I wonder about an efficient way to copy files (on Linux, on a FS which supports copy-on-write (COW)).
Specifically, I want that my implementation uses copy-on-write if possible, but otherwise falls back to other efficient variants. Specifically, I also care about server-side copy (supported by SMB, NFS and others), and also zero-copy (i.e. bypassing the CPU or memory if possible).
(This question is not really specific to any programming language. It could be C or C++, but also any other like Python, Go or whatever has bindings to the OS syscalls, or has any way to do a syscall. If this is confusing to you, just answer in C.)
It looks like ioctl_ficlonerange
, ioctl_ficlone
(i.e. ioctl
with FICLONE
or FICLONERANGE
) support copy-on-write (COW). Specifically FICLONE
is used by GNU cp
(here, via --reflink
).
Then there is also copy_file_range
, which also seems to support COW, and server-side-copy.
(LWN about copy_file_range.)
It sounds as if copy_file_range
is more generic (e.g. it supports server-side-copy; not sure if that is supported by FICLONE
).
However, copy_file_range
seems to have some issues.
E.g. here, Paul Eggert comments:
[copy_file_range]'s man page
says it uses a size_t (not off_t) to count the number of bytes to be
copied, which is a strange choice for a file-copying API.
Are there situations where FICLONE
would work better/different than copy_file_range
?
Are there situations where FICLONE
would work better/different than FICLONERANGE
?
Specifically, assuming the underlying FS supports this, and assume you want to copy a file. I ask about the support of these functions for the functionality of:
Are they (FICLONE
, FICLONERANGE
, copy_file_range
) always performing exactly the same operation? (Assuming the underlying FS supports copy-on-write, and/or server-side copy.)
Or are there situations where it make sense to use copy_file_range
instead of FICLONE
? (E.g. COW only works with copy_file_range
but not with FICLONE
. Or the other way around. Or can this never happen?)
Or formulating the same question differently: Would copy_file_range
always be fine, or are there situations where I would want to use FICLONE
instead?
Why does GNU cp
use FICLONE
and not copy_file_range
? (Is there a technical reason, or is this just historic?)
Related: GNU cp
originally did not use reflink
by default (see comment by the GNU coreutils maintainer Pádraig Brady).
However, that was changed recently (this commit, bug report 24400), i.e. COW behavior is the default now (if possible) (--reflink=auto
).
Related question about Python for COW support.
Related discussion about FICLONE vs copy_file_range by Python developers. I.e. this seems to be a valid question, and it's not totally clear whether to use FICLONE
or copy_file_range
.
Related Syncthing documentation about the choice of methods for copying data between files, and
Syncthing issue about copy_file_range
and others for efficient file copying, e.g. with COW support.
It also suggests that it is not so clear that FICLONE
would do the same as copy_file_range
, so their solution is to just try all of them, and fallback to the next, in this order:
ioctl (with FICLONE), copy_file_range, sendfile, duplicate_extents, standard.
Related issue by Go developers on the usage of copy_file_range
.
It sounds as if they agree that copy_file_range
is always to be preferred over sendfile
.
(Question copied from here but I don't see how this is too less focused. This question is very focused and asks a very specific thing (whether FICLONE and copy_file_range behave the same), and should be extremely clear. I formulated the question in multiple different ways, to make the question even more clear. This question is also extremely well researched, and should already be very valuable to the community as-is with all the references. I would have been very happy if I would have found such a question by itself, even without answers, when I started researching about the differences between FICLONE and copy_file_range.)