Message ID | b7dd4b16ffffa1114177f37bc349d437fc51cc63.1739484084.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs-progs: docs: add an extra note to btrfs data checksum and directIO | expand |
On 13.02.25 23:02, Qu Wenruo wrote: Not a native English speaker either but I think it should be: > +.. note:: > + Since data checksum is calculated just before submitting to the block device, ^~the > + btrfs has a strong requirement that those data can not be modified until the this data/those data blocks ~^ > + writeback is finished. > + > + This requirement is met for buffered IO as btrfs has full control on the > + page cache, but direct IOs (``O_DIRECT``) bypass the page cache, and btrfs bypasses ~^ > + can not control the direct IO buffer (can be user space memory), thus it's as it can be in user space memory ~^ > + possible that user space programs modify the buffer before it's fully written > + back, and lead to data checksum mismatch. this leads ~^ ^~a > + > + To avoid such checksum mismatch, since v6.14 btrfs will force direct IOs to a ~^ > + fall back to buffered IOs, if the inode requires data checksum. a ~^ > + This will bring a small performance penalty, if the end user requires true > + zero-copy direct IOs, they should set the ``NODATASUM`` flag for the inode > + and make sure the direct IO buffer is fully aligned to btrfs block size. > + > + Byte, Johannes
diff --git a/Documentation/ch-checksumming.rst b/Documentation/ch-checksumming.rst index 5e47a6bfb492..782191692746 100644 --- a/Documentation/ch-checksumming.rst +++ b/Documentation/ch-checksumming.rst @@ -3,6 +3,24 @@ writing and verified after reading the blocks from devices. The whole metadata block has an inline checksum stored in the b-tree node header. Each data block has a detached checksum stored in the checksum tree. +.. note:: + Since data checksum is calculated just before submitting to the block device, + btrfs has a strong requirement that those data can not be modified until the + writeback is finished. + + This requirement is met for buffered IO as btrfs has full control on the + page cache, but direct IOs (``O_DIRECT``) bypass the page cache, and btrfs + can not control the direct IO buffer (can be user space memory), thus it's + possible that user space programs modify the buffer before it's fully written + back, and lead to data checksum mismatch. + + To avoid such checksum mismatch, since v6.14 btrfs will force direct IOs to + fall back to buffered IOs, if the inode requires data checksum. + This will bring a small performance penalty, if the end user requires true + zero-copy direct IOs, they should set the ``NODATASUM`` flag for the inode + and make sure the direct IO buffer is fully aligned to btrfs block size. + + There are several checksum algorithms supported. The default and backward compatible algorithm is *crc32c*. Since kernel 5.5 there are three more with different characteristics and trade-offs regarding speed and strength. The following list
In v6.14 kernel release, btrfs will force direct IO to fall back to buffered one if the inode requires data checksum. This will cause a small performance drop, to solve the false data checksum problem caused by direct IOs. Although such change is small to most end users, for those requiring zero-copy direct IO this will be a behavior change, and require a proper documentation update. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Documentation/ch-checksumming.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)