Message ID | cover.1742834133.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] Btrfs updates for 6.15 | expand |
The pull request you sent on Mon, 24 Mar 2025 17:37:51 +0100:
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.15-tag
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fd71def6d9abc5ae362fb9995d46049b7b0ed391
Thank you!
On Mon, Mar 24, 2025 at 05:37:51PM +0100, David Sterba wrote: > Hi, > > please pull the following btrfs updates, thanks. > > User visible changes: > > - fall back to buffered write if direct io is done on a file that requires > checksums <trimming the everybody linux-btrfs from the cc list> What? We use this constantly in a bunch of places to avoid page cache overhead, it's perfectly legitimate to do DIO to a file that requires checksums. Does the vm case mess this up? Absolutely, but that's why we say use NOCOW for that case. We've always had this behavior, we've always been clear that if you break it you buy it. This is a huge regression for a pretty significant use case. Thanks, Josef
On Fri, Mar 28, 2025 at 09:27:51AM -0400, Josef Bacik wrote: > On Mon, Mar 24, 2025 at 05:37:51PM +0100, David Sterba wrote: > > Hi, > > > > please pull the following btrfs updates, thanks. > > > > User visible changes: > > > > - fall back to buffered write if direct io is done on a file that requires > > checksums > > <trimming the everybody linux-btrfs from the cc list> > > What? We use this constantly in a bunch of places to avoid page cache overhead, > it's perfectly legitimate to do DIO to a file that requires checksums. Does the > vm case mess this up? Absolutely, but that's why we say use NOCOW for that > case. We've always had this behavior, we've always been clear that if you break > it you buy it. This is a huge regression for a pretty significant use case. The patch has been up for like 2 months and you could have said "don't because reasons" any time before the pull request. Now we're left with a revert, or other alternatives making the use cases working.
On Fri, Mar 28, 2025 at 06:36:44PM +0100, David Sterba wrote: > On Fri, Mar 28, 2025 at 09:27:51AM -0400, Josef Bacik wrote: > > On Mon, Mar 24, 2025 at 05:37:51PM +0100, David Sterba wrote: > > > Hi, > > > > > > please pull the following btrfs updates, thanks. > > > > > > User visible changes: > > > > > > - fall back to buffered write if direct io is done on a file that requires > > > checksums > > > > <trimming the everybody linux-btrfs from the cc list> > > > > What? We use this constantly in a bunch of places to avoid page cache overhead, > > it's perfectly legitimate to do DIO to a file that requires checksums. Does the > > vm case mess this up? Absolutely, but that's why we say use NOCOW for that > > case. We've always had this behavior, we've always been clear that if you break > > it you buy it. This is a huge regression for a pretty significant use case. > > The patch has been up for like 2 months and you could have said "don't > because reasons" any time before the pull request. Now we're left with a > revert, or other alternatives making the use cases working. Boris told me about this and I forgot. I took everybody off the CC list because I don't want to revert it, in fact generally speaking I'd love to never have these style of bug reports again. But it is a pretty significant change. Are we ok going forward saying you don't get O_DIRECT unless you want NOCOW? Should we maybe allow for users to indicate they're not dumb and can be trusted to do O_DIRECT properly? I just think this opens us up to a lot more uncomfortable conversations than the other behavior. I personally think this is better, if it's been sitting there for 2 months then hooray we're in agreement. But I'm also worried it'll come back to bite us. Thanks, Josef
在 2025/3/29 06:09, Josef Bacik 写道: > On Fri, Mar 28, 2025 at 06:36:44PM +0100, David Sterba wrote: >> On Fri, Mar 28, 2025 at 09:27:51AM -0400, Josef Bacik wrote: >>> On Mon, Mar 24, 2025 at 05:37:51PM +0100, David Sterba wrote: >>>> Hi, >>>> >>>> please pull the following btrfs updates, thanks. >>>> >>>> User visible changes: >>>> >>>> - fall back to buffered write if direct io is done on a file that requires >>>> checksums >>> >>> <trimming the everybody linux-btrfs from the cc list> >>> >>> What? We use this constantly in a bunch of places to avoid page cache overhead, >>> it's perfectly legitimate to do DIO to a file that requires checksums. Does the >>> vm case mess this up? Absolutely, but that's why we say use NOCOW for that >>> case. We've always had this behavior, we've always been clear that if you break >>> it you buy it. This is a huge regression for a pretty significant use case. BTW, it's not clear, it's not in the documents (until recently, along with this change), and no one is really explaining that to end users until now. If you're a experienced fs/mm developer, sure it's obvious, but most users (millions of VM users who choose to use none cache mode) are not so experienced. Future more, a lot of such VM users are just trying to reduce page cache memory usage, not so about the zero-copy performance. And the buffered fallback should still make a lot of sense for them, and still allow working data checksum (which they may care more than the performance). >> >> The patch has been up for like 2 months and you could have said "don't >> because reasons" any time before the pull request. Now we're left with a >> revert, or other alternatives making the use cases working. > > Boris told me about this and I forgot. I took everybody off the CC list because > I don't want to revert it, in fact generally speaking I'd love to never have > these style of bug reports again. > > But it is a pretty significant change. Are we ok going forward saying you don't > get O_DIRECT unless you want NOCOW? Despite NOCOW, there are still other things (like alignment) can make O_DIRECT to fall back buffered, without even letting the users know. And it is always fs specific on whether it choose to fall buffered or return -EINVAL. So in the first place there is never any guarantee that O_DIRECT always results in zero-copy writes. > Should we maybe allow for users to indicate > they're not dumb and can be trusted to do O_DIRECT properly? I just think this > opens us up to a lot more uncomfortable conversations than the other behavior. For whatever reason, no matter dumb users or not, all operations through a fs should not lead to data/metadata corruption, no matter if it's real corruption or just false csum mismatch. (Not to mention it's not easy to fix the csum mismatch either) If dumb programs can still corrupt the fs, then it's a bug in the fs. To me, this is very straightforward, and the priority should be obvious. > > I personally think this is better, if it's been sitting there for 2 months then > hooray we're in agreement. But I'm also worried it'll come back to bite us. I'd say, let it bite. If it's some performance critical use cases, I believe the developer should be reasonable enough to be persuaded (either choose perf and go NODATASUM, or choose datacsum and accept the perf drop). If it's newbie level usages, they shouldn't even notice any difference, except the fact btrfs no longer corrupts their files. Even if it really bites in the future, I doubt it will be any worse. On the other hand, the data csum mismatch for VM images are already biting us for too long, and it's really damaging the reputation of Btrfs. Thanks, Qu > Thanks, > > Josef
On Fri, 28 Mar 2025 at 20:40, Josef Bacik <josef@toxicpanda.com> wrote: > > ... Should we maybe allow for users to indicate > they're not dumb and can be trusted to do O_DIRECT properly? I just think this Seriously, I don't think it's a good idea. Though, I'd vote for adding a tracing point so that people chasing a performance issue/regression can check if this is possibly the cause. Yet, that wouldn't restore the performance for you, of course. Anyways, if you really trust your app that much (hey, it's your call, right?) you can always patch *your* kernel. But IMO such a hack does not belong to the upstream kernel. As Qu mentioned, the FS needs to stay consistent and not allow users to shoot themselves into a leg. --nX > > Josef >
On Fri, Mar 28, 2025 at 03:39:27PM -0400, Josef Bacik wrote: > On Fri, Mar 28, 2025 at 06:36:44PM +0100, David Sterba wrote: > > On Fri, Mar 28, 2025 at 09:27:51AM -0400, Josef Bacik wrote: > > > On Mon, Mar 24, 2025 at 05:37:51PM +0100, David Sterba wrote: > > > > Hi, > > > > > > > > please pull the following btrfs updates, thanks. > > > > > > > > User visible changes: > > > > > > > > - fall back to buffered write if direct io is done on a file that requires > > > > checksums > > > > > > <trimming the everybody linux-btrfs from the cc list> > > > > > > What? We use this constantly in a bunch of places to avoid page cache overhead, > > > it's perfectly legitimate to do DIO to a file that requires checksums. Does the > > > vm case mess this up? Absolutely, but that's why we say use NOCOW for that > > > case. We've always had this behavior, we've always been clear that if you break > > > it you buy it. This is a huge regression for a pretty significant use case. > > > > The patch has been up for like 2 months and you could have said "don't > > because reasons" any time before the pull request. Now we're left with a > > revert, or other alternatives making the use cases working. > > Boris told me about this and I forgot. I took everybody off the CC list because > I don't want to revert it, in fact generally speaking I'd love to never have > these style of bug reports again. > > But it is a pretty significant change. Are we ok going forward saying you don't > get O_DIRECT unless you want NOCOW? Should we maybe allow for users to indicate > they're not dumb and can be trusted to do O_DIRECT properly? I just think this > opens us up to a lot more uncomfortable conversations than the other behavior. > > I personally think this is better, if it's been sitting there for 2 months then > hooray we're in agreement. But I'm also worried it'll come back to bite us. I think we're in agreement where the fix is going and I also agree this is a significant change and can indeed become worse in some sense than what we have now. We already have exceptions for direct io when combined with various features, most notably with compression where it falls back to buffered right away. We may need to enumerate and document the common combinations and decide what to do about them and what are potential drawbacks. The upcoming uncached IO support can possibly fill some gaps. The fix is going towards correctness at the cost of performance and arguably this should have been the default already. We ended up with many people noticing checksum mismatches in syslog due to the VM setup. It's rather bad usability to tell people to ignore certain errors that would otherwise be quite significant. Some of the combinations: 1) dio + no csum + no cow -> true direct io 2) dio + no csum + cow -> almost direct io 3) dio + csum + cow + no compression -> fallback to buffered 4) dio + csum + cow compression -> fallback to buffered The changed case is 3. Quoting your reply: > We use this constantly in a bunch of places to avoid page cache overhead, > it's perfectly legitimate to do DIO to a file that requires checksums. So if you want to avoid page cache overhead the uncached mode does not help that much I guess, the pages and related structures still need to be set up. I'm wondering if the use of direct io is more for convenience (not to pollute the page cache) or for the real expectations of the performance gains when the caching is done on the application side (e.g. databases). The mutually exclusive feature is checksumming, so if the direct io is for performance then the application could also turn off the checksums. We don't have a magic bit and opening mode that can keep the dio + csum working as before, and currently don't have a suggestion how to implement it. Seems that one way or another the user application will have to do some work to get best the performance.