Message ID | YqOZ3v68HrM9LI//@casper.infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] Folio fixes for 5.19 | expand |
On Fri, Jun 10, 2022 at 12:22 PM Matthew Wilcox <willy@infradead.org> wrote: > > - Don't release a folio while it's still locked Ugh. I *hate* this patch. It's just incredibly broken. Yes, I've pulled this, but I have looked at that readahead_folio() function before, and I have despised it before, but this patch really drove home how incredibly broken that function is. Honestly, readahead_folio() returns a folio *AFTER* it has dropped the ref to that folio. That's broken to start with. It's not only really wrong to say "here's a pointer, and by the way, you don't actually hold a ref to it any more". It's doubly broken because it's *very* different from the similarly named readahead_page() that doesn't have that fundamental interface bug. But it's now *extra* broken now that you realize that the dropping of the ref was very very wrong, so you re-get the ref. WTF? As far as I can tell, ALL THE OTHER users of this broken function fall into two categories: (a) they are broken (see the exact same broken pattern in fs/erofs/fscache.c, for example) (b) they only use the thing as a boolean Anyway, I've pulled this, but I really seriously object to that completely mis-designed function. Please send me a follow-up patch that fixes it by just making the *callers* drop the refcount, instead of doing it incorrectly inside readahead_folio(). Alternatively, make "readahead_folio()" just return a boolean - so that the (b) case situation can continue the use this function - and create a new function that just exposes __readahead_folio() without this broken "drop refcount" behavior). Or explain why that "drop and retake ref" isn't (a) completely broken and racy (b) inefficient (c) returning a non-ref'd folio pointer is horribly broken interface to begin with because right now it's just disgusting in so many ways and this bug is just the last drop for me. Linus
The pull request you sent on Fri, 10 Jun 2022 20:22:06 +0100:
> git://git.infradead.org/users/willy/pagecache.git tags/folio-5.19a
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a32e7ea362356af8e89e67600432bad83d2325da
Thank you!
On Fri, Jun 10, 2022 at 12:56:48PM -0700, Linus Torvalds wrote: > On Fri, Jun 10, 2022 at 12:22 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > - Don't release a folio while it's still locked > > Ugh. > > I *hate* this patch. It's just incredibly broken. > > Yes, I've pulled this, but I have looked at that readahead_folio() > function before, and I have despised it before, but this patch really > drove home how incredibly broken that function is. > > Honestly, readahead_folio() returns a folio *AFTER* it has dropped the > ref to that folio. OK, you caught me. I realised (a little too late) that the rules around refcounts in ->readpage and ->readahead are different, and that creates pain for people writing filesystems. For ->readahead, I stuck with the refcount model that was in ->readpages (there is an extra refcount on the folio and the filesystem must put it before it returns). But I don't want to change the refcounting rules on a method without changing something else about the method, because trying to find a missing refcount change is misery. Anyway, my cunning thought was that if I bundle the change to the refcount rule with the change from readahead_page() to readahead_folio(), once all filesystems are converted to readahead_folio(), I can pull the refcount game out of readahead_folio() and do it in the caller where it belongs, all transparent to the filesystems. I think it's worth doing, because it's two fewer atomic ops per folio that we read from a file. But I didn't think through the transition process clearly enough, and right now it's a mess. How would you like me to proceed? (I don't think the erofs code has a bug because it doesn't remove the folio from the pagecache while holding the lock -- the folio lock prevents anyone _else_ from removing the folio from the pagecache, so there must be a reference on the folio up until erofs calls folio_unlock()).
On Fri, Jun 10, 2022 at 2:40 PM Matthew Wilcox <willy@infradead.org> wrote: > > But I don't want to change the refcounting rules on a method without > changing something else about the method, because trying to find a > missing refcount change is misery. Anyway, my cunning thought was > that if I bundle the change to the refcount rule with the change > from readahead_page() to readahead_folio(), once all filesystems > are converted to readahead_folio(), I can pull the refcount game out > of readahead_folio() and do it in the caller where it belongs, all > transparent to the filesystems. Hmm. Any reason why that can't be done right now? Aren't we basically converted already? Yeah, yeah, there's a couple of users of readahead_page() left, but if cleaning up the folio case requires some fixup to those, then that sounds better than the current "folio interface is very messy". > (I don't think the erofs code has a bug because it doesn't remove > the folio from the pagecache while holding the lock -- the folio lock > prevents anyone _else_ from removing the folio from the pagecache, > so there must be a reference on the folio up until erofs calls > folio_unlock()). Ahh. Ugh. And I guess the whole "clearing the lock bit is the last time we touch the page flags" and "folio_wake_bit() is very careful to only touch the external waitqueue" so that there can be no nasty races with somebody coming in *exactly* as the folio is unlocked. This has been subtle before, but I think we did allow it exactly for this kind of reason. I've swapped out the details. Linus