mbox series

[GIT,PULL] Folio fixes for 5.19

Message ID YqOZ3v68HrM9LI//@casper.infradead.org (mailing list archive)
State New
Headers show
Series [GIT,PULL] Folio fixes for 5.19 | expand

Pull-request

git://git.infradead.org/users/willy/pagecache.git tags/folio-5.19a

Message

Matthew Wilcox June 10, 2022, 7:22 p.m. UTC
The following changes since commit 3d9f55c57bc3659f986acc421eac431ff6edcc83:

  Merge tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2022-06-09 12:26:05 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/willy/pagecache.git tags/folio-5.19a

for you to fetch changes up to 334f6f53abcf57782bd2fe81da1cbd893e4ef05c:

  mm: Add kernel-doc for folio->mlock_count (2022-06-09 16:24:25 -0400)

----------------------------------------------------------------
Four folio-related fixes for 5.19:

 - Don't release a folio while it's still locked

 - Fix a use-after-free after dropping the mmap_lock

 - Fix a memory leak when splitting a page

 - Fix a kernel-doc warning for struct folio

----------------------------------------------------------------
Matthew Wilcox (Oracle) (4):
      filemap: Don't release a locked folio
      filemap: Cache the value of vm_flags
      mm/huge_memory: Fix xarray node memory leak
      mm: Add kernel-doc for folio->mlock_count

 include/linux/mm_types.h | 5 +++++
 include/linux/xarray.h   | 1 +
 lib/xarray.c             | 5 +++--
 mm/filemap.c             | 9 +++++----
 mm/huge_memory.c         | 3 +--
 mm/readahead.c           | 2 ++
 6 files changed, 17 insertions(+), 8 deletions(-)

Comments

Linus Torvalds June 10, 2022, 7:56 p.m. UTC | #1
On Fri, Jun 10, 2022 at 12:22 PM Matthew Wilcox <willy@infradead.org> wrote:
>
>  - Don't release a folio while it's still locked

Ugh.

I *hate* this patch. It's just incredibly broken.

Yes, I've pulled this, but I have looked at that readahead_folio()
function before, and I have despised it before, but this patch really
drove home how incredibly broken that function is.

Honestly, readahead_folio() returns a folio *AFTER* it has dropped the
ref to that folio.

That's broken to start with. It's not only really wrong to say "here's
a pointer, and by the way, you don't actually hold a ref to it any
more".

It's doubly broken because it's *very* different from the similarly
named readahead_page() that doesn't have that fundamental interface
bug.

But it's now *extra* broken now that you realize that the dropping of
the ref was very very wrong, so you re-get the ref. WTF?

As far as I can tell, ALL THE OTHER users of this broken function fall
into two categories:

 (a) they are broken (see the exact same broken pattern in
fs/erofs/fscache.c, for example)

 (b) they only use the thing as a boolean

Anyway, I've pulled this, but I really seriously object to that
completely mis-designed function.

Please send me a follow-up patch that fixes it by just making the
*callers* drop the refcount, instead of doing it incorrectly inside
readahead_folio().

Alternatively, make "readahead_folio()" just return a boolean - so
that the (b) case situation can continue the use this function - and
create a new function that just exposes __readahead_folio() without
this broken "drop refcount" behavior).

Or explain why that "drop and retake ref" isn't

 (a) completely broken and racy

 (b) inefficient

 (c) returning a non-ref'd folio pointer is horribly broken interface
to begin with

because right now it's just disgusting in so many ways and this bug is
just the last drop for me.

                 Linus
pr-tracker-bot@kernel.org June 10, 2022, 7:58 p.m. UTC | #2
The pull request you sent on Fri, 10 Jun 2022 20:22:06 +0100:

> git://git.infradead.org/users/willy/pagecache.git tags/folio-5.19a

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a32e7ea362356af8e89e67600432bad83d2325da

Thank you!
Matthew Wilcox June 10, 2022, 9:39 p.m. UTC | #3
On Fri, Jun 10, 2022 at 12:56:48PM -0700, Linus Torvalds wrote:
> On Fri, Jun 10, 2022 at 12:22 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> >  - Don't release a folio while it's still locked
> 
> Ugh.
> 
> I *hate* this patch. It's just incredibly broken.
> 
> Yes, I've pulled this, but I have looked at that readahead_folio()
> function before, and I have despised it before, but this patch really
> drove home how incredibly broken that function is.
> 
> Honestly, readahead_folio() returns a folio *AFTER* it has dropped the
> ref to that folio.

OK, you caught me.

I realised (a little too late) that the rules around refcounts in
->readpage and ->readahead are different, and that creates pain for
people writing filesystems.  For ->readahead, I stuck with the refcount
model that was in ->readpages (there is an extra refcount on the folio
and the filesystem must put it before it returns).

But I don't want to change the refcounting rules on a method without
changing something else about the method, because trying to find a
missing refcount change is misery.  Anyway, my cunning thought was
that if I bundle the change to the refcount rule with the change
from readahead_page() to readahead_folio(), once all filesystems
are converted to readahead_folio(), I can pull the refcount game out
of readahead_folio() and do it in the caller where it belongs, all
transparent to the filesystems.

I think it's worth doing, because it's two fewer atomic ops per folio
that we read from a file.  But I didn't think through the transition
process clearly enough, and right now it's a mess.  How would you like
me to proceed?

(I don't think the erofs code has a bug because it doesn't remove
the folio from the pagecache while holding the lock -- the folio lock
prevents anyone _else_ from removing the folio from the pagecache,
so there must be a reference on the folio up until erofs calls
folio_unlock()).
Linus Torvalds June 10, 2022, 11:27 p.m. UTC | #4
On Fri, Jun 10, 2022 at 2:40 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> But I don't want to change the refcounting rules on a method without
> changing something else about the method, because trying to find a
> missing refcount change is misery.  Anyway, my cunning thought was
> that if I bundle the change to the refcount rule with the change
> from readahead_page() to readahead_folio(), once all filesystems
> are converted to readahead_folio(), I can pull the refcount game out
> of readahead_folio() and do it in the caller where it belongs, all
> transparent to the filesystems.

Hmm. Any reason why that can't be done right now? Aren't we basically
converted already?

Yeah, yeah, there's a couple of users of readahead_page() left, but if
cleaning up the folio case requires some fixup to those, then that
sounds better than the current "folio interface is very messy".

> (I don't think the erofs code has a bug because it doesn't remove
> the folio from the pagecache while holding the lock -- the folio lock
> prevents anyone _else_ from removing the folio from the pagecache,
> so there must be a reference on the folio up until erofs calls
> folio_unlock()).

Ahh. Ugh. And I guess the whole "clearing the lock bit is the last
time we touch the page flags" and "folio_wake_bit() is very careful to
only touch the external waitqueue" so that there can be no nasty races
with somebody coming in *exactly* as the folio is unlocked.

This has been subtle before, but I think we did allow it exactly for
this kind of reason. I've swapped out the details.

            Linus