mbox series

[GIT,PULL] Two folio fixes for 5.18

Message ID YnRhFrLuRM5SY+hq@casper.infradead.org (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] Two folio fixes for 5.18 | expand

Pull-request

git://git.infradead.org/users/willy/pagecache.git tags/folio-5.18f

Message

Matthew Wilcox May 5, 2022, 11:43 p.m. UTC
Darrick and Brian have done amazing work debugging the race I created
in the folio BIO iterator.  The readahead problem was deterministic,
so easy to fix.

The following changes since commit a7391ad3572431a354c927cf8896e86e50d7d0bf:

  Merge tag 'iomm-fixes-v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu (2022-05-04 11:04:52 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/willy/pagecache.git tags/folio-5.18f

for you to fetch changes up to b9ff43dd27434dbd850b908e2e0e1f6e794efd9b:

  mm/readahead: Fix readahead with large folios (2022-05-05 00:47:29 -0400)

----------------------------------------------------------------
Two folio fixes for 5.18:

 - Fix a race when we were calling folio_next() in the BIO folio iter
   without holding a reference, meaning the folio could be split or freed,
   and we'd jump to the next page instead of the intended next folio.

 - Fix readahead creating single-page folios instead of the intended
   large folios when doing reads that are not a power of two in size.

----------------------------------------------------------------
Matthew Wilcox (Oracle) (2):
      block: Do not call folio_next() on an unreferenced folio
      mm/readahead: Fix readahead with large folios

 include/linux/bio.h |  5 ++++-
 mm/readahead.c      | 15 +++++++++------
 2 files changed, 13 insertions(+), 7 deletions(-)

Comments

pr-tracker-bot@kernel.org May 6, 2022, 12:02 a.m. UTC | #1
The pull request you sent on Fri, 6 May 2022 00:43:18 +0100:

> git://git.infradead.org/users/willy/pagecache.git tags/folio-5.18f

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fe27d189e3f42e31d3c8223d5daed7285e334c5e

Thank you!
Andrew Morton May 10, 2022, 10:18 p.m. UTC | #2
On Fri, 6 May 2022 00:43:18 +0100 Matthew Wilcox <willy@infradead.org> wrote:

>  - Fix readahead creating single-page folios instead of the intended
>    large folios when doing reads that are not a power of two in size.

I worry about the idea of using hugepages in readahead.  We're
increasing the load on the hugepage allocator, which is already
groaning under the load.

The obvious risk is that handing out hugepages to a low-value consumer
(copying around pagecache which is only ever accessed via the direct
map) will deny their availability to high-value consumers (that
compute-intensive task against a large dataset).

Has testing and instrumentation been used to demonstrate that this is
not actually going to be a problem, or are we at risk of getting
unhappy reports?
Matthew Wilcox May 10, 2022, 10:30 p.m. UTC | #3
On Tue, May 10, 2022 at 03:18:09PM -0700, Andrew Morton wrote:
> On Fri, 6 May 2022 00:43:18 +0100 Matthew Wilcox <willy@infradead.org> wrote:
> 
> >  - Fix readahead creating single-page folios instead of the intended
> >    large folios when doing reads that are not a power of two in size.
> 
> I worry about the idea of using hugepages in readahead.  We're
> increasing the load on the hugepage allocator, which is already
> groaning under the load.

Well, hang on.  We're not using the hugepage allocator, we're using
the page allocator.  We're also using variable order pages, not
necessarily PMD_ORDER.  I was under the impression that we were
using GFP_TRANSHUGE_LIGHT, but I now don't see that.  So that might
be something that needs to be changed.

> The obvious risk is that handing out hugepages to a low-value consumer
> (copying around pagecache which is only ever accessed via the direct
> map) will deny their availability to high-value consumers (that
> compute-intensive task against a large dataset).
> 
> Has testing and instrumentation been used to demonstrate that this is
> not actually going to be a problem, or are we at risk of getting
> unhappy reports?

It's hard to demonstrate that it's definitely not going to cause a
problem.  But I actually believe it will help; by keeping page cache
memory in larger chunks, we make it easier to defrag memory and create
PMD-order pages when they're needed.
Andrew Morton May 10, 2022, 10:45 p.m. UTC | #4
On Tue, 10 May 2022 23:30:02 +0100 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 10, 2022 at 03:18:09PM -0700, Andrew Morton wrote:
> > On Fri, 6 May 2022 00:43:18 +0100 Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > >  - Fix readahead creating single-page folios instead of the intended
> > >    large folios when doing reads that are not a power of two in size.
> > 
> > I worry about the idea of using hugepages in readahead.  We're
> > increasing the load on the hugepage allocator, which is already
> > groaning under the load.
> 
> Well, hang on.  We're not using the hugepage allocator, we're using
> the page allocator.  We're also using variable order pages, not
> necessarily PMD_ORDER.

Ah, OK, misapprehended.  I guess there remains a fragmentation risk.

>  I was under the impression that we were
> using GFP_TRANSHUGE_LIGHT, but I now don't see that.  So that might
> be something that needs to be changed.
> 
> > The obvious risk is that handing out hugepages to a low-value consumer
> > (copying around pagecache which is only ever accessed via the direct
> > map) will deny their availability to high-value consumers (that
> > compute-intensive task against a large dataset).
> > 
> > Has testing and instrumentation been used to demonstrate that this is
> > not actually going to be a problem, or are we at risk of getting
> > unhappy reports?
> 
> It's hard to demonstrate that it's definitely not going to cause a
> problem.  But I actually believe it will help; by keeping page cache
> memory in larger chunks, we make it easier to defrag memory and create
> PMD-order pages when they're needed.

Obviously it'll be very workload-dependent.