diff mbox series

mm: Fix error handling in __filemap_get_folio() with FGP_NOWAIT

Message ID 20250223235719.66576-1-raphaelsc@scylladb.com (mailing list archive)
State New
Headers show
Series mm: Fix error handling in __filemap_get_folio() with FGP_NOWAIT | expand

Commit Message

Raphael S. Carvalho Feb. 23, 2025, 11:57 p.m. UTC
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=B-Dw@mail.gmail.com/T/

When doing buffered writes with FGP_NOWAIT, under memory pressure, the system
returned ENOMEM despite there was plenty of available memory. The user space
used io_uring interface, which in turn submits I/O with FGP_NOWAIT (the fast
path).

retsnoop pointed to iomap_get_folio:

00:34:16.180612 -> 00:34:16.180651 TID/PID 253786/253721
(reactor-1/combined_tests):

                    entry_SYSCALL_64_after_hwframe+0x76
                    do_syscall_64+0x82
                    __do_sys_io_uring_enter+0x265
                    io_submit_sqes+0x209
                    io_issue_sqe+0x5b
                    io_write+0xdd
                    xfs_file_buffered_write+0x84
                    iomap_file_buffered_write+0x1a6
    32us [-ENOMEM]  iomap_write_begin+0x408
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x…
pos=0 len=4096 foliop=0xffffb32c296b7b80
!    4us [-ENOMEM]  iomap_get_folio
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x…
pos=0 len=4096

This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which performed the following changes:
    --- a/fs/iomap/buffered-io.c
    +++ b/fs/iomap/buffered-io.c
    @@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
    struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
    {
            unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
    -       struct folio *folio;

            if (iter->flags & IOMAP_NOWAIT)
                    fgp |= FGP_NOWAIT;

    -       folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
    +       return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
                            fgp, mapping_gfp_mask(iter->inode->i_mapping));
    -       if (folio)
    -               return folio;
    -
    -       if (iter->flags & IOMAP_NOWAIT)
    -               return ERR_PTR(-EAGAIN);
    -       return ERR_PTR(-ENOMEM);
    }

Essentially, that patch is moving error picking decision to
__filemap_get_folio, but it missed proper FGP_NOWAIT handling, so ENOMEM
is being escaped to user space. Had it correctly returned -EAGAIN with NOWAIT,
either io_uring or user space itself would be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must return
the proper error too.

The patch was tested with scylladb test suite (its original reproducer), and
the tests all pass now when memory is pressured.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
---
 mm/filemap.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Feb. 24, 2025, 4:14 a.m. UTC | #1
On Sun, Feb 23, 2025 at 08:57:19PM -0300, Raphael S. Carvalho wrote:
> This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
> from __filemap_get_folio"), which performed the following changes:
>     --- a/fs/iomap/buffered-io.c
>     +++ b/fs/iomap/buffered-io.c
>     @@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>     struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
>     {
>             unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
>     -       struct folio *folio;
> 
>             if (iter->flags & IOMAP_NOWAIT)
>                     fgp |= FGP_NOWAIT;
> 
>     -       folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
>     +       return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
>                             fgp, mapping_gfp_mask(iter->inode->i_mapping));
>     -       if (folio)
>     -               return folio;
>     -
>     -       if (iter->flags & IOMAP_NOWAIT)
>     -               return ERR_PTR(-EAGAIN);
>     -       return ERR_PTR(-ENOMEM);
>     }

We don't usually put this in the changelog ...

> Essentially, that patch is moving error picking decision to
> __filemap_get_folio, but it missed proper FGP_NOWAIT handling, so ENOMEM
> is being escaped to user space. Had it correctly returned -EAGAIN with NOWAIT,
> either io_uring or user space itself would be able to retry the request.
> It's not enough to patch io_uring since the iomap interface is the one
> responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must return
> the proper error too.
> 
> The patch was tested with scylladb test suite (its original reproducer), and
> the tests all pass now when memory is pressured.
> 
> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Instead, we add:

Fixes: 66dabbb65d67 (mm: return an ERR_PTR from __filemap_get_folio)

> ---
>  mm/filemap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 804d7365680c..b06bd6eedaf7 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1986,8 +1986,15 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  
>  		if (err == -EEXIST)
>  			goto repeat;
> -		if (err)
> +		if (err) {
> +			/*
> +			 * Presumably ENOMEM, either from when allocating or
> +			 * adding folio (this one for xarray node)
> +			 */

I don't like the comment.  Better to do that in code:

			if ((fgp_flags & FGP_NOWAIT) && (err == -ENOMEM))

> +			if (fgp_flags & FGP_NOWAIT)
> +				err = -EAGAIN;
>  			return ERR_PTR(err);
> +		}
>  		/*
>  		 * filemap_add_folio locks the page, and for mmap
>  		 * we expect an unlocked page.
> -- 
> 2.48.1
>
Raphael S. Carvalho Feb. 24, 2025, 7:59 a.m. UTC | #2
On Mon, Feb 24, 2025 at 1:14 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Feb 23, 2025 at 08:57:19PM -0300, Raphael S. Carvalho wrote:
> > This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
> > from __filemap_get_folio"), which performed the following changes:
> >     --- a/fs/iomap/buffered-io.c
> >     +++ b/fs/iomap/buffered-io.c
> >     @@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
> >     struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
> >     {
> >             unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
> >     -       struct folio *folio;
> >
> >             if (iter->flags & IOMAP_NOWAIT)
> >                     fgp |= FGP_NOWAIT;
> >
> >     -       folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> >     +       return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> >                             fgp, mapping_gfp_mask(iter->inode->i_mapping));
> >     -       if (folio)
> >     -               return folio;
> >     -
> >     -       if (iter->flags & IOMAP_NOWAIT)
> >     -               return ERR_PTR(-EAGAIN);
> >     -       return ERR_PTR(-ENOMEM);
> >     }
>
> We don't usually put this in the changelog ...
>
> > Essentially, that patch is moving error picking decision to
> > __filemap_get_folio, but it missed proper FGP_NOWAIT handling, so ENOMEM
> > is being escaped to user space. Had it correctly returned -EAGAIN with NOWAIT,
> > either io_uring or user space itself would be able to retry the request.
> > It's not enough to patch io_uring since the iomap interface is the one
> > responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must return
> > the proper error too.
> >
> > The patch was tested with scylladb test suite (its original reproducer), and
> > the tests all pass now when memory is pressured.
> >
> > Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
>
> Instead, we add:
>
> Fixes: 66dabbb65d67 (mm: return an ERR_PTR from __filemap_get_folio)

Thanks, will fix it in v2.

>
> > ---
> >  mm/filemap.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 804d7365680c..b06bd6eedaf7 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1986,8 +1986,15 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
> >
> >               if (err == -EEXIST)
> >                       goto repeat;
> > -             if (err)
> > +             if (err) {
> > +                     /*
> > +                      * Presumably ENOMEM, either from when allocating or
> > +                      * adding folio (this one for xarray node)
> > +                      */
>
> I don't like the comment.  Better to do that in code:
>

Initially I was doing exactly what you proposed above, but after
reading do_read_cache_folio() and the patch the introduces the
regression, which transforms failure to get a folio (a NULL) with
FGP_NOWAIT into NOAGAIN, I decided to do this, but it's indeed better
to remove assumptions. Not ideal for the long run. Will change in v2.
thanks.
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 804d7365680c..b06bd6eedaf7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1986,8 +1986,15 @@  struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 
 		if (err == -EEXIST)
 			goto repeat;
-		if (err)
+		if (err) {
+			/*
+			 * Presumably ENOMEM, either from when allocating or
+			 * adding folio (this one for xarray node)
+			 */
+			if (fgp_flags & FGP_NOWAIT)
+				err = -EAGAIN;
 			return ERR_PTR(err);
+		}
 		/*
 		 * filemap_add_folio locks the page, and for mmap
 		 * we expect an unlocked page.