diff mbox series

[RFC,2/4] fs: Add IOCB_NOIO flag for generic_file_read_iter

Message ID 20200702165120.1469875-3-agruenba@redhat.com (mailing list archive)
State New, archived
Headers show
Series Fix gfs2 readahead deadlocks | expand

Commit Message

Andreas Gruenbacher July 2, 2020, 4:51 p.m. UTC
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 include/linux/fs.h |  1 +
 mm/filemap.c       | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

Comments

Linus Torvalds July 2, 2020, 6:06 p.m. UTC | #1
On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> shouldn't trigger any filesystem I/O for the actual request or for
> readahead.  This allows to do tentative reads out of the page cache as
> some filesystems allow, and to take the appropriate locks and retry the
> reads only if the requested pages are not cached.

This looks sane to me, except for this part:
>                 if (!PageUptodate(page)) {
> -                       if (iocb->ki_flags & IOCB_NOWAIT) {
> +                       if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
>                                 put_page(page);
>                                 goto would_block;
>                         }

This path doesn't actually initiate reads at all - it waits for
existing reads to finish.

So I think it should only check for IOCB_NOWAIT.

Of course, if you want to avoid both new reads to be submitted _and_
avoid waiting for existing pending reads, you should just set both
flags, and you get the semantics you want. So for your case, this may
not make any difference.

But if the issue is a deadlock where the code can block for IO, but
not call back down to the filesystem for new IO (because it holds a
lock that the filesystem might need) then this patch as-is is wrong,
because it disallows even that case.

              Linus
Andreas Gruenbacher July 2, 2020, 7:58 p.m. UTC | #2
On Thu, Jul 2, 2020 at 8:06 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> > Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> > shouldn't trigger any filesystem I/O for the actual request or for
> > readahead.  This allows to do tentative reads out of the page cache as
> > some filesystems allow, and to take the appropriate locks and retry the
> > reads only if the requested pages are not cached.
>
> This looks sane to me, except for this part:
> >                 if (!PageUptodate(page)) {
> > -                       if (iocb->ki_flags & IOCB_NOWAIT) {
> > +                       if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
> >                                 put_page(page);
> >                                 goto would_block;
> >                         }
>
> This path doesn't actually initiate reads at all - it waits for
> existing reads to finish.
>
> So I think it should only check for IOCB_NOWAIT.
>
> Of course, if you want to avoid both new reads to be submitted _and_
> avoid waiting for existing pending reads, you should just set both
> flags, and you get the semantics you want. So for your case, this may
> not make any difference.

Indeed, in the gfs2 case, waiting for existing pending reads should be
fine. I'll send an update after some testing.

Thanks,
Andreas
Linus Torvalds July 2, 2020, 8:17 p.m. UTC | #3
On Thu, Jul 2, 2020 at 12:58 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> > Of course, if you want to avoid both new reads to be submitted _and_
> > avoid waiting for existing pending reads, you should just set both
> > flags, and you get the semantics you want. So for your case, this may
> > not make any difference.
>
> Indeed, in the gfs2 case, waiting for existing pending reads should be
> fine. I'll send an update after some testing.

Do note that "wait for pending reads" very much does imply "wait for
those reads to _complete_".

And maybe the IO completion handler itself ends up having to finalize
something and take the lock to do that?

So in that case, even just "waiting" will cause a deadlock. Not
because the waiter itself needs the lock, but because the thing it
waits for might possibly need it.

But in many simple cases, IO completion shouldn't need any filesystem
locks. I just don't know the gfs2 code at all, so I'm not even going
to guess. I just wanted to mention it.

               Linus
Andreas Gruenbacher July 3, 2020, 9:45 a.m. UTC | #4
On Thu, Jul 2, 2020 at 10:18 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 2, 2020 at 12:58 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> > > Of course, if you want to avoid both new reads to be submitted _and_
> > > avoid waiting for existing pending reads, you should just set both
> > > flags, and you get the semantics you want. So for your case, this may
> > > not make any difference.
> >
> > Indeed, in the gfs2 case, waiting for existing pending reads should be
> > fine. I'll send an update after some testing.
>
> Do note that "wait for pending reads" very much does imply "wait for
> those reads to _complete_".
>
> And maybe the IO completion handler itself ends up having to finalize
> something and take the lock to do that?
>
> So in that case, even just "waiting" will cause a deadlock. Not
> because the waiter itself needs the lock, but because the thing it
> waits for might possibly need it.
>
> But in many simple cases, IO completion shouldn't need any filesystem
> locks. I just don't know the gfs2 code at all, so I'm not even going
> to guess. I just wanted to mention it.

Yes, that makes sense. Luckily gfs2 doesn't do any such locking on IO
completion.

Thanks,
Andreas
Andreas Gruenbacher July 7, 2020, 2:30 p.m. UTC | #5
On Thu, Jul 2, 2020 at 8:06 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> > Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> > shouldn't trigger any filesystem I/O for the actual request or for
> > readahead.  This allows to do tentative reads out of the page cache as
> > some filesystems allow, and to take the appropriate locks and retry the
> > reads only if the requested pages are not cached.
>
> This looks sane to me, except for this part:
> >                 if (!PageUptodate(page)) {
> > -                       if (iocb->ki_flags & IOCB_NOWAIT) {
> > +                       if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
> >                                 put_page(page);
> >                                 goto would_block;
> >                         }
>
> This path doesn't actually initiate reads at all - it waits for
> existing reads to finish.
>
> So I think it should only check for IOCB_NOWAIT.

It turns out that label readpage is reachable from here via goto
page_not_up_to_date / goto page_not_up_to_date_locked. So IOCB_NOIO
needs to be checked somewhere. I'll send an update.

Andreas
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..1ab2ea19e883 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -315,6 +315,7 @@  enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+#define IOCB_NOIO		(1 << 8)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..e8318f99f468 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2028,7 +2028,7 @@  ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
 		page = find_get_page(mapping, index);
 		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
+			if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
 				goto would_block;
 			page_cache_sync_readahead(mapping,
 					ra, filp,
@@ -2038,12 +2038,16 @@  ssize_t generic_file_buffered_read(struct kiocb *iocb,
 				goto no_cached_page;
 		}
 		if (PageReadahead(page)) {
+			if (iocb->ki_flags & IOCB_NOIO) {
+				put_page(page);
+				goto out;
+			}
 			page_cache_async_readahead(mapping,
 					ra, filp, page,
 					index, last_index - index);
 		}
 		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
+			if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
 				put_page(page);
 				goto would_block;
 			}
@@ -2249,6 +2253,14 @@  EXPORT_SYMBOL_GPL(generic_file_buffered_read);
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall
+ * be returned when no data can be read without issuing I/O requests;
+ * this doesn't prevent readahead.  The IOCB_NOIO flag indicates that
+ * -EAGAIN shall be returned when no data can be read without issuing
+ * I/O requests, and 0 shall be returned when readhead would be
+ * triggered.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
  * * negative error code if nothing was read