diff mbox series

[v3,5/7] iomap: Support restarting direct I/O requests after user copy failures

Message ID 20210723205840.299280-6-agruenba@redhat.com (mailing list archive)
State New, archived
Headers show
Series gfs2: Fix mmap + page fault deadlocks | expand

Commit Message

Andreas Gruenbacher July 23, 2021, 8:58 p.m. UTC
In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete the
request synchronously and reset the iterator to the start position.  This
allows callers to deal with the failure and retry the operation.

In gfs2, we need to disable page faults while we're holding glocks to prevent
deadlocks.  This patch is the minimum solution I could find to make
iomap_dio_rw work with page faults disabled.  It's still expensive because any
I/O that was carried out before hitting -EFAULT needs to be retried.

A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or similar flag
that would allow iomap_dio_rw to return a short result when hitting -EFAULT.
Callers could then retry only the rest of the request after dealing with the
page fault.

Asynchronous requests turn into synchronous requests up to the point of the
page fault in any case, but they could be retried asynchronously after dealing
with the page fault.  To make that work, the completion notification would have
to include the bytes read or written before the page fault(s) as well, and we'd
need an additional iomap_dio_rw argument for that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/iomap/direct-io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jan Kara July 26, 2021, 5:19 p.m. UTC | #1
On Fri 23-07-21 22:58:38, Andreas Gruenbacher wrote:
> In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete the
> request synchronously and reset the iterator to the start position.  This
> allows callers to deal with the failure and retry the operation.
> 
> In gfs2, we need to disable page faults while we're holding glocks to prevent
> deadlocks.  This patch is the minimum solution I could find to make
> iomap_dio_rw work with page faults disabled.  It's still expensive because any
> I/O that was carried out before hitting -EFAULT needs to be retried.
> 
> A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or similar flag
> that would allow iomap_dio_rw to return a short result when hitting -EFAULT.
> Callers could then retry only the rest of the request after dealing with the
> page fault.
> 
> Asynchronous requests turn into synchronous requests up to the point of the
> page fault in any case, but they could be retried asynchronously after dealing
> with the page fault.  To make that work, the completion notification would have
> to include the bytes read or written before the page fault(s) as well, and we'd
> need an additional iomap_dio_rw argument for that.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>  fs/iomap/direct-io.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index cc0b4bc8861b..b0a494211bb4 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -561,6 +561,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
>  				iomap_dio_actor);
>  		if (ret <= 0) {
> +			if (ret == -EFAULT) {
> +				/*
> +				 * To allow retrying the request, fail
> +				 * synchronously and reset the iterator.
> +				 */
> +				wait_for_completion = true;
> +				iov_iter_revert(dio->submit.iter, dio->size);
> +			}
> +

Hum, OK, but this means that if userspace submits large enough write, GFS2
will livelock trying to complete it? While other filesystems can just
submit multiple smaller bios constructed in iomap_apply() (paging in
different parts of the buffer) and thus complete the write?

								Honza

>  			/* magic error code to fall back to buffered I/O */
>  			if (ret == -ENOTBLK) {
>  				wait_for_completion = true;
> -- 
> 2.26.3
>
Andreas Grünbacher July 26, 2021, 5:45 p.m. UTC | #2
Jan Kara <jack@suse.cz> schrieb am Mo., 26. Juli 2021, 19:21:

> On Fri 23-07-21 22:58:38, Andreas Gruenbacher wrote:
> > In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete
> the
> > request synchronously and reset the iterator to the start position.  This
> > allows callers to deal with the failure and retry the operation.
> >
> > In gfs2, we need to disable page faults while we're holding glocks to
> prevent
> > deadlocks.  This patch is the minimum solution I could find to make
> > iomap_dio_rw work with page faults disabled.  It's still expensive
> because any
> > I/O that was carried out before hitting -EFAULT needs to be retried.
> >
> > A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or
> similar flag
> > that would allow iomap_dio_rw to return a short result when hitting
> -EFAULT.
> > Callers could then retry only the rest of the request after dealing with
> the
> > page fault.
> >
> > Asynchronous requests turn into synchronous requests up to the point of
> the
> > page fault in any case, but they could be retried asynchronously after
> dealing
> > with the page fault.  To make that work, the completion notification
> would have
> > to include the bytes read or written before the page fault(s) as well,
> and we'd
> > need an additional iomap_dio_rw argument for that.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > ---
> >  fs/iomap/direct-io.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index cc0b4bc8861b..b0a494211bb4 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -561,6 +561,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter
> *iter,
> >               ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
> >                               iomap_dio_actor);
> >               if (ret <= 0) {
> > +                     if (ret == -EFAULT) {
> > +                             /*
> > +                              * To allow retrying the request, fail
> > +                              * synchronously and reset the iterator.
> > +                              */
> > +                             wait_for_completion = true;
> > +                             iov_iter_revert(dio->submit.iter,
> dio->size);
> > +                     }
> > +
>
> Hum, OK, but this means that if userspace submits large enough write, GFS2
> will livelock trying to complete it? While other filesystems can just
> submit multiple smaller bios constructed in iomap_apply() (paging in
> different parts of the buffer) and thus complete the write?
>

No. First, this affects reads but not writes. We cannot just blindly repeat
writes; when a page fault occurs in the middle of a write, the result will
be a short write. For reads, the plan is to ads a flag to allow
iomap_dio_rw to return a partial result when a page fault occurs.
(Currently, it fails the entire request.) Then we can handle the page fault
and complete the rest of the request.

The changes needed for that are simple on the iomap side, but we need to go
through some gymnastics for handling the page fault without giving up the
glock in the non-contended case. There will still be the potential for
losing the lock and having to re-acquire it, in which case we'll actually
have to repeat the entire read.

Thanks,
Andreas

                                                                Honza
>
> >                       /* magic error code to fall back to buffered I/O */
> >                       if (ret == -ENOTBLK) {
> >                               wait_for_completion = true;
> > --
> > 2.26.3
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
>
Jan Kara July 26, 2021, 6:08 p.m. UTC | #3
On Mon 26-07-21 19:45:22, Andreas Grünbacher wrote:
> Jan Kara <jack@suse.cz> schrieb am Mo., 26. Juli 2021, 19:21:
> 
> > On Fri 23-07-21 22:58:38, Andreas Gruenbacher wrote:
> > > In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete
> > the
> > > request synchronously and reset the iterator to the start position.  This
> > > allows callers to deal with the failure and retry the operation.
> > >
> > > In gfs2, we need to disable page faults while we're holding glocks to
> > prevent
> > > deadlocks.  This patch is the minimum solution I could find to make
> > > iomap_dio_rw work with page faults disabled.  It's still expensive
> > because any
> > > I/O that was carried out before hitting -EFAULT needs to be retried.
> > >
> > > A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or
> > similar flag
> > > that would allow iomap_dio_rw to return a short result when hitting
> > -EFAULT.
> > > Callers could then retry only the rest of the request after dealing with
> > the
> > > page fault.
> > >
> > > Asynchronous requests turn into synchronous requests up to the point of
> > the
> > > page fault in any case, but they could be retried asynchronously after
> > dealing
> > > with the page fault.  To make that work, the completion notification
> > would have
> > > to include the bytes read or written before the page fault(s) as well,
> > and we'd
> > > need an additional iomap_dio_rw argument for that.
> > >
> > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > > ---
> > >  fs/iomap/direct-io.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index cc0b4bc8861b..b0a494211bb4 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -561,6 +561,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter
> > *iter,
> > >               ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
> > >                               iomap_dio_actor);
> > >               if (ret <= 0) {
> > > +                     if (ret == -EFAULT) {
> > > +                             /*
> > > +                              * To allow retrying the request, fail
> > > +                              * synchronously and reset the iterator.
> > > +                              */
> > > +                             wait_for_completion = true;
> > > +                             iov_iter_revert(dio->submit.iter,
> > dio->size);
> > > +                     }
> > > +
> >
> > Hum, OK, but this means that if userspace submits large enough write, GFS2
> > will livelock trying to complete it? While other filesystems can just
> > submit multiple smaller bios constructed in iomap_apply() (paging in
> > different parts of the buffer) and thus complete the write?
> >
> 
> No. First, this affects reads but not writes. We cannot just blindly repeat
> writes; when a page fault occurs in the middle of a write, the result will
> be a short write. For reads, the plan is to ads a flag to allow
> iomap_dio_rw to return a partial result when a page fault occurs.
> (Currently, it fails the entire request.) Then we can handle the page fault
> and complete the rest of the request.
> 
> The changes needed for that are simple on the iomap side, but we need to go
> through some gymnastics for handling the page fault without giving up the
> glock in the non-contended case. There will still be the potential for
> losing the lock and having to re-acquire it, in which case we'll actually
> have to repeat the entire read.

I've missed you've already sent out v4 (I'm catching up after vacation so
my mailbox is a bit of a mess). What's in there addresses my objection.
I'm sorry for the noise.

								Honza
diff mbox series

Patch

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index cc0b4bc8861b..b0a494211bb4 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -561,6 +561,15 @@  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
 				iomap_dio_actor);
 		if (ret <= 0) {
+			if (ret == -EFAULT) {
+				/*
+				 * To allow retrying the request, fail
+				 * synchronously and reset the iterator.
+				 */
+				wait_for_completion = true;
+				iov_iter_revert(dio->submit.iter, dio->size);
+			}
+
 			/* magic error code to fall back to buffered I/O */
 			if (ret == -ENOTBLK) {
 				wait_for_completion = true;