diff mbox

[RFC] O_DIRECT vs EFAULT (was Re: [PATCH 10/12] new iov_iter flavour: pipe-backed)

Message ID 20161003033409.GT19539@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Oct. 3, 2016, 3:34 a.m. UTC
On Fri, Sep 30, 2016 at 09:30:21AM +0200, Miklos Szeredi wrote:
> On Fri, Sep 30, 2016 at 12:50 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Thu, Sep 29, 2016 at 10:53:55PM +0200, Miklos Szeredi wrote:
> >
> >> The EFAULT logic seems to be missing across the board.  And callers
> >> don't expect a zero return value.  Most will loop indefinitely.
> >
> > Nope.  copy_page_to_iter() *never* returns -EFAULT.  Including the iovec
> > one - check copy_page_to_iter_iovec().  Any caller that does not expect
> > a zero return value from that primitive is a bug, triggerable as soon as
> > you feed it an iovec with NULL ->iov_base.
> 
> Right.
> 
> I was actually looking at iov_iter_get_pages() callers...

FWIW, that's interesting - O_DIRECT readv()/writev() reacts to fault anywhere
as "nothing done, return -EFAULT now", rather than a short read/write.
Despite that some IO is actually done.  Note, BTW, that we are not even
consistent between the filesystems - local block ones do IO and give -EFAULT,
while NFS, Lustre and FUSE do short read/write, reporting -EFAULT only upon
shortening to nothing.  So does ceph, except that shortening might be for
more than one page.

Considering how weak POSIX is in that area, we are probably not violating
anything, but... it would be more convenient if we treated those as
short read/write, same way for all filesystems.

Linus, do you have any objections against such behaviour change?  AFAICS,
all it takes is this:

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Linus Torvalds Oct. 3, 2016, 5:07 p.m. UTC | #1
On Sun, Oct 2, 2016 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Linus, do you have any objections against such behaviour change?  AFAICS,
> all it takes is this:
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 7c3ce73..3a8ebda 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -246,6 +246,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
>                 if ((dio->op == REQ_OP_READ) &&
>                     ((offset + transferred) > dio->i_size))
>                         transferred = dio->i_size - offset;
> +               if (ret == -EFAULT)
> +                       ret = 0;

I don't think that's right. To me it looks like the short read case
might have changed "transferred" back to zero, in which case we do
*not* want to skip the EFAULT.

But if there's some reason that can't happen (ie "dio->i_size" is
guaranteed to be larger than "offset"), then with a comment to that
effect it's ok.

Otherwise I think it would need to be something like

        /* If we were partially successful, ignore later EFAULT */
        if (transferred && ret == -EFAULT)
                ret = 0;

or something. Yes?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Oct. 3, 2016, 6:54 p.m. UTC | #2
On Mon, Oct 03, 2016 at 10:07:39AM -0700, Linus Torvalds wrote:
> On Sun, Oct 2, 2016 at 8:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Linus, do you have any objections against such behaviour change?  AFAICS,
> > all it takes is this:
> >
> > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > index 7c3ce73..3a8ebda 100644
> > --- a/fs/direct-io.c
> > +++ b/fs/direct-io.c
> > @@ -246,6 +246,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
> >                 if ((dio->op == REQ_OP_READ) &&
> >                     ((offset + transferred) > dio->i_size))
> >                         transferred = dio->i_size - offset;
> > +               if (ret == -EFAULT)
> > +                       ret = 0;
> 
> I don't think that's right. To me it looks like the short read case
> might have changed "transferred" back to zero, in which case we do
> *not* want to skip the EFAULT.

There's this in do_blockdev_direct_IO():
        /* Once we sampled i_size check for reads beyond EOF */
        dio->i_size = i_size_read(inode);
        if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
                if (dio->flags & DIO_LOCKING)
                        mutex_unlock(&inode->i_mutex);
                kmem_cache_free(dio_cache, dio);
                retval = 0;
                goto out;
        }
so that shouldn't happen.  Said that,

> But if there's some reason that can't happen (ie "dio->i_size" is
> guaranteed to be larger than "offset"), then with a comment to that
> effect it's ok.
> 
> Otherwise I think it would need to be something like
> 
>         /* If we were partially successful, ignore later EFAULT */
>         if (transferred && ret == -EFAULT)
>                 ret = 0;

... it's certainly less brittle that way.  I'd probably still put it under
the same if (dio->result) and write it as
	if (unlikely(ret == -EFAULT) && transferred)
though.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7c3ce73..3a8ebda 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -246,6 +246,8 @@  static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
 		if ((dio->op == REQ_OP_READ) &&
 		    ((offset + transferred) > dio->i_size))
 			transferred = dio->i_size - offset;
+		if (ret == -EFAULT)
+			ret = 0;
 	}
 
 	if (ret == 0)