NFS: Retry a zero-length short read

Message ID	79adfc63f33887475f6a0f0eeaacd8b52b2ed12c.1458119750.git.bcodding@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Benjamin Coddington <bcodding@redhat.com> To: Trond Myklebust <trond.myklebust@primarydata.com>, Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Subject: [PATCH] NFS: Retry a zero-length short read Date: Wed, 16 Mar 2016 05:17:07 -0400 Message-Id: <79adfc63f33887475f6a0f0eeaacd8b52b2ed12c.1458119750.git.bcodding@redhat.com> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Benjamin Coddington March 16, 2016, 9:17 a.m. UTC

A zero-length short read without eof should be retried rather than sending
an error to the application.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/read.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

Trond Myklebust March 16, 2016, 1:14 p.m. UTC | #1

On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
<bcodding@redhat.com> wrote:
>
> A zero-length short read without eof should be retried rather than sending
> an error to the application.


In what situation would returning a 0 length read not be a bug? If the
server intended that we back off and retry, it has the alternative of
sending a JUKEBOX/DELAY error.


> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
>  fs/nfs/read.c |    5 -----
>  1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index eb31e23..7269d42 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
>
>         /* This is a short read! */
>         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> -       /* Has the server at least made some progress? */
> -       if (resp->count == 0) {
> -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> -               return;
> -       }
>
>         /* For non rpc-based layout drivers, retry-through-MDS */
>         if (!task->tk_ops) {
> --
> 1.7.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 2:22 p.m. UTC | #2

On Wed, 16 Mar 2016, Trond Myklebust wrote:

> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> >
> > A zero-length short read without eof should be retried rather than sending
> > an error to the application.
>
>
> In what situation would returning a 0 length read not be a bug? If the
> server intended that we back off and retry, it has the alternative of
> sending a JUKEBOX/DELAY error.

If the server completes a local read but then another writer comes in and
appends to the file before the server checks if it needs to set EOF, then
the response might be 0 length without EOF set.

I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
how I think the client should behave; it says that the client should retry
a short read without eof set.  I think that should include a response with
0 length.


> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > ---
> >  fs/nfs/read.c |    5 -----
> >  1 files changed, 0 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > index eb31e23..7269d42 100644
> > --- a/fs/nfs/read.c
> > +++ b/fs/nfs/read.c
> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
> >
> >         /* This is a short read! */
> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> > -       /* Has the server at least made some progress? */
> > -       if (resp->count == 0) {
> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> > -               return;
> > -       }
> >
> >         /* For non rpc-based layout drivers, retry-through-MDS */
> >         if (!task->tk_ops) {
> > --
> > 1.7.1
> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust March 16, 2016, 2:40 p.m. UTC | #3

On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
<bcodding@redhat.com> wrote:
> On Wed, 16 Mar 2016, Trond Myklebust wrote:
>
>> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
>> <bcodding@redhat.com> wrote:
>> >
>> > A zero-length short read without eof should be retried rather than sending
>> > an error to the application.
>>
>>
>> In what situation would returning a 0 length read not be a bug? If the
>> server intended that we back off and retry, it has the alternative of
>> sending a JUKEBOX/DELAY error.
>
> If the server completes a local read but then another writer comes in and
> appends to the file before the server checks if it needs to set EOF, then
> the response might be 0 length without EOF set.

Why isn't that EOF check done atomically with the read itself? This
still sounds like a server bug to me.

> I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
> how I think the client should behave; it says that the client should retry
> a short read without eof set.  I think that should include a response with
> 0 length.

>> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> > ---
>> >  fs/nfs/read.c |    5 -----
>> >  1 files changed, 0 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> > index eb31e23..7269d42 100644
>> > --- a/fs/nfs/read.c
>> > +++ b/fs/nfs/read.c
>> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
>> >
>> >         /* This is a short read! */
>> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
>> > -       /* Has the server at least made some progress? */
>> > -       if (resp->count == 0) {
>> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
>> > -               return;
>> > -       }
>> >
>> >         /* For non rpc-based layout drivers, retry-through-MDS */
>> >         if (!task->tk_ops) {
>> > --
>> > 1.7.1
>> >
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 2:56 p.m. UTC | #4

On Wed, 16 Mar 2016, Trond Myklebust wrote:

> On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >
> >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> >> <bcodding@redhat.com> wrote:
> >> >
> >> > A zero-length short read without eof should be retried rather than sending
> >> > an error to the application.
> >>
> >>
> >> In what situation would returning a 0 length read not be a bug? If the
> >> server intended that we back off and retry, it has the alternative of
> >> sending a JUKEBOX/DELAY error.
> >
> > If the server completes a local read but then another writer comes in and
> > appends to the file before the server checks if it needs to set EOF, then
> > the response might be 0 length without EOF set.
>
> Why isn't that EOF check done atomically with the read itself? This
> still sounds like a server bug to me.

I don't know -- I would guess because doing that atomically is harder than
not, and I don't know where the RFCs say that a zero length response without
eof is to be treated as an error or condition to be avoided.

I'll look into that, and respond here.

> > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
> > how I think the client should behave; it says that the client should retry
> > a short read without eof set.  I think that should include a response with
> > 0 length.

Here's the verbatim from section 12.23.5:

   If the server returns a "short read" (i.e., less data than requested
   and eof is set to FALSE), the client should send another READ to get
   the remaining data.  A server may return less data than requested
   under several circumstances.  The file may have been truncated by
   another client or perhaps on the server itself, changing the file
   size from what the requesting client believes to be the case.  This
   would reduce the actual amount of data available to the client.  It
   is possible that the server reduces the transfer size and so returns
   a short read result.  Server resource exhaustion may also result in a
   short read.

Ben

> >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> >> > ---
> >> >  fs/nfs/read.c |    5 -----
> >> >  1 files changed, 0 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> >> > index eb31e23..7269d42 100644
> >> > --- a/fs/nfs/read.c
> >> > +++ b/fs/nfs/read.c
> >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
> >> >
> >> >         /* This is a short read! */
> >> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> >> > -       /* Has the server at least made some progress? */
> >> > -       if (resp->count == 0) {
> >> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> >> > -               return;
> >> > -       }
> >> >
> >> >         /* For non rpc-based layout drivers, retry-through-MDS */
> >> >         if (!task->tk_ops) {
> >> > --
> >> > 1.7.1
> >> >
> >>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 3:20 p.m. UTC | #5

On Wed, 16 Mar 2016, Benjamin Coddington wrote:

> On Wed, 16 Mar 2016, Trond Myklebust wrote:
>
> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > <bcodding@redhat.com> wrote:
> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > >
> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > >> <bcodding@redhat.com> wrote:
> > >> >
> > >> > A zero-length short read without eof should be retried rather than sending
> > >> > an error to the application.
> > >>
> > >>
> > >> In what situation would returning a 0 length read not be a bug? If the
> > >> server intended that we back off and retry, it has the alternative of
> > >> sending a JUKEBOX/DELAY error.
> > >
> > > If the server completes a local read but then another writer comes in and
> > > appends to the file before the server checks if it needs to set EOF, then
> > > the response might be 0 length without EOF set.
> >
> > Why isn't that EOF check done atomically with the read itself? This
> > still sounds like a server bug to me.
>
> I don't know -- I would guess because doing that atomically is harder than
> not, and I don't know where the RFCs say that a zero length response without
> eof is to be treated as an error or condition to be avoided.
>
> I'll look into that, and respond here.

Indeed, it seems that it is more convenient for the linux server to send a
zero-length response without eof when the file grows.  It would probably be
more helpful if the server handled that case, but I think that 7530 states
that it doesn't have to handle that case.

Ben

> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
> > > how I think the client should behave; it says that the client should retry
> > > a short read without eof set.  I think that should include a response with
> > > 0 length.
>
> Here's the verbatim from section 12.23.5:
>
>    If the server returns a "short read" (i.e., less data than requested
>    and eof is set to FALSE), the client should send another READ to get
>    the remaining data.  A server may return less data than requested
>    under several circumstances.  The file may have been truncated by
>    another client or perhaps on the server itself, changing the file
>    size from what the requesting client believes to be the case.  This
>    would reduce the actual amount of data available to the client.  It
>    is possible that the server reduces the transfer size and so returns
>    a short read result.  Server resource exhaustion may also result in a
>    short read.
>
> Ben
>
> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > >> > ---
> > >> >  fs/nfs/read.c |    5 -----
> > >> >  1 files changed, 0 insertions(+), 5 deletions(-)
> > >> >
> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > >> > index eb31e23..7269d42 100644
> > >> > --- a/fs/nfs/read.c
> > >> > +++ b/fs/nfs/read.c
> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
> > >> >
> > >> >         /* This is a short read! */
> > >> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> > >> > -       /* Has the server at least made some progress? */
> > >> > -       if (resp->count == 0) {
> > >> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> > >> > -               return;
> > >> > -       }
> > >> >
> > >> >         /* For non rpc-based layout drivers, retry-through-MDS */
> > >> >         if (!task->tk_ops) {
> > >> > --
> > >> > 1.7.1
> > >> >
> > >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust March 16, 2016, 4:22 p.m. UTC | #6

On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
<bcodding@redhat.com> wrote:
> On Wed, 16 Mar 2016, Benjamin Coddington wrote:
>
>> On Wed, 16 Mar 2016, Trond Myklebust wrote:
>>
>> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
>> > <bcodding@redhat.com> wrote:
>> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
>> > >
>> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
>> > >> <bcodding@redhat.com> wrote:
>> > >> >
>> > >> > A zero-length short read without eof should be retried rather than sending
>> > >> > an error to the application.
>> > >>
>> > >>
>> > >> In what situation would returning a 0 length read not be a bug? If the
>> > >> server intended that we back off and retry, it has the alternative of
>> > >> sending a JUKEBOX/DELAY error.
>> > >
>> > > If the server completes a local read but then another writer comes in and
>> > > appends to the file before the server checks if it needs to set EOF, then
>> > > the response might be 0 length without EOF set.
>> >
>> > Why isn't that EOF check done atomically with the read itself? This
>> > still sounds like a server bug to me.
>>
>> I don't know -- I would guess because doing that atomically is harder than
>> not, and I don't know where the RFCs say that a zero length response without
>> eof is to be treated as an error or condition to be avoided.
>>
>> I'll look into that, and respond here.
>
> Indeed, it seems that it is more convenient for the linux server to send a
> zero-length response without eof when the file grows.  It would probably be
> more helpful if the server handled that case, but I think that 7530 states
> that it doesn't have to handle that case.

Here is what RFC5661 and RFC7530 say.

   If the READ ended at the end-of-file (formally, in a correctly formed
   READ request, if offset + count is equal to the size of the file), or
   the READ request extends beyond the size of the file (if offset +
   count is greater than the size of the file), eof is returned as TRUE;
   otherwise, it is FALSE.  A successful READ of an empty file will
   always return eof as TRUE.

Here is what RFC1813 says:

      eof
         If the read ended at the end-of-file (formally, in a
         correctly formed READ request, if READ3args.offset plus
         READ3resok.count is equal to the size of the file), eof
         is returned as TRUE; otherwise it is FALSE. A
         successful READ of an empty file will always return eof
         as TRUE.

Where does it say that the eof determination is allowed to be
non-atomic? Unlike structures such as change_info4, there isn't an
"atomic" flag to allow the server to communicate to the client that it
cannot rely on the eof flag. Since the determination is part of the
same READ operation, you can't point to the "COMPOUNDS are not atomic"
either.


>
> Ben
>
>> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
>> > > how I think the client should behave; it says that the client should retry
>> > > a short read without eof set.  I think that should include a response with
>> > > 0 length.
>>
>> Here's the verbatim from section 12.23.5:
>>
>>    If the server returns a "short read" (i.e., less data than requested
>>    and eof is set to FALSE), the client should send another READ to get
>>    the remaining data.  A server may return less data than requested
>>    under several circumstances.  The file may have been truncated by
>>    another client or perhaps on the server itself, changing the file
>>    size from what the requesting client believes to be the case.  This
>>    would reduce the actual amount of data available to the client.  It
>>    is possible that the server reduces the transfer size and so returns
>>    a short read result.  Server resource exhaustion may also result in a
>>    short read.
>>
>> Ben
>>
>> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> > >> > ---
>> > >> >  fs/nfs/read.c |    5 -----
>> > >> >  1 files changed, 0 insertions(+), 5 deletions(-)
>> > >> >
>> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
>> > >> > index eb31e23..7269d42 100644
>> > >> > --- a/fs/nfs/read.c
>> > >> > +++ b/fs/nfs/read.c
>> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
>> > >> >
>> > >> >         /* This is a short read! */
>> > >> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
>> > >> > -       /* Has the server at least made some progress? */
>> > >> > -       if (resp->count == 0) {
>> > >> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
>> > >> > -               return;
>> > >> > -       }
>> > >> >
>> > >> >         /* For non rpc-based layout drivers, retry-through-MDS */
>> > >> >         if (!task->tk_ops) {
>> > >> > --
>> > >> > 1.7.1
>> > >> >
>> > >>
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields March 16, 2016, 5:18 p.m. UTC | #7

On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> >
> >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >>
> >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> >> > <bcodding@redhat.com> wrote:
> >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >> > >
> >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> >> > >> <bcodding@redhat.com> wrote:
> >> > >> >
> >> > >> > A zero-length short read without eof should be retried rather than sending
> >> > >> > an error to the application.
> >> > >>
> >> > >>
> >> > >> In what situation would returning a 0 length read not be a bug? If the
> >> > >> server intended that we back off and retry, it has the alternative of
> >> > >> sending a JUKEBOX/DELAY error.
> >> > >
> >> > > If the server completes a local read but then another writer comes in and
> >> > > appends to the file before the server checks if it needs to set EOF, then
> >> > > the response might be 0 length without EOF set.
> >> >
> >> > Why isn't that EOF check done atomically with the read itself? This
> >> > still sounds like a server bug to me.
> >>
> >> I don't know -- I would guess because doing that atomically is harder than
> >> not, and I don't know where the RFCs say that a zero length response without
> >> eof is to be treated as an error or condition to be avoided.
> >>
> >> I'll look into that, and respond here.
> >
> > Indeed, it seems that it is more convenient for the linux server to send a
> > zero-length response without eof when the file grows.  It would probably be
> > more helpful if the server handled that case, but I think that 7530 states
> > that it doesn't have to handle that case.
> 
> Here is what RFC5661 and RFC7530 say.
> 
>    If the READ ended at the end-of-file (formally, in a correctly formed
>    READ request, if offset + count is equal to the size of the file), or
>    the READ request extends beyond the size of the file (if offset +
>    count is greater than the size of the file), eof is returned as TRUE;
>    otherwise, it is FALSE.  A successful READ of an empty file will
>    always return eof as TRUE.
> 
> Here is what RFC1813 says:
> 
>       eof
>          If the read ended at the end-of-file (formally, in a
>          correctly formed READ request, if READ3args.offset plus
>          READ3resok.count is equal to the size of the file), eof
>          is returned as TRUE; otherwise it is FALSE. A
>          successful READ of an empty file will always return eof
>          as TRUE.
> 
> Where does it say that the eof determination is allowed to be
> non-atomic? Unlike structures such as change_info4, there isn't an
> "atomic" flag to allow the server to communicate to the client that it
> cannot rely on the eof flag. Since the determination is part of the
> same READ operation, you can't point to the "COMPOUNDS are not atomic"
> either.

I wonder why READ has eof at all, instead of adopting read()'s
convention that 0 means eof?

In any case, our server should be able to count on that convention
internally, so if getting a real "eof" out of the filesystem looks
daunting, why not for now just check for that case in
nfsd4_encode_readv()?:


-	eof = (read->rd_offset + maxcount >=
+	eof = (maxcount == 0) || (read->rd_offset + maxcount >=
 	       d_inode(read->rd_fhp->fh_dentry)->i_size);
 

(Actually, I'm not sure that's completely right: if 0-length read
requests are legal then we need to exclude that case.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 5:30 p.m. UTC | #8

On Wed, 16 Mar 2016, Trond Myklebust wrote:

> On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> >
> >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >>
> >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> >> > <bcodding@redhat.com> wrote:
> >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >> > >
> >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> >> > >> <bcodding@redhat.com> wrote:
> >> > >> >
> >> > >> > A zero-length short read without eof should be retried rather than sending
> >> > >> > an error to the application.
> >> > >>
> >> > >>
> >> > >> In what situation would returning a 0 length read not be a bug? If the
> >> > >> server intended that we back off and retry, it has the alternative of
> >> > >> sending a JUKEBOX/DELAY error.
> >> > >
> >> > > If the server completes a local read but then another writer comes in and
> >> > > appends to the file before the server checks if it needs to set EOF, then
> >> > > the response might be 0 length without EOF set.
> >> >
> >> > Why isn't that EOF check done atomically with the read itself? This
> >> > still sounds like a server bug to me.
> >>
> >> I don't know -- I would guess because doing that atomically is harder than
> >> not, and I don't know where the RFCs say that a zero length response without
> >> eof is to be treated as an error or condition to be avoided.
> >>
> >> I'll look into that, and respond here.
> >
> > Indeed, it seems that it is more convenient for the linux server to send a
> > zero-length response without eof when the file grows.  It would probably be
> > more helpful if the server handled that case, but I think that 7530 states
> > that it doesn't have to handle that case.
>
> Here is what RFC5661 and RFC7530 say.
>
>    If the READ ended at the end-of-file (formally, in a correctly formed
>    READ request, if offset + count is equal to the size of the file), or
>    the READ request extends beyond the size of the file (if offset +
>    count is greater than the size of the file), eof is returned as TRUE;
>    otherwise, it is FALSE.  A successful READ of an empty file will
>    always return eof as TRUE.
>
> Here is what RFC1813 says:
>
>       eof
>          If the read ended at the end-of-file (formally, in a
>          correctly formed READ request, if READ3args.offset plus
>          READ3resok.count is equal to the size of the file), eof
>          is returned as TRUE; otherwise it is FALSE. A
>          successful READ of an empty file will always return eof
>          as TRUE.

These cover the eof case, but what about server resource exhaustion?
NFS4ERR_DELAY seems inappropriate since it signals that the operation could
not be completed in an appropriate amount of time.

RFC5661 also says:

   If the client specifies a count value of 0 (zero), the READ succeeds
   and returns 0 (zero) bytes of data (subject to access permissions
   checking).  The server may choose to return fewer bytes than
   specified by the client.  The client needs to check for this
   condition and handle the condition appropriately.

So there is a case where a server can send a read response of zero length
without eof set, and this is perfectly fine.  And, it is the server's choice
to send fewer bytes.

> Where does it say that the eof determination is allowed to be
> non-atomic? Unlike structures such as change_info4, there isn't an
> "atomic" flag to allow the server to communicate to the client that it
> cannot rely on the eof flag. Since the determination is part of the
> same READ operation, you can't point to the "COMPOUNDS are not atomic"
> either.

I don't see where it says that the eof determination is allowed to be
non-atomic, but I do see where it says the amount of data returned is up to
the server.  Why can't the server decide to skip the local read altogether
this time around, returning no data and also not setting eof?

Ben

> >> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
> >> > > how I think the client should behave; it says that the client should retry
> >> > > a short read without eof set.  I think that should include a response with
> >> > > 0 length.
> >>
> >> Here's the verbatim from section 12.23.5:
> >>
> >>    If the server returns a "short read" (i.e., less data than requested
> >>    and eof is set to FALSE), the client should send another READ to get
> >>    the remaining data.  A server may return less data than requested
> >>    under several circumstances.  The file may have been truncated by
> >>    another client or perhaps on the server itself, changing the file
> >>    size from what the requesting client believes to be the case.  This
> >>    would reduce the actual amount of data available to the client.  It
> >>    is possible that the server reduces the transfer size and so returns
> >>    a short read result.  Server resource exhaustion may also result in a
> >>    short read.
> >>
> >> Ben
> >>
> >> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> >> > >> > ---
> >> > >> >  fs/nfs/read.c |    5 -----
> >> > >> >  1 files changed, 0 insertions(+), 5 deletions(-)
> >> > >> >
> >> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> >> > >> > index eb31e23..7269d42 100644
> >> > >> > --- a/fs/nfs/read.c
> >> > >> > +++ b/fs/nfs/read.c
> >> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
> >> > >> >
> >> > >> >         /* This is a short read! */
> >> > >> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> >> > >> > -       /* Has the server at least made some progress? */
> >> > >> > -       if (resp->count == 0) {
> >> > >> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> >> > >> > -               return;
> >> > >> > -       }
> >> > >> >
> >> > >> >         /* For non rpc-based layout drivers, retry-through-MDS */
> >> > >> >         if (!task->tk_ops) {
> >> > >> > --
> >> > >> > 1.7.1
> >> > >> >
> >> > >>
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 5:36 p.m. UTC | #9

On Wed, 16 Mar 2016, J. Bruce Fields wrote:

> On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> > <bcodding@redhat.com> wrote:
> > > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> > >
> > >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > >>
> > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > >> > <bcodding@redhat.com> wrote:
> > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > >> > >
> > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > >> > >> <bcodding@redhat.com> wrote:
> > >> > >> >
> > >> > >> > A zero-length short read without eof should be retried rather than sending
> > >> > >> > an error to the application.
> > >> > >>
> > >> > >>
> > >> > >> In what situation would returning a 0 length read not be a bug? If the
> > >> > >> server intended that we back off and retry, it has the alternative of
> > >> > >> sending a JUKEBOX/DELAY error.
> > >> > >
> > >> > > If the server completes a local read but then another writer comes in and
> > >> > > appends to the file before the server checks if it needs to set EOF, then
> > >> > > the response might be 0 length without EOF set.
> > >> >
> > >> > Why isn't that EOF check done atomically with the read itself? This
> > >> > still sounds like a server bug to me.
> > >>
> > >> I don't know -- I would guess because doing that atomically is harder than
> > >> not, and I don't know where the RFCs say that a zero length response without
> > >> eof is to be treated as an error or condition to be avoided.
> > >>
> > >> I'll look into that, and respond here.
> > >
> > > Indeed, it seems that it is more convenient for the linux server to send a
> > > zero-length response without eof when the file grows.  It would probably be
> > > more helpful if the server handled that case, but I think that 7530 states
> > > that it doesn't have to handle that case.
> >
> > Here is what RFC5661 and RFC7530 say.
> >
> >    If the READ ended at the end-of-file (formally, in a correctly formed
> >    READ request, if offset + count is equal to the size of the file), or
> >    the READ request extends beyond the size of the file (if offset +
> >    count is greater than the size of the file), eof is returned as TRUE;
> >    otherwise, it is FALSE.  A successful READ of an empty file will
> >    always return eof as TRUE.
> >
> > Here is what RFC1813 says:
> >
> >       eof
> >          If the read ended at the end-of-file (formally, in a
> >          correctly formed READ request, if READ3args.offset plus
> >          READ3resok.count is equal to the size of the file), eof
> >          is returned as TRUE; otherwise it is FALSE. A
> >          successful READ of an empty file will always return eof
> >          as TRUE.
> >
> > Where does it say that the eof determination is allowed to be
> > non-atomic? Unlike structures such as change_info4, there isn't an
> > "atomic" flag to allow the server to communicate to the client that it
> > cannot rely on the eof flag. Since the determination is part of the
> > same READ operation, you can't point to the "COMPOUNDS are not atomic"
> > either.
>
> I wonder why READ has eof at all, instead of adopting read()'s
> convention that 0 means eof?
>
> In any case, our server should be able to count on that convention
> internally, so if getting a real "eof" out of the filesystem looks
> daunting, why not for now just check for that case in
> nfsd4_encode_readv()?:

It's probably not really that daunting.  Couldn't we check to see if inode's
size has increased, and if so do the read again?

The concept of READ and eof atomicity should also include a concept of a
boundary or barrier.  The operation is atomic within what bound - the
server's response processing or the underlying filesystem?

Ben

>
> -	eof = (read->rd_offset + maxcount >=
> +	eof = (maxcount == 0) || (read->rd_offset + maxcount >=
>  	       d_inode(read->rd_fhp->fh_dentry)->i_size);
>
>
> (Actually, I'm not sure that's completely right: if 0-length read
> requests are legal then we need to exclude that case.)
>
> --b.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields March 16, 2016, 7:15 p.m. UTC | #10

On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote:
> On Wed, 16 Mar 2016, J. Bruce Fields wrote:
> 
> > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> > > <bcodding@redhat.com> wrote:
> > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> > > >
> > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > >>
> > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > > >> > <bcodding@redhat.com> wrote:
> > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > >> > >
> > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > > >> > >> <bcodding@redhat.com> wrote:
> > > >> > >> >
> > > >> > >> > A zero-length short read without eof should be retried rather than sending
> > > >> > >> > an error to the application.
> > > >> > >>
> > > >> > >>
> > > >> > >> In what situation would returning a 0 length read not be a bug? If the
> > > >> > >> server intended that we back off and retry, it has the alternative of
> > > >> > >> sending a JUKEBOX/DELAY error.
> > > >> > >
> > > >> > > If the server completes a local read but then another writer comes in and
> > > >> > > appends to the file before the server checks if it needs to set EOF, then
> > > >> > > the response might be 0 length without EOF set.
> > > >> >
> > > >> > Why isn't that EOF check done atomically with the read itself? This
> > > >> > still sounds like a server bug to me.
> > > >>
> > > >> I don't know -- I would guess because doing that atomically is harder than
> > > >> not, and I don't know where the RFCs say that a zero length response without
> > > >> eof is to be treated as an error or condition to be avoided.
> > > >>
> > > >> I'll look into that, and respond here.
> > > >
> > > > Indeed, it seems that it is more convenient for the linux server to send a
> > > > zero-length response without eof when the file grows.  It would probably be
> > > > more helpful if the server handled that case, but I think that 7530 states
> > > > that it doesn't have to handle that case.
> > >
> > > Here is what RFC5661 and RFC7530 say.
> > >
> > >    If the READ ended at the end-of-file (formally, in a correctly formed
> > >    READ request, if offset + count is equal to the size of the file), or
> > >    the READ request extends beyond the size of the file (if offset +
> > >    count is greater than the size of the file), eof is returned as TRUE;
> > >    otherwise, it is FALSE.  A successful READ of an empty file will
> > >    always return eof as TRUE.
> > >
> > > Here is what RFC1813 says:
> > >
> > >       eof
> > >          If the read ended at the end-of-file (formally, in a
> > >          correctly formed READ request, if READ3args.offset plus
> > >          READ3resok.count is equal to the size of the file), eof
> > >          is returned as TRUE; otherwise it is FALSE. A
> > >          successful READ of an empty file will always return eof
> > >          as TRUE.
> > >
> > > Where does it say that the eof determination is allowed to be
> > > non-atomic? Unlike structures such as change_info4, there isn't an
> > > "atomic" flag to allow the server to communicate to the client that it
> > > cannot rely on the eof flag. Since the determination is part of the
> > > same READ operation, you can't point to the "COMPOUNDS are not atomic"
> > > either.
> >
> > I wonder why READ has eof at all, instead of adopting read()'s
> > convention that 0 means eof?
> >
> > In any case, our server should be able to count on that convention
> > internally, so if getting a real "eof" out of the filesystem looks
> > daunting, why not for now just check for that case in
> > nfsd4_encode_readv()?:
> 
> It's probably not really that daunting.  Couldn't we check to see if inode's
> size has increased, and if so do the read again?

Does the client really have trouble with short reads in general, or only
with zero-length short reads?

> The concept of READ and eof atomicity should also include a concept of a
> boundary or barrier.  The operation is atomic within what bound - the
> server's response processing or the underlying filesystem?

Ideally the read data and eof both represent the same state of the file
with no file-modifying operations intervening between the two.

I could come up with artificial examples that would allow clients to
detect that we don't meet this standard.  I can't think of one that
would occur in practice (may just be my lack of imagination).

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 16, 2016, 7:46 p.m. UTC | #11

> On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote:
> > On Wed, 16 Mar 2016, J. Bruce Fields wrote:
> >
> > > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> > > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> > > > <bcodding@redhat.com> wrote:
> > > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> > > > >
> > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > > >>
> > > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > > > >> > <bcodding@redhat.com> wrote:
> > > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > > >> > >
> > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > > > >> > >> <bcodding@redhat.com> wrote:
> > > > >> > >> >
> > > > >> > >> > A zero-length short read without eof should be retried rather than sending
> > > > >> > >> > an error to the application.
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> In what situation would returning a 0 length read not be a bug? If the
> > > > >> > >> server intended that we back off and retry, it has the alternative of
> > > > >> > >> sending a JUKEBOX/DELAY error.
> > > > >> > >
> > > > >> > > If the server completes a local read but then another writer comes in and
> > > > >> > > appends to the file before the server checks if it needs to set EOF, then
> > > > >> > > the response might be 0 length without EOF set.
> > > > >> >
> > > > >> > Why isn't that EOF check done atomically with the read itself? This
> > > > >> > still sounds like a server bug to me.
> > > > >>
> > > > >> I don't know -- I would guess because doing that atomically is harder than
> > > > >> not, and I don't know where the RFCs say that a zero length response without
> > > > >> eof is to be treated as an error or condition to be avoided.
> > > > >>
> > > > >> I'll look into that, and respond here.
> > > > >
> > > > > Indeed, it seems that it is more convenient for the linux server to send a
> > > > > zero-length response without eof when the file grows.  It would probably be
> > > > > more helpful if the server handled that case, but I think that 7530 states
> > > > > that it doesn't have to handle that case.
> > > >
> > > > Here is what RFC5661 and RFC7530 say.
> > > >
> > > >    If the READ ended at the end-of-file (formally, in a correctly formed
> > > >    READ request, if offset + count is equal to the size of the file), or
> > > >    the READ request extends beyond the size of the file (if offset +
> > > >    count is greater than the size of the file), eof is returned as TRUE;
> > > >    otherwise, it is FALSE.  A successful READ of an empty file will
> > > >    always return eof as TRUE.
> > > >
> > > > Here is what RFC1813 says:
> > > >
> > > >       eof
> > > >          If the read ended at the end-of-file (formally, in a
> > > >          correctly formed READ request, if READ3args.offset plus
> > > >          READ3resok.count is equal to the size of the file), eof
> > > >          is returned as TRUE; otherwise it is FALSE. A
> > > >          successful READ of an empty file will always return eof
> > > >          as TRUE.
> > > >
> > > > Where does it say that the eof determination is allowed to be
> > > > non-atomic? Unlike structures such as change_info4, there isn't an
> > > > "atomic" flag to allow the server to communicate to the client that it
> > > > cannot rely on the eof flag. Since the determination is part of the
> > > > same READ operation, you can't point to the "COMPOUNDS are not atomic"
> > > > either.
> > >
> > > I wonder why READ has eof at all, instead of adopting read()'s
> > > convention that 0 means eof?
> > >
> > > In any case, our server should be able to count on that convention
> > > internally, so if getting a real "eof" out of the filesystem looks
> > > daunting, why not for now just check for that case in
> > > nfsd4_encode_readv()?:
> >
> > It's probably not really that daunting.  Couldn't we check to see if inode's
> > size has increased, and if so do the read again?
>
> Does the client really have trouble with short reads in general, or only
> with zero-length short reads?

Just the zero-length short read.

> > The concept of READ and eof atomicity should also include a concept of a
> > boundary or barrier.  The operation is atomic within what bound - the
> > server's response processing or the underlying filesystem?
>
> Ideally the read data and eof both represent the same state of the file
> with no file-modifying operations intervening between the two.
>
> I could come up with artificial examples that would allow clients to
> detect that we don't meet this standard.  I can't think of one that
> would occur in practice (may just be my lack of imagination).

So, sounds like fixing this is a good idea on the server. I hope Trond will
let us know if he still feels that the client ought not to be changed since
it seems an easy enough fix to avoid a similar problem on another server.
Perhaps there's a downside I'm not seeing on the client.  Or maybe the
convention of read() returning 0 meaning eof is global enough to cause it to
be acceptible behavior -- we really should treat a zero-length read response
without eof as an error.  My lack of experience is showing..  :)

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields March 16, 2016, 7:46 p.m. UTC | #12

On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote:
> On Wed, 16 Mar 2016, J. Bruce Fields wrote:
> 
> > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> > > <bcodding@redhat.com> wrote:
> > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> > > >
> > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > >>
> > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > > >> > <bcodding@redhat.com> wrote:
> > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > > >> > >
> > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > > >> > >> <bcodding@redhat.com> wrote:
> > > >> > >> >
> > > >> > >> > A zero-length short read without eof should be retried rather than sending
> > > >> > >> > an error to the application.
> > > >> > >>
> > > >> > >>
> > > >> > >> In what situation would returning a 0 length read not be a bug? If the
> > > >> > >> server intended that we back off and retry, it has the alternative of
> > > >> > >> sending a JUKEBOX/DELAY error.
> > > >> > >
> > > >> > > If the server completes a local read but then another writer comes in and
> > > >> > > appends to the file before the server checks if it needs to set EOF, then
> > > >> > > the response might be 0 length without EOF set.
> > > >> >
> > > >> > Why isn't that EOF check done atomically with the read itself? This
> > > >> > still sounds like a server bug to me.
> > > >>
> > > >> I don't know -- I would guess because doing that atomically is harder than
> > > >> not, and I don't know where the RFCs say that a zero length response without
> > > >> eof is to be treated as an error or condition to be avoided.
> > > >>
> > > >> I'll look into that, and respond here.
> > > >
> > > > Indeed, it seems that it is more convenient for the linux server to send a
> > > > zero-length response without eof when the file grows.  It would probably be
> > > > more helpful if the server handled that case, but I think that 7530 states
> > > > that it doesn't have to handle that case.
> > >
> > > Here is what RFC5661 and RFC7530 say.
> > >
> > >    If the READ ended at the end-of-file (formally, in a correctly formed
> > >    READ request, if offset + count is equal to the size of the file), or
> > >    the READ request extends beyond the size of the file (if offset +
> > >    count is greater than the size of the file), eof is returned as TRUE;
> > >    otherwise, it is FALSE.  A successful READ of an empty file will
> > >    always return eof as TRUE.
> > >
> > > Here is what RFC1813 says:
> > >
> > >       eof
> > >          If the read ended at the end-of-file (formally, in a
> > >          correctly formed READ request, if READ3args.offset plus
> > >          READ3resok.count is equal to the size of the file), eof
> > >          is returned as TRUE; otherwise it is FALSE. A
> > >          successful READ of an empty file will always return eof
> > >          as TRUE.
> > >
> > > Where does it say that the eof determination is allowed to be
> > > non-atomic? Unlike structures such as change_info4, there isn't an
> > > "atomic" flag to allow the server to communicate to the client that it
> > > cannot rely on the eof flag. Since the determination is part of the
> > > same READ operation, you can't point to the "COMPOUNDS are not atomic"
> > > either.
> >
> > I wonder why READ has eof at all, instead of adopting read()'s
> > convention that 0 means eof?
> >
> > In any case, our server should be able to count on that convention
> > internally, so if getting a real "eof" out of the filesystem looks
> > daunting, why not for now just check for that case in
> > nfsd4_encode_readv()?:
> 
> It's probably not really that daunting.  Couldn't we check to see if inode's
> size has increased, and if so do the read again?

By the way, in theory, this isn't enough:

	truncate(0)
	write("AA")
			read(1) -> "A"
	truncate(0)
	write("B")
			read i_size -> 1

			return ("A", eof = 1) to client

But the file never simultaneously had content "A" and size 1.

Also, why only check for increases and not decreases?  And how do you
ensure forward progress?

--b.

> 
> The concept of READ and eof atomicity should also include a concept of a
> boundary or barrier.  The operation is atomic within what bound - the
> server's response processing or the underlying filesystem?
> 
> Ben
> 
> >
> > -	eof = (read->rd_offset + maxcount >=
> > +	eof = (maxcount == 0) || (read->rd_offset + maxcount >=
> >  	       d_inode(read->rd_fhp->fh_dentry)->i_size);
> >
> >
> > (Actually, I'm not sure that's completely right: if 0-length read
> > requests are legal then we need to exclude that case.)
> >
> > --b.
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields March 16, 2016, 7:56 p.m. UTC | #13

On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
> So, sounds like fixing this is a good idea on the server. I hope Trond will
> let us know if he still feels that the client ought not to be changed since
> it seems an easy enough fix to avoid a similar problem on another server.
> Perhaps there's a downside I'm not seeing on the client.

My worry would just be ensuring forward progress--if the client gets
some data back, then at least the next read can start at a later
offset....  With zero reads, we can set a maximum number of retries, I
guess, but that makes it little messy.

> Or maybe the
> convention of read() returning 0 meaning eof is global enough to cause it to
> be acceptible behavior -- we really should treat a zero-length read response
> without eof as an error.  My lack of experience is showing..  :)

Eh, I think it's legitimately more confusing than it should be.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust March 16, 2016, 8:02 p.m. UTC | #14

On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
>> So, sounds like fixing this is a good idea on the server. I hope Trond will
>> let us know if he still feels that the client ought not to be changed since
>> it seems an easy enough fix to avoid a similar problem on another server.
>> Perhaps there's a downside I'm not seeing on the client.
>
> My worry would just be ensuring forward progress--if the client gets
> some data back, then at least the next read can start at a later
> offset....  With zero reads, we can set a maximum number of retries, I
> guess, but that makes it little messy.
>
>> Or maybe the
>> convention of read() returning 0 meaning eof is global enough to cause it to
>> be acceptible behavior -- we really should treat a zero-length read response
>> without eof as an error.  My lack of experience is showing..  :)
>
> Eh, I think it's legitimately more confusing than it should be.
>

POSIX is very specific about the cases where you are allowed to return
a short read:

See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

"The value returned may be less than nbyte if the number of bytes left
in the file is less than nbyte, if the read() request was interrupted
by a signal, or if the file is a pipe or FIFO or special file and has
fewer than nbyte bytes immediately available for reading. For example,
a read() from a file associated with a terminal may return one typed
line of data."

So I'm guessing most POSIX based server implementations should have no
trouble working out exactly when to set the eof flag. However the
client has no clue as to what OS your server is based on, which is
presumably the main reason why NFS has an eof flag in the first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mkrtchyan, Tigran March 17, 2016, 2:03 a.m. UTC | #15

I agree with Trond, that returning zero bytes without setting eof
with a high probability a server side issue. We had that situation
with dCache server, where eof flag was set only if you read beyond
file size, e.q. READ with count=0 at the offset=file size, we returned
zero bytes with no eof set. The pynfs test, actually, do retry such
request and there was an infinite loop.

I think, if we (you) add retry on zero byte short-reads
without eof we may have applications/client hangs in case of
misbehaving servers. But failing with EIO is not the best
option. May be it makes sense to query file size in such
situations? As this is a rare corner case, performance
penalty will by negligible.

Tigran. 

----- Original Message -----
> From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> To: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List"
> <linux-nfs@vger.kernel.org>
> Sent: Wednesday, March 16, 2016 9:02:49 PM
> Subject: Re: [PATCH] NFS: Retry a zero-length short read

> On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
>>> So, sounds like fixing this is a good idea on the server. I hope Trond will
>>> let us know if he still feels that the client ought not to be changed since
>>> it seems an easy enough fix to avoid a similar problem on another server.
>>> Perhaps there's a downside I'm not seeing on the client.
>>
>> My worry would just be ensuring forward progress--if the client gets
>> some data back, then at least the next read can start at a later
>> offset....  With zero reads, we can set a maximum number of retries, I
>> guess, but that makes it little messy.
>>
>>> Or maybe the
>>> convention of read() returning 0 meaning eof is global enough to cause it to
>>> be acceptible behavior -- we really should treat a zero-length read response
>>> without eof as an error.  My lack of experience is showing..  :)
>>
>> Eh, I think it's legitimately more confusing than it should be.
>>
> 
> POSIX is very specific about the cases where you are allowed to return
> a short read:
> 
> See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
> 
> "The value returned may be less than nbyte if the number of bytes left
> in the file is less than nbyte, if the read() request was interrupted
> by a signal, or if the file is a pipe or FIFO or special file and has
> fewer than nbyte bytes immediately available for reading. For example,
> a read() from a file associated with a terminal may return one typed
> line of data."
> 
> So I'm guessing most POSIX based server implementations should have no
> trouble working out exactly when to set the eof flag. However the
> client has no clue as to what OS your server is based on, which is
> presumably the main reason why NFS has an eof flag in the first place.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 17, 2016, 10:11 a.m. UTC | #16

On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote:
> I agree with Trond, that returning zero bytes without setting eof
> with a high probability a server side issue. We had that situation
> with dCache server, where eof flag was set only if you read beyond
> file size, e.q. READ with count=0 at the offset=file size, we returned
> zero bytes with no eof set. The pynfs test, actually, do retry such
> request and there was an infinite loop.

But that isn't a short read.. if the request is with count=0 and the
response is 0 without eof, there shouldn't be a retry, and the current
client won't retry in that case, nor fail with EIO.

> I think, if we (you) add retry on zero byte short-reads
> without eof we may have applications/client hangs in case of
> misbehaving servers. But failing with EIO is not the best
> option. May be it makes sense to query file size in such
> situations? As this is a rare corner case, performance
> penalty will by negligible.

I do see now that the resistance to this change is because it seems
important to ensure that we make forward progress.  This change would lose
that guarantee for a server that gets "stuck".  And to Trond's point: if the
zero-length short read really was due to eof at read time, then eof should
have been set.

It seems tricky for the linux server to determine when to set eof.  The best
way might be to adopt the convention that a local short read means we should
set eof and accept the small penalty that a read that completes fully (not
short) up to the end of the file will require another read to detect eof.
That should be a fairly rare case except where file sizes end up being
multiples of rsize.

Ben

>
> Tigran.
>
> ----- Original Message -----
> > From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> > To: "J. Bruce Fields" <bfields@fieldses.org>
> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List"
> > <linux-nfs@vger.kernel.org>
> > Sent: Wednesday, March 16, 2016 9:02:49 PM
> > Subject: Re: [PATCH] NFS: Retry a zero-length short read
>
> > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
> >>> So, sounds like fixing this is a good idea on the server. I hope Trond will
> >>> let us know if he still feels that the client ought not to be changed since
> >>> it seems an easy enough fix to avoid a similar problem on another server.
> >>> Perhaps there's a downside I'm not seeing on the client.
> >>
> >> My worry would just be ensuring forward progress--if the client gets
> >> some data back, then at least the next read can start at a later
> >> offset....  With zero reads, we can set a maximum number of retries, I
> >> guess, but that makes it little messy.
> >>
> >>> Or maybe the
> >>> convention of read() returning 0 meaning eof is global enough to cause it to
> >>> be acceptible behavior -- we really should treat a zero-length read response
> >>> without eof as an error.  My lack of experience is showing..  :)
> >>
> >> Eh, I think it's legitimately more confusing than it should be.
> >>
> >
> > POSIX is very specific about the cases where you are allowed to return
> > a short read:
> >
> > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
> >
> > "The value returned may be less than nbyte if the number of bytes left
> > in the file is less than nbyte, if the read() request was interrupted
> > by a signal, or if the file is a pipe or FIFO or special file and has
> > fewer than nbyte bytes immediately available for reading. For example,
> > a read() from a file associated with a terminal may return one typed
> > line of data."
> >
> > So I'm guessing most POSIX based server implementations should have no
> > trouble working out exactly when to set the eof flag. However the
> > client has no clue as to what OS your server is based on, which is
> > presumably the main reason why NFS has an eof flag in the first place.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust March 17, 2016, 1:24 p.m. UTC | #17

On Thu, Mar 17, 2016 at 6:11 AM, Benjamin Coddington
<bcodding@redhat.com> wrote:
> On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote:
>> I agree with Trond, that returning zero bytes without setting eof
>> with a high probability a server side issue. We had that situation
>> with dCache server, where eof flag was set only if you read beyond
>> file size, e.q. READ with count=0 at the offset=file size, we returned
>> zero bytes with no eof set. The pynfs test, actually, do retry such
>> request and there was an infinite loop.
>
> But that isn't a short read.. if the request is with count=0 and the
> response is 0 without eof, there shouldn't be a retry, and the current
> client won't retry in that case, nor fail with EIO.
>
>> I think, if we (you) add retry on zero byte short-reads
>> without eof we may have applications/client hangs in case of
>> misbehaving servers. But failing with EIO is not the best
>> option. May be it makes sense to query file size in such
>> situations? As this is a rare corner case, performance
>> penalty will by negligible.
>
> I do see now that the resistance to this change is because it seems
> important to ensure that we make forward progress.  This change would lose
> that guarantee for a server that gets "stuck".  And to Trond's point: if the
> zero-length short read really was due to eof at read time, then eof should
> have been set.
>
> It seems tricky for the linux server to determine when to set eof.  The best
> way might be to adopt the convention that a local short read means we should
> set eof and accept the small penalty that a read that completes fully (not
> short) up to the end of the file will require another read to detect eof.
> That should be a fairly rare case except where file sizes end up being
> multiples of rsize.

It should be trivial for the linux server to determine when eof is to
be set, because it is running in a POSIX environment. As I pointed out
earlier, the POSIX spec says that all reads to a regular file that
return less than the number of requested bytes are hitting eof.

>
> Ben
>
>>
>> Tigran.
>>
>> ----- Original Message -----
>> > From: "Trond Myklebust" <trond.myklebust@primarydata.com>
>> > To: "J. Bruce Fields" <bfields@fieldses.org>
>> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List"
>> > <linux-nfs@vger.kernel.org>
>> > Sent: Wednesday, March 16, 2016 9:02:49 PM
>> > Subject: Re: [PATCH] NFS: Retry a zero-length short read
>>
>> > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
>> >>> So, sounds like fixing this is a good idea on the server. I hope Trond will
>> >>> let us know if he still feels that the client ought not to be changed since
>> >>> it seems an easy enough fix to avoid a similar problem on another server.
>> >>> Perhaps there's a downside I'm not seeing on the client.
>> >>
>> >> My worry would just be ensuring forward progress--if the client gets
>> >> some data back, then at least the next read can start at a later
>> >> offset....  With zero reads, we can set a maximum number of retries, I
>> >> guess, but that makes it little messy.
>> >>
>> >>> Or maybe the
>> >>> convention of read() returning 0 meaning eof is global enough to cause it to
>> >>> be acceptible behavior -- we really should treat a zero-length read response
>> >>> without eof as an error.  My lack of experience is showing..  :)
>> >>
>> >> Eh, I think it's legitimately more confusing than it should be.
>> >>
>> >
>> > POSIX is very specific about the cases where you are allowed to return
>> > a short read:
>> >
>> > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
>> >
>> > "The value returned may be less than nbyte if the number of bytes left
>> > in the file is less than nbyte, if the read() request was interrupted
>> > by a signal, or if the file is a pipe or FIFO or special file and has
>> > fewer than nbyte bytes immediately available for reading. For example,
>> > a read() from a file associated with a terminal may return one typed
>> > line of data."
>> >
>> > So I'm guessing most POSIX based server implementations should have no
>> > trouble working out exactly when to set the eof flag. However the
>> > client has no clue as to what OS your server is based on, which is
>> > presumably the main reason why NFS has an eof flag in the first place.
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benjamin Coddington March 17, 2016, 1:34 p.m. UTC | #18

On Thu, 17 Mar 2016, Trond Myklebust wrote:

> On Thu, Mar 17, 2016 at 6:11 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote:
> >> I agree with Trond, that returning zero bytes without setting eof
> >> with a high probability a server side issue. We had that situation
> >> with dCache server, where eof flag was set only if you read beyond
> >> file size, e.q. READ with count=0 at the offset=file size, we returned
> >> zero bytes with no eof set. The pynfs test, actually, do retry such
> >> request and there was an infinite loop.
> >
> > But that isn't a short read.. if the request is with count=0 and the
> > response is 0 without eof, there shouldn't be a retry, and the current
> > client won't retry in that case, nor fail with EIO.
> >
> >> I think, if we (you) add retry on zero byte short-reads
> >> without eof we may have applications/client hangs in case of
> >> misbehaving servers. But failing with EIO is not the best
> >> option. May be it makes sense to query file size in such
> >> situations? As this is a rare corner case, performance
> >> penalty will by negligible.
> >
> > I do see now that the resistance to this change is because it seems
> > important to ensure that we make forward progress.  This change would lose
> > that guarantee for a server that gets "stuck".  And to Trond's point: if the
> > zero-length short read really was due to eof at read time, then eof should
> > have been set.
> >
> > It seems tricky for the linux server to determine when to set eof.  The best
> > way might be to adopt the convention that a local short read means we should
> > set eof and accept the small penalty that a read that completes fully (not
> > short) up to the end of the file will require another read to detect eof.
> > That should be a fairly rare case except where file sizes end up being
> > multiples of rsize.
>
> It should be trivial for the linux server to determine when eof is to
> be set, because it is running in a POSIX environment. As I pointed out
> earlier, the POSIX spec says that all reads to a regular file that
> return less than the number of requested bytes are hitting eof.

I agree.  I meant to say it seems tricky to do it they way we are doing it
now: by checking i_size afterward.  So, thanks for pointing out how specific
POSIX is about this.  I'll send a patch for the server.

> >
> > Ben
> >
> >>
> >> Tigran.
> >>
> >> ----- Original Message -----
> >> > From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> >> > To: "J. Bruce Fields" <bfields@fieldses.org>
> >> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List"
> >> > <linux-nfs@vger.kernel.org>
> >> > Sent: Wednesday, March 16, 2016 9:02:49 PM
> >> > Subject: Re: [PATCH] NFS: Retry a zero-length short read
> >>
> >> > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >> >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
> >> >>> So, sounds like fixing this is a good idea on the server. I hope Trond will
> >> >>> let us know if he still feels that the client ought not to be changed since
> >> >>> it seems an easy enough fix to avoid a similar problem on another server.
> >> >>> Perhaps there's a downside I'm not seeing on the client.
> >> >>
> >> >> My worry would just be ensuring forward progress--if the client gets
> >> >> some data back, then at least the next read can start at a later
> >> >> offset....  With zero reads, we can set a maximum number of retries, I
> >> >> guess, but that makes it little messy.
> >> >>
> >> >>> Or maybe the
> >> >>> convention of read() returning 0 meaning eof is global enough to cause it to
> >> >>> be acceptible behavior -- we really should treat a zero-length read response
> >> >>> without eof as an error.  My lack of experience is showing..  :)
> >> >>
> >> >> Eh, I think it's legitimately more confusing than it should be.
> >> >>
> >> >
> >> > POSIX is very specific about the cases where you are allowed to return
> >> > a short read:
> >> >
> >> > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
> >> >
> >> > "The value returned may be less than nbyte if the number of bytes left
> >> > in the file is less than nbyte, if the read() request was interrupted
> >> > by a signal, or if the file is a pipe or FIFO or special file and has
> >> > fewer than nbyte bytes immediately available for reading. For example,
> >> > a read() from a file associated with a terminal may return one typed
> >> > line of data."
> >> >
> >> > So I'm guessing most POSIX based server implementations should have no
> >> > trouble working out exactly when to set the eof flag. However the
> >> > client has no clue as to what OS your server is based on, which is
> >> > presumably the main reason why NFS has an eof flag in the first place.
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields March 22, 2016, 9:04 p.m. UTC | #19

On Wed, Mar 16, 2016 at 04:02:49PM -0400, Trond Myklebust wrote:
> On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote:
> >> So, sounds like fixing this is a good idea on the server. I hope Trond will
> >> let us know if he still feels that the client ought not to be changed since
> >> it seems an easy enough fix to avoid a similar problem on another server.
> >> Perhaps there's a downside I'm not seeing on the client.
> >
> > My worry would just be ensuring forward progress--if the client gets
> > some data back, then at least the next read can start at a later
> > offset....  With zero reads, we can set a maximum number of retries, I
> > guess, but that makes it little messy.
> >
> >> Or maybe the
> >> convention of read() returning 0 meaning eof is global enough to cause it to
> >> be acceptible behavior -- we really should treat a zero-length read response
> >> without eof as an error.  My lack of experience is showing..  :)
> >
> > Eh, I think it's legitimately more confusing than it should be.
> >
> 
> POSIX is very specific about the cases where you are allowed to return
> a short read:
> 
> See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
> 
> "The value returned may be less than nbyte if the number of bytes left
> in the file is less than nbyte, if the read() request was interrupted
> by a signal, or if the file is a pipe or FIFO or special file and has
> fewer than nbyte bytes immediately available for reading. For example,
> a read() from a file associated with a terminal may return one typed
> line of data."

It bothers me a bit that that's "if" and not "if and only if", but maybe
that's the posix style.  The linux man pages are even looser ("this may
happen for example because..." and then the above reasons).

--b.

> 
> So I'm guessing most POSIX based server implementations should have no
> trouble working out exactly when to set the eof flag. However the
> client has no clue as to what OS your server is based on, which is
> presumably the main reason why NFS has an eof flag in the first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NFS: Retry a zero-length short read

Commit Message

Comments

Patch