Message ID | 79adfc63f33887475f6a0f0eeaacd8b52b2ed12c.1458119750.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > > A zero-length short read without eof should be retried rather than sending > an error to the application. In what situation would returning a 0 length read not be a bug? If the server intended that we back off and retry, it has the alternative of sending a JUKEBOX/DELAY error. > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > --- > fs/nfs/read.c | 5 ----- > 1 files changed, 0 insertions(+), 5 deletions(-) > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > index eb31e23..7269d42 100644 > --- a/fs/nfs/read.c > +++ b/fs/nfs/read.c > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, > > /* This is a short read! */ > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); > - /* Has the server at least made some progress? */ > - if (resp->count == 0) { > - nfs_set_pgio_error(hdr, -EIO, argp->offset); > - return; > - } > > /* For non rpc-based layout drivers, retry-through-MDS */ > if (!task->tk_ops) { > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 16 Mar 2016, Trond Myklebust wrote: > On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > > > A zero-length short read without eof should be retried rather than sending > > an error to the application. > > > In what situation would returning a 0 length read not be a bug? If the > server intended that we back off and retry, it has the alternative of > sending a JUKEBOX/DELAY error. If the server completes a local read but then another writer comes in and appends to the file before the server checks if it needs to set EOF, then the response might be 0 length without EOF set. I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide how I think the client should behave; it says that the client should retry a short read without eof set. I think that should include a response with 0 length. > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > > --- > > fs/nfs/read.c | 5 ----- > > 1 files changed, 0 insertions(+), 5 deletions(-) > > > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > > index eb31e23..7269d42 100644 > > --- a/fs/nfs/read.c > > +++ b/fs/nfs/read.c > > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, > > > > /* This is a short read! */ > > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); > > - /* Has the server at least made some progress? */ > > - if (resp->count == 0) { > > - nfs_set_pgio_error(hdr, -EIO, argp->offset); > > - return; > > - } > > > > /* For non rpc-based layout drivers, retry-through-MDS */ > > if (!task->tk_ops) { > > -- > > 1.7.1 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Wed, 16 Mar 2016, Trond Myklebust wrote: > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington >> <bcodding@redhat.com> wrote: >> > >> > A zero-length short read without eof should be retried rather than sending >> > an error to the application. >> >> >> In what situation would returning a 0 length read not be a bug? If the >> server intended that we back off and retry, it has the alternative of >> sending a JUKEBOX/DELAY error. > > If the server completes a local read but then another writer comes in and > appends to the file before the server checks if it needs to set EOF, then > the response might be 0 length without EOF set. Why isn't that EOF check done atomically with the read itself? This still sounds like a server bug to me. > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide > how I think the client should behave; it says that the client should retry > a short read without eof set. I think that should include a response with > 0 length. >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> > --- >> > fs/nfs/read.c | 5 ----- >> > 1 files changed, 0 insertions(+), 5 deletions(-) >> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c >> > index eb31e23..7269d42 100644 >> > --- a/fs/nfs/read.c >> > +++ b/fs/nfs/read.c >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, >> > >> > /* This is a short read! */ >> > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); >> > - /* Has the server at least made some progress? */ >> > - if (resp->count == 0) { >> > - nfs_set_pgio_error(hdr, -EIO, argp->offset); >> > - return; >> > - } >> > >> > /* For non rpc-based layout drivers, retry-through-MDS */ >> > if (!task->tk_ops) { >> > -- >> > 1.7.1 >> > >> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 16 Mar 2016, Trond Myklebust wrote: > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > >> <bcodding@redhat.com> wrote: > >> > > >> > A zero-length short read without eof should be retried rather than sending > >> > an error to the application. > >> > >> > >> In what situation would returning a 0 length read not be a bug? If the > >> server intended that we back off and retry, it has the alternative of > >> sending a JUKEBOX/DELAY error. > > > > If the server completes a local read but then another writer comes in and > > appends to the file before the server checks if it needs to set EOF, then > > the response might be 0 length without EOF set. > > Why isn't that EOF check done atomically with the read itself? This > still sounds like a server bug to me. I don't know -- I would guess because doing that atomically is harder than not, and I don't know where the RFCs say that a zero length response without eof is to be treated as an error or condition to be avoided. I'll look into that, and respond here. > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide > > how I think the client should behave; it says that the client should retry > > a short read without eof set. I think that should include a response with > > 0 length. Here's the verbatim from section 12.23.5: If the server returns a "short read" (i.e., less data than requested and eof is set to FALSE), the client should send another READ to get the remaining data. A server may return less data than requested under several circumstances. The file may have been truncated by another client or perhaps on the server itself, changing the file size from what the requesting client believes to be the case. This would reduce the actual amount of data available to the client. It is possible that the server reduces the transfer size and so returns a short read result. Server resource exhaustion may also result in a short read. Ben > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > >> > --- > >> > fs/nfs/read.c | 5 ----- > >> > 1 files changed, 0 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > >> > index eb31e23..7269d42 100644 > >> > --- a/fs/nfs/read.c > >> > +++ b/fs/nfs/read.c > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, > >> > > >> > /* This is a short read! */ > >> > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); > >> > - /* Has the server at least made some progress? */ > >> > - if (resp->count == 0) { > >> > - nfs_set_pgio_error(hdr, -EIO, argp->offset); > >> > - return; > >> > - } > >> > > >> > /* For non rpc-based layout drivers, retry-through-MDS */ > >> > if (!task->tk_ops) { > >> > -- > >> > 1.7.1 > >> > > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 16 Mar 2016, Benjamin Coddington wrote: > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > <bcodding@redhat.com> wrote: > > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > > > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > >> <bcodding@redhat.com> wrote: > > >> > > > >> > A zero-length short read without eof should be retried rather than sending > > >> > an error to the application. > > >> > > >> > > >> In what situation would returning a 0 length read not be a bug? If the > > >> server intended that we back off and retry, it has the alternative of > > >> sending a JUKEBOX/DELAY error. > > > > > > If the server completes a local read but then another writer comes in and > > > appends to the file before the server checks if it needs to set EOF, then > > > the response might be 0 length without EOF set. > > > > Why isn't that EOF check done atomically with the read itself? This > > still sounds like a server bug to me. > > I don't know -- I would guess because doing that atomically is harder than > not, and I don't know where the RFCs say that a zero length response without > eof is to be treated as an error or condition to be avoided. > > I'll look into that, and respond here. Indeed, it seems that it is more convenient for the linux server to send a zero-length response without eof when the file grows. It would probably be more helpful if the server handled that case, but I think that 7530 states that it doesn't have to handle that case. Ben > > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide > > > how I think the client should behave; it says that the client should retry > > > a short read without eof set. I think that should include a response with > > > 0 length. > > Here's the verbatim from section 12.23.5: > > If the server returns a "short read" (i.e., less data than requested > and eof is set to FALSE), the client should send another READ to get > the remaining data. A server may return less data than requested > under several circumstances. The file may have been truncated by > another client or perhaps on the server itself, changing the file > size from what the requesting client believes to be the case. This > would reduce the actual amount of data available to the client. It > is possible that the server reduces the transfer size and so returns > a short read result. Server resource exhaustion may also result in a > short read. > > Ben > > > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > > >> > --- > > >> > fs/nfs/read.c | 5 ----- > > >> > 1 files changed, 0 insertions(+), 5 deletions(-) > > >> > > > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > > >> > index eb31e23..7269d42 100644 > > >> > --- a/fs/nfs/read.c > > >> > +++ b/fs/nfs/read.c > > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, > > >> > > > >> > /* This is a short read! */ > > >> > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); > > >> > - /* Has the server at least made some progress? */ > > >> > - if (resp->count == 0) { > > >> > - nfs_set_pgio_error(hdr, -EIO, argp->offset); > > >> > - return; > > >> > - } > > >> > > > >> > /* For non rpc-based layout drivers, retry-through-MDS */ > > >> > if (!task->tk_ops) { > > >> > -- > > >> > 1.7.1 > > >> > > > >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: >> >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington >> > <bcodding@redhat.com> wrote: >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: >> > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington >> > >> <bcodding@redhat.com> wrote: >> > >> > >> > >> > A zero-length short read without eof should be retried rather than sending >> > >> > an error to the application. >> > >> >> > >> >> > >> In what situation would returning a 0 length read not be a bug? If the >> > >> server intended that we back off and retry, it has the alternative of >> > >> sending a JUKEBOX/DELAY error. >> > > >> > > If the server completes a local read but then another writer comes in and >> > > appends to the file before the server checks if it needs to set EOF, then >> > > the response might be 0 length without EOF set. >> > >> > Why isn't that EOF check done atomically with the read itself? This >> > still sounds like a server bug to me. >> >> I don't know -- I would guess because doing that atomically is harder than >> not, and I don't know where the RFCs say that a zero length response without >> eof is to be treated as an error or condition to be avoided. >> >> I'll look into that, and respond here. > > Indeed, it seems that it is more convenient for the linux server to send a > zero-length response without eof when the file grows. It would probably be > more helpful if the server handled that case, but I think that 7530 states > that it doesn't have to handle that case. Here is what RFC5661 and RFC7530 say. If the READ ended at the end-of-file (formally, in a correctly formed READ request, if offset + count is equal to the size of the file), or the READ request extends beyond the size of the file (if offset + count is greater than the size of the file), eof is returned as TRUE; otherwise, it is FALSE. A successful READ of an empty file will always return eof as TRUE. Here is what RFC1813 says: eof If the read ended at the end-of-file (formally, in a correctly formed READ request, if READ3args.offset plus READ3resok.count is equal to the size of the file), eof is returned as TRUE; otherwise it is FALSE. A successful READ of an empty file will always return eof as TRUE. Where does it say that the eof determination is allowed to be non-atomic? Unlike structures such as change_info4, there isn't an "atomic" flag to allow the server to communicate to the client that it cannot rely on the eof flag. Since the determination is part of the same READ operation, you can't point to the "COMPOUNDS are not atomic" either. > > Ben > >> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide >> > > how I think the client should behave; it says that the client should retry >> > > a short read without eof set. I think that should include a response with >> > > 0 length. >> >> Here's the verbatim from section 12.23.5: >> >> If the server returns a "short read" (i.e., less data than requested >> and eof is set to FALSE), the client should send another READ to get >> the remaining data. A server may return less data than requested >> under several circumstances. The file may have been truncated by >> another client or perhaps on the server itself, changing the file >> size from what the requesting client believes to be the case. This >> would reduce the actual amount of data available to the client. It >> is possible that the server reduces the transfer size and so returns >> a short read result. Server resource exhaustion may also result in a >> short read. >> >> Ben >> >> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> > >> > --- >> > >> > fs/nfs/read.c | 5 ----- >> > >> > 1 files changed, 0 insertions(+), 5 deletions(-) >> > >> > >> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c >> > >> > index eb31e23..7269d42 100644 >> > >> > --- a/fs/nfs/read.c >> > >> > +++ b/fs/nfs/read.c >> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, >> > >> > >> > >> > /* This is a short read! */ >> > >> > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); >> > >> > - /* Has the server at least made some progress? */ >> > >> > - if (resp->count == 0) { >> > >> > - nfs_set_pgio_error(hdr, -EIO, argp->offset); >> > >> > - return; >> > >> > - } >> > >> > >> > >> > /* For non rpc-based layout drivers, retry-through-MDS */ >> > >> > if (!task->tk_ops) { >> > >> > -- >> > >> > 1.7.1 >> > >> > >> > >> >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > >> > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > >> > <bcodding@redhat.com> wrote: > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > >> > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > >> > >> <bcodding@redhat.com> wrote: > >> > >> > > >> > >> > A zero-length short read without eof should be retried rather than sending > >> > >> > an error to the application. > >> > >> > >> > >> > >> > >> In what situation would returning a 0 length read not be a bug? If the > >> > >> server intended that we back off and retry, it has the alternative of > >> > >> sending a JUKEBOX/DELAY error. > >> > > > >> > > If the server completes a local read but then another writer comes in and > >> > > appends to the file before the server checks if it needs to set EOF, then > >> > > the response might be 0 length without EOF set. > >> > > >> > Why isn't that EOF check done atomically with the read itself? This > >> > still sounds like a server bug to me. > >> > >> I don't know -- I would guess because doing that atomically is harder than > >> not, and I don't know where the RFCs say that a zero length response without > >> eof is to be treated as an error or condition to be avoided. > >> > >> I'll look into that, and respond here. > > > > Indeed, it seems that it is more convenient for the linux server to send a > > zero-length response without eof when the file grows. It would probably be > > more helpful if the server handled that case, but I think that 7530 states > > that it doesn't have to handle that case. > > Here is what RFC5661 and RFC7530 say. > > If the READ ended at the end-of-file (formally, in a correctly formed > READ request, if offset + count is equal to the size of the file), or > the READ request extends beyond the size of the file (if offset + > count is greater than the size of the file), eof is returned as TRUE; > otherwise, it is FALSE. A successful READ of an empty file will > always return eof as TRUE. > > Here is what RFC1813 says: > > eof > If the read ended at the end-of-file (formally, in a > correctly formed READ request, if READ3args.offset plus > READ3resok.count is equal to the size of the file), eof > is returned as TRUE; otherwise it is FALSE. A > successful READ of an empty file will always return eof > as TRUE. > > Where does it say that the eof determination is allowed to be > non-atomic? Unlike structures such as change_info4, there isn't an > "atomic" flag to allow the server to communicate to the client that it > cannot rely on the eof flag. Since the determination is part of the > same READ operation, you can't point to the "COMPOUNDS are not atomic" > either. I wonder why READ has eof at all, instead of adopting read()'s convention that 0 means eof? In any case, our server should be able to count on that convention internally, so if getting a real "eof" out of the filesystem looks daunting, why not for now just check for that case in nfsd4_encode_readv()?: - eof = (read->rd_offset + maxcount >= + eof = (maxcount == 0) || (read->rd_offset + maxcount >= d_inode(read->rd_fhp->fh_dentry)->i_size); (Actually, I'm not sure that's completely right: if 0-length read requests are legal then we need to exclude that case.) --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 16 Mar 2016, Trond Myklebust wrote: > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > >> > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > >> > <bcodding@redhat.com> wrote: > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > >> > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > >> > >> <bcodding@redhat.com> wrote: > >> > >> > > >> > >> > A zero-length short read without eof should be retried rather than sending > >> > >> > an error to the application. > >> > >> > >> > >> > >> > >> In what situation would returning a 0 length read not be a bug? If the > >> > >> server intended that we back off and retry, it has the alternative of > >> > >> sending a JUKEBOX/DELAY error. > >> > > > >> > > If the server completes a local read but then another writer comes in and > >> > > appends to the file before the server checks if it needs to set EOF, then > >> > > the response might be 0 length without EOF set. > >> > > >> > Why isn't that EOF check done atomically with the read itself? This > >> > still sounds like a server bug to me. > >> > >> I don't know -- I would guess because doing that atomically is harder than > >> not, and I don't know where the RFCs say that a zero length response without > >> eof is to be treated as an error or condition to be avoided. > >> > >> I'll look into that, and respond here. > > > > Indeed, it seems that it is more convenient for the linux server to send a > > zero-length response without eof when the file grows. It would probably be > > more helpful if the server handled that case, but I think that 7530 states > > that it doesn't have to handle that case. > > Here is what RFC5661 and RFC7530 say. > > If the READ ended at the end-of-file (formally, in a correctly formed > READ request, if offset + count is equal to the size of the file), or > the READ request extends beyond the size of the file (if offset + > count is greater than the size of the file), eof is returned as TRUE; > otherwise, it is FALSE. A successful READ of an empty file will > always return eof as TRUE. > > Here is what RFC1813 says: > > eof > If the read ended at the end-of-file (formally, in a > correctly formed READ request, if READ3args.offset plus > READ3resok.count is equal to the size of the file), eof > is returned as TRUE; otherwise it is FALSE. A > successful READ of an empty file will always return eof > as TRUE. These cover the eof case, but what about server resource exhaustion? NFS4ERR_DELAY seems inappropriate since it signals that the operation could not be completed in an appropriate amount of time. RFC5661 also says: If the client specifies a count value of 0 (zero), the READ succeeds and returns 0 (zero) bytes of data (subject to access permissions checking). The server may choose to return fewer bytes than specified by the client. The client needs to check for this condition and handle the condition appropriately. So there is a case where a server can send a read response of zero length without eof set, and this is perfectly fine. And, it is the server's choice to send fewer bytes. > Where does it say that the eof determination is allowed to be > non-atomic? Unlike structures such as change_info4, there isn't an > "atomic" flag to allow the server to communicate to the client that it > cannot rely on the eof flag. Since the determination is part of the > same READ operation, you can't point to the "COMPOUNDS are not atomic" > either. I don't see where it says that the eof determination is allowed to be non-atomic, but I do see where it says the amount of data returned is up to the server. Why can't the server decide to skip the local read altogether this time around, returning no data and also not setting eof? Ben > >> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide > >> > > how I think the client should behave; it says that the client should retry > >> > > a short read without eof set. I think that should include a response with > >> > > 0 length. > >> > >> Here's the verbatim from section 12.23.5: > >> > >> If the server returns a "short read" (i.e., less data than requested > >> and eof is set to FALSE), the client should send another READ to get > >> the remaining data. A server may return less data than requested > >> under several circumstances. The file may have been truncated by > >> another client or perhaps on the server itself, changing the file > >> size from what the requesting client believes to be the case. This > >> would reduce the actual amount of data available to the client. It > >> is possible that the server reduces the transfer size and so returns > >> a short read result. Server resource exhaustion may also result in a > >> short read. > >> > >> Ben > >> > >> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > >> > >> > --- > >> > >> > fs/nfs/read.c | 5 ----- > >> > >> > 1 files changed, 0 insertions(+), 5 deletions(-) > >> > >> > > >> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > >> > >> > index eb31e23..7269d42 100644 > >> > >> > --- a/fs/nfs/read.c > >> > >> > +++ b/fs/nfs/read.c > >> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, > >> > >> > > >> > >> > /* This is a short read! */ > >> > >> > nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); > >> > >> > - /* Has the server at least made some progress? */ > >> > >> > - if (resp->count == 0) { > >> > >> > - nfs_set_pgio_error(hdr, -EIO, argp->offset); > >> > >> > - return; > >> > >> > - } > >> > >> > > >> > >> > /* For non rpc-based layout drivers, retry-through-MDS */ > >> > >> > if (!task->tk_ops) { > >> > >> > -- > >> > >> > 1.7.1 > >> > >> > > >> > >> > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 16 Mar 2016, J. Bruce Fields wrote: > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > > <bcodding@redhat.com> wrote: > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > > >> > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > >> > <bcodding@redhat.com> wrote: > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > >> > > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > >> > >> <bcodding@redhat.com> wrote: > > >> > >> > > > >> > >> > A zero-length short read without eof should be retried rather than sending > > >> > >> > an error to the application. > > >> > >> > > >> > >> > > >> > >> In what situation would returning a 0 length read not be a bug? If the > > >> > >> server intended that we back off and retry, it has the alternative of > > >> > >> sending a JUKEBOX/DELAY error. > > >> > > > > >> > > If the server completes a local read but then another writer comes in and > > >> > > appends to the file before the server checks if it needs to set EOF, then > > >> > > the response might be 0 length without EOF set. > > >> > > > >> > Why isn't that EOF check done atomically with the read itself? This > > >> > still sounds like a server bug to me. > > >> > > >> I don't know -- I would guess because doing that atomically is harder than > > >> not, and I don't know where the RFCs say that a zero length response without > > >> eof is to be treated as an error or condition to be avoided. > > >> > > >> I'll look into that, and respond here. > > > > > > Indeed, it seems that it is more convenient for the linux server to send a > > > zero-length response without eof when the file grows. It would probably be > > > more helpful if the server handled that case, but I think that 7530 states > > > that it doesn't have to handle that case. > > > > Here is what RFC5661 and RFC7530 say. > > > > If the READ ended at the end-of-file (formally, in a correctly formed > > READ request, if offset + count is equal to the size of the file), or > > the READ request extends beyond the size of the file (if offset + > > count is greater than the size of the file), eof is returned as TRUE; > > otherwise, it is FALSE. A successful READ of an empty file will > > always return eof as TRUE. > > > > Here is what RFC1813 says: > > > > eof > > If the read ended at the end-of-file (formally, in a > > correctly formed READ request, if READ3args.offset plus > > READ3resok.count is equal to the size of the file), eof > > is returned as TRUE; otherwise it is FALSE. A > > successful READ of an empty file will always return eof > > as TRUE. > > > > Where does it say that the eof determination is allowed to be > > non-atomic? Unlike structures such as change_info4, there isn't an > > "atomic" flag to allow the server to communicate to the client that it > > cannot rely on the eof flag. Since the determination is part of the > > same READ operation, you can't point to the "COMPOUNDS are not atomic" > > either. > > I wonder why READ has eof at all, instead of adopting read()'s > convention that 0 means eof? > > In any case, our server should be able to count on that convention > internally, so if getting a real "eof" out of the filesystem looks > daunting, why not for now just check for that case in > nfsd4_encode_readv()?: It's probably not really that daunting. Couldn't we check to see if inode's size has increased, and if so do the read again? The concept of READ and eof atomicity should also include a concept of a boundary or barrier. The operation is atomic within what bound - the server's response processing or the underlying filesystem? Ben > > - eof = (read->rd_offset + maxcount >= > + eof = (maxcount == 0) || (read->rd_offset + maxcount >= > d_inode(read->rd_fhp->fh_dentry)->i_size); > > > (Actually, I'm not sure that's completely right: if 0-length read > requests are legal then we need to exclude that case.) > > --b. > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote: > On Wed, 16 Mar 2016, J. Bruce Fields wrote: > > > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > > > <bcodding@redhat.com> wrote: > > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > > > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > >> > > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > > >> > <bcodding@redhat.com> wrote: > > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > >> > > > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > > >> > >> <bcodding@redhat.com> wrote: > > > >> > >> > > > > >> > >> > A zero-length short read without eof should be retried rather than sending > > > >> > >> > an error to the application. > > > >> > >> > > > >> > >> > > > >> > >> In what situation would returning a 0 length read not be a bug? If the > > > >> > >> server intended that we back off and retry, it has the alternative of > > > >> > >> sending a JUKEBOX/DELAY error. > > > >> > > > > > >> > > If the server completes a local read but then another writer comes in and > > > >> > > appends to the file before the server checks if it needs to set EOF, then > > > >> > > the response might be 0 length without EOF set. > > > >> > > > > >> > Why isn't that EOF check done atomically with the read itself? This > > > >> > still sounds like a server bug to me. > > > >> > > > >> I don't know -- I would guess because doing that atomically is harder than > > > >> not, and I don't know where the RFCs say that a zero length response without > > > >> eof is to be treated as an error or condition to be avoided. > > > >> > > > >> I'll look into that, and respond here. > > > > > > > > Indeed, it seems that it is more convenient for the linux server to send a > > > > zero-length response without eof when the file grows. It would probably be > > > > more helpful if the server handled that case, but I think that 7530 states > > > > that it doesn't have to handle that case. > > > > > > Here is what RFC5661 and RFC7530 say. > > > > > > If the READ ended at the end-of-file (formally, in a correctly formed > > > READ request, if offset + count is equal to the size of the file), or > > > the READ request extends beyond the size of the file (if offset + > > > count is greater than the size of the file), eof is returned as TRUE; > > > otherwise, it is FALSE. A successful READ of an empty file will > > > always return eof as TRUE. > > > > > > Here is what RFC1813 says: > > > > > > eof > > > If the read ended at the end-of-file (formally, in a > > > correctly formed READ request, if READ3args.offset plus > > > READ3resok.count is equal to the size of the file), eof > > > is returned as TRUE; otherwise it is FALSE. A > > > successful READ of an empty file will always return eof > > > as TRUE. > > > > > > Where does it say that the eof determination is allowed to be > > > non-atomic? Unlike structures such as change_info4, there isn't an > > > "atomic" flag to allow the server to communicate to the client that it > > > cannot rely on the eof flag. Since the determination is part of the > > > same READ operation, you can't point to the "COMPOUNDS are not atomic" > > > either. > > > > I wonder why READ has eof at all, instead of adopting read()'s > > convention that 0 means eof? > > > > In any case, our server should be able to count on that convention > > internally, so if getting a real "eof" out of the filesystem looks > > daunting, why not for now just check for that case in > > nfsd4_encode_readv()?: > > It's probably not really that daunting. Couldn't we check to see if inode's > size has increased, and if so do the read again? Does the client really have trouble with short reads in general, or only with zero-length short reads? > The concept of READ and eof atomicity should also include a concept of a > boundary or barrier. The operation is atomic within what bound - the > server's response processing or the underlying filesystem? Ideally the read data and eof both represent the same state of the file with no file-modifying operations intervening between the two. I could come up with artificial examples that would allow clients to detect that we don't meet this standard. I can't think of one that would occur in practice (may just be my lack of imagination). --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote: > > On Wed, 16 Mar 2016, J. Bruce Fields wrote: > > > > > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > > > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > > > > <bcodding@redhat.com> wrote: > > > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > > > > > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > > >> > > > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > > > >> > <bcodding@redhat.com> wrote: > > > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > > >> > > > > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > > > >> > >> <bcodding@redhat.com> wrote: > > > > >> > >> > > > > > >> > >> > A zero-length short read without eof should be retried rather than sending > > > > >> > >> > an error to the application. > > > > >> > >> > > > > >> > >> > > > > >> > >> In what situation would returning a 0 length read not be a bug? If the > > > > >> > >> server intended that we back off and retry, it has the alternative of > > > > >> > >> sending a JUKEBOX/DELAY error. > > > > >> > > > > > > >> > > If the server completes a local read but then another writer comes in and > > > > >> > > appends to the file before the server checks if it needs to set EOF, then > > > > >> > > the response might be 0 length without EOF set. > > > > >> > > > > > >> > Why isn't that EOF check done atomically with the read itself? This > > > > >> > still sounds like a server bug to me. > > > > >> > > > > >> I don't know -- I would guess because doing that atomically is harder than > > > > >> not, and I don't know where the RFCs say that a zero length response without > > > > >> eof is to be treated as an error or condition to be avoided. > > > > >> > > > > >> I'll look into that, and respond here. > > > > > > > > > > Indeed, it seems that it is more convenient for the linux server to send a > > > > > zero-length response without eof when the file grows. It would probably be > > > > > more helpful if the server handled that case, but I think that 7530 states > > > > > that it doesn't have to handle that case. > > > > > > > > Here is what RFC5661 and RFC7530 say. > > > > > > > > If the READ ended at the end-of-file (formally, in a correctly formed > > > > READ request, if offset + count is equal to the size of the file), or > > > > the READ request extends beyond the size of the file (if offset + > > > > count is greater than the size of the file), eof is returned as TRUE; > > > > otherwise, it is FALSE. A successful READ of an empty file will > > > > always return eof as TRUE. > > > > > > > > Here is what RFC1813 says: > > > > > > > > eof > > > > If the read ended at the end-of-file (formally, in a > > > > correctly formed READ request, if READ3args.offset plus > > > > READ3resok.count is equal to the size of the file), eof > > > > is returned as TRUE; otherwise it is FALSE. A > > > > successful READ of an empty file will always return eof > > > > as TRUE. > > > > > > > > Where does it say that the eof determination is allowed to be > > > > non-atomic? Unlike structures such as change_info4, there isn't an > > > > "atomic" flag to allow the server to communicate to the client that it > > > > cannot rely on the eof flag. Since the determination is part of the > > > > same READ operation, you can't point to the "COMPOUNDS are not atomic" > > > > either. > > > > > > I wonder why READ has eof at all, instead of adopting read()'s > > > convention that 0 means eof? > > > > > > In any case, our server should be able to count on that convention > > > internally, so if getting a real "eof" out of the filesystem looks > > > daunting, why not for now just check for that case in > > > nfsd4_encode_readv()?: > > > > It's probably not really that daunting. Couldn't we check to see if inode's > > size has increased, and if so do the read again? > > Does the client really have trouble with short reads in general, or only > with zero-length short reads? Just the zero-length short read. > > The concept of READ and eof atomicity should also include a concept of a > > boundary or barrier. The operation is atomic within what bound - the > > server's response processing or the underlying filesystem? > > Ideally the read data and eof both represent the same state of the file > with no file-modifying operations intervening between the two. > > I could come up with artificial examples that would allow clients to > detect that we don't meet this standard. I can't think of one that > would occur in practice (may just be my lack of imagination). So, sounds like fixing this is a good idea on the server. I hope Trond will let us know if he still feels that the client ought not to be changed since it seems an easy enough fix to avoid a similar problem on another server. Perhaps there's a downside I'm not seeing on the client. Or maybe the convention of read() returning 0 meaning eof is global enough to cause it to be acceptible behavior -- we really should treat a zero-length read response without eof as an error. My lack of experience is showing.. :) Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 01:36:48PM -0400, Benjamin Coddington wrote: > On Wed, 16 Mar 2016, J. Bruce Fields wrote: > > > On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote: > > > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington > > > <bcodding@redhat.com> wrote: > > > > On Wed, 16 Mar 2016, Benjamin Coddington wrote: > > > > > > > >> On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > >> > > > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington > > > >> > <bcodding@redhat.com> wrote: > > > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote: > > > >> > > > > > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington > > > >> > >> <bcodding@redhat.com> wrote: > > > >> > >> > > > > >> > >> > A zero-length short read without eof should be retried rather than sending > > > >> > >> > an error to the application. > > > >> > >> > > > >> > >> > > > >> > >> In what situation would returning a 0 length read not be a bug? If the > > > >> > >> server intended that we back off and retry, it has the alternative of > > > >> > >> sending a JUKEBOX/DELAY error. > > > >> > > > > > >> > > If the server completes a local read but then another writer comes in and > > > >> > > appends to the file before the server checks if it needs to set EOF, then > > > >> > > the response might be 0 length without EOF set. > > > >> > > > > >> > Why isn't that EOF check done atomically with the read itself? This > > > >> > still sounds like a server bug to me. > > > >> > > > >> I don't know -- I would guess because doing that atomically is harder than > > > >> not, and I don't know where the RFCs say that a zero length response without > > > >> eof is to be treated as an error or condition to be avoided. > > > >> > > > >> I'll look into that, and respond here. > > > > > > > > Indeed, it seems that it is more convenient for the linux server to send a > > > > zero-length response without eof when the file grows. It would probably be > > > > more helpful if the server handled that case, but I think that 7530 states > > > > that it doesn't have to handle that case. > > > > > > Here is what RFC5661 and RFC7530 say. > > > > > > If the READ ended at the end-of-file (formally, in a correctly formed > > > READ request, if offset + count is equal to the size of the file), or > > > the READ request extends beyond the size of the file (if offset + > > > count is greater than the size of the file), eof is returned as TRUE; > > > otherwise, it is FALSE. A successful READ of an empty file will > > > always return eof as TRUE. > > > > > > Here is what RFC1813 says: > > > > > > eof > > > If the read ended at the end-of-file (formally, in a > > > correctly formed READ request, if READ3args.offset plus > > > READ3resok.count is equal to the size of the file), eof > > > is returned as TRUE; otherwise it is FALSE. A > > > successful READ of an empty file will always return eof > > > as TRUE. > > > > > > Where does it say that the eof determination is allowed to be > > > non-atomic? Unlike structures such as change_info4, there isn't an > > > "atomic" flag to allow the server to communicate to the client that it > > > cannot rely on the eof flag. Since the determination is part of the > > > same READ operation, you can't point to the "COMPOUNDS are not atomic" > > > either. > > > > I wonder why READ has eof at all, instead of adopting read()'s > > convention that 0 means eof? > > > > In any case, our server should be able to count on that convention > > internally, so if getting a real "eof" out of the filesystem looks > > daunting, why not for now just check for that case in > > nfsd4_encode_readv()?: > > It's probably not really that daunting. Couldn't we check to see if inode's > size has increased, and if so do the read again? By the way, in theory, this isn't enough: truncate(0) write("AA") read(1) -> "A" truncate(0) write("B") read i_size -> 1 return ("A", eof = 1) to client But the file never simultaneously had content "A" and size 1. Also, why only check for increases and not decreases? And how do you ensure forward progress? --b. > > The concept of READ and eof atomicity should also include a concept of a > boundary or barrier. The operation is atomic within what bound - the > server's response processing or the underlying filesystem? > > Ben > > > > > - eof = (read->rd_offset + maxcount >= > > + eof = (maxcount == 0) || (read->rd_offset + maxcount >= > > d_inode(read->rd_fhp->fh_dentry)->i_size); > > > > > > (Actually, I'm not sure that's completely right: if 0-length read > > requests are legal then we need to exclude that case.) > > > > --b. > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: > So, sounds like fixing this is a good idea on the server. I hope Trond will > let us know if he still feels that the client ought not to be changed since > it seems an easy enough fix to avoid a similar problem on another server. > Perhaps there's a downside I'm not seeing on the client. My worry would just be ensuring forward progress--if the client gets some data back, then at least the next read can start at a later offset.... With zero reads, we can set a maximum number of retries, I guess, but that makes it little messy. > Or maybe the > convention of read() returning 0 meaning eof is global enough to cause it to > be acceptible behavior -- we really should treat a zero-length read response > without eof as an error. My lack of experience is showing.. :) Eh, I think it's legitimately more confusing than it should be. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: >> So, sounds like fixing this is a good idea on the server. I hope Trond will >> let us know if he still feels that the client ought not to be changed since >> it seems an easy enough fix to avoid a similar problem on another server. >> Perhaps there's a downside I'm not seeing on the client. > > My worry would just be ensuring forward progress--if the client gets > some data back, then at least the next read can start at a later > offset.... With zero reads, we can set a maximum number of retries, I > guess, but that makes it little messy. > >> Or maybe the >> convention of read() returning 0 meaning eof is global enough to cause it to >> be acceptible behavior -- we really should treat a zero-length read response >> without eof as an error. My lack of experience is showing.. :) > > Eh, I think it's legitimately more confusing than it should be. > POSIX is very specific about the cases where you are allowed to return a short read: See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html "The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data." So I'm guessing most POSIX based server implementations should have no trouble working out exactly when to set the eof flag. However the client has no clue as to what OS your server is based on, which is presumably the main reason why NFS has an eof flag in the first place. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I agree with Trond, that returning zero bytes without setting eof with a high probability a server side issue. We had that situation with dCache server, where eof flag was set only if you read beyond file size, e.q. READ with count=0 at the offset=file size, we returned zero bytes with no eof set. The pynfs test, actually, do retry such request and there was an infinite loop. I think, if we (you) add retry on zero byte short-reads without eof we may have applications/client hangs in case of misbehaving servers. But failing with EIO is not the best option. May be it makes sense to query file size in such situations? As this is a rare corner case, performance penalty will by negligible. Tigran. ----- Original Message ----- > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > To: "J. Bruce Fields" <bfields@fieldses.org> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List" > <linux-nfs@vger.kernel.org> > Sent: Wednesday, March 16, 2016 9:02:49 PM > Subject: Re: [PATCH] NFS: Retry a zero-length short read > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: >>> So, sounds like fixing this is a good idea on the server. I hope Trond will >>> let us know if he still feels that the client ought not to be changed since >>> it seems an easy enough fix to avoid a similar problem on another server. >>> Perhaps there's a downside I'm not seeing on the client. >> >> My worry would just be ensuring forward progress--if the client gets >> some data back, then at least the next read can start at a later >> offset.... With zero reads, we can set a maximum number of retries, I >> guess, but that makes it little messy. >> >>> Or maybe the >>> convention of read() returning 0 meaning eof is global enough to cause it to >>> be acceptible behavior -- we really should treat a zero-length read response >>> without eof as an error. My lack of experience is showing.. :) >> >> Eh, I think it's legitimately more confusing than it should be. >> > > POSIX is very specific about the cases where you are allowed to return > a short read: > > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > > "The value returned may be less than nbyte if the number of bytes left > in the file is less than nbyte, if the read() request was interrupted > by a signal, or if the file is a pipe or FIFO or special file and has > fewer than nbyte bytes immediately available for reading. For example, > a read() from a file associated with a terminal may return one typed > line of data." > > So I'm guessing most POSIX based server implementations should have no > trouble working out exactly when to set the eof flag. However the > client has no clue as to what OS your server is based on, which is > presumably the main reason why NFS has an eof flag in the first place. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote: > I agree with Trond, that returning zero bytes without setting eof > with a high probability a server side issue. We had that situation > with dCache server, where eof flag was set only if you read beyond > file size, e.q. READ with count=0 at the offset=file size, we returned > zero bytes with no eof set. The pynfs test, actually, do retry such > request and there was an infinite loop. But that isn't a short read.. if the request is with count=0 and the response is 0 without eof, there shouldn't be a retry, and the current client won't retry in that case, nor fail with EIO. > I think, if we (you) add retry on zero byte short-reads > without eof we may have applications/client hangs in case of > misbehaving servers. But failing with EIO is not the best > option. May be it makes sense to query file size in such > situations? As this is a rare corner case, performance > penalty will by negligible. I do see now that the resistance to this change is because it seems important to ensure that we make forward progress. This change would lose that guarantee for a server that gets "stuck". And to Trond's point: if the zero-length short read really was due to eof at read time, then eof should have been set. It seems tricky for the linux server to determine when to set eof. The best way might be to adopt the convention that a local short read means we should set eof and accept the small penalty that a read that completes fully (not short) up to the end of the file will require another read to detect eof. That should be a fairly rare case except where file sizes end up being multiples of rsize. Ben > > Tigran. > > ----- Original Message ----- > > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > > To: "J. Bruce Fields" <bfields@fieldses.org> > > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List" > > <linux-nfs@vger.kernel.org> > > Sent: Wednesday, March 16, 2016 9:02:49 PM > > Subject: Re: [PATCH] NFS: Retry a zero-length short read > > > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: > >>> So, sounds like fixing this is a good idea on the server. I hope Trond will > >>> let us know if he still feels that the client ought not to be changed since > >>> it seems an easy enough fix to avoid a similar problem on another server. > >>> Perhaps there's a downside I'm not seeing on the client. > >> > >> My worry would just be ensuring forward progress--if the client gets > >> some data back, then at least the next read can start at a later > >> offset.... With zero reads, we can set a maximum number of retries, I > >> guess, but that makes it little messy. > >> > >>> Or maybe the > >>> convention of read() returning 0 meaning eof is global enough to cause it to > >>> be acceptible behavior -- we really should treat a zero-length read response > >>> without eof as an error. My lack of experience is showing.. :) > >> > >> Eh, I think it's legitimately more confusing than it should be. > >> > > > > POSIX is very specific about the cases where you are allowed to return > > a short read: > > > > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > > > > "The value returned may be less than nbyte if the number of bytes left > > in the file is less than nbyte, if the read() request was interrupted > > by a signal, or if the file is a pipe or FIFO or special file and has > > fewer than nbyte bytes immediately available for reading. For example, > > a read() from a file associated with a terminal may return one typed > > line of data." > > > > So I'm guessing most POSIX based server implementations should have no > > trouble working out exactly when to set the eof flag. However the > > client has no clue as to what OS your server is based on, which is > > presumably the main reason why NFS has an eof flag in the first place. > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 17, 2016 at 6:11 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote: >> I agree with Trond, that returning zero bytes without setting eof >> with a high probability a server side issue. We had that situation >> with dCache server, where eof flag was set only if you read beyond >> file size, e.q. READ with count=0 at the offset=file size, we returned >> zero bytes with no eof set. The pynfs test, actually, do retry such >> request and there was an infinite loop. > > But that isn't a short read.. if the request is with count=0 and the > response is 0 without eof, there shouldn't be a retry, and the current > client won't retry in that case, nor fail with EIO. > >> I think, if we (you) add retry on zero byte short-reads >> without eof we may have applications/client hangs in case of >> misbehaving servers. But failing with EIO is not the best >> option. May be it makes sense to query file size in such >> situations? As this is a rare corner case, performance >> penalty will by negligible. > > I do see now that the resistance to this change is because it seems > important to ensure that we make forward progress. This change would lose > that guarantee for a server that gets "stuck". And to Trond's point: if the > zero-length short read really was due to eof at read time, then eof should > have been set. > > It seems tricky for the linux server to determine when to set eof. The best > way might be to adopt the convention that a local short read means we should > set eof and accept the small penalty that a read that completes fully (not > short) up to the end of the file will require another read to detect eof. > That should be a fairly rare case except where file sizes end up being > multiples of rsize. It should be trivial for the linux server to determine when eof is to be set, because it is running in a POSIX environment. As I pointed out earlier, the POSIX spec says that all reads to a regular file that return less than the number of requested bytes are hitting eof. > > Ben > >> >> Tigran. >> >> ----- Original Message ----- >> > From: "Trond Myklebust" <trond.myklebust@primarydata.com> >> > To: "J. Bruce Fields" <bfields@fieldses.org> >> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List" >> > <linux-nfs@vger.kernel.org> >> > Sent: Wednesday, March 16, 2016 9:02:49 PM >> > Subject: Re: [PATCH] NFS: Retry a zero-length short read >> >> > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >> >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: >> >>> So, sounds like fixing this is a good idea on the server. I hope Trond will >> >>> let us know if he still feels that the client ought not to be changed since >> >>> it seems an easy enough fix to avoid a similar problem on another server. >> >>> Perhaps there's a downside I'm not seeing on the client. >> >> >> >> My worry would just be ensuring forward progress--if the client gets >> >> some data back, then at least the next read can start at a later >> >> offset.... With zero reads, we can set a maximum number of retries, I >> >> guess, but that makes it little messy. >> >> >> >>> Or maybe the >> >>> convention of read() returning 0 meaning eof is global enough to cause it to >> >>> be acceptible behavior -- we really should treat a zero-length read response >> >>> without eof as an error. My lack of experience is showing.. :) >> >> >> >> Eh, I think it's legitimately more confusing than it should be. >> >> >> > >> > POSIX is very specific about the cases where you are allowed to return >> > a short read: >> > >> > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html >> > >> > "The value returned may be less than nbyte if the number of bytes left >> > in the file is less than nbyte, if the read() request was interrupted >> > by a signal, or if the file is a pipe or FIFO or special file and has >> > fewer than nbyte bytes immediately available for reading. For example, >> > a read() from a file associated with a terminal may return one typed >> > line of data." >> > >> > So I'm guessing most POSIX based server implementations should have no >> > trouble working out exactly when to set the eof flag. However the >> > client has no clue as to what OS your server is based on, which is >> > presumably the main reason why NFS has an eof flag in the first place. >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Mar 2016, Trond Myklebust wrote: > On Thu, Mar 17, 2016 at 6:11 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > On Thu, 17 Mar 2016, Mkrtchyan, Tigran wrote: > >> I agree with Trond, that returning zero bytes without setting eof > >> with a high probability a server side issue. We had that situation > >> with dCache server, where eof flag was set only if you read beyond > >> file size, e.q. READ with count=0 at the offset=file size, we returned > >> zero bytes with no eof set. The pynfs test, actually, do retry such > >> request and there was an infinite loop. > > > > But that isn't a short read.. if the request is with count=0 and the > > response is 0 without eof, there shouldn't be a retry, and the current > > client won't retry in that case, nor fail with EIO. > > > >> I think, if we (you) add retry on zero byte short-reads > >> without eof we may have applications/client hangs in case of > >> misbehaving servers. But failing with EIO is not the best > >> option. May be it makes sense to query file size in such > >> situations? As this is a rare corner case, performance > >> penalty will by negligible. > > > > I do see now that the resistance to this change is because it seems > > important to ensure that we make forward progress. This change would lose > > that guarantee for a server that gets "stuck". And to Trond's point: if the > > zero-length short read really was due to eof at read time, then eof should > > have been set. > > > > It seems tricky for the linux server to determine when to set eof. The best > > way might be to adopt the convention that a local short read means we should > > set eof and accept the small penalty that a read that completes fully (not > > short) up to the end of the file will require another read to detect eof. > > That should be a fairly rare case except where file sizes end up being > > multiples of rsize. > > It should be trivial for the linux server to determine when eof is to > be set, because it is running in a POSIX environment. As I pointed out > earlier, the POSIX spec says that all reads to a regular file that > return less than the number of requested bytes are hitting eof. I agree. I meant to say it seems tricky to do it they way we are doing it now: by checking i_size afterward. So, thanks for pointing out how specific POSIX is about this. I'll send a patch for the server. > > > > Ben > > > >> > >> Tigran. > >> > >> ----- Original Message ----- > >> > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > >> > To: "J. Bruce Fields" <bfields@fieldses.org> > >> > Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "Linux NFS Mailing List" > >> > <linux-nfs@vger.kernel.org> > >> > Sent: Wednesday, March 16, 2016 9:02:49 PM > >> > Subject: Re: [PATCH] NFS: Retry a zero-length short read > >> > >> > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> >> On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: > >> >>> So, sounds like fixing this is a good idea on the server. I hope Trond will > >> >>> let us know if he still feels that the client ought not to be changed since > >> >>> it seems an easy enough fix to avoid a similar problem on another server. > >> >>> Perhaps there's a downside I'm not seeing on the client. > >> >> > >> >> My worry would just be ensuring forward progress--if the client gets > >> >> some data back, then at least the next read can start at a later > >> >> offset.... With zero reads, we can set a maximum number of retries, I > >> >> guess, but that makes it little messy. > >> >> > >> >>> Or maybe the > >> >>> convention of read() returning 0 meaning eof is global enough to cause it to > >> >>> be acceptible behavior -- we really should treat a zero-length read response > >> >>> without eof as an error. My lack of experience is showing.. :) > >> >> > >> >> Eh, I think it's legitimately more confusing than it should be. > >> >> > >> > > >> > POSIX is very specific about the cases where you are allowed to return > >> > a short read: > >> > > >> > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > >> > > >> > "The value returned may be less than nbyte if the number of bytes left > >> > in the file is less than nbyte, if the read() request was interrupted > >> > by a signal, or if the file is a pipe or FIFO or special file and has > >> > fewer than nbyte bytes immediately available for reading. For example, > >> > a read() from a file associated with a terminal may return one typed > >> > line of data." > >> > > >> > So I'm guessing most POSIX based server implementations should have no > >> > trouble working out exactly when to set the eof flag. However the > >> > client has no clue as to what OS your server is based on, which is > >> > presumably the main reason why NFS has an eof flag in the first place. > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 04:02:49PM -0400, Trond Myklebust wrote: > On Wed, Mar 16, 2016 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Mar 16, 2016 at 03:46:28PM -0400, Benjamin Coddington wrote: > >> So, sounds like fixing this is a good idea on the server. I hope Trond will > >> let us know if he still feels that the client ought not to be changed since > >> it seems an easy enough fix to avoid a similar problem on another server. > >> Perhaps there's a downside I'm not seeing on the client. > > > > My worry would just be ensuring forward progress--if the client gets > > some data back, then at least the next read can start at a later > > offset.... With zero reads, we can set a maximum number of retries, I > > guess, but that makes it little messy. > > > >> Or maybe the > >> convention of read() returning 0 meaning eof is global enough to cause it to > >> be acceptible behavior -- we really should treat a zero-length read response > >> without eof as an error. My lack of experience is showing.. :) > > > > Eh, I think it's legitimately more confusing than it should be. > > > > POSIX is very specific about the cases where you are allowed to return > a short read: > > See http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > > "The value returned may be less than nbyte if the number of bytes left > in the file is less than nbyte, if the read() request was interrupted > by a signal, or if the file is a pipe or FIFO or special file and has > fewer than nbyte bytes immediately available for reading. For example, > a read() from a file associated with a terminal may return one typed > line of data." It bothers me a bit that that's "if" and not "if and only if", but maybe that's the posix style. The linux man pages are even looser ("this may happen for example because..." and then the above reasons). --b. > > So I'm guessing most POSIX based server implementations should have no > trouble working out exactly when to set the eof flag. However the > client has no clue as to what OS your server is based on, which is > presumably the main reason why NFS has an eof flag in the first place. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/read.c b/fs/nfs/read.c index eb31e23..7269d42 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task, /* This is a short read! */ nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD); - /* Has the server at least made some progress? */ - if (resp->count == 0) { - nfs_set_pgio_error(hdr, -EIO, argp->offset); - return; - } /* For non rpc-based layout drivers, retry-through-MDS */ if (!task->tk_ops) {
A zero-length short read without eof should be retried rather than sending an error to the application. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/read.c | 5 ----- 1 files changed, 0 insertions(+), 5 deletions(-)