diff mbox series

[v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

Message ID 20220122124953.1606281-1-dan.aloni@vastdata.com (mailing list archive)
State New, archived
Headers show
Series [v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check | expand

Commit Message

Dan Aloni Jan. 22, 2022, 12:49 p.m. UTC
Due to change in client 8cfb9015280d ("NFS: Always provide aligned
buffers to the RPC read layers"), a read of 0xfff is aligned up to
server rsize of 0x0fff.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past
NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
in NFSv3.

Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
 fs/nfsd/nfs4proc.c |  3 +++
 fs/nfsd/vfs.c      | 11 +++++++++++
 2 files changed, 14 insertions(+)

Comments

Chuck Lever Jan. 22, 2022, 5:37 p.m. UTC | #1
Some additional comments on v2 below. We need to sort the
NFS_OFFSET_MAX v. OFFSET_MAX issue before you send a v3.


> On Jan 22, 2022, at 7:49 AM, Dan Aloni <dan.aloni@vastdata.com> wrote:
> 
> Due to change in client 8cfb9015280d ("NFS: Always provide aligned
> buffers to the RPC read layers"), a read of 0xfff is aligned up to
> server rsize of 0x0fff.

scripts/checkpatch.pl will complain about the way you name the
commit here. It will want:

Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers
to the RPC read layers") on the client, 


> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.
> 
> This fixes the issue at server side by trimming reads past
> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
> in NFSv3.

RFC 1813 section 3.3.6 does say this:

>>       It is possible for the server to return fewer than count
>>       bytes of data. If the server returns less than the count
>>       requested and eof set to FALSE, the client should issue
>>       another READ to get the remaining data. A server may
>>       return less data than requested under several
>>       circumstances. The file may have been truncated by another
>>       client or perhaps on the server itself, changing the file
>>       size from what the requesting client believes to be the
>>       case. This would reduce the actual amount of data
>>       available to the client. It is possible that the server
>>       may back off the transfer size and reduce the read request
>>       return. Server resource exhaustion may also occur
>>       necessitating a smaller read return.

Similar language in RFC 8881 section 18.22.4:

>>    If the server returns a "short read" (i.e., fewer data than requested
>>    and eof is set to FALSE), the client should send another READ to get
>>    the remaining data.  A server may return less data than requested
>>    under several circumstances.  The file may have been truncated by
>>    another client or perhaps on the server itself, changing the file
>>    size from what the requesting client believes to be the case.  This
>>    would reduce the actual amount of data available to the client.  It
>>    is possible that the server reduce the transfer size and so return a
>>    short read result.  Server resource exhaustion may also occur in a
>>    short read.

So the server could be returning INVAL and leaving EOF set
to false. That might be triggering the client to retry the
READ. Does the server need to set the EOF flag if the READ
arguments cross the limit of the range that the server can
return? Does the client need to check for this case and
stop retrying? The specs aren't particularly clear on this
matter.


> Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")

It's arguable that you are fixing 8cfb9015280d. I don't
think that commit is actually broken.

Also, the server behavior is wrong even before that commit,
and that commit is a client change... Mentioning this commit
at the top of the patch description is fine, since that is
how you discovered the problem, but I'd prefer if you drop
this line.

The patch does warrant a Cc: stable, though.


> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
> ---
> fs/nfsd/nfs4proc.c |  3 +++
> fs/nfsd/vfs.c      | 11 +++++++++++
> 2 files changed, 14 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 486c5dba4b65..3b1e395a93b6 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	trace_nfsd_read_start(rqstp, &cstate->current_fh,
> 			      read->rd_offset, read->rd_length);
> 
> +	if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX))
> +		read->rd_length = NFS_OFFSET_MAX - read->rd_offset;
> +

rd_offset is range-checked before the trace point, so I think
this check needs to go before the trace point as well. That
way the adjusted argument values are recorded.

And we need to verify whether NFS_OFFSET_MAX is the correct
constant for this check.


> 	/*
> 	 * If we do a zero copy read, then a client will see read data
> 	 * that reflects the state of the file *after* performing the
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..4a209f807466 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	__be32 err;
> 
> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> +
> +	if (unlikely(offset > NFS_OFFSET_MAX)) {
> +		/* We can return this according to Section 3.3.6 */

RFC 1813 section 3.3.6 says that READ is permitted to return
NFS3ERR_INVAL, but does not mandate that status code in this
particular case (it's provided for general issues similar to
this). So returning INVAL here is an implementation choice.

Thus mentioning the spec here is IMO perhaps misleading, so
I'd like you to drop this comment.


> +		err = nfserr_inval;
> +		goto error;
> +	}
> +
> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
> +		*count = NFS_OFFSET_MAX - offset;
> +

Same as above: these range-checks need to go before the trace
point, IMO.


> 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> 	if (err)
> 		return err;
> @@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 
> 	nfsd_file_put(nf);
> 
> +error:
> 	trace_nfsd_read_done(rqstp, fhp, offset, *count);
> 
> 	return err;
> -- 
> 2.23.0


--
Chuck Lever
Dan Aloni Jan. 23, 2022, 6:29 a.m. UTC | #2
On Sat, Jan 22, 2022 at 05:37:16PM +0000, Chuck Lever III wrote:
> Similar language in RFC 8881 section 18.22.4:
> 
> >>    If the server returns a "short read" (i.e., fewer data than requested
> >>    and eof is set to FALSE), the client should send another READ to get
> >>    the remaining data.  A server may return less data than requested
> >>    under several circumstances.  The file may have been truncated by
> >>    another client or perhaps on the server itself, changing the file
> >>    size from what the requesting client believes to be the case.  This
> >>    would reduce the actual amount of data available to the client.  It
> >>    is possible that the server reduce the transfer size and so return a
> >>    short read result.  Server resource exhaustion may also occur in a
> >>    short read.
> 
> So the server could be returning INVAL and leaving EOF set
> to false. That might be triggering the client to retry the
> READ. Does the server need to set the EOF flag if the READ
> arguments cross the limit of the range that the server can
> return? Does the client need to check for this case and
> stop retrying? The specs aren't particularly clear on this
> matter.

But eof only in `resok` and not in `resfail`. For reference, here's the
reply I got:

Network File System, READ Reply  Error: NFS3ERR_INVAL
    [Program Version: 3]
    [V3 Procedure: READ (6)]
    Status: NFS3ERR_INVAL (22)
    file_attributes  Regular File mode: 0600 uid: 0 gid: 0
        attributes_follow: value follows (1)
        attributes  Regular File mode: 0600 uid: 0 gid: 0
            Type: Regular File (1)
            Mode: 0600, S_IRUSR, S_IWUSR
            nlink: 1
            uid: 0
            gid: 0
            size: 9223372036854775807
            used: 4096
            rdev: 0,0
            fsid: 0x0000000000000002 (2)
            fileid: 69
            atime: Jan 18, 2022 20:20:33.611267439 IST
                seconds: 1642530033
                nano seconds: 611267439
            mtime: Jan 18, 2022 20:20:33.571266608 IST
                seconds: 1642530033
                nano seconds: 571266608
            ctime: Jan 18, 2022 20:20:33.571266608 IST
                seconds: 1642530033
                nano seconds: 571266608
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..3b1e395a93b6 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -788,6 +788,9 @@  nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	trace_nfsd_read_start(rqstp, &cstate->current_fh,
 			      read->rd_offset, read->rd_length);
 
+	if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX))
+		read->rd_length = NFS_OFFSET_MAX - read->rd_offset;
+
 	/*
 	 * If we do a zero copy read, then a client will see read data
 	 * that reflects the state of the file *after* performing the
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 738d564ca4ce..4a209f807466 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1046,6 +1046,16 @@  __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	__be32 err;
 
 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
+
+	if (unlikely(offset > NFS_OFFSET_MAX)) {
+		/* We can return this according to Section 3.3.6 */
+		err = nfserr_inval;
+		goto error;
+	}
+
+	if (unlikely(offset + *count > NFS_OFFSET_MAX))
+		*count = NFS_OFFSET_MAX - offset;
+
 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (err)
 		return err;
@@ -1058,6 +1068,7 @@  __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	nfsd_file_put(nf);
 
+error:
 	trace_nfsd_read_done(rqstp, fhp, offset, *count);
 
 	return err;