diff mbox series

[v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

Message ID 20220123095023.2775411-1-dan.aloni@vastdata.com (mailing list archive)
State New, archived
Headers show
Series [v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check | expand

Commit Message

Dan Aloni Jan. 23, 2022, 9:50 a.m. UTC
Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the
RPC read layers") on the client, a read of 0xfff is aligned up to server
rsize of 0x1000.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past
NFS_OFFSET_MAX. It also adds a missing check for out of bound offset in
NFSv3, copying a similar check from NFSv4.x.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
 fs/nfsd/nfs4proc.c | 3 +++
 fs/nfsd/vfs.c      | 6 ++++++
 2 files changed, 9 insertions(+)

Comments

Trond Myklebust Jan. 23, 2022, 3:29 p.m. UTC | #1
On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
> the
> RPC read layers") on the client, a read of 0xfff is aligned up to
> server
> rsize of 0x1000.
> 
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
> it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.
> 
> This fixes the issue at server side by trimming reads past
> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
> in
> NFSv3, copying a similar check from NFSv4.x.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
> ---
>  fs/nfsd/nfs4proc.c | 3 +++
>  fs/nfsd/vfs.c      | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 486c5dba4b65..816bdf212559 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
>         if (read->rd_offset >= OFFSET_MAX)
>                 return nfserr_inval;
>  
> +       if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
> +               read->rd_length = OFFSET_MAX - read->rd_offset;
> +
>         trace_nfsd_read_start(rqstp, &cstate->current_fh,
>                               read->rd_offset, read->rd_length);
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..ad4df374433e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
> struct svc_fh *fhp,
>         struct file *file;
>         __be32 err;
>  
> +       if (unlikely(offset >= NFS_OFFSET_MAX))
> +               return nfserr_inval;

POSIX does allow you to lseek to the maximum filesize offset (sb-
>s_maxbytes), however any subsequent read will return 0 bytes (i.e.
eof), whereas a subsequent write will return EFBIG.

> +
> +       if (unlikely(offset + *count > NFS_OFFSET_MAX))
> +               *count = NFS_OFFSET_MAX - offset;

Can we please fix this to use the actual per-filesystem file size
limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?

Ditto for 'read' above.

> +
>         trace_nfsd_read_start(rqstp, fhp, offset, *count);
>         err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>         if (err)
Chuck Lever III Jan. 23, 2022, 5:03 p.m. UTC | #2
> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
>> the
>> RPC read layers") on the client, a read of 0xfff is aligned up to
>> server
>> rsize of 0x1000.
>> 
>> As a result, in a test where the server has a file of size
>> 0x7fffffffffffffff, and the client tries to read from the offset
>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
>> it
>> returns an NFS code of EINVAL to the client. The client as a result
>> indefinitely retries the request.
>> 
>> This fixes the issue at server side by trimming reads past
>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
>> in
>> NFSv3, copying a similar check from NFSv4.x.
>> 
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
>> ---
>>  fs/nfsd/nfs4proc.c | 3 +++
>>  fs/nfsd/vfs.c      | 6 ++++++
>>  2 files changed, 9 insertions(+)
>> 
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 486c5dba4b65..816bdf212559 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
>> nfsd4_compound_state *cstate,
>>         if (read->rd_offset >= OFFSET_MAX)
>>                 return nfserr_inval;
>>  
>> +       if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
>> +               read->rd_length = OFFSET_MAX - read->rd_offset;
>> +
>>         trace_nfsd_read_start(rqstp, &cstate->current_fh,
>>                               read->rd_offset, read->rd_length);
>>  
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 738d564ca4ce..ad4df374433e 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
>> struct svc_fh *fhp,
>>         struct file *file;
>>         __be32 err;
>>  
>> +       if (unlikely(offset >= NFS_OFFSET_MAX))
>> +               return nfserr_inval;
> 
> POSIX does allow you to lseek to the maximum filesize offset (sb-
>> s_maxbytes), however any subsequent read will return 0 bytes (i.e.
> eof), whereas a subsequent write will return EFBIG.

I'm simply trying to clarify your request.

You've stated that the Linux NFS client does not handle INVAL
responses, even though both RFC 1813 and 8881 permit it. So
are you suggesting (here) that the Linux NFS server should
not return INVAL on READs beyond the filesystem's supported
maximum file size but instead return a successful 0-byte
result with EOF set?


>> +
>> +       if (unlikely(offset + *count > NFS_OFFSET_MAX))
>> +               *count = NFS_OFFSET_MAX - offset;
> 
> Can we please fix this to use the actual per-filesystem file size
> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?
> 
> Ditto for 'read' above.

That sounds reasonable. But I wonder if there are some other
places that need the same treatment.


>> +
>>         trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>         err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>>         if (err)

--
Chuck Lever
Chuck Lever III Jan. 26, 2022, 4:23 p.m. UTC | #3
Hi Dan-

Dropping stable@. No need to copy them on this discussion.

Also, you don't need to actually cc: stable when you repost.
Just add the tag as you did below. Automation will pick up
the patch when it goes into mainline.

More below.


> On Jan 23, 2022, at 12:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>> 
>> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
>>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
>>> the
>>> RPC read layers") on the client, a read of 0xfff is aligned up to
>>> server
>>> rsize of 0x1000.
>>> 
>>> As a result, in a test where the server has a file of size
>>> 0x7fffffffffffffff, and the client tries to read from the offset
>>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
>>> it
>>> returns an NFS code of EINVAL to the client. The client as a result
>>> indefinitely retries the request.
>>> 
>>> This fixes the issue at server side by trimming reads past
>>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
>>> in
>>> NFSv3, copying a similar check from NFSv4.x.
>>> 
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
>>> ---
>>> fs/nfsd/nfs4proc.c | 3 +++
>>> fs/nfsd/vfs.c      | 6 ++++++
>>> 2 files changed, 9 insertions(+)
>>> 
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index 486c5dba4b65..816bdf212559 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>>        if (read->rd_offset >= OFFSET_MAX)
>>>                return nfserr_inval;
>>> 
>>> +       if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
>>> +               read->rd_length = OFFSET_MAX - read->rd_offset;
>>> +
>>>        trace_nfsd_read_start(rqstp, &cstate->current_fh,
>>>                              read->rd_offset, read->rd_length);
>>> 
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 738d564ca4ce..ad4df374433e 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
>>> struct svc_fh *fhp,
>>>        struct file *file;
>>>        __be32 err;
>>> 
>>> +       if (unlikely(offset >= NFS_OFFSET_MAX))
>>> +               return nfserr_inval;
>> 
>> POSIX does allow you to lseek to the maximum filesize offset (sb-
>>> s_maxbytes), however any subsequent read will return 0 bytes (i.e.
>> eof), whereas a subsequent write will return EFBIG.
> 
> I'm simply trying to clarify your request.
> 
> You've stated that the Linux NFS client does not handle INVAL
> responses, even though both RFC 1813 and 8881 permit it. So
> are you suggesting (here) that the Linux NFS server should
> not return INVAL on READs beyond the filesystem's supported
> maximum file size but instead return a successful 0-byte
> result with EOF set?

After some thought and discussion with Solaris NFS server
engineers, I think this is the best response to a READ
whose range arguments go past the server's advertised
maxfilesize.

So can you please confirm that your final fix does:

1. The range of values that was failing before but shouldn't
   have, now succeeds

2. When the offset is less than maxfilesize but offset+count
   exceeds it, the READ should succeed but return a short
   result and set EOF

3. When the range is completely outside maxfilesize, the
   READ should succeed but return zero bytes and set EOF

And I don't mind if you split the fix over two or three
patches if that makes it more clear.


>>> +
>>> +       if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>> +               *count = NFS_OFFSET_MAX - offset;
>> 
>> Can we please fix this to use the actual per-filesystem file size
>> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?
>> 
>> Ditto for 'read' above.
> 
> That sounds reasonable.

I'm wondering whether the VFS itself will bound the range
arguments relative to sb->s_maxbytes. I'm told that the
kiocb iterators used in fs/nfsd/vfs.c should do that.

All that NFSD has to do is ensure that the incoming
client values are converted from u64 to loff_t without
underflowing. So comparing @offset with OFFSET_MAX here
seems like the right thing to do?

We just don't want the READ to fail with INVAL.


> But I wonder if there are some other
> places that need the same treatment.

I've confirmed that there /are/ other places that need to
be fixed. I've created a series of patches that will address
those. The first of those was the COMMIT patch I posted
yesterday.

Dan, I'd like to include your READ fixes in that series.


>>> +
>>>        trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>>        err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>>>        if (err)
> 
> --
> Chuck Lever

--
Chuck Lever
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..816bdf212559 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -785,6 +785,9 @@  nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (read->rd_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
+	if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
+		read->rd_length = OFFSET_MAX - read->rd_offset;
+
 	trace_nfsd_read_start(rqstp, &cstate->current_fh,
 			      read->rd_offset, read->rd_length);
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 738d564ca4ce..ad4df374433e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1045,6 +1045,12 @@  __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct file *file;
 	__be32 err;
 
+	if (unlikely(offset >= NFS_OFFSET_MAX))
+		return nfserr_inval;
+
+	if (unlikely(offset + *count > NFS_OFFSET_MAX))
+		*count = NFS_OFFSET_MAX - offset;
+
 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (err)