Message ID | 20220123095023.2775411-1-dan.aloni@vastdata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check | expand |
On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the > RPC read layers") on the client, a read of 0xfff is aligned up to > server > rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and > it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. > > This fixes the issue at server side by trimming reads past > NFS_OFFSET_MAX. It also adds a missing check for out of bound offset > in > NFSv3, copying a similar check from NFSv4.x. > > Cc: <stable@vger.kernel.org> > Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> > --- > fs/nfsd/nfs4proc.c | 3 +++ > fs/nfsd/vfs.c | 6 ++++++ > 2 files changed, 9 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 486c5dba4b65..816bdf212559 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct > nfsd4_compound_state *cstate, > if (read->rd_offset >= OFFSET_MAX) > return nfserr_inval; > > + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) > + read->rd_length = OFFSET_MAX - read->rd_offset; > + > trace_nfsd_read_start(rqstp, &cstate->current_fh, > read->rd_offset, read->rd_length); > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..ad4df374433e 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, > struct svc_fh *fhp, > struct file *file; > __be32 err; > > + if (unlikely(offset >= NFS_OFFSET_MAX)) > + return nfserr_inval; POSIX does allow you to lseek to the maximum filesize offset (sb- >s_maxbytes), however any subsequent read will return 0 bytes (i.e. eof), whereas a subsequent write will return EFBIG. > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; Can we please fix this to use the actual per-filesystem file size limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? Ditto for 'read' above. > + > trace_nfsd_read_start(rqstp, fhp, offset, *count); > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err)
> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: >> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to >> the >> RPC read layers") on the client, a read of 0xfff is aligned up to >> server >> rsize of 0x1000. >> >> As a result, in a test where the server has a file of size >> 0x7fffffffffffffff, and the client tries to read from the offset >> 0x7ffffffffffff000, the read causes loff_t overflow in the server and >> it >> returns an NFS code of EINVAL to the client. The client as a result >> indefinitely retries the request. >> >> This fixes the issue at server side by trimming reads past >> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset >> in >> NFSv3, copying a similar check from NFSv4.x. >> >> Cc: <stable@vger.kernel.org> >> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> >> --- >> fs/nfsd/nfs4proc.c | 3 +++ >> fs/nfsd/vfs.c | 6 ++++++ >> 2 files changed, 9 insertions(+) >> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >> index 486c5dba4b65..816bdf212559 100644 >> --- a/fs/nfsd/nfs4proc.c >> +++ b/fs/nfsd/nfs4proc.c >> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct >> nfsd4_compound_state *cstate, >> if (read->rd_offset >= OFFSET_MAX) >> return nfserr_inval; >> >> + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) >> + read->rd_length = OFFSET_MAX - read->rd_offset; >> + >> trace_nfsd_read_start(rqstp, &cstate->current_fh, >> read->rd_offset, read->rd_length); >> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >> index 738d564ca4ce..ad4df374433e 100644 >> --- a/fs/nfsd/vfs.c >> +++ b/fs/nfsd/vfs.c >> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, >> struct svc_fh *fhp, >> struct file *file; >> __be32 err; >> >> + if (unlikely(offset >= NFS_OFFSET_MAX)) >> + return nfserr_inval; > > POSIX does allow you to lseek to the maximum filesize offset (sb- >> s_maxbytes), however any subsequent read will return 0 bytes (i.e. > eof), whereas a subsequent write will return EFBIG. I'm simply trying to clarify your request. You've stated that the Linux NFS client does not handle INVAL responses, even though both RFC 1813 and 8881 permit it. So are you suggesting (here) that the Linux NFS server should not return INVAL on READs beyond the filesystem's supported maximum file size but instead return a successful 0-byte result with EOF set? >> + >> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >> + *count = NFS_OFFSET_MAX - offset; > > Can we please fix this to use the actual per-filesystem file size > limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? > > Ditto for 'read' above. That sounds reasonable. But I wonder if there are some other places that need the same treatment. >> + >> trace_nfsd_read_start(rqstp, fhp, offset, *count); >> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); >> if (err) -- Chuck Lever
Hi Dan- Dropping stable@. No need to copy them on this discussion. Also, you don't need to actually cc: stable when you repost. Just add the tag as you did below. Automation will pick up the patch when it goes into mainline. More below. > On Jan 23, 2022, at 12:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> >> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: >> >> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: >>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to >>> the >>> RPC read layers") on the client, a read of 0xfff is aligned up to >>> server >>> rsize of 0x1000. >>> >>> As a result, in a test where the server has a file of size >>> 0x7fffffffffffffff, and the client tries to read from the offset >>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and >>> it >>> returns an NFS code of EINVAL to the client. The client as a result >>> indefinitely retries the request. >>> >>> This fixes the issue at server side by trimming reads past >>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset >>> in >>> NFSv3, copying a similar check from NFSv4.x. >>> >>> Cc: <stable@vger.kernel.org> >>> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> >>> --- >>> fs/nfsd/nfs4proc.c | 3 +++ >>> fs/nfsd/vfs.c | 6 ++++++ >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>> index 486c5dba4b65..816bdf212559 100644 >>> --- a/fs/nfsd/nfs4proc.c >>> +++ b/fs/nfsd/nfs4proc.c >>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct >>> nfsd4_compound_state *cstate, >>> if (read->rd_offset >= OFFSET_MAX) >>> return nfserr_inval; >>> >>> + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) >>> + read->rd_length = OFFSET_MAX - read->rd_offset; >>> + >>> trace_nfsd_read_start(rqstp, &cstate->current_fh, >>> read->rd_offset, read->rd_length); >>> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index 738d564ca4ce..ad4df374433e 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, >>> struct svc_fh *fhp, >>> struct file *file; >>> __be32 err; >>> >>> + if (unlikely(offset >= NFS_OFFSET_MAX)) >>> + return nfserr_inval; >> >> POSIX does allow you to lseek to the maximum filesize offset (sb- >>> s_maxbytes), however any subsequent read will return 0 bytes (i.e. >> eof), whereas a subsequent write will return EFBIG. > > I'm simply trying to clarify your request. > > You've stated that the Linux NFS client does not handle INVAL > responses, even though both RFC 1813 and 8881 permit it. So > are you suggesting (here) that the Linux NFS server should > not return INVAL on READs beyond the filesystem's supported > maximum file size but instead return a successful 0-byte > result with EOF set? After some thought and discussion with Solaris NFS server engineers, I think this is the best response to a READ whose range arguments go past the server's advertised maxfilesize. So can you please confirm that your final fix does: 1. The range of values that was failing before but shouldn't have, now succeeds 2. When the offset is less than maxfilesize but offset+count exceeds it, the READ should succeed but return a short result and set EOF 3. When the range is completely outside maxfilesize, the READ should succeed but return zero bytes and set EOF And I don't mind if you split the fix over two or three patches if that makes it more clear. >>> + >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>> + *count = NFS_OFFSET_MAX - offset; >> >> Can we please fix this to use the actual per-filesystem file size >> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? >> >> Ditto for 'read' above. > > That sounds reasonable. I'm wondering whether the VFS itself will bound the range arguments relative to sb->s_maxbytes. I'm told that the kiocb iterators used in fs/nfsd/vfs.c should do that. All that NFSD has to do is ensure that the incoming client values are converted from u64 to loff_t without underflowing. So comparing @offset with OFFSET_MAX here seems like the right thing to do? We just don't want the READ to fail with INVAL. > But I wonder if there are some other > places that need the same treatment. I've confirmed that there /are/ other places that need to be fixed. I've created a series of patches that will address those. The first of those was the COMMIT patch I posted yesterday. Dan, I'd like to include your READ fixes in that series. >>> + >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); >>> if (err) > > -- > Chuck Lever -- Chuck Lever
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 486c5dba4b65..816bdf212559 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (read->rd_offset >= OFFSET_MAX) return nfserr_inval; + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) + read->rd_length = OFFSET_MAX - read->rd_offset; + trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 738d564ca4ce..ad4df374433e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file; __be32 err; + if (unlikely(offset >= NFS_OFFSET_MAX)) + return nfserr_inval; + + if (unlikely(offset + *count > NFS_OFFSET_MAX)) + *count = NFS_OFFSET_MAX - offset; + trace_nfsd_read_start(rqstp, fhp, offset, *count); err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err)
Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") on the client, a read of 0xfff is aligned up to server rsize of 0x1000. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. It also adds a missing check for out of bound offset in NFSv3, copying a similar check from NFSv4.x. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfsd/nfs4proc.c | 3 +++ fs/nfsd/vfs.c | 6 ++++++ 2 files changed, 9 insertions(+)