diff mbox series

[RFC,1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes

Message ID 164329971128.5879.15718457509790221509.stgit@bazille.1015granger.net (mailing list archive)
State New, archived
Headers show
Series NFSD size, offset, and count sanity | expand

Commit Message

Chuck Lever Jan. 27, 2022, 4:08 p.m. UTC
iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
in there. If that value is larger than S64_MAX, then ia_size has
underflowed.

In this case the negative size is passed through to the VFS and
underlying filesystems. I've observed XFS behavior: it returns
EIO but still attempts to access past the end of the device.
IOW it assumes the caller has already sanity-checked the value.

Have our server return NFS4ERR_FBIG to the client when the passed-in
file size cannot be held in a loff_t variable.

> 15.1.4.4.  NFS4ERR_FBIG (Error Code 27)
>
>    The file is too large.  The operation would have caused the file to
>    grow beyond the server's limit.

It's likely that other NFSv4 operations that take a fattr4 argument
(such as OPEN) have a similar issue).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |    3 +++
 1 file changed, 3 insertions(+)

Comments

Dave Chinner Jan. 28, 2022, 12:36 a.m. UTC | #1
On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
> iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
> in there. If that value is larger than S64_MAX, then ia_size has
> underflowed.
> 
> In this case the negative size is passed through to the VFS and
> underlying filesystems. I've observed XFS behavior: it returns
> EIO but still attempts to access past the end of the device.

What attempts to access beyond the end of the device? A file offset
is not a disk offset, and the filesystem cannot allocate blocks for
IO that are outside the device boundaries. So I don't understand how
setting an inode size of >LLONGMAX can cause the filesystem to
access blocks outside the range it can allocate and map IO to. If
this falls through to trying to access data outside the range the
filesystem is allowed to access then we've got a bug that needs to
be fixed.

Can you please clarify the behaviour that is occurring here (stack
traces demonstrating the IO path that leads to access past the end
of device would be useful) so we can look into this further?

> IOW it assumes the caller has already sanity-checked the value.

Every filesystem assumes that the iattr that is passed to ->setattr
by notify_change() has been sanity checked and the parameters are
within the valid VFS supported ranges, not just XFS. Perhaps this
check should be in notify_change, not in the callers?

Cheers,

Dave.
Chuck Lever Jan. 28, 2022, 1:48 a.m. UTC | #2
> On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
>> iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
>> in there. If that value is larger than S64_MAX, then ia_size has
>> underflowed.
>> 
>> In this case the negative size is passed through to the VFS and
>> underlying filesystems. I've observed XFS behavior: it returns
>> EIO but still attempts to access past the end of the device.
> 
> What attempts to access beyond the end of the device? A file offset
> is not a disk offset, and the filesystem cannot allocate blocks for
> IO that are outside the device boundaries. So I don't understand how
> setting an inode size of >LLONGMAX can cause the filesystem to
> access blocks outside the range it can allocate and map IO to. If
> this falls through to trying to access data outside the range the
> filesystem is allowed to access then we've got a bug that needs to
> be fixed.
> 
> Can you please clarify the behaviour that is occurring here (stack
> traces demonstrating the IO path that leads to access past the end
> of device would be useful) so we can look into this further?

Reproducer:

I constructed a pynfs test that sends an NFSv4.0 SETATTR request
to set the file length to U64_MAX. That test was applied to pynfs
today.

git://git.linux-nfs.org/projects/bfields/pynfs.git

I will note that I tried this test against a tmpfs export as
well. No crash, but a subsequent GETATTR returned U64_MAX,
which is surprising. There's really no checking in that path
either.


Note below: md0 is the device where this filesystem resides.
It's a pair of 3TB PCIe NVMe cards striped together. Kernel at
the time on this system was 5.17-rc1 + a few NFSD patches.

Jan 26 16:01:26 klimt.1015granger.net rpc.mountd[972]: v4.0 client attached: 0x61bb6d4261eef9f6 from "192.168.1.67:53636"
Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------
Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:33 iomap_iter+0x1b5/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_>
Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Not tainted 5.17.0-rc1-00165-g2785fad9b745 #3338
Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020
Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1b5/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 8b 73 08 49 8b 04 24 4d 89 e9 4d 89 f0 48 8b 3b e8 c0 79 8e 00 85 c0 0f 88 c1 00 00 00 48 8>
Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010213
Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: 0000000000000000 RBX: ffffa65701ea3ad0 RCX: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40
Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100
Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000
Jan 26 16:01:26 klimt.1015granger.net kernel: FS:  0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000
Jan 26 16:01:26 klimt.1015granger.net kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0
Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace:
Jan 26 16:01:26 klimt.1015granger.net kernel:  <TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_zero_range+0x6c/0x1a9
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? __radix_tree_lookup+0x2f/0xac
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_truncate_page+0x31/0x36
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_truncate_page+0x39/0x3b [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_setattr_size+0x11a/0x306 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr_size+0x4e/0x57 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr+0x67/0xb1 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  notify_change+0x2ac/0x3a2
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_setattr+0xf1/0x130 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_proc_compound+0x337/0x474 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_dispatch+0x1a9/0x260 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process_common+0x331/0x4bc [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_svc+0x2f5/0x2f5 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process+0xc8/0xe7 [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd+0xdd/0x160 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  kthread+0xf7/0xff
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_shutdown_threads+0x65/0x65 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 26 16:01:26 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
Jan 26 16:01:26 klimt.1015granger.net kernel:  </TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]---
Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------
Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:35 iomap_iter+0x1ca/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_>
Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Tainted: G        W         5.17.0-rc1-00165-g2785fad9b745 #3338
Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020
Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1ca/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 85 c0 0f 88 c1 00 00 00 48 8b 43 30 48 8b 53 08 48 39 d0 7e 02 0f 0b 48 8b 4b 38 48 85 c9 7>
Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010293
Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: f8ada41a253c0000 RBX: ffffa65701ea3ad0 RCX: f8ada41a253c0000
Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40
Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100
Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000
Jan 26 16:01:26 klimt.1015granger.net kernel: FS:  0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000
Jan 26 16:01:26 klimt.1015granger.net kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0
Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace:
Jan 26 16:01:26 klimt.1015granger.net kernel:  <TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_zero_range+0x6c/0x1a9
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? __radix_tree_lookup+0x2f/0xac
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_truncate_page+0x31/0x36
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_truncate_page+0x39/0x3b [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_setattr_size+0x11a/0x306 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr_size+0x4e/0x57 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr+0x67/0xb1 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  notify_change+0x2ac/0x3a2
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_setattr+0xf1/0x130 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_proc_compound+0x337/0x474 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_dispatch+0x1a9/0x260 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process_common+0x331/0x4bc [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_svc+0x2f5/0x2f5 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process+0xc8/0xe7 [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd+0xdd/0x160 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  kthread+0xf7/0xff
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_shutdown_threads+0x65/0x65 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 26 16:01:26 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
Jan 26 16:01:26 klimt.1015granger.net kernel:  </TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]---
Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd: attempt to access beyond end of device
                                              md0: rw=2048, want=19907765165852672, limit=12501942272


>> IOW it assumes the caller has already sanity-checked the value.
> 
> Every filesystem assumes that the iattr that is passed to ->setattr
> by notify_change() has been sanity checked and the parameters are
> within the valid VFS supported ranges, not just XFS. Perhaps this
> check should be in notify_change, not in the callers?

My (limited) understanding of the VFS code is that functions at
the notify_change() level expect that its callers will have
already sanitized the input -- those callers are largely the
system call routines. That's why I chose to address this in NFSD.

Maybe inode_newsize_ok() needs to check for negative size values?


--
Chuck Lever
Dave Chinner Feb. 9, 2022, 9:55 p.m. UTC | #3
On Fri, Jan 28, 2022 at 01:48:48AM +0000, Chuck Lever III wrote:
> 
> 
> > On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> > 
> > On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
> >> IOW it assumes the caller has already sanity-checked the value.
> > 
> > Every filesystem assumes that the iattr that is passed to ->setattr
> > by notify_change() has been sanity checked and the parameters are
> > within the valid VFS supported ranges, not just XFS. Perhaps this
> > check should be in notify_change, not in the callers?
> 
> My (limited) understanding of the VFS code is that functions at
> the notify_change() level expect that its callers will have
> already sanitized the input -- those callers are largely the
> system call routines. That's why I chose to address this in NFSD.
> 
> Maybe inode_newsize_ok() needs to check for negative size values?

Yeah, that would seem reasonable - the size passed to it is a
loff_t, and it's not checked for overflows/negative values. So if it
checked for offset < 0 if would catch this....

Cheers,

Dave.
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ed1ee25647be..b8ac2b9bce74 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -972,6 +972,9 @@  nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	int err;
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
+		if (setattr->sa_iattr.ia_size < 0)
+			return nfserr_fbig;
+
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
 				&cstate->current_fh, &setattr->sa_stateid,
 				WR_STATE, NULL, NULL);