nfsd: special case truncates some more

Message ID	20170123153615.GA32201@lst.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> Date: Mon, 23 Jan 2017 16:36:15 +0100 From: Christoph Hellwig <hch@lst.de> To: Jeff Layton <jlayton@poochiereds.net> Cc: Christoph Hellwig <hch@lst.de>, bfields@redhat.com, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH] nfsd: special case truncates some more Message-ID: <20170123153615.GA32201@lst.de> References: <1485104060-15209-1-git-send-email-hch@lst.de> <1485104060-15209-2-git-send-email-hch@lst.de> <1485174116.2786.7.camel@poochiereds.net> <20170123123348.GA28102@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170123123348.GA28102@lst.de> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Christoph Hellwig Jan. 23, 2017, 3:36 p.m. UTC

On Mon, Jan 23, 2017 at 01:33:48PM +0100, Christoph Hellwig wrote:
> I'll need to look at the exact NFS semantics in that area, but after
> a bit of research I can probably come up with something that will work.

Here is my first attempt.  As vfs_truncate will add the ctime and mtime
updates when needed it just leaves handling that quirk to vfs_truncate
and then exits early if no other attributes are set.

Unfortunately at least the Linux client always seems to also request
a mtime update with a size update.  We could keep the

	if (iap->ia_size != i_size_read(inode))

check from the old code and remove ATTR_MTIME, but these racy checks
outside i_rwsem make me feel a bit uneasy.  Jeff, Bruce - any opinion
if we should add something like this:

	/* vfs_truncate will update ctime and mtime if the size changes */
	if (iap->ia_size != i_size_read(inode))
		iap->ia_valid &= ATTR_MTIME;

back to nfsd_setattr?  This would avoid the additional setattr call,
but make me feel dirty :)

---
From 0e06e2fc6157bb97692ed47c21e36120efb9f15c Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 22 Jan 2017 17:17:48 +0100
Subject: nfsd: special case truncates some more
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributs to set to set various file attributes including the
file size and the uid/gid.

The Linux syscalls never mixe size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates might not update random other attributes, and many
other file systems handle the case but do not update the different
attributes in the same transaction.  NFSD on the other hand passes
the attributes it gets on the wire more or less directly through to
the VFS, leading to updates the file systems don't expect.  XFS at
least has an assert on the allowed attributes, which cought an NFS
client sets the size and group ІD at the same time.

To handles this issue properly this switches nfsd to call vfs_truncate
for size changes, and then handling all other attributes through
notify_change.  As a side effect this also means less boilerplace
code around the size change as we can now reuse the VFS code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/vfs.c | 92 +++++++++++++++++++----------------------------------------
 1 file changed, 30 insertions(+), 62 deletions(-)

Jeff Layton Jan. 23, 2017, 3:52 p.m. UTC | #1

On Mon, 2017-01-23 at 16:36 +0100, Christoph Hellwig wrote:
> On Mon, Jan 23, 2017 at 01:33:48PM +0100, Christoph Hellwig wrote:
> > I'll need to look at the exact NFS semantics in that area, but after
> > a bit of research I can probably come up with something that will work.
> 
> Here is my first attempt.  As vfs_truncate will add the ctime and mtime
> updates when needed it just leaves handling that quirk to vfs_truncate
> and then exits early if no other attributes are set.
> 
> Unfortunately at least the Linux client always seems to also request
> a mtime update with a size update.  We could keep the
> 
> 	if (iap->ia_size != i_size_read(inode))
> 
> check from the old code and remove ATTR_MTIME, but these racy checks
> outside i_rwsem make me feel a bit uneasy.  Jeff, Bruce - any opinion
> if we should add something like this:
> 

Ok, that's more complicated than it looked at first blush. :)

To be clear, the client is requesting to set the mtime to current server
time and not to a specific mtime, right?

> 	/* vfs_truncate will update ctime and mtime if the size changes */
> 	if (iap->ia_size != i_size_read(inode))
> 		iap->ia_valid &= ATTR_MTIME;
> 
> back to nfsd_setattr?  This would avoid the additional setattr call,
> but make me feel dirty :)
> 

I agree that I wouldn't want to go with a potentially racy check.

I don't see where vfs_truncate will handle the times though. do_truncate
will, but you have to pass in a non-zero time_attrs and vfs_truncate
always sets that to 0.

If we did want to do this, it seems like it might be better to just add
a new time_attrs arg to vfs_truncate that gets passed to do_truncate.
Most callers would set it to zero, but nfsd could set it to:

    iap->ia_valid & (ATTR_MTIME|ATTR_CTIME)

Would that work?

> ---
> From 0e06e2fc6157bb97692ed47c21e36120efb9f15c Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Sun, 22 Jan 2017 17:17:48 +0100
> Subject: nfsd: special case truncates some more
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Both the NFS protocols and the Linux VFS use a setattr operation with a
> bitmap of attributs to set to set various file attributes including the
> file size and the uid/gid.
> 
> The Linux syscalls never mixe size updates with unrelated updates like
> the uid/gid, and some file systems like XFS and GFS2 rely on the fact
> that truncates might not update random other attributes, and many
> other file systems handle the case but do not update the different
> attributes in the same transaction.  NFSD on the other hand passes
> the attributes it gets on the wire more or less directly through to
> the VFS, leading to updates the file systems don't expect.  XFS at
> least has an assert on the allowed attributes, which cought an NFS
> client sets the size and group ІD at the same time.
> 
> To handles this issue properly this switches nfsd to call vfs_truncate
> for size changes, and then handling all other attributes through
> notify_change.  As a side effect this also means less boilerplace
> code around the size change as we can now reuse the VFS code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/nfsd/vfs.c | 92 +++++++++++++++++++----------------------------------------
>  1 file changed, 30 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 26c6fdb..4ca5b92 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -332,37 +332,6 @@ nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap)
>  	}
>  }
>  
> -static __be32
> -nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		struct iattr *iap)
> -{
> -	struct inode *inode = d_inode(fhp->fh_dentry);
> -	int host_err;
> -
> -	if (iap->ia_size < inode->i_size) {
> -		__be32 err;
> -
> -		err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
> -				NFSD_MAY_TRUNC | NFSD_MAY_OWNER_OVERRIDE);
> -		if (err)
> -			return err;
> -	}
> -
> -	host_err = get_write_access(inode);
> -	if (host_err)
> -		goto out_nfserrno;
> -
> -	host_err = locks_verify_truncate(inode, NULL, iap->ia_size);
> -	if (host_err)
> -		goto out_put_write_access;
> -	return 0;
> -
> -out_put_write_access:
> -	put_write_access(inode);
> -out_nfserrno:
> -	return nfserrno(host_err);
> -}
> -
>  /*
>   * Set various file attributes.  After this call fhp needs an fh_put.
>   */
> @@ -377,7 +346,6 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  	__be32		err;
>  	int		host_err;
>  	bool		get_write_count;
> -	int		size_change = 0;
>  
>  	if (iap->ia_valid & (ATTR_ATIME | ATTR_MTIME | ATTR_SIZE))
>  		accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE;
> @@ -390,11 +358,11 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  	/* Get inode */
>  	err = fh_verify(rqstp, fhp, ftype, accmode);
>  	if (err)
> -		goto out;
> +		return err;
>  	if (get_write_count) {
>  		host_err = fh_want_write(fhp);
>  		if (host_err)
> -			return nfserrno(host_err);
> +			goto out_host_err;
>  	}
>  
>  	dentry = fhp->fh_dentry;
> @@ -405,50 +373,50 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  		iap->ia_valid &= ~ATTR_MODE;
>  
>  	if (!iap->ia_valid)
> -		goto out;
> +		return 0;
>  
>  	nfsd_sanitize_attrs(inode, iap);
>  
> +	if (check_guard && guardtime != inode->i_ctime.tv_sec)
> +		return nfserr_notsync;
> +
>  	/*
>  	 * The size case is special, it changes the file in addition to the
> -	 * attributes.
> +	 * attributes, and file systems don't expect it to be mixed with
> +	 * "random" attribute changes.  We thus split out the size change
> +	 * into a separate calo for vfs_truncate, and do the rest as a
> +	 * a separate setattr call.
> +	 *
> +	 * Note that vfs_truncate will also update ctime and mtime if
> +	 * the file size changes.
>  	 */
>  	if (iap->ia_valid & ATTR_SIZE) {
> -		err = nfsd_get_write_access(rqstp, fhp, iap);
> -		if (err)
> -			goto out;
> -		size_change = 1;
> +		struct path path = {
> +			.mnt	= fhp->fh_export->ex_path.mnt,
> +			.dentry	= dentry,
> +		};
>  
> -		/*
> -		 * RFC5661, Section 18.30.4:
> -		 *   Changing the size of a file with SETATTR indirectly
> -		 *   changes the time_modify and change attributes.
> -		 *
> -		 * (and similar for the older RFCs)
> -		 */
> -		if (iap->ia_size != i_size_read(inode))
> -			iap->ia_valid |= ATTR_MTIME;
> +		host_err = vfs_truncate(&path, iap->ia_size);
> +		if (host_err)
> +			goto out_host_err;
> +
> +		iap->ia_valid &= ~ATTR_SIZE;
> +		if (!iap->ia_valid)
> +			goto done;
>  	}
>  
>  	iap->ia_valid |= ATTR_CTIME;
>  
> -	if (check_guard && guardtime != inode->i_ctime.tv_sec) {
> -		err = nfserr_notsync;
> -		goto out_put_write_access;
> -	}
> -
>  	fh_lock(fhp);
>  	host_err = notify_change(dentry, iap, NULL);
>  	fh_unlock(fhp);
> -	err = nfserrno(host_err);
> +	if (host_err)
> +		goto out_host_err;
>  
> -out_put_write_access:
> -	if (size_change)
> -		put_write_access(inode);
> -	if (!err)
> -		err = nfserrno(commit_metadata(fhp));
> -out:
> -	return err;
> +done:
> +	host_err = commit_metadata(fhp);
> +out_host_err:
> +	return nfserrno(host_err);
>  }
>  
>  #if defined(CONFIG_NFSD_V4)

Christoph Hellwig Jan. 23, 2017, 4:05 p.m. UTC | #2

On Mon, Jan 23, 2017 at 10:52:09AM -0500, Jeff Layton wrote:
> To be clear, the client is requesting to set the mtime to current server
> time and not to a specific mtime, right?

Yes.  And I think it's mostly the Linux client being lazy - ATTR_MTIME
is what it gets from the VFS for a truncate operation (but not ftrunate,
so we probably won't see it on the wire in that case, but I need to verify
that first).  Yet another reason for ->truncate :)

> I don't see where vfs_truncate will handle the times though. do_truncate
> will, but you have to pass in a non-zero time_attrs and vfs_truncate
> always sets that to 0.

This is the magic of the Linux VFS interface.  For a ATTR_SIZE operation
the file system is expected to update mtime and ctime if the size changes
even if ATTR_MTIME and ATTR_CTIME are not set.  See the comments
in xfs_vn_setattr_size, which I wrote many years ago when I tripped
over this interesting calling convention.

> If we did want to do this, it seems like it might be better to just add
> a new time_attrs arg to vfs_truncate that gets passed to do_truncate.
> Most callers would set it to zero, but nfsd could set it to:
> 
>     iap->ia_valid & (ATTR_MTIME|ATTR_CTIME)
> 
> Would that work?

I'd hate it.  I'd rather spent my time on a real truncate operation
which makes all the above magic explicit, and as a side effect would
fix the Linux client sending spurious mtime update requests that
the procotol already requires to be done implicitly.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jeff Layton Jan. 23, 2017, 4:14 p.m. UTC | #3

On Mon, 2017-01-23 at 17:05 +0100, Christoph Hellwig wrote:
> On Mon, Jan 23, 2017 at 10:52:09AM -0500, Jeff Layton wrote:
> > To be clear, the client is requesting to set the mtime to current server
> > time and not to a specific mtime, right?
> 
> Yes.  And I think it's mostly the Linux client being lazy - ATTR_MTIME
> is what it gets from the VFS for a truncate operation (but not ftrunate,
> so we probably won't see it on the wire in that case, but I need to verify
> that first).  Yet another reason for ->truncate :)
> 

Heh, ok. Makes sense.

> > I don't see where vfs_truncate will handle the times though. do_truncate
> > will, but you have to pass in a non-zero time_attrs and vfs_truncate
> > always sets that to 0.
> 
> This is the magic of the Linux VFS interface.  For a ATTR_SIZE operation
> the file system is expected to update mtime and ctime if the size changes
> even if ATTR_MTIME and ATTR_CTIME are not set.  See the comments
> in xfs_vn_setattr_size, which I wrote many years ago when I tripped
> over this interesting calling convention.
> 

Ick.

> > If we did want to do this, it seems like it might be better to just add
> > a new time_attrs arg to vfs_truncate that gets passed to do_truncate.
> > Most callers would set it to zero, but nfsd could set it to:
> > 
> >     iap->ia_valid & (ATTR_MTIME|ATTR_CTIME)
> > 
> > Would that work?
> 
> I'd hate it.  I'd rather spent my time on a real truncate operation
> which makes all the above magic explicit, and as a side effect would
> fix the Linux client sending spurious mtime update requests that
> the procotol already requires to be done implicitly.

Fair enough. In that case, I wouldn't try to optimize away the second
notify_change if the client sets ATTR_MTIME/ATTR_CTIME for now

 We might have to dip down into the fs twice for a truncate, but so be
it. If it becomes a problem then we can consider that more reason to add
a real ->truncate operation.

Trond Myklebust Jan. 23, 2017, 4:20 p.m. UTC | #4

T24gTW9uLCAyMDE3LTAxLTIzIGF0IDE3OjA1ICswMTAwLCBDaHJpc3RvcGggSGVsbHdpZyB3cm90
ZToNCj4gT24gTW9uLCBKYW4gMjMsIDIwMTcgYXQgMTA6NTI6MDlBTSAtMDUwMCwgSmVmZiBMYXl0
b24gd3JvdGU6DQo+ID4gVG8gYmUgY2xlYXIsIHRoZSBjbGllbnQgaXMgcmVxdWVzdGluZyB0byBz
ZXQgdGhlIG10aW1lIHRvIGN1cnJlbnQNCj4gPiBzZXJ2ZXINCj4gPiB0aW1lIGFuZCBub3QgdG8g
YSBzcGVjaWZpYyBtdGltZSwgcmlnaHQ/DQo+IA0KPiBZZXMuwqDCoEFuZCBJIHRoaW5rIGl0J3Mg
bW9zdGx5IHRoZSBMaW51eCBjbGllbnQgYmVpbmcgbGF6eSAtDQo+IEFUVFJfTVRJTUUNCj4gaXMg
d2hhdCBpdCBnZXRzIGZyb20gdGhlIFZGUyBmb3IgYSB0cnVuY2F0ZSBvcGVyYXRpb24gKGJ1dCBu
b3QNCj4gZnRydW5hdGUsDQo+IHNvIHdlIHByb2JhYmx5IHdvbid0IHNlZSBpdCBvbiB0aGUgd2ly
ZSBpbiB0aGF0IGNhc2UsIGJ1dCBJIG5lZWQgdG8NCj4gdmVyaWZ5DQo+IHRoYXQgZmlyc3QpLsKg
wqBZZXQgYW5vdGhlciByZWFzb24gZm9yIC0+dHJ1bmNhdGUgOikNCj4gDQoNCk5vdGUgdGhhdCB0
aGUgUE9TSVggc3BlYyBzZWVtcyB0byBoYXZlIGNoYW5nZWQgcmVjZW50bHkuIFRoZSBjdXJyZW50
DQpzcGVjIGFwcGVhcnMgdG8gc3RhdGUgdGhhdCB3ZSBzaG91bGQgc2V0IHRoZSBtdGltZSBhbmQg
Y3RpbWUgKGFuZA0KY2hhbmdlIGF0dHJpYnV0ZSkgb24gc3VjY2VzcyBpbiBvcGVuKE9fVFJVTkMp
LCB0cnVuY2F0ZSgpIGFuZA0KZnRydW5jYXRlKCkuIEluIHByZXZpb3VzIGluY2FybmF0aW9ucyBv
ZiB0aGUgc3BlYywgdHJ1bmNhdGUoKSB3b3VsZA0Kb25seSBzZXQgdGhlIHRpbWUgaWYgdGhlIHNp
emUgd2FzIGNoYW5nZWQ6DQoNClNlZToNCmh0dHA6Ly9wdWJzLm9wZW5ncm91cC5vcmcvb25saW5l
cHVicy85Njk5OTE5Nzk5L2Z1bmN0aW9ucy9mdHJ1bmNhdGUuaHRtDQpsDQpodHRwOi8vcHVicy5v
cGVuZ3JvdXAub3JnL29ubGluZXB1YnMvOTY5OTkxOTc5OS9mdW5jdGlvbnMvdHJ1bmNhdGUuaHRt
bA0KaHR0cDovL3B1YnMub3Blbmdyb3VwLm9yZy9vbmxpbmVwdWJzLzk2OTk5MTk3OTkvZnVuY3Rp
b25zL29wZW4uaHRtbA0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBt
YWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Jan. 23, 2017, 4:26 p.m. UTC | #5

On Mon, Jan 23, 2017 at 04:20:45PM +0000, Trond Myklebust wrote:
> Note that the POSIX spec seems to have changed recently. The current
> spec appears to state that we should set the mtime and ctime (and
> change attribute) on success in open(O_TRUNC), truncate() and
> ftruncate(). In previous incarnations of the spec, truncate() would
> only set the time if the size was changed:

Interesting.  But in this case historical Posix and thus Linux behavior
still takes precedence and we're not suddently going to change behavior.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust Jan. 23, 2017, 5:25 p.m. UTC | #6

On Mon, 2017-01-23 at 17:26 +0100, hch wrote:
> On Mon, Jan 23, 2017 at 04:20:45PM +0000, Trond Myklebust wrote:

> > Note that the POSIX spec seems to have changed recently. The

> > current

> > spec appears to state that we should set the mtime and ctime (and

> > change attribute) on success in open(O_TRUNC), truncate() and

> > ftruncate(). In previous incarnations of the spec, truncate() would

> > only set the time if the size was changed:

> 

> Interesting.  But in this case historical Posix and thus Linux

> behavior

> still takes precedence and we're not suddently going to change

> behavior.

> 



In that case the client will be required to continue to need to send
mtime/ctime in order to ensure that we get the same historical
semantics w.r.t. ftruncate() vs truncate().

IOW: It's not a question of the client being lazy about clearing the
flags. It's a question of enforcing the correct semantics.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

Christoph Hellwig Jan. 23, 2017, 5:38 p.m. UTC | #7

On Mon, Jan 23, 2017 at 05:25:34PM +0000, Trond Myklebust wrote:
> In that case the client will be required to continue to need to send
> mtime/ctime in order to ensure that we get the same historical
> semantics w.r.t. ftruncate() vs truncate().
> 
> IOW: It's not a question of the client being lazy about clearing the
> flags. It's a question of enforcing the correct semantics.

No, the NFS spec requires the server to add an implicit mtime
when the size actually changes.  In fact the current code has a comment
pointing to the section:

 * RFC5661, Section 18.30.4:
 *   Changing the size of a file with SETATTR indirectly
 *   changes the time_modify and change attributes.
 *
 * (and similar for the older RFCs)

And yes, I've double checked that in the RFC.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust Jan. 23, 2017, 5:42 p.m. UTC | #8

On Mon, 2017-01-23 at 18:38 +0100, hch wrote:
> On Mon, Jan 23, 2017 at 05:25:34PM +0000, Trond Myklebust wrote:

> > In that case the client will be required to continue to need to

> > send

> > mtime/ctime in order to ensure that we get the same historical

> > semantics w.r.t. ftruncate() vs truncate().

> > 

> > IOW: It's not a question of the client being lazy about clearing

> > the

> > flags. It's a question of enforcing the correct semantics.

> 

> No, the NFS spec requires the server to add an implicit mtime

> when the size actually changes.  In fact the current code has a

> comment

> pointing to the section:

> 

>  * RFC5661, Section 18.30.4:

>  *   Changing the size of a file with SETATTR indirectly

>  *   changes the time_modify and change attributes.

>  *

>  * (and similar for the older RFCs)

> 

> And yes, I've double checked that in the RFC.


Sure, but truncate() on POSIX adds the requirement that the mtime/ctime
should change even when the file size is not changed.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

J. Bruce Fields Jan. 24, 2017, 4:25 p.m. UTC | #9

On Mon, Jan 23, 2017 at 05:26:07PM +0100, hch wrote:
> On Mon, Jan 23, 2017 at 04:20:45PM +0000, Trond Myklebust wrote:
> > Note that the POSIX spec seems to have changed recently. The current
> > spec appears to state that we should set the mtime and ctime (and
> > change attribute) on success in open(O_TRUNC), truncate() and
> > ftruncate(). In previous incarnations of the spec, truncate() would
> > only set the time if the size was changed:
> 
> Interesting.  But in this case historical Posix and thus Linux behavior
> still takes precedence and we're not suddently going to change behavior.

Makes sense as a general rule, but is it really likely that anyone
depends on ctime/mtime not changing on a non-size-changing truncate()?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields Jan. 24, 2017, 10:02 p.m. UTC | #10

On Mon, Jan 23, 2017 at 10:52:09AM -0500, Jeff Layton wrote:
> On Mon, 2017-01-23 at 16:36 +0100, Christoph Hellwig wrote:
> > On Mon, Jan 23, 2017 at 01:33:48PM +0100, Christoph Hellwig wrote:
> > > I'll need to look at the exact NFS semantics in that area, but after
> > > a bit of research I can probably come up with something that will work.
> > 
> > Here is my first attempt.  As vfs_truncate will add the ctime and mtime
> > updates when needed it just leaves handling that quirk to vfs_truncate
> > and then exits early if no other attributes are set.
> > 
> > Unfortunately at least the Linux client always seems to also request
> > a mtime update with a size update.  We could keep the
> > 
> > 	if (iap->ia_size != i_size_read(inode))
> > 
> > check from the old code and remove ATTR_MTIME, but these racy checks
> > outside i_rwsem make me feel a bit uneasy.  Jeff, Bruce - any opinion
> > if we should add something like this:
> > 
> 
> Ok, that's more complicated than it looked at first blush. :)
> 
> To be clear, the client is requesting to set the mtime to current server
> time and not to a specific mtime, right?
> 
> > 	/* vfs_truncate will update ctime and mtime if the size changes */
> > 	if (iap->ia_size != i_size_read(inode))
> > 		iap->ia_valid &= ATTR_MTIME;
> > 
> > back to nfsd_setattr?  This would avoid the additional setattr call,
> > but make me feel dirty :)
> > 
> 
> I agree that I wouldn't want to go with a potentially racy check.

Unless I'm misunderstanding: we've always had the race, and the
consequence is just an unnecessary update in the case the truncate
didn't actually change the size (although it looked like it would at the
time of the check).

I don't like that, but it's not going to keep me up at night.

--b.

> 
> I don't see where vfs_truncate will handle the times though. do_truncate
> will, but you have to pass in a non-zero time_attrs and vfs_truncate
> always sets that to 0.
> 
> If we did want to do this, it seems like it might be better to just add
> a new time_attrs arg to vfs_truncate that gets passed to do_truncate.
> Most callers would set it to zero, but nfsd could set it to:
> 
>     iap->ia_valid & (ATTR_MTIME|ATTR_CTIME)
> 
> Would that work?
> 
> > ---
> > From 0e06e2fc6157bb97692ed47c21e36120efb9f15c Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig <hch@lst.de>
> > Date: Sun, 22 Jan 2017 17:17:48 +0100
> > Subject: nfsd: special case truncates some more
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Both the NFS protocols and the Linux VFS use a setattr operation with a
> > bitmap of attributs to set to set various file attributes including the
> > file size and the uid/gid.
> > 
> > The Linux syscalls never mixe size updates with unrelated updates like
> > the uid/gid, and some file systems like XFS and GFS2 rely on the fact
> > that truncates might not update random other attributes, and many
> > other file systems handle the case but do not update the different
> > attributes in the same transaction.  NFSD on the other hand passes
> > the attributes it gets on the wire more or less directly through to
> > the VFS, leading to updates the file systems don't expect.  XFS at
> > least has an assert on the allowed attributes, which cought an NFS
> > client sets the size and group ІD at the same time.
> > 
> > To handles this issue properly this switches nfsd to call vfs_truncate
> > for size changes, and then handling all other attributes through
> > notify_change.  As a side effect this also means less boilerplace
> > code around the size change as we can now reuse the VFS code.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/nfsd/vfs.c | 92 +++++++++++++++++++----------------------------------------
> >  1 file changed, 30 insertions(+), 62 deletions(-)
> > 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 26c6fdb..4ca5b92 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -332,37 +332,6 @@ nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap)
> >  	}
> >  }
> >  
> > -static __be32
> > -nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > -		struct iattr *iap)
> > -{
> > -	struct inode *inode = d_inode(fhp->fh_dentry);
> > -	int host_err;
> > -
> > -	if (iap->ia_size < inode->i_size) {
> > -		__be32 err;
> > -
> > -		err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
> > -				NFSD_MAY_TRUNC | NFSD_MAY_OWNER_OVERRIDE);
> > -		if (err)
> > -			return err;
> > -	}
> > -
> > -	host_err = get_write_access(inode);
> > -	if (host_err)
> > -		goto out_nfserrno;
> > -
> > -	host_err = locks_verify_truncate(inode, NULL, iap->ia_size);
> > -	if (host_err)
> > -		goto out_put_write_access;
> > -	return 0;
> > -
> > -out_put_write_access:
> > -	put_write_access(inode);
> > -out_nfserrno:
> > -	return nfserrno(host_err);
> > -}
> > -
> >  /*
> >   * Set various file attributes.  After this call fhp needs an fh_put.
> >   */
> > @@ -377,7 +346,6 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  	__be32		err;
> >  	int		host_err;
> >  	bool		get_write_count;
> > -	int		size_change = 0;
> >  
> >  	if (iap->ia_valid & (ATTR_ATIME | ATTR_MTIME | ATTR_SIZE))
> >  		accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE;
> > @@ -390,11 +358,11 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  	/* Get inode */
> >  	err = fh_verify(rqstp, fhp, ftype, accmode);
> >  	if (err)
> > -		goto out;
> > +		return err;
> >  	if (get_write_count) {
> >  		host_err = fh_want_write(fhp);
> >  		if (host_err)
> > -			return nfserrno(host_err);
> > +			goto out_host_err;
> >  	}
> >  
> >  	dentry = fhp->fh_dentry;
> > @@ -405,50 +373,50 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  		iap->ia_valid &= ~ATTR_MODE;
> >  
> >  	if (!iap->ia_valid)
> > -		goto out;
> > +		return 0;
> >  
> >  	nfsd_sanitize_attrs(inode, iap);
> >  
> > +	if (check_guard && guardtime != inode->i_ctime.tv_sec)
> > +		return nfserr_notsync;
> > +
> >  	/*
> >  	 * The size case is special, it changes the file in addition to the
> > -	 * attributes.
> > +	 * attributes, and file systems don't expect it to be mixed with
> > +	 * "random" attribute changes.  We thus split out the size change
> > +	 * into a separate calo for vfs_truncate, and do the rest as a
> > +	 * a separate setattr call.
> > +	 *
> > +	 * Note that vfs_truncate will also update ctime and mtime if
> > +	 * the file size changes.
> >  	 */
> >  	if (iap->ia_valid & ATTR_SIZE) {
> > -		err = nfsd_get_write_access(rqstp, fhp, iap);
> > -		if (err)
> > -			goto out;
> > -		size_change = 1;
> > +		struct path path = {
> > +			.mnt	= fhp->fh_export->ex_path.mnt,
> > +			.dentry	= dentry,
> > +		};
> >  
> > -		/*
> > -		 * RFC5661, Section 18.30.4:
> > -		 *   Changing the size of a file with SETATTR indirectly
> > -		 *   changes the time_modify and change attributes.
> > -		 *
> > -		 * (and similar for the older RFCs)
> > -		 */
> > -		if (iap->ia_size != i_size_read(inode))
> > -			iap->ia_valid |= ATTR_MTIME;
> > +		host_err = vfs_truncate(&path, iap->ia_size);
> > +		if (host_err)
> > +			goto out_host_err;
> > +
> > +		iap->ia_valid &= ~ATTR_SIZE;
> > +		if (!iap->ia_valid)
> > +			goto done;
> >  	}
> >  
> >  	iap->ia_valid |= ATTR_CTIME;
> >  
> > -	if (check_guard && guardtime != inode->i_ctime.tv_sec) {
> > -		err = nfserr_notsync;
> > -		goto out_put_write_access;
> > -	}
> > -
> >  	fh_lock(fhp);
> >  	host_err = notify_change(dentry, iap, NULL);
> >  	fh_unlock(fhp);
> > -	err = nfserrno(host_err);
> > +	if (host_err)
> > +		goto out_host_err;
> >  
> > -out_put_write_access:
> > -	if (size_change)
> > -		put_write_access(inode);
> > -	if (!err)
> > -		err = nfserrno(commit_metadata(fhp));
> > -out:
> > -	return err;
> > +done:
> > +	host_err = commit_metadata(fhp);
> > +out_host_err:
> > +	return nfserrno(host_err);
> >  }
> >  
> >  #if defined(CONFIG_NFSD_V4)
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

nfsd: special case truncates some more

Commit Message

Comments

Patch