diff mbox

nfsd: Make creates return EEXIST correctly instead of EPERM

Message ID 1467942466-3081422-1-git-send-email-green@linuxhacker.ru (mailing list archive)
State New, archived
Headers show

Commit Message

Oleg Drokin July 8, 2016, 1:47 a.m. UTC
It looks like we are bit overzealous about failing mkdir/create/mknod
with permission denied if the parent dir is not writeable.
Need to make sure the name does not exist first, because we need to
return EEXIST in that case.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
A very similar problem exists with symlinks, but the patch is more
involved, so assuming this one is ok, I'll send a symlink one separately.
 fs/nfsd/nfs4proc.c |  6 +++++-
 fs/nfsd/vfs.c      | 11 ++++++++++-
 2 files changed, 15 insertions(+), 2 deletions(-)

Comments

Jeff Layton July 8, 2016, 11:02 a.m. UTC | #1
On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> It looks like we are bit overzealous about failing mkdir/create/mknod
> with permission denied if the parent dir is not writeable.
> Need to make sure the name does not exist first, because we need to
> return EEXIST in that case.
> 
> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> ---
> A very similar problem exists with symlinks, but the patch is more
> involved, so assuming this one is ok, I'll send a symlink one separately.
>  fs/nfsd/nfs4proc.c |  6 +++++-
>  fs/nfsd/vfs.c      | 11 ++++++++++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 


nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
always used is that EPERM is "permanent". IOW, changing permissions
won't ever allow the user to do something. For instance, unprivileged
users can never chown a file, so they should get back EPERM there. When
a directory isn't writeable on a create they should get EACCES since
they could do the create if the directory were writeable.


> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..0067520 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>  	fh_init(&resfh, NFS4_FHSIZE);
>  
> +	/*
> +	 * We just check thta parent is accessible here, nfsd_* do their
> +	 * own access permission checks
> +	 */
>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> -			   NFSD_MAY_CREATE);
> +			   NFSD_MAY_EXEC);
>  	if (status)
>  		return status;
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6fbd81e..6a45ec6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (isdotent(fname, flen))
>  		goto out;
>  
> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> +	/*
> +	 * Even though it is a create, first we see if we are even allowed
> +	 * to peek inside the parent
> +	 */
> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>  	if (err)
>  		goto out;
>  
> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		goto out; 
>  	}
>  
> +	/* Now let's see if we actually have permissions to create */
> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> +	if (err)
> +		goto out;
> +
>  	if (!(iap->ia_valid & ATTR_MODE))
>  		iap->ia_mode = 0;
>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;


Ouch. This means two nfsd_permission calls per create operation. If
it's necessary for correctness then so be it, but is it actually
documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
EACCES in this situation?
Oleg Drokin July 8, 2016, 3:14 p.m. UTC | #2
On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:

> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>> It looks like we are bit overzealous about failing mkdir/create/mknod
>> with permission denied if the parent dir is not writeable.
>> Need to make sure the name does not exist first, because we need to
>> return EEXIST in that case.
>> 
>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>> ---
>> A very similar problem exists with symlinks, but the patch is more
>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>  fs/nfsd/nfs4proc.c |  6 +++++-
>>  fs/nfsd/vfs.c      | 11 ++++++++++-
>>  2 files changed, 15 insertions(+), 2 deletions(-)
>> 
> 
> 
> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> always used is that EPERM is "permanent". IOW, changing permissions
> won't ever allow the user to do something. For instance, unprivileged
> users can never chown a file, so they should get back EPERM there. When
> a directory isn't writeable on a create they should get EACCES since
> they could do the create if the directory were writeable.

Hm, I see, thanks.
Confusing that you get "Permission denied" from perror ;)

>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index de1ff1d..0067520 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>  
>>  	fh_init(&resfh, NFS4_FHSIZE);
>>  
>> +	/*
>> +	 * We just check thta parent is accessible here, nfsd_* do their
>> +	 * own access permission checks
>> +	 */
>>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>> -			   NFSD_MAY_CREATE);
>> +			   NFSD_MAY_EXEC);
>>  	if (status)
>>  		return status;
>>  
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 6fbd81e..6a45ec6 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>  	if (isdotent(fname, flen))
>>  		goto out;
>>  
>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>> +	/*
>> +	 * Even though it is a create, first we see if we are even allowed
>> +	 * to peek inside the parent
>> +	 */
>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>  	if (err)
>>  		goto out;
>>  
>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>  		goto out; 
>>  	}
>>  
>> +	/* Now let's see if we actually have permissions to create */
>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>> +	if (err)
>> +		goto out;
>> +
>>  	if (!(iap->ia_valid & ATTR_MODE))
>>  		iap->ia_mode = 0;
>>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> 
> 
> Ouch. This means two nfsd_permission calls per create operation. If
> it's necessary for correctness then so be it, but is it actually
> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> EACCES in this situation?

Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
newer version is here:
http://pubs.opengroup.org/onlinepubs/9699919799/

They tell us that we absolutely must fail with EEXIST if the name is a symlink
(so we need to lookup it anyway), and also that EEXIST is the failure code
if the path exists.

Are double permission checks really as bad for nfs? it looked like it would
call mostly into VFS so even if first call would be expensive, second call should
be really cheap?

In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
i.e. mkdir in a directory where you cannot read and execute will give you
EEXIST where as that's when EACCES is desired I imagine.


> 
> -- 
> 
> Jeff Layton <jlayton@poochiereds.net>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton July 8, 2016, 3:53 p.m. UTC | #3
On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> 
> > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > with permission denied if the parent dir is not writeable.
> > > Need to make sure the name does not exist first, because we need to
> > > return EEXIST in that case.
> > > 
> > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> > > ---
> > > A very similar problem exists with symlinks, but the patch is more
> > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > > 
> > 
> > 
> > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > always used is that EPERM is "permanent". IOW, changing permissions
> > won't ever allow the user to do something. For instance, unprivileged
> > users can never chown a file, so they should get back EPERM there. When
> > a directory isn't writeable on a create they should get EACCES since
> > they could do the create if the directory were writeable.
> 
> Hm, I see, thanks.
> Confusing that you get "Permission denied" from perror ;)
> 

Yes indeed. It's a subtle and confusing distinction.

> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index de1ff1d..0067520 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  
> > >  	fh_init(&resfh, NFS4_FHSIZE);
> > >  
> > > +	/*
> > > +	 * We just check thta parent is accessible here, nfsd_* do their
> > > +	 * own access permission checks
> > > +	 */
> > >  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > -			   NFSD_MAY_CREATE);
> > > +			   NFSD_MAY_EXEC);
> > >  	if (status)
> > >  		return status;
> > >  
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 6fbd81e..6a45ec6 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  	if (isdotent(fname, flen))
> > >  		goto out;
> > >  
> > > -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > +	/*
> > > +	 * Even though it is a create, first we see if we are even allowed
> > > +	 * to peek inside the parent
> > > +	 */
> > > +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > >  	if (err)
> > >  		goto out;
> > >  
> > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  		goto out; 
> > >  	}
> > >  
> > > +	/* Now let's see if we actually have permissions to create */
> > > +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > +	if (err)
> > > +		goto out;
> > > +
> > >  	if (!(iap->ia_valid & ATTR_MODE))
> > >  		iap->ia_mode = 0;
> > >  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> > 
> > 
> > Ouch. This means two nfsd_permission calls per create operation. If
> > it's necessary for correctness then so be it, but is it actually
> > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > EACCES in this situation?
> 
> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> newer version is here:
> http://pubs.opengroup.org/onlinepubs/9699919799/
> 
> They tell us that we absolutely must fail with EEXIST if the name is a symlink
> (so we need to lookup it anyway), and also that EEXIST is the failure code
> if the path exists.
> 

I'm not sure that that verbiage supersedes the fact that you don't have
write permissions on the directory. Does it?

ISTM that it's perfectly valid to shortcut looking up the dentry if the
user doesn't have write permissions on the directory, even when the
target is a symlink.

IOW, I'm not sure I see a bug here.

> Are double permission checks really as bad for nfs? it looked like it would
> call mostly into VFS so even if first call would be expensive, second call should
> be really cheap?
> 

It depends on the underlying fs. In most cases, you're right, but you
can export things that overload the ->permission op, and those can be
as expensive as they like (within reason of course).


> In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
> nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
> i.e. mkdir in a directory where you cannot read and execute will give you
> EEXIST where as that's when EACCES is desired I imagine.
>
Oleg Drokin July 8, 2016, 3:59 p.m. UTC | #4
On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:

> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>> 
>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>> with permission denied if the parent dir is not writeable.
>>>> Need to make sure the name does not exist first, because we need to
>>>> return EEXIST in that case.
>>>> 
>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>> ---
>>>> A very similar problem exists with symlinks, but the patch is more
>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>  fs/nfsd/nfs4proc.c |  6 +++++-
>>>>  fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>> 
>>> 
>>> 
>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>> always used is that EPERM is "permanent". IOW, changing permissions
>>> won't ever allow the user to do something. For instance, unprivileged
>>> users can never chown a file, so they should get back EPERM there. When
>>> a directory isn't writeable on a create they should get EACCES since
>>> they could do the create if the directory were writeable.
>> 
>> Hm, I see, thanks.
>> Confusing that you get "Permission denied" from perror ;)
>> 
> 
> Yes indeed. It's a subtle and confusing distinction.
> 
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index de1ff1d..0067520 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>  
>>>>  	fh_init(&resfh, NFS4_FHSIZE);
>>>>  
>>>> +	/*
>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>> +	 * own access permission checks
>>>> +	 */
>>>>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>> -			   NFSD_MAY_CREATE);
>>>> +			   NFSD_MAY_EXEC);
>>>>  	if (status)
>>>>  		return status;
>>>>  
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 6fbd81e..6a45ec6 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>  	if (isdotent(fname, flen))
>>>>  		goto out;
>>>>  
>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>> +	/*
>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>> +	 * to peek inside the parent
>>>> +	 */
>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>  	if (err)
>>>>  		goto out;
>>>>  
>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>  		goto out; 
>>>>  	}
>>>>  
>>>> +	/* Now let's see if we actually have permissions to create */
>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>> +	if (err)
>>>> +		goto out;
>>>> +
>>>>  	if (!(iap->ia_valid & ATTR_MODE))
>>>>  		iap->ia_mode = 0;
>>>>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>> 
>>> 
>>> Ouch. This means two nfsd_permission calls per create operation. If
>>> it's necessary for correctness then so be it, but is it actually
>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>> EACCES in this situation?
>> 
>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>> newer version is here:
>> http://pubs.opengroup.org/onlinepubs/9699919799/
>> 
>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>> if the path exists.
>> 
> 
> I'm not sure that that verbiage supersedes the fact that you don't have
> write permissions on the directory. Does it?

"If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."

This sounds pretty straightforward to me, no?
Since it does not matter that we do not have write permissions here, because
the name already exists.

(also there are tons of applications that make this assumption when
badly reimplementing their mkdir -p thing, I imagine they also have this same
reading of the man page - this is why I even care about it).

> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> user doesn't have write permissions on the directory, even when the
> target is a symlink.
> 
> IOW, I'm not sure I see a bug here.
> 
>> Are double permission checks really as bad for nfs? it looked like it would
>> call mostly into VFS so even if first call would be expensive, second call should
>> be really cheap?
>> 
> 
> It depends on the underlying fs. In most cases, you're right, but you
> can export things that overload the ->permission op, and those can be
> as expensive as they like (within reason of course).
> 
> 
>> In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
>> nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
>> i.e. mkdir in a directory where you cannot read and execute will give you
>> EEXIST where as that's when EACCES is desired I imagine.
>> 
> 
> -- 
> 
> Jeff Layton <jlayton@poochiereds.net>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 8, 2016, 4:04 p.m. UTC | #5
On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> > On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> > 
> > > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > > with permission denied if the parent dir is not writeable.
> > > > Need to make sure the name does not exist first, because we need to
> > > > return EEXIST in that case.
> > > > 
> > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> > > > ---
> > > > A very similar problem exists with symlinks, but the patch is more
> > > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > > > 
> > > 
> > > 
> > > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > > always used is that EPERM is "permanent". IOW, changing permissions
> > > won't ever allow the user to do something. For instance, unprivileged
> > > users can never chown a file, so they should get back EPERM there. When
> > > a directory isn't writeable on a create they should get EACCES since
> > > they could do the create if the directory were writeable.
> > 
> > Hm, I see, thanks.
> > Confusing that you get "Permission denied" from perror ;)
> > 
> 
> Yes indeed. It's a subtle and confusing distinction.
> 
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index de1ff1d..0067520 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  
> > > >  	fh_init(&resfh, NFS4_FHSIZE);
> > > >  
> > > > +	/*
> > > > +	 * We just check thta parent is accessible here, nfsd_* do their
> > > > +	 * own access permission checks
> > > > +	 */
> > > >  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > > -			   NFSD_MAY_CREATE);
> > > > +			   NFSD_MAY_EXEC);
> > > >  	if (status)
> > > >  		return status;
> > > >  
> > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > index 6fbd81e..6a45ec6 100644
> > > > --- a/fs/nfsd/vfs.c
> > > > +++ b/fs/nfsd/vfs.c
> > > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >  	if (isdotent(fname, flen))
> > > >  		goto out;
> > > >  
> > > > -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > > +	/*
> > > > +	 * Even though it is a create, first we see if we are even allowed
> > > > +	 * to peek inside the parent
> > > > +	 */
> > > > +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > > >  	if (err)
> > > >  		goto out;
> > > >  
> > > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >  		goto out; 
> > > >  	}
> > > >  
> > > > +	/* Now let's see if we actually have permissions to create */
> > > > +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > >  	if (!(iap->ia_valid & ATTR_MODE))
> > > >  		iap->ia_mode = 0;
> > > >  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> > > 
> > > 
> > > Ouch. This means two nfsd_permission calls per create operation. If
> > > it's necessary for correctness then so be it, but is it actually
> > > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > > EACCES in this situation?
> > 
> > Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> > newer version is here:
> > http://pubs.opengroup.org/onlinepubs/9699919799/
> > 
> > They tell us that we absolutely must fail with EEXIST if the name is a symlink
> > (so we need to lookup it anyway), and also that EEXIST is the failure code
> > if the path exists.
> > 
> 
> I'm not sure that that verbiage supersedes the fact that you don't have
> write permissions on the directory. Does it?
> 
> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> user doesn't have write permissions on the directory, even when the
> target is a symlink.
> 
> IOW, I'm not sure I see a bug here.

If this is causing real programs to behave incorrectly, then that may
matter more than the letter of the spec.  But I'm a little curious why
we'd be hearing about that just now--did the client or server's behavior
change recently?

> > Are double permission checks really as bad for nfs? it looked like it would
> > call mostly into VFS so even if first call would be expensive, second call should
> > be really cheap?
> > 
> 
> It depends on the underlying fs. In most cases, you're right, but you
> can export things that overload the ->permission op, and those can be
> as expensive as they like (within reason of course).

Weird if the expense of a second permission call is significant compared
to following the mkdir and sync.  But, what do I know.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 8, 2016, 4:16 p.m. UTC | #6
On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>> 
>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>> with permission denied if the parent dir is not writeable.
>>>>> Need to make sure the name does not exist first, because we need to
>>>>> return EEXIST in that case.
>>>>> 
>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>>> ---
>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>  fs/nfsd/nfs4proc.c |  6 +++++-
>>>>>  fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>>> 
>>>> 
>>>> 
>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>> won't ever allow the user to do something. For instance, unprivileged
>>>> users can never chown a file, so they should get back EPERM there. When
>>>> a directory isn't writeable on a create they should get EACCES since
>>>> they could do the create if the directory were writeable.
>>> 
>>> Hm, I see, thanks.
>>> Confusing that you get "Permission denied" from perror ;)
>>> 
>> 
>> Yes indeed. It's a subtle and confusing distinction.
>> 
>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>> index de1ff1d..0067520 100644
>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>  
>>>>>  	fh_init(&resfh, NFS4_FHSIZE);
>>>>>  
>>>>> +	/*
>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>>> +	 * own access permission checks
>>>>> +	 */
>>>>>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>> -			   NFSD_MAY_CREATE);
>>>>> +			   NFSD_MAY_EXEC);
>>>>>  	if (status)
>>>>>  		return status;
>>>>>  
>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>> index 6fbd81e..6a45ec6 100644
>>>>> --- a/fs/nfsd/vfs.c
>>>>> +++ b/fs/nfsd/vfs.c
>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>  	if (isdotent(fname, flen))
>>>>>  		goto out;
>>>>>  
>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>> +	/*
>>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>>> +	 * to peek inside the parent
>>>>> +	 */
>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>  	if (err)
>>>>>  		goto out;
>>>>>  
>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>  		goto out; 
>>>>>  	}
>>>>>  
>>>>> +	/* Now let's see if we actually have permissions to create */
>>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>> +	if (err)
>>>>> +		goto out;
>>>>> +
>>>>>  	if (!(iap->ia_valid & ATTR_MODE))
>>>>>  		iap->ia_mode = 0;
>>>>>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>> 
>>>> 
>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>> it's necessary for correctness then so be it, but is it actually
>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>> EACCES in this situation?
>>> 
>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>> newer version is here:
>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>> 
>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>> if the path exists.
>>> 
>> 
>> I'm not sure that that verbiage supersedes the fact that you don't have
>> write permissions on the directory. Does it?
>> 
>> ISTM that it's perfectly valid to shortcut looking up the dentry if the
>> user doesn't have write permissions on the directory, even when the
>> target is a symlink.
>> 
>> IOW, I'm not sure I see a bug here.
> 
> If this is causing real programs to behave incorrectly, then that may
> matter more than the letter of the spec.  But I'm a little curious why
> we'd be hearing about that just now--did the client or server's behavior
> change recently?

We, on the Lustre side, have been hearing about this since 2010, (this optimization
was enabled in 2009).

I suspect some people just complain in places that not everybody monitors.
I tried 3.10 and it has the same problem here.
I just tried on RHEL6 (2.6.32) and the problem is also apparent there.

Also it's confusing how you get different errors depending on if the cache is hot or not:
[green@centos6-16 racer]$ mkdir test
mkdir: cannot create directory `test': Permission denied
[green@centos6-16 racer]$ ls -ld test
drwxr-xr-x 2 root root 4096 Jul  8 12:12 test
[green@centos6-16 racer]$ mkdir test
mkdir: cannot create directory `test': File exists


>>> Are double permission checks really as bad for nfs? it looked like it would
>>> call mostly into VFS so even if first call would be expensive, second call should
>>> be really cheap?
>>> 
>> 
>> It depends on the underlying fs. In most cases, you're right, but you
>> can export things that overload the ->permission op, and those can be
>> as expensive as they like (within reason of course).
> 
> Weird if the expense of a second permission call is significant compared
> to following the mkdir and sync.  But, what do I know.
> 
> --b.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton July 8, 2016, 4:17 p.m. UTC | #7
On Fri, 2016-07-08 at 11:59 -0400, Oleg Drokin wrote:
> On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:
> 
> > On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> > > On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> > > 
> > > > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > > > with permission denied if the parent dir is not writeable.
> > > > > Need to make sure the name does not exist first, because we need to
> > > > > return EEXIST in that case.
> > > > > 
> > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> > > > > ---
> > > > > A very similar problem exists with symlinks, but the patch is more
> > > > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > > > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > > > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > > > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > > > > 
> > > > 
> > > > 
> > > > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > > > always used is that EPERM is "permanent". IOW, changing permissions
> > > > won't ever allow the user to do something. For instance, unprivileged
> > > > users can never chown a file, so they should get back EPERM there. When
> > > > a directory isn't writeable on a create they should get EACCES since
> > > > they could do the create if the directory were writeable.
> > > 
> > > Hm, I see, thanks.
> > > Confusing that you get "Permission denied" from perror ;)
> > > 
> > 
> > Yes indeed. It's a subtle and confusing distinction.
> > 
> > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > index de1ff1d..0067520 100644
> > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >  
> > > > >  	fh_init(&resfh, NFS4_FHSIZE);
> > > > >  
> > > > > +	/*
> > > > > +	 * We just check thta parent is accessible here, nfsd_* do their
> > > > > +	 * own access permission checks
> > > > > +	 */
> > > > >  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > > > -			   NFSD_MAY_CREATE);
> > > > > +			   NFSD_MAY_EXEC);
> > > > >  	if (status)
> > > > >  		return status;
> > > > >  
> > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > > index 6fbd81e..6a45ec6 100644
> > > > > --- a/fs/nfsd/vfs.c
> > > > > +++ b/fs/nfsd/vfs.c
> > > > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > > >  	if (isdotent(fname, flen))
> > > > >  		goto out;
> > > > >  
> > > > > -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > > > +	/*
> > > > > +	 * Even though it is a create, first we see if we are even allowed
> > > > > +	 * to peek inside the parent
> > > > > +	 */
> > > > > +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > > > >  	if (err)
> > > > >  		goto out;
> > > > >  
> > > > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > > >  		goto out; 
> > > > >  	}
> > > > >  
> > > > > +	/* Now let's see if we actually have permissions to create */
> > > > > +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > > > +	if (err)
> > > > > +		goto out;
> > > > > +
> > > > >  	if (!(iap->ia_valid & ATTR_MODE))
> > > > >  		iap->ia_mode = 0;
> > > > >  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> > > > 
> > > > 
> > > > Ouch. This means two nfsd_permission calls per create operation. If
> > > > it's necessary for correctness then so be it, but is it actually
> > > > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > > > EACCES in this situation?
> > > 
> > > Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> > > newer version is here:
> > > http://pubs.opengroup.org/onlinepubs/9699919799/
> > > 
> > > They tell us that we absolutely must fail with EEXIST if the name is a symlink
> > > (so we need to lookup it anyway), and also that EEXIST is the failure code
> > > if the path exists.
> > > 
> > 
> > I'm not sure that that verbiage supersedes the fact that you don't have
> > write permissions on the directory. Does it?
> 
> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
> 
> This sounds pretty straightforward to me, no?
> Since it does not matter that we do not have write permissions here, because
> the name already exists.
> 
> (also there are tons of applications that make this assumption when
> badly reimplementing their mkdir -p thing, I imagine they also have this same
> reading of the man page - this is why I even care about it).
> 

I always have trouble with this sort of thing. Just because it's in
DESCRIPTION, does that make it supersede the part in ERRORS? IOW, I
think that's just telling you how to handle a symlink as the last
component, not that you have to do that before the permissions check.

Now that said, as a practical matter I do agree that EEXIST _is_
probably the more helpful error message. If there are applications that
rely on this then we probably should just take your patch.

Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 8, 2016, 4:28 p.m. UTC | #8
On Jul 8, 2016, at 12:17 PM, Jeff Layton wrote:

> On Fri, 2016-07-08 at 11:59 -0400, Oleg Drokin wrote:
>> On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:
>> 
>>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>>> 
>>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>> with permission denied if the parent dir is not writeable.
>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>> return EEXIST in that case.
>>>>>> 
>>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>>>> ---
>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>>  fs/nfsd/nfs4proc.c |  6 +++++-
>>>>>>  fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>> 
>>>>> 
>>>>> 
>>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>>> won't ever allow the user to do something. For instance, unprivileged
>>>>> users can never chown a file, so they should get back EPERM there. When
>>>>> a directory isn't writeable on a create they should get EACCES since
>>>>> they could do the create if the directory were writeable.
>>>> 
>>>> Hm, I see, thanks.
>>>> Confusing that you get "Permission denied" from perror ;)
>>>> 
>>> 
>>> Yes indeed. It's a subtle and confusing distinction.
>>> 
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index de1ff1d..0067520 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>  
>>>>>>  	fh_init(&resfh, NFS4_FHSIZE);
>>>>>>  
>>>>>> +	/*
>>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>>>> +	 * own access permission checks
>>>>>> +	 */
>>>>>>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>> -			   NFSD_MAY_CREATE);
>>>>>> +			   NFSD_MAY_EXEC);
>>>>>>  	if (status)
>>>>>>  		return status;
>>>>>>  
>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>> --- a/fs/nfsd/vfs.c
>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>  	if (isdotent(fname, flen))
>>>>>>  		goto out;
>>>>>>  
>>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>> +	/*
>>>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>>>> +	 * to peek inside the parent
>>>>>> +	 */
>>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>>  	if (err)
>>>>>>  		goto out;
>>>>>>  
>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>  		goto out; 
>>>>>>  	}
>>>>>>  
>>>>>> +	/* Now let's see if we actually have permissions to create */
>>>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>> +	if (err)
>>>>>> +		goto out;
>>>>>> +
>>>>>>  	if (!(iap->ia_valid & ATTR_MODE))
>>>>>>  		iap->ia_mode = 0;
>>>>>>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>> 
>>>>> 
>>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>>> it's necessary for correctness then so be it, but is it actually
>>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>>> EACCES in this situation?
>>>> 
>>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>>> newer version is here:
>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>> 
>>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>>> if the path exists.
>>>> 
>>> 
>>> I'm not sure that that verbiage supersedes the fact that you don't have
>>> write permissions on the directory. Does it?
>> 
>> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>> 
>> This sounds pretty straightforward to me, no?
>> Since it does not matter that we do not have write permissions here, because
>> the name already exists.
>> 
>> (also there are tons of applications that make this assumption when
>> badly reimplementing their mkdir -p thing, I imagine they also have this same
>> reading of the man page - this is why I even care about it).
>> 
> 
> I always have trouble with this sort of thing. Just because it's in
> DESCRIPTION, does that make it supersede the part in ERRORS? IOW, I
> think that's just telling you how to handle a symlink as the last
> component, not that you have to do that before the permissions check.

Personally I think of it like open(O_CREAT).
If the file exists, you just open it even if you have no permissions to
add stuff in the parent.

Same with mkdir/mknod - if the name already exists - you don't need any
write rights to create it - it's already there.

symlink was likely just special cased to highlight that it should not
just be followed (unlike what happens with open).

Also VFS even goes to some pains to prioritize returning EEXIST over
other errors like EROFS and such (something nfs (both server and client?)/lustre
does not, but I suspect it's a lot more fringe case?).

> Now that said, as a practical matter I do agree that EEXIST _is_
> probably the more helpful error message. If there are applications that
> rely on this then we probably should just take your patch.
> 
> Reviewed-by: Jeff Layton <jlayton@poochiereds.net>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 8, 2016, 8:49 p.m. UTC | #9
On Fri, Jul 08, 2016 at 12:16:14PM -0400, Oleg Drokin wrote:
> 
> On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:
> 
> > On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
> >> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> >>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> >>> 
> >>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> >>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>>> with permission denied if the parent dir is not writeable.
> >>>>> Need to make sure the name does not exist first, because we need to
> >>>>> return EEXIST in that case.
> >>>>> 
> >>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> >>>>> ---
> >>>>> A very similar problem exists with symlinks, but the patch is more
> >>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>>>  fs/nfsd/nfs4proc.c |  6 +++++-
> >>>>>  fs/nfsd/vfs.c      | 11 ++++++++++-
> >>>>>  2 files changed, 15 insertions(+), 2 deletions(-)
> >>>>> 
> >>>> 
> >>>> 
> >>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> >>>> always used is that EPERM is "permanent". IOW, changing permissions
> >>>> won't ever allow the user to do something. For instance, unprivileged
> >>>> users can never chown a file, so they should get back EPERM there. When
> >>>> a directory isn't writeable on a create they should get EACCES since
> >>>> they could do the create if the directory were writeable.
> >>> 
> >>> Hm, I see, thanks.
> >>> Confusing that you get "Permission denied" from perror ;)
> >>> 
> >> 
> >> Yes indeed. It's a subtle and confusing distinction.
> >> 
> >>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>>> index de1ff1d..0067520 100644
> >>>>> --- a/fs/nfsd/nfs4proc.c
> >>>>> +++ b/fs/nfsd/nfs4proc.c
> >>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>>>  
> >>>>>  	fh_init(&resfh, NFS4_FHSIZE);
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
> >>>>> +	 * own access permission checks
> >>>>> +	 */
> >>>>>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>>> -			   NFSD_MAY_CREATE);
> >>>>> +			   NFSD_MAY_EXEC);
> >>>>>  	if (status)
> >>>>>  		return status;
> >>>>>  
> >>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>>> index 6fbd81e..6a45ec6 100644
> >>>>> --- a/fs/nfsd/vfs.c
> >>>>> +++ b/fs/nfsd/vfs.c
> >>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>>  	if (isdotent(fname, flen))
> >>>>>  		goto out;
> >>>>>  
> >>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>>> +	/*
> >>>>> +	 * Even though it is a create, first we see if we are even allowed
> >>>>> +	 * to peek inside the parent
> >>>>> +	 */
> >>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>>>>  	if (err)
> >>>>>  		goto out;
> >>>>>  
> >>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>>  		goto out; 
> >>>>>  	}
> >>>>>  
> >>>>> +	/* Now let's see if we actually have permissions to create */
> >>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >>>>> +	if (err)
> >>>>> +		goto out;
> >>>>> +
> >>>>>  	if (!(iap->ia_valid & ATTR_MODE))
> >>>>>  		iap->ia_mode = 0;
> >>>>>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >>>> 
> >>>> 
> >>>> Ouch. This means two nfsd_permission calls per create operation. If
> >>>> it's necessary for correctness then so be it, but is it actually
> >>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> >>>> EACCES in this situation?
> >>> 
> >>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> >>> newer version is here:
> >>> http://pubs.opengroup.org/onlinepubs/9699919799/
> >>> 
> >>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
> >>> (so we need to lookup it anyway), and also that EEXIST is the failure code
> >>> if the path exists.
> >>> 
> >> 
> >> I'm not sure that that verbiage supersedes the fact that you don't have
> >> write permissions on the directory. Does it?
> >> 
> >> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> >> user doesn't have write permissions on the directory, even when the
> >> target is a symlink.
> >> 
> >> IOW, I'm not sure I see a bug here.
> > 
> > If this is causing real programs to behave incorrectly, then that may
> > matter more than the letter of the spec.  But I'm a little curious why
> > we'd be hearing about that just now--did the client or server's behavior
> > change recently?
> 
> We, on the Lustre side, have been hearing about this since 2010, (this optimization
> was enabled in 2009).
> 
> I suspect some people just complain in places that not everybody monitors.

Sure, but you said "tons of programs" do this, and off hand I can't
recall a single report.  That's weird.

Anyway, I agree that the behavior your want seems more consistent at
least.

--b.

> I tried 3.10 and it has the same problem here.
> I just tried on RHEL6 (2.6.32) and the problem is also apparent there.
> 
> Also it's confusing how you get different errors depending on if the cache is hot or not:
> [green@centos6-16 racer]$ mkdir test
> mkdir: cannot create directory `test': Permission denied
> [green@centos6-16 racer]$ ls -ld test
> drwxr-xr-x 2 root root 4096 Jul  8 12:12 test
> [green@centos6-16 racer]$ mkdir test
> mkdir: cannot create directory `test': File exists
> 
> 
> >>> Are double permission checks really as bad for nfs? it looked like it would
> >>> call mostly into VFS so even if first call would be expensive, second call should
> >>> be really cheap?
> >>> 
> >> 
> >> It depends on the underlying fs. In most cases, you're right, but you
> >> can export things that overload the ->permission op, and those can be
> >> as expensive as they like (within reason of course).
> > 
> > Weird if the expense of a second permission call is significant compared
> > to following the mkdir and sync.  But, what do I know.
> > 
> > --b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 8, 2016, 8:54 p.m. UTC | #10
On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> It looks like we are bit overzealous about failing mkdir/create/mknod
> with permission denied if the parent dir is not writeable.
> Need to make sure the name does not exist first, because we need to
> return EEXIST in that case.
> 
> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> ---
> A very similar problem exists with symlinks, but the patch is more
> involved, so assuming this one is ok, I'll send a symlink one separately.
>  fs/nfsd/nfs4proc.c |  6 +++++-
>  fs/nfsd/vfs.c      | 11 ++++++++++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..0067520 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>  	fh_init(&resfh, NFS4_FHSIZE);
>  
> +	/*
> +	 * We just check thta parent is accessible here, nfsd_* do their
> +	 * own access permission checks
> +	 */
>  	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> -			   NFSD_MAY_CREATE);
> +			   NFSD_MAY_EXEC);
>  	if (status)
>  		return status;
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6fbd81e..6a45ec6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (isdotent(fname, flen))
>  		goto out;
>  
> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> +	/*
> +	 * Even though it is a create, first we see if we are even allowed
> +	 * to peek inside the parent
> +	 */
> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);

Looks like in the v3 case we haven't actually locked the directory yet
at this point so this check is a little race-prone.

I wonder why the code's structured that way--it's confusing.

--b.

>  	if (err)
>  		goto out;
>  
> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		goto out; 
>  	}
>  
> +	/* Now let's see if we actually have permissions to create */
> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> +	if (err)
> +		goto out;
> +
>  	if (!(iap->ia_valid & ATTR_MODE))
>  		iap->ia_mode = 0;
>  	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> -- 
> 2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 8, 2016, 9:47 p.m. UTC | #11
On Jul 8, 2016, at 4:49 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 12:16:14PM -0400, Oleg Drokin wrote:
>> 
>> On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:
>> 
>>> On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
>>>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>>>> 
>>>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>>> with permission denied if the parent dir is not writeable.
>>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>>> return EEXIST in that case.
>>>>>>> 
>>>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>>>>> ---
>>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>>> fs/nfsd/nfs4proc.c |  6 +++++-
>>>>>>> fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>>>> won't ever allow the user to do something. For instance, unprivileged
>>>>>> users can never chown a file, so they should get back EPERM there. When
>>>>>> a directory isn't writeable on a create they should get EACCES since
>>>>>> they could do the create if the directory were writeable.
>>>>> 
>>>>> Hm, I see, thanks.
>>>>> Confusing that you get "Permission denied" from perror ;)
>>>>> 
>>>> 
>>>> Yes indeed. It's a subtle and confusing distinction.
>>>> 
>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>> index de1ff1d..0067520 100644
>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>> 
>>>>>>> 	fh_init(&resfh, NFS4_FHSIZE);
>>>>>>> 
>>>>>>> +	/*
>>>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>>>>> +	 * own access permission checks
>>>>>>> +	 */
>>>>>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>>> -			   NFSD_MAY_CREATE);
>>>>>>> +			   NFSD_MAY_EXEC);
>>>>>>> 	if (status)
>>>>>>> 		return status;
>>>>>>> 
>>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>>> --- a/fs/nfsd/vfs.c
>>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>> 	if (isdotent(fname, flen))
>>>>>>> 		goto out;
>>>>>>> 
>>>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>>> +	/*
>>>>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>>>>> +	 * to peek inside the parent
>>>>>>> +	 */
>>>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>>> 	if (err)
>>>>>>> 		goto out;
>>>>>>> 
>>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>> 		goto out; 
>>>>>>> 	}
>>>>>>> 
>>>>>>> +	/* Now let's see if we actually have permissions to create */
>>>>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>>> +	if (err)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> 	if (!(iap->ia_valid & ATTR_MODE))
>>>>>>> 		iap->ia_mode = 0;
>>>>>>> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>>> 
>>>>>> 
>>>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>>>> it's necessary for correctness then so be it, but is it actually
>>>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>>>> EACCES in this situation?
>>>>> 
>>>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>>>> newer version is here:
>>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>> 
>>>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>>>> if the path exists.
>>>>> 
>>>> 
>>>> I'm not sure that that verbiage supersedes the fact that you don't have
>>>> write permissions on the directory. Does it?
>>>> 
>>>> ISTM that it's perfectly valid to shortcut looking up the dentry if the
>>>> user doesn't have write permissions on the directory, even when the
>>>> target is a symlink.
>>>> 
>>>> IOW, I'm not sure I see a bug here.
>>> 
>>> If this is causing real programs to behave incorrectly, then that may
>>> matter more than the letter of the spec.  But I'm a little curious why
>>> we'd be hearing about that just now--did the client or server's behavior
>>> change recently?
>> 
>> We, on the Lustre side, have been hearing about this since 2010, (this optimization
>> was enabled in 2009).
>> 
>> I suspect some people just complain in places that not everybody monitors.
> 
> Sure, but you said "tons of programs" do this, and off hand I can't
> recall a single report.  That's weird.

I wonder if people just accept that "NFS is just weird" and code in workarounds,
where as with Lustre we promise (almost) full POSIX compliance, and also came much later
so people are just seeing that "this does not work" and complain more loudly?

From the two ready examples, the torque job scheduler and Matlab are the two examples
of programs we got bugreports from, but they are not the only ones for sure.
There are some "custom code references too.
So I guess "tons of" is a relative measure.

> Anyway, I agree that the behavior your want seems more consistent at
> least.
> 
> --b.
> 
>> I tried 3.10 and it has the same problem here.
>> I just tried on RHEL6 (2.6.32) and the problem is also apparent there.
>> 
>> Also it's confusing how you get different errors depending on if the cache is hot or not:
>> [green@centos6-16 racer]$ mkdir test
>> mkdir: cannot create directory `test': Permission denied
>> [green@centos6-16 racer]$ ls -ld test
>> drwxr-xr-x 2 root root 4096 Jul  8 12:12 test
>> [green@centos6-16 racer]$ mkdir test
>> mkdir: cannot create directory `test': File exists
>> 
>> 
>>>>> Are double permission checks really as bad for nfs? it looked like it would
>>>>> call mostly into VFS so even if first call would be expensive, second call should
>>>>> be really cheap?
>>>>> 
>>>> 
>>>> It depends on the underlying fs. In most cases, you're right, but you
>>>> can export things that overload the ->permission op, and those can be
>>>> as expensive as they like (within reason of course).
>>> 
>>> Weird if the expense of a second permission call is significant compared
>>> to following the mkdir and sync.  But, what do I know.
>>> 
>>> --b.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 8, 2016, 9:53 p.m. UTC | #12
On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:

> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>> It looks like we are bit overzealous about failing mkdir/create/mknod
>> with permission denied if the parent dir is not writeable.
>> Need to make sure the name does not exist first, because we need to
>> return EEXIST in that case.
>> 
>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>> ---
>> A very similar problem exists with symlinks, but the patch is more
>> involved, so assuming this one is ok, I'll send a symlink one separately.
>> fs/nfsd/nfs4proc.c |  6 +++++-
>> fs/nfsd/vfs.c      | 11 ++++++++++-
>> 2 files changed, 15 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index de1ff1d..0067520 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> 
>> 	fh_init(&resfh, NFS4_FHSIZE);
>> 
>> +	/*
>> +	 * We just check thta parent is accessible here, nfsd_* do their
>> +	 * own access permission checks
>> +	 */
>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>> -			   NFSD_MAY_CREATE);
>> +			   NFSD_MAY_EXEC);
>> 	if (status)
>> 		return status;
>> 
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 6fbd81e..6a45ec6 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 	if (isdotent(fname, flen))
>> 		goto out;
>> 
>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>> +	/*
>> +	 * Even though it is a create, first we see if we are even allowed
>> +	 * to peek inside the parent
>> +	 */
>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> 
> Looks like in the v3 case we haven't actually locked the directory yet
> at this point so this check is a little race-prone.

In reality this check is not really needed, I suspect.
When we call vfs_create/mknod/mkdir later on, it has it's own permission check
anyway so if there was a race and somebody changed dir access in the middle,
there's going to be another check anyway and it would be caught.
Unless there's some weird server-side permission wiggling as well that makes it
ineffective, but I imagine that one cannot really change in a racy way?

> I wonder why the code's structured that way--it's confusing.

Probably years of accumulated "damage" ;)

> --b.
> 
>> 	if (err)
>> 		goto out;
>> 
>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 		goto out; 
>> 	}
>> 
>> +	/* Now let's see if we actually have permissions to create */
>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>> +	if (err)
>> +		goto out;
>> +
>> 	if (!(iap->ia_valid & ATTR_MODE))
>> 		iap->ia_mode = 0;
>> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>> -- 
>> 2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro July 9, 2016, 2:52 a.m. UTC | #13
On Fri, Jul 08, 2016 at 11:59:50AM -0400, Oleg Drokin wrote:

> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
> 
> This sounds pretty straightforward to me, no?
> Since it does not matter that we do not have write permissions here, because
> the name already exists.

When more than one condition applies, we have every right to return any of
them.  POSIX does *NOT* specify the order of checks.  Never had.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 9, 2016, 2:58 a.m. UTC | #14
On Jul 8, 2016, at 10:52 PM, Al Viro wrote:

> On Fri, Jul 08, 2016 at 11:59:50AM -0400, Oleg Drokin wrote:
> 
>> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>> 
>> This sounds pretty straightforward to me, no?
>> Since it does not matter that we do not have write permissions here, because
>> the name already exists.
> 
> When more than one condition applies, we have every right to return any of
> them.  POSIX does *NOT* specify the order of checks.  Never had.

Out of curiosity, why does filename_create() delay EROFS then?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro July 9, 2016, 3:10 a.m. UTC | #15
On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:

> I wonder if people just accept that "NFS is just weird" and code in workarounds,
> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
> so people are just seeing that "this does not work" and complain more loudly?

To quote POSIX: "If more than one error occurs in processing a function call,
any one of the possible errors may be returned, as the order of detection is
undefined." (from System Interfaces: General Information: 2.3 Error Numbers)

And regarding mkdir(2) it has
[EACCES]
    Search permission is denied on a component of the path prefix, or write
permission is denied on the parent directory of the directory to be created.
[EEXIST]
    The named file exists.
among the error conditions.  In situations when both apply, the implementation
is bloody well allowed to return either.  It might be nicer to return EEXIST
in such cases, for consistency sake (if another thread does stat() on the
pathname in question just as you are about to call mkdir(2), you will get
EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
please do not bring POSIX compliance as an argument.  It's a QoI argument and
nothing beyond that.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro July 9, 2016, 3:13 a.m. UTC | #16
On Fri, Jul 08, 2016 at 10:58:38PM -0400, Oleg Drokin wrote:

> > When more than one condition applies, we have every right to return any of
> > them.  POSIX does *NOT* specify the order of checks.  Never had.
> 
> Out of curiosity, why does filename_create() delay EROFS then?

QoI and historical behaviour...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 9, 2016, 3:41 a.m. UTC | #17
On Jul 8, 2016, at 11:10 PM, Al Viro wrote:

> On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:
> 
>> I wonder if people just accept that "NFS is just weird" and code in workarounds,
>> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
>> so people are just seeing that "this does not work" and complain more loudly?
> 
> To quote POSIX: "If more than one error occurs in processing a function call,
> any one of the possible errors may be returned, as the order of detection is
> undefined." (from System Interfaces: General Information: 2.3 Error Numbers)
> 
> And regarding mkdir(2) it has
> [EACCES]
>    Search permission is denied on a component of the path prefix, or write
> permission is denied on the parent directory of the directory to be created.
> [EEXIST]
>    The named file exists.
> among the error conditions.  In situations when both apply, the implementation
> is bloody well allowed to return either.  It might be nicer to return EEXIST
> in such cases, for consistency sake (if another thread does stat() on the
> pathname in question just as you are about to call mkdir(2), you will get
> EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
> please do not bring POSIX compliance as an argument.  It's a QoI argument and
> nothing beyond that.

Ok, I see.
Thanks.

Bruce, do you want the patch resubmitted with a rewritten commit message,
or do you think it's best to just drop it them?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 13, 2016, 7 p.m. UTC | #18
On Fri, Jul 08, 2016 at 11:41:41PM -0400, Oleg Drokin wrote:
> 
> On Jul 8, 2016, at 11:10 PM, Al Viro wrote:
> 
> > On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:
> > 
> >> I wonder if people just accept that "NFS is just weird" and code in workarounds,
> >> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
> >> so people are just seeing that "this does not work" and complain more loudly?
> > 
> > To quote POSIX: "If more than one error occurs in processing a function call,
> > any one of the possible errors may be returned, as the order of detection is
> > undefined." (from System Interfaces: General Information: 2.3 Error Numbers)
> > 
> > And regarding mkdir(2) it has
> > [EACCES]
> >    Search permission is denied on a component of the path prefix, or write
> > permission is denied on the parent directory of the directory to be created.
> > [EEXIST]
> >    The named file exists.
> > among the error conditions.  In situations when both apply, the implementation
> > is bloody well allowed to return either.  It might be nicer to return EEXIST
> > in such cases, for consistency sake (if another thread does stat() on the
> > pathname in question just as you are about to call mkdir(2), you will get
> > EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
> > please do not bring POSIX compliance as an argument.  It's a QoI argument and
> > nothing beyond that.
> 
> Ok, I see.
> Thanks.
> 
> Bruce, do you want the patch resubmitted with a rewritten commit message,
> or do you think it's best to just drop it them?

Other things being equal I still agree with you that there'd be
advantages to being more consistent, so a changelog update would be
fine.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 21, 2016, 8:34 p.m. UTC | #19
On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
> 
> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
> 
> > On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >> It looks like we are bit overzealous about failing mkdir/create/mknod
> >> with permission denied if the parent dir is not writeable.
> >> Need to make sure the name does not exist first, because we need to
> >> return EEXIST in that case.
> >> 
> >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> >> ---
> >> A very similar problem exists with symlinks, but the patch is more
> >> involved, so assuming this one is ok, I'll send a symlink one separately.
> >> fs/nfsd/nfs4proc.c |  6 +++++-
> >> fs/nfsd/vfs.c      | 11 ++++++++++-
> >> 2 files changed, 15 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index de1ff1d..0067520 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> 
> >> 	fh_init(&resfh, NFS4_FHSIZE);
> >> 
> >> +	/*
> >> +	 * We just check thta parent is accessible here, nfsd_* do their
> >> +	 * own access permission checks
> >> +	 */
> >> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >> -			   NFSD_MAY_CREATE);
> >> +			   NFSD_MAY_EXEC);
> >> 	if (status)
> >> 		return status;
> >> 
> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >> index 6fbd81e..6a45ec6 100644
> >> --- a/fs/nfsd/vfs.c
> >> +++ b/fs/nfsd/vfs.c
> >> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >> 	if (isdotent(fname, flen))
> >> 		goto out;
> >> 
> >> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >> +	/*
> >> +	 * Even though it is a create, first we see if we are even allowed
> >> +	 * to peek inside the parent
> >> +	 */
> >> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > 
> > Looks like in the v3 case we haven't actually locked the directory yet
> > at this point so this check is a little race-prone.
> 
> In reality this check is not really needed, I suspect.
> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> anyway so if there was a race and somebody changed dir access in the middle,
> there's going to be another check anyway and it would be caught.
> Unless there's some weird server-side permission wiggling as well that makes it
> ineffective, but I imagine that one cannot really change in a racy way?

Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
We still need the fh_verify there since it's also what does the
filehandle->dentry translation, but we don't need permission checking
here yet.

Applying with that one change.  (And I'll followup with some additional
minor cleanup of the create code.)

--b.

> 
> > I wonder why the code's structured that way--it's confusing.
> 
> Probably years of accumulated "damage" ;)
> 
> > --b.
> > 
> >> 	if (err)
> >> 		goto out;
> >> 
> >> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >> 		goto out; 
> >> 	}
> >> 
> >> +	/* Now let's see if we actually have permissions to create */
> >> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >> +	if (err)
> >> +		goto out;
> >> +
> >> 	if (!(iap->ia_valid & ATTR_MODE))
> >> 		iap->ia_mode = 0;
> >> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >> -- 
> >> 2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 21, 2016, 8:37 p.m. UTC | #20
On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>> 
>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>> 
>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>> with permission denied if the parent dir is not writeable.
>>>> Need to make sure the name does not exist first, because we need to
>>>> return EEXIST in that case.
>>>> 
>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>> ---
>>>> A very similar problem exists with symlinks, but the patch is more
>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>> fs/nfsd/nfs4proc.c |  6 +++++-
>>>> fs/nfsd/vfs.c      | 11 ++++++++++-
>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index de1ff1d..0067520 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>> 
>>>> 	fh_init(&resfh, NFS4_FHSIZE);
>>>> 
>>>> +	/*
>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>> +	 * own access permission checks
>>>> +	 */
>>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>> -			   NFSD_MAY_CREATE);
>>>> +			   NFSD_MAY_EXEC);
>>>> 	if (status)
>>>> 		return status;
>>>> 
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 6fbd81e..6a45ec6 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> 	if (isdotent(fname, flen))
>>>> 		goto out;
>>>> 
>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>> +	/*
>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>> +	 * to peek inside the parent
>>>> +	 */
>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>> 
>>> Looks like in the v3 case we haven't actually locked the directory yet
>>> at this point so this check is a little race-prone.
>> 
>> In reality this check is not really needed, I suspect.
>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>> anyway so if there was a race and somebody changed dir access in the middle,
>> there's going to be another check anyway and it would be caught.
>> Unless there's some weird server-side permission wiggling as well that makes it
>> ineffective, but I imagine that one cannot really change in a racy way?
> 
> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> We still need the fh_verify there since it's also what does the
> filehandle->dentry translation, but we don't need permission checking
> here yet.

This will likely need an extra test to ensure that when you
do mkdir where you do not have exec permissions, you would get EACCES instead
of EEXIST, otherwise that would be information leakage, no?
Or do you think the second time we do nfsd_permission, that would be covered?

> Applying with that one change.  (And I'll followup with some additional
> minor cleanup of the create code.)
> 
> --b.
> 
>> 
>>> I wonder why the code's structured that way--it's confusing.
>> 
>> Probably years of accumulated "damage" ;)
>> 
>>> --b.
>>> 
>>>> 	if (err)
>>>> 		goto out;
>>>> 
>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> 		goto out; 
>>>> 	}
>>>> 
>>>> +	/* Now let's see if we actually have permissions to create */
>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>> +	if (err)
>>>> +		goto out;
>>>> +
>>>> 	if (!(iap->ia_valid & ATTR_MODE))
>>>> 		iap->ia_mode = 0;
>>>> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>> -- 
>>>> 2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 22, 2016, 1:57 a.m. UTC | #21
On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
> 
> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
> 
> > On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
> >> 
> >> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
> >> 
> >>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>> with permission denied if the parent dir is not writeable.
> >>>> Need to make sure the name does not exist first, because we need to
> >>>> return EEXIST in that case.
> >>>> 
> >>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> >>>> ---
> >>>> A very similar problem exists with symlinks, but the patch is more
> >>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>> fs/nfsd/nfs4proc.c |  6 +++++-
> >>>> fs/nfsd/vfs.c      | 11 ++++++++++-
> >>>> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>>> 
> >>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>> index de1ff1d..0067520 100644
> >>>> --- a/fs/nfsd/nfs4proc.c
> >>>> +++ b/fs/nfsd/nfs4proc.c
> >>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>> 
> >>>> 	fh_init(&resfh, NFS4_FHSIZE);
> >>>> 
> >>>> +	/*
> >>>> +	 * We just check thta parent is accessible here, nfsd_* do their
> >>>> +	 * own access permission checks
> >>>> +	 */
> >>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>> -			   NFSD_MAY_CREATE);
> >>>> +			   NFSD_MAY_EXEC);
> >>>> 	if (status)
> >>>> 		return status;
> >>>> 
> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>> index 6fbd81e..6a45ec6 100644
> >>>> --- a/fs/nfsd/vfs.c
> >>>> +++ b/fs/nfsd/vfs.c
> >>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>> 	if (isdotent(fname, flen))
> >>>> 		goto out;
> >>>> 
> >>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>> +	/*
> >>>> +	 * Even though it is a create, first we see if we are even allowed
> >>>> +	 * to peek inside the parent
> >>>> +	 */
> >>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>> 
> >>> Looks like in the v3 case we haven't actually locked the directory yet
> >>> at this point so this check is a little race-prone.
> >> 
> >> In reality this check is not really needed, I suspect.
> >> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> >> anyway so if there was a race and somebody changed dir access in the middle,
> >> there's going to be another check anyway and it would be caught.
> >> Unless there's some weird server-side permission wiggling as well that makes it
> >> ineffective, but I imagine that one cannot really change in a racy way?
> > 
> > Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> > We still need the fh_verify there since it's also what does the
> > filehandle->dentry translation, but we don't need permission checking
> > here yet.
> 
> This will likely need an extra test to ensure that when you
> do mkdir where you do not have exec permissions, you would get EACCES instead
> of EEXIST, otherwise that would be information leakage, no?
> Or do you think the second time we do nfsd_permission, that would be covered?

No, you're right, for some reason I thought that the check for a
positive inode didn't happen till later.  But actually the logic is
basically:

	lock inode
	lookup_one_len
	return nfserr_exist if looked up dentry is positive.
	check for create permission
	vfs_create

So, yes, the initial MAY_EXEC test's needed to prevent that information
leak.

That said... I wonder why it's done that way?  Seems to me we could just
tremove that nfserr_exist check and the vfs would handle it for us....
I'll try that.

--b.

> 
> > Applying with that one change.  (And I'll followup with some additional
> > minor cleanup of the create code.)
> > 
> > --b.
> > 
> >> 
> >>> I wonder why the code's structured that way--it's confusing.
> >> 
> >> Probably years of accumulated "damage" ;)
> >> 
> >>> --b.
> >>> 
> >>>> 	if (err)
> >>>> 		goto out;
> >>>> 
> >>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>> 		goto out; 
> >>>> 	}
> >>>> 
> >>>> +	/* Now let's see if we actually have permissions to create */
> >>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >>>> +	if (err)
> >>>> +		goto out;
> >>>> +
> >>>> 	if (!(iap->ia_valid & ATTR_MODE))
> >>>> 		iap->ia_mode = 0;
> >>>> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >>>> -- 
> >>>> 2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 22, 2016, 6:35 a.m. UTC | #22
On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:

> On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
>> 
>> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
>> 
>>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>>>> 
>>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>>>> 
>>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>> with permission denied if the parent dir is not writeable.
>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>> return EEXIST in that case.
>>>>>> 
>>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>>>> ---
>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>> fs/nfsd/nfs4proc.c |  6 +++++-
>>>>>> fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index de1ff1d..0067520 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>> 
>>>>>> 	fh_init(&resfh, NFS4_FHSIZE);
>>>>>> 
>>>>>> +	/*
>>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>>>> +	 * own access permission checks
>>>>>> +	 */
>>>>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>> -			   NFSD_MAY_CREATE);
>>>>>> +			   NFSD_MAY_EXEC);
>>>>>> 	if (status)
>>>>>> 		return status;
>>>>>> 
>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>> --- a/fs/nfsd/vfs.c
>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> 	if (isdotent(fname, flen))
>>>>>> 		goto out;
>>>>>> 
>>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>> +	/*
>>>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>>>> +	 * to peek inside the parent
>>>>>> +	 */
>>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>> 
>>>>> Looks like in the v3 case we haven't actually locked the directory yet
>>>>> at this point so this check is a little race-prone.
>>>> 
>>>> In reality this check is not really needed, I suspect.
>>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>>>> anyway so if there was a race and somebody changed dir access in the middle,
>>>> there's going to be another check anyway and it would be caught.
>>>> Unless there's some weird server-side permission wiggling as well that makes it
>>>> ineffective, but I imagine that one cannot really change in a racy way?
>>> 
>>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
>>> We still need the fh_verify there since it's also what does the
>>> filehandle->dentry translation, but we don't need permission checking
>>> here yet.
>> 
>> This will likely need an extra test to ensure that when you
>> do mkdir where you do not have exec permissions, you would get EACCES instead
>> of EEXIST, otherwise that would be information leakage, no?
>> Or do you think the second time we do nfsd_permission, that would be covered?
> 
> No, you're right, for some reason I thought that the check for a
> positive inode didn't happen till later.  But actually the logic is
> basically:
> 
> 	lock inode
> 	lookup_one_len
> 	return nfserr_exist if looked up dentry is positive.
> 	check for create permission
> 	vfs_create
> 
> So, yes, the initial MAY_EXEC test's needed to prevent that information
> leak.
> 
> That said... I wonder why it's done that way?  Seems to me we could just
> tremove that nfserr_exist check and the vfs would handle it for us....
> I'll try that.

It won't work because the very first thing vfs_create does is may_create(),
and so you get EACCES right there instead of the EEXIST.

> 
> --b.
> 
>> 
>>> Applying with that one change.  (And I'll followup with some additional
>>> minor cleanup of the create code.)
>>> 
>>> --b.
>>> 
>>>> 
>>>>> I wonder why the code's structured that way--it's confusing.
>>>> 
>>>> Probably years of accumulated "damage" ;)
>>>> 
>>>>> --b.
>>>>> 
>>>>>> 	if (err)
>>>>>> 		goto out;
>>>>>> 
>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> 		goto out; 
>>>>>> 	}
>>>>>> 
>>>>>> +	/* Now let's see if we actually have permissions to create */
>>>>>> +	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>> +	if (err)
>>>>>> +		goto out;
>>>>>> +
>>>>>> 	if (!(iap->ia_valid & ATTR_MODE))
>>>>>> 		iap->ia_mode = 0;
>>>>>> 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>>> -- 
>>>>>> 2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields July 22, 2016, 10:55 a.m. UTC | #23
On Fri, Jul 22, 2016 at 02:35:26AM -0400, Oleg Drokin wrote:
> 
> On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:
> 
> > On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
> >> 
> >> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
> >> 
> >>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
> >>>> 
> >>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
> >>>> 
> >>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>>>> with permission denied if the parent dir is not writeable.
> >>>>>> Need to make sure the name does not exist first, because we need to
> >>>>>> return EEXIST in that case.
> >>>>>> 
> >>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> >>>>>> ---
> >>>>>> A very similar problem exists with symlinks, but the patch is more
> >>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>>>> fs/nfsd/nfs4proc.c |  6 +++++-
> >>>>>> fs/nfsd/vfs.c      | 11 ++++++++++-
> >>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>>>> index de1ff1d..0067520 100644
> >>>>>> --- a/fs/nfsd/nfs4proc.c
> >>>>>> +++ b/fs/nfsd/nfs4proc.c
> >>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>>>> 
> >>>>>> 	fh_init(&resfh, NFS4_FHSIZE);
> >>>>>> 
> >>>>>> +	/*
> >>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
> >>>>>> +	 * own access permission checks
> >>>>>> +	 */
> >>>>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>>>> -			   NFSD_MAY_CREATE);
> >>>>>> +			   NFSD_MAY_EXEC);
> >>>>>> 	if (status)
> >>>>>> 		return status;
> >>>>>> 
> >>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>>>> index 6fbd81e..6a45ec6 100644
> >>>>>> --- a/fs/nfsd/vfs.c
> >>>>>> +++ b/fs/nfsd/vfs.c
> >>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>>> 	if (isdotent(fname, flen))
> >>>>>> 		goto out;
> >>>>>> 
> >>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>>>> +	/*
> >>>>>> +	 * Even though it is a create, first we see if we are even allowed
> >>>>>> +	 * to peek inside the parent
> >>>>>> +	 */
> >>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>>>> 
> >>>>> Looks like in the v3 case we haven't actually locked the directory yet
> >>>>> at this point so this check is a little race-prone.
> >>>> 
> >>>> In reality this check is not really needed, I suspect.
> >>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> >>>> anyway so if there was a race and somebody changed dir access in the middle,
> >>>> there's going to be another check anyway and it would be caught.
> >>>> Unless there's some weird server-side permission wiggling as well that makes it
> >>>> ineffective, but I imagine that one cannot really change in a racy way?
> >>> 
> >>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> >>> We still need the fh_verify there since it's also what does the
> >>> filehandle->dentry translation, but we don't need permission checking
> >>> here yet.
> >> 
> >> This will likely need an extra test to ensure that when you
> >> do mkdir where you do not have exec permissions, you would get EACCES instead
> >> of EEXIST, otherwise that would be information leakage, no?
> >> Or do you think the second time we do nfsd_permission, that would be covered?
> > 
> > No, you're right, for some reason I thought that the check for a
> > positive inode didn't happen till later.  But actually the logic is
> > basically:
> > 
> > 	lock inode
> > 	lookup_one_len
> > 	return nfserr_exist if looked up dentry is positive.
> > 	check for create permission
> > 	vfs_create
> > 
> > So, yes, the initial MAY_EXEC test's needed to prevent that information
> > leak.
> > 
> > That said... I wonder why it's done that way?  Seems to me we could just
> > tremove that nfserr_exist check and the vfs would handle it for us....
> > I'll try that.
> 
> It won't work because the very first thing vfs_create does is may_create(),
> and so you get EACCES right there instead of the EEXIST.

static inline int may_create(struct inode *dir, struct dentry *child)
{
        audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
        if (child->d_inode)
                return -EEXIST;
	...

So it looks OK to me.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 22, 2016, 3:13 p.m. UTC | #24
On Jul 22, 2016, at 6:55 AM, J. Bruce Fields wrote:

> On Fri, Jul 22, 2016 at 02:35:26AM -0400, Oleg Drokin wrote:
>> 
>> On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:
>> 
>>> On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
>>>> 
>>>> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
>>>> 
>>>>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>>>>>> 
>>>>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>>>>>> 
>>>>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>>>> with permission denied if the parent dir is not writeable.
>>>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>>>> return EEXIST in that case.
>>>>>>>> 
>>>>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
>>>>>>>> ---
>>>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>>>> fs/nfsd/nfs4proc.c |  6 +++++-
>>>>>>>> fs/nfsd/vfs.c      | 11 ++++++++++-
>>>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>> index de1ff1d..0067520 100644
>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>> 
>>>>>>>> 	fh_init(&resfh, NFS4_FHSIZE);
>>>>>>>> 
>>>>>>>> +	/*
>>>>>>>> +	 * We just check thta parent is accessible here, nfsd_* do their
>>>>>>>> +	 * own access permission checks
>>>>>>>> +	 */
>>>>>>>> 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>>>> -			   NFSD_MAY_CREATE);
>>>>>>>> +			   NFSD_MAY_EXEC);
>>>>>>>> 	if (status)
>>>>>>>> 		return status;
>>>>>>>> 
>>>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>>>> --- a/fs/nfsd/vfs.c
>>>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>>> 	if (isdotent(fname, flen))
>>>>>>>> 		goto out;
>>>>>>>> 
>>>>>>>> -	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>>>> +	/*
>>>>>>>> +	 * Even though it is a create, first we see if we are even allowed
>>>>>>>> +	 * to peek inside the parent
>>>>>>>> +	 */
>>>>>>>> +	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>>> 
>>>>>>> Looks like in the v3 case we haven't actually locked the directory yet
>>>>>>> at this point so this check is a little race-prone.
>>>>>> 
>>>>>> In reality this check is not really needed, I suspect.
>>>>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>>>>>> anyway so if there was a race and somebody changed dir access in the middle,
>>>>>> there's going to be another check anyway and it would be caught.
>>>>>> Unless there's some weird server-side permission wiggling as well that makes it
>>>>>> ineffective, but I imagine that one cannot really change in a racy way?
>>>>> 
>>>>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
>>>>> We still need the fh_verify there since it's also what does the
>>>>> filehandle->dentry translation, but we don't need permission checking
>>>>> here yet.
>>>> 
>>>> This will likely need an extra test to ensure that when you
>>>> do mkdir where you do not have exec permissions, you would get EACCES instead
>>>> of EEXIST, otherwise that would be information leakage, no?
>>>> Or do you think the second time we do nfsd_permission, that would be covered?
>>> 
>>> No, you're right, for some reason I thought that the check for a
>>> positive inode didn't happen till later.  But actually the logic is
>>> basically:
>>> 
>>> 	lock inode
>>> 	lookup_one_len
>>> 	return nfserr_exist if looked up dentry is positive.
>>> 	check for create permission
>>> 	vfs_create
>>> 
>>> So, yes, the initial MAY_EXEC test's needed to prevent that information
>>> leak.
>>> 
>>> That said... I wonder why it's done that way?  Seems to me we could just
>>> tremove that nfserr_exist check and the vfs would handle it for us....
>>> I'll try that.
>> 
>> It won't work because the very first thing vfs_create does is may_create(),
>> and so you get EACCES right there instead of the EEXIST.
> 
> static inline int may_create(struct inode *dir, struct dentry *child)
> {
>        audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
>        if (child->d_inode)
>                return -EEXIST;
> 	...
> 
> So it looks OK to me.

Hm, in fact indeed. I was just too worked up about the client side, but on the
server side there was a real lookup already, so it does look workable.

> 
> --b.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields July 22, 2016, 5:48 p.m. UTC | #25
From: "J. Bruce Fields" <bfields@redhat.com>

On Fri, Jul 22, 2016 at 11:13:20AM -0400, Oleg Drokin wrote:
> Hm, in fact indeed. I was just too worked up about the client side,
> but on the server side there was a real lookup already, so it does
> look workable.

So I end up with the following.  This is all (after your patch) pretty
trivial cleanup, but I think it's overdue for that code.

--b.


J. Bruce Fields (6):
  nfsd: remove redundant zero-length check from create
  nfsd: remove redundant i_lookup check
  nfsd: reorganize nfsd_create
  nfsd: remove unnecessary positive-dentry check
  nfsd: clean up bad-type check in nfsd_create_locked
  nfsd: drop unnecessary MAY_EXEC check from create

Oleg Drokin (1):
  nfsd: Make creates return EEXIST instead of EACCES

 fs/nfsd/nfs4proc.c |   3 +-
 fs/nfsd/nfsproc.c  |   7 +--
 fs/nfsd/vfs.c      | 131 ++++++++++++++++++++++++-----------------------------
 fs/nfsd/vfs.h      |   3 ++
 4 files changed, 66 insertions(+), 78 deletions(-)
diff mbox

Patch

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index de1ff1d..0067520 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -605,8 +605,12 @@  nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	fh_init(&resfh, NFS4_FHSIZE);
 
+	/*
+	 * We just check thta parent is accessible here, nfsd_* do their
+	 * own access permission checks
+	 */
 	status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
-			   NFSD_MAY_CREATE);
+			   NFSD_MAY_EXEC);
 	if (status)
 		return status;
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6fbd81e..6a45ec6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1161,7 +1161,11 @@  nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (isdotent(fname, flen))
 		goto out;
 
-	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
+	/*
+	 * Even though it is a create, first we see if we are even allowed
+	 * to peek inside the parent
+	 */
+	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (err)
 		goto out;
 
@@ -1211,6 +1215,11 @@  nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		goto out; 
 	}
 
+	/* Now let's see if we actually have permissions to create */
+	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
+	if (err)
+		goto out;
+
 	if (!(iap->ia_valid & ATTR_MODE))
 		iap->ia_mode = 0;
 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;