diff mbox series

nfsd: fix handling of readdir in v4root vs. mount upcall timeout

Message ID 20221213180826.216690-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series nfsd: fix handling of readdir in v4root vs. mount upcall timeout | expand

Commit Message

Jeffrey Layton Dec. 13, 2022, 6:08 p.m. UTC
If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.

That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.

If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.

Cc: Steve Dickson <steved@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by:  JianHong Yin <yin-jianhong@163.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4xdr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Chuck Lever Dec. 13, 2022, 7 p.m. UTC | #1
> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> If v4 READDIR operation hits a mountpoint and gets back an error,
> then it will include that entry in the reply and set RDATTR_ERROR for it
> to the error.
> 
> That's fine for "normal" exported filesystems, but on the v4root, we
> need to be more careful to only expose the existence of dentries that
> lead to exports.
> 
> If the mountd upcall times out while checking to see whether a
> mountpoint on the v4root is exported, then we have no recourse other
> than to fail the whole operation.

Thank you for chasing this down!

Failing the whole READDIR when mountd times out might be a bad idea.
If the mountd upcall times out every time, the client can't make
any progress and will continue to emit the failing READDIR request.

Would it be better to skip the unresolvable entry instead and let
the READDIR succeed without that entry?


> Cc: Steve Dickson <steved@redhat.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
> Reported-by:  JianHong Yin <yin-jianhong@163.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/nfs4xdr.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 2b4ae858c89b..984528ce8d68 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3588,6 +3588,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
> 	struct readdir_cd *ccd = ccdv;
> 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
> 	struct xdr_stream *xdr = cd->xdr;
> +	struct svc_export *exp = cd->rd_fhp->fh_export;
> 	int start_offset = xdr->buf->len;
> 	int cookie_offset;
> 	u32 name_and_cookie;
> @@ -3629,6 +3630,17 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
> 	case nfserr_noent:
> 		xdr_truncate_encode(xdr, start_offset);
> 		goto skip_entry;
> +	case nfserr_jukebox:
> +		/*
> +		 * The pseudoroot should only display dentries that lead to
> +		 * exports. If we get EJUKEBOX here, then we can't tell whether
> +		 * this entry should be included. Just fail the whole READDIR
> +		 * with NFS4ERR_DELAY in that case, and hope that the situation
> +		 * will resolve itself by the client's next attempt.
> +		 */
> +		if (exp->ex_flags & NFSEXP_V4ROOT)
> +			goto fail;
> +		fallthrough;
> 	default:
> 		/*
> 		 * If the client requested the RDATTR_ERROR attribute,
> -- 
> 2.38.1
> 

--
Chuck Lever
Jeffrey Layton Dec. 13, 2022, 8:02 p.m. UTC | #2
On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
> 
> > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > If v4 READDIR operation hits a mountpoint and gets back an error,
> > then it will include that entry in the reply and set RDATTR_ERROR for it
> > to the error.
> > 
> > That's fine for "normal" exported filesystems, but on the v4root, we
> > need to be more careful to only expose the existence of dentries that
> > lead to exports.
> > 
> > If the mountd upcall times out while checking to see whether a
> > mountpoint on the v4root is exported, then we have no recourse other
> > than to fail the whole operation.
> 
> Thank you for chasing this down!
> 
> Failing the whole READDIR when mountd times out might be a bad idea.
> If the mountd upcall times out every time, the client can't make
> any progress and will continue to emit the failing READDIR request.
> 
> Would it be better to skip the unresolvable entry instead and let
> the READDIR succeed without that entry?
> 

Mounting doesn't usually require working READDIR. In that situation, a
readdir() might hang (until the client kills), but a lookup of other
dentries that aren't perpetually stalled should be ok in this situation.

If mountd is that hosed then I think it's unlikely that any progress
will be possible anyway.

> 
> > Cc: Steve Dickson <steved@redhat.com>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
> > Reported-by:  JianHong Yin <yin-jianhong@163.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/nfs4xdr.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 2b4ae858c89b..984528ce8d68 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -3588,6 +3588,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
> > 	struct readdir_cd *ccd = ccdv;
> > 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
> > 	struct xdr_stream *xdr = cd->xdr;
> > +	struct svc_export *exp = cd->rd_fhp->fh_export;
> > 	int start_offset = xdr->buf->len;
> > 	int cookie_offset;
> > 	u32 name_and_cookie;
> > @@ -3629,6 +3630,17 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
> > 	case nfserr_noent:
> > 		xdr_truncate_encode(xdr, start_offset);
> > 		goto skip_entry;
> > +	case nfserr_jukebox:
> > +		/*
> > +		 * The pseudoroot should only display dentries that lead to
> > +		 * exports. If we get EJUKEBOX here, then we can't tell whether
> > +		 * this entry should be included. Just fail the whole READDIR
> > +		 * with NFS4ERR_DELAY in that case, and hope that the situation
> > +		 * will resolve itself by the client's next attempt.
> > +		 */
> > +		if (exp->ex_flags & NFSEXP_V4ROOT)
> > +			goto fail;
> > +		fallthrough;
> > 	default:
> > 		/*
> > 		 * If the client requested the RDATTR_ERROR attribute,
> > -- 
> > 2.38.1
> > 
> 
> --
> Chuck Lever
> 
> 
>
Ian Kent Dec. 13, 2022, 11:14 p.m. UTC | #3
On 14/12/22 04:02, Jeff Layton wrote:
> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>
>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>> to the error.
>>>
>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>> need to be more careful to only expose the existence of dentries that
>>> lead to exports.
>>>
>>> If the mountd upcall times out while checking to see whether a
>>> mountpoint on the v4root is exported, then we have no recourse other
>>> than to fail the whole operation.
>> Thank you for chasing this down!
>>
>> Failing the whole READDIR when mountd times out might be a bad idea.
>> If the mountd upcall times out every time, the client can't make
>> any progress and will continue to emit the failing READDIR request.
>>
>> Would it be better to skip the unresolvable entry instead and let
>> the READDIR succeed without that entry?
>>
> Mounting doesn't usually require working READDIR. In that situation, a
> readdir() might hang (until the client kills), but a lookup of other
> dentries that aren't perpetually stalled should be ok in this situation.
>
> If mountd is that hosed then I think it's unlikely that any progress
> will be possible anyway.

The READDIR shouldn't trigger a mount yes, but if it's a valid automount

point (basically a valid dentry in this case I think) it should be listed.

It certainly shouldn't hold up the READDIR, passing into it is when a

mount should occur.


That's usually the behavior we want for automounts, we don't want mount

storms on directories full of automount points.


Ian
Jeffrey Layton Dec. 14, 2022, 12:39 a.m. UTC | #4
On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
> On 14/12/22 04:02, Jeff Layton wrote:
> > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
> > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > If v4 READDIR operation hits a mountpoint and gets back an error,
> > > > then it will include that entry in the reply and set RDATTR_ERROR for it
> > > > to the error.
> > > > 
> > > > That's fine for "normal" exported filesystems, but on the v4root, we
> > > > need to be more careful to only expose the existence of dentries that
> > > > lead to exports.
> > > > 
> > > > If the mountd upcall times out while checking to see whether a
> > > > mountpoint on the v4root is exported, then we have no recourse other
> > > > than to fail the whole operation.
> > > Thank you for chasing this down!
> > > 
> > > Failing the whole READDIR when mountd times out might be a bad idea.
> > > If the mountd upcall times out every time, the client can't make
> > > any progress and will continue to emit the failing READDIR request.
> > > 
> > > Would it be better to skip the unresolvable entry instead and let
> > > the READDIR succeed without that entry?
> > > 
> > Mounting doesn't usually require working READDIR. In that situation, a
> > readdir() might hang (until the client kills), but a lookup of other
> > dentries that aren't perpetually stalled should be ok in this situation.
> > 
> > If mountd is that hosed then I think it's unlikely that any progress
> > will be possible anyway.
> 
> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
> 
> point (basically a valid dentry in this case I think) it should be listed.
> 
> It certainly shouldn't hold up the READDIR, passing into it is when a
> 
> mount should occur.
> 
> 
> That's usually the behavior we want for automounts, we don't want mount
> 
> storms on directories full of automount points.
> 


We only want to display it if it's a valid _exported_ mountpoint.

The idea here is to only reveal the parts of the namespace that are
exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
only exported mountpoints and ancestor directories of those mountpoints.

We don't want mountd triggering automounts, in general. If the
underlying filesystem was exported, then it should also already be
mounted, since nfsd doesn't currently trigger automounts in
follow_down().

There is also a separate patchset by Richard Weinberger to allow nfsd to
trigger automounts if the parent filesystem is exported with -o
crossmnt. That should be ok with this patch, since the automount will be
triggered before the upcall to mountd. That should ensure that it's
already mounted by the time we get to upcalling for its export.
Ian Kent Dec. 14, 2022, 5:37 a.m. UTC | #5
On 14/12/22 08:39, Jeff Layton wrote:
> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>> On 14/12/22 04:02, Jeff Layton wrote:
>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>
>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>>>> to the error.
>>>>>
>>>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>>>> need to be more careful to only expose the existence of dentries that
>>>>> lead to exports.
>>>>>
>>>>> If the mountd upcall times out while checking to see whether a
>>>>> mountpoint on the v4root is exported, then we have no recourse other
>>>>> than to fail the whole operation.
>>>> Thank you for chasing this down!
>>>>
>>>> Failing the whole READDIR when mountd times out might be a bad idea.
>>>> If the mountd upcall times out every time, the client can't make
>>>> any progress and will continue to emit the failing READDIR request.
>>>>
>>>> Would it be better to skip the unresolvable entry instead and let
>>>> the READDIR succeed without that entry?
>>>>
>>> Mounting doesn't usually require working READDIR. In that situation, a
>>> readdir() might hang (until the client kills), but a lookup of other
>>> dentries that aren't perpetually stalled should be ok in this situation.
>>>
>>> If mountd is that hosed then I think it's unlikely that any progress
>>> will be possible anyway.
>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
>>
>> point (basically a valid dentry in this case I think) it should be listed.
>>
>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>
>> mount should occur.
>>
>>
>> That's usually the behavior we want for automounts, we don't want mount
>>
>> storms on directories full of automount points.
>>
>
> We only want to display it if it's a valid _exported_ mountpoint.
>
> The idea here is to only reveal the parts of the namespace that are
> exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
> only exported mountpoints and ancestor directories of those mountpoints.
>
> We don't want mountd triggering automounts, in general. If the
> underlying filesystem was exported, then it should also already be
> mounted, since nfsd doesn't currently trigger automounts in
> follow_down().

Umm ... must they already be mounted?


Can't it be a valid mount point either not yet mounted or timed out

and umounted. In that case shouldn't it be listed, I know that's

not the that good an outcome because its stat info will change when

it gets walked into but it's usually the only sane choice.


>
> There is also a separate patchset by Richard Weinberger to allow nfsd to
> trigger automounts if the parent filesystem is exported with -o
> crossmnt. That should be ok with this patch, since the automount will be
> triggered before the upcall to mountd. That should ensure that it's
> already mounted by the time we get to upcalling for its export.

Yep, saw that, ;)


Ian
Chuck Lever Jan. 1, 2023, 6:09 p.m. UTC | #6
> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote:
> 
> On 14/12/22 08:39, Jeff Layton wrote:
>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>>> On 14/12/22 04:02, Jeff Layton wrote:
>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> 
>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>>>>> to the error.
>>>>>> 
>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>>>>> need to be more careful to only expose the existence of dentries that
>>>>>> lead to exports.
>>>>>> 
>>>>>> If the mountd upcall times out while checking to see whether a
>>>>>> mountpoint on the v4root is exported, then we have no recourse other
>>>>>> than to fail the whole operation.
>>>>> Thank you for chasing this down!
>>>>> 
>>>>> Failing the whole READDIR when mountd times out might be a bad idea.
>>>>> If the mountd upcall times out every time, the client can't make
>>>>> any progress and will continue to emit the failing READDIR request.
>>>>> 
>>>>> Would it be better to skip the unresolvable entry instead and let
>>>>> the READDIR succeed without that entry?
>>>>> 
>>>> Mounting doesn't usually require working READDIR. In that situation, a
>>>> readdir() might hang (until the client kills), but a lookup of other
>>>> dentries that aren't perpetually stalled should be ok in this situation.
>>>> 
>>>> If mountd is that hosed then I think it's unlikely that any progress
>>>> will be possible anyway.
>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
>>> 
>>> point (basically a valid dentry in this case I think) it should be listed.
>>> 
>>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>> 
>>> mount should occur.
>>> 
>>> 
>>> That's usually the behavior we want for automounts, we don't want mount
>>> 
>>> storms on directories full of automount points.
>>> 
>> 
>> We only want to display it if it's a valid _exported_ mountpoint.
>> 
>> The idea here is to only reveal the parts of the namespace that are
>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
>> only exported mountpoints and ancestor directories of those mountpoints.
>> 
>> We don't want mountd triggering automounts, in general. If the
>> underlying filesystem was exported, then it should also already be
>> mounted, since nfsd doesn't currently trigger automounts in
>> follow_down().
> 
> Umm ... must they already be mounted?
> 
> 
> Can't it be a valid mount point either not yet mounted or timed out
> 
> and umounted. In that case shouldn't it be listed, I know that's
> 
> not the that good an outcome because its stat info will change when
> 
> it gets walked into but it's usually the only sane choice.
> 
> 
>> 
>> There is also a separate patchset by Richard Weinberger to allow nfsd to
>> trigger automounts if the parent filesystem is exported with -o
>> crossmnt. That should be ok with this patch, since the automount will be
>> triggered before the upcall to mountd. That should ensure that it's
>> already mounted by the time we get to upcalling for its export.
> 
> Yep, saw that, ;)

I'm not sure if there is consensus on this patch.

It's been pushed to nfsd's for-rc branch for wider testing, but if
there's a strong objection I can pull it out before the next -rc PR.


--
Chuck Lever
Chuck Lever Jan. 1, 2023, 6:18 p.m. UTC | #7
> On Jan 1, 2023, at 1:09 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote:
>> 
>> On 14/12/22 08:39, Jeff Layton wrote:
>>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>>>> On 14/12/22 04:02, Jeff Layton wrote:
>>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>> 
>>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>>>>>> to the error.
>>>>>>> 
>>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>>>>>> need to be more careful to only expose the existence of dentries that
>>>>>>> lead to exports.
>>>>>>> 
>>>>>>> If the mountd upcall times out while checking to see whether a
>>>>>>> mountpoint on the v4root is exported, then we have no recourse other
>>>>>>> than to fail the whole operation.
>>>>>> Thank you for chasing this down!
>>>>>> 
>>>>>> Failing the whole READDIR when mountd times out might be a bad idea.
>>>>>> If the mountd upcall times out every time, the client can't make
>>>>>> any progress and will continue to emit the failing READDIR request.
>>>>>> 
>>>>>> Would it be better to skip the unresolvable entry instead and let
>>>>>> the READDIR succeed without that entry?
>>>>>> 
>>>>> Mounting doesn't usually require working READDIR. In that situation, a
>>>>> readdir() might hang (until the client kills), but a lookup of other
>>>>> dentries that aren't perpetually stalled should be ok in this situation.
>>>>> 
>>>>> If mountd is that hosed then I think it's unlikely that any progress
>>>>> will be possible anyway.
>>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
>>>> 
>>>> point (basically a valid dentry in this case I think) it should be listed.
>>>> 
>>>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>>> 
>>>> mount should occur.
>>>> 
>>>> 
>>>> That's usually the behavior we want for automounts, we don't want mount
>>>> 
>>>> storms on directories full of automount points.
>>>> 
>>> 
>>> We only want to display it if it's a valid _exported_ mountpoint.
>>> 
>>> The idea here is to only reveal the parts of the namespace that are
>>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
>>> only exported mountpoints and ancestor directories of those mountpoints.
>>> 
>>> We don't want mountd triggering automounts, in general. If the
>>> underlying filesystem was exported, then it should also already be
>>> mounted, since nfsd doesn't currently trigger automounts in
>>> follow_down().
>> 
>> Umm ... must they already be mounted?
>> 
>> 
>> Can't it be a valid mount point either not yet mounted or timed out
>> 
>> and umounted. In that case shouldn't it be listed, I know that's
>> 
>> not the that good an outcome because its stat info will change when
>> 
>> it gets walked into but it's usually the only sane choice.
>> 
>> 
>>> 
>>> There is also a separate patchset by Richard Weinberger to allow nfsd to
>>> trigger automounts if the parent filesystem is exported with -o
>>> crossmnt. That should be ok with this patch, since the automount will be
>>> triggered before the upcall to mountd. That should ensure that it's
>>> already mounted by the time we get to upcalling for its export.
>> 
>> Yep, saw that, ;)
> 
> I'm not sure if there is consensus on this patch.
> 
> It's been pushed to nfsd's for-rc branch for wider testing, but if
> there's a strong objection I can pull it out before the next -rc PR.

Also, do we agree that it should get a "Cc: stable" tag?


--
Chuck Lever
Jeffrey Layton Jan. 1, 2023, 9:16 p.m. UTC | #8
On Wed, 2022-12-14 at 13:37 +0800, Ian Kent wrote:
> On 14/12/22 08:39, Jeff Layton wrote:
> > On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
> > > On 14/12/22 04:02, Jeff Layton wrote:
> > > > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
> > > > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > If v4 READDIR operation hits a mountpoint and gets back an error,
> > > > > > then it will include that entry in the reply and set RDATTR_ERROR for it
> > > > > > to the error.
> > > > > > 
> > > > > > That's fine for "normal" exported filesystems, but on the v4root, we
> > > > > > need to be more careful to only expose the existence of dentries that
> > > > > > lead to exports.
> > > > > > 
> > > > > > If the mountd upcall times out while checking to see whether a
> > > > > > mountpoint on the v4root is exported, then we have no recourse other
> > > > > > than to fail the whole operation.
> > > > > Thank you for chasing this down!
> > > > > 
> > > > > Failing the whole READDIR when mountd times out might be a bad idea.
> > > > > If the mountd upcall times out every time, the client can't make
> > > > > any progress and will continue to emit the failing READDIR request.
> > > > > 
> > > > > Would it be better to skip the unresolvable entry instead and let
> > > > > the READDIR succeed without that entry?
> > > > > 
> > > > Mounting doesn't usually require working READDIR. In that situation, a
> > > > readdir() might hang (until the client kills), but a lookup of other
> > > > dentries that aren't perpetually stalled should be ok in this situation.
> > > > 
> > > > If mountd is that hosed then I think it's unlikely that any progress
> > > > will be possible anyway.
> > > The READDIR shouldn't trigger a mount yes, but if it's a valid automount
> > > 
> > > point (basically a valid dentry in this case I think) it should be listed.
> > > 
> > > It certainly shouldn't hold up the READDIR, passing into it is when a
> > > 
> > > mount should occur.
> > > 
> > > 
> > > That's usually the behavior we want for automounts, we don't want mount
> > > 
> > > storms on directories full of automount points.
> > > 
> > 
> > We only want to display it if it's a valid _exported_ mountpoint.
> > 
> > The idea here is to only reveal the parts of the namespace that are
> > exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
> > only exported mountpoints and ancestor directories of those mountpoints.
> > 
> > We don't want mountd triggering automounts, in general. If the
> > underlying filesystem was exported, then it should also already be
> > mounted, since nfsd doesn't currently trigger automounts in
> > follow_down().
> 
> Umm ... must they already be mounted?
> 
> 
> Can't it be a valid mount point either not yet mounted or timed out
> 
> and umounted. In that case shouldn't it be listed, I know that's
> 
> not the that good an outcome because its stat info will change when
> 
> it gets walked into but it's usually the only sane choice.
> 

Yes, it does need to already be mounted.

The proposed kernel patches from Richard only trigger an automount if
the parent mount is exported with -o crossmnt. I think this is necessary
to avoid nfs client activity triggering automounts of filesystems that
are not exported.

> 
> > 
> > There is also a separate patchset by Richard Weinberger to allow nfsd to
> > trigger automounts if the parent filesystem is exported with -o
> > crossmnt. That should be ok with this patch, since the automount will be
> > triggered before the upcall to mountd. That should ensure that it's
> > already mounted by the time we get to upcalling for its export.
>
Ian Kent Jan. 2, 2023, 6:34 a.m. UTC | #9
On 2/1/23 02:09, Chuck Lever III wrote:
>
>> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote:
>>
>> On 14/12/22 08:39, Jeff Layton wrote:
>>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>>>> On 14/12/22 04:02, Jeff Layton wrote:
>>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>>>>>> to the error.
>>>>>>>
>>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>>>>>> need to be more careful to only expose the existence of dentries that
>>>>>>> lead to exports.
>>>>>>>
>>>>>>> If the mountd upcall times out while checking to see whether a
>>>>>>> mountpoint on the v4root is exported, then we have no recourse other
>>>>>>> than to fail the whole operation.
>>>>>> Thank you for chasing this down!
>>>>>>
>>>>>> Failing the whole READDIR when mountd times out might be a bad idea.
>>>>>> If the mountd upcall times out every time, the client can't make
>>>>>> any progress and will continue to emit the failing READDIR request.
>>>>>>
>>>>>> Would it be better to skip the unresolvable entry instead and let
>>>>>> the READDIR succeed without that entry?
>>>>>>
>>>>> Mounting doesn't usually require working READDIR. In that situation, a
>>>>> readdir() might hang (until the client kills), but a lookup of other
>>>>> dentries that aren't perpetually stalled should be ok in this situation.
>>>>>
>>>>> If mountd is that hosed then I think it's unlikely that any progress
>>>>> will be possible anyway.
>>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
>>>>
>>>> point (basically a valid dentry in this case I think) it should be listed.
>>>>
>>>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>>>
>>>> mount should occur.
>>>>
>>>>
>>>> That's usually the behavior we want for automounts, we don't want mount
>>>>
>>>> storms on directories full of automount points.
>>>>
>>> We only want to display it if it's a valid _exported_ mountpoint.
>>>
>>> The idea here is to only reveal the parts of the namespace that are
>>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
>>> only exported mountpoints and ancestor directories of those mountpoints.
>>>
>>> We don't want mountd triggering automounts, in general. If the
>>> underlying filesystem was exported, then it should also already be
>>> mounted, since nfsd doesn't currently trigger automounts in
>>> follow_down().
>> Umm ... must they already be mounted?
>>
>>
>> Can't it be a valid mount point either not yet mounted or timed out
>>
>> and umounted. In that case shouldn't it be listed, I know that's
>>
>> not the that good an outcome because its stat info will change when
>>
>> it gets walked into but it's usually the only sane choice.
>>
>>
>>> There is also a separate patchset by Richard Weinberger to allow nfsd to
>>> trigger automounts if the parent filesystem is exported with -o
>>> crossmnt. That should be ok with this patch, since the automount will be
>>> triggered before the upcall to mountd. That should ensure that it's
>>> already mounted by the time we get to upcalling for its export.
>> Yep, saw that, ;)
> I'm not sure if there is consensus on this patch.
>
> It's been pushed to nfsd's for-rc branch for wider testing, but if
> there's a strong objection I can pull it out before the next -rc PR.


I don't have any objections, my original comment about it breaking

existing behavior has been addressed.


The only reason I've commented further is because of my time with

automounting but, as Jeff kind-off points out nfsd is not quite the

same as what I'm used to, specifically the way exports are implemented

in nfsd.


Still you never know, my comments may trigger a thought in someone along

the way, ;)


Ian
Ian Kent Jan. 2, 2023, 6:41 a.m. UTC | #10
On 2/1/23 05:16, Jeff Layton wrote:
> On Wed, 2022-12-14 at 13:37 +0800, Ian Kent wrote:
>> On 14/12/22 08:39, Jeff Layton wrote:
>>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>>>> On 14/12/22 04:02, Jeff Layton wrote:
>>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it
>>>>>>> to the error.
>>>>>>>
>>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we
>>>>>>> need to be more careful to only expose the existence of dentries that
>>>>>>> lead to exports.
>>>>>>>
>>>>>>> If the mountd upcall times out while checking to see whether a
>>>>>>> mountpoint on the v4root is exported, then we have no recourse other
>>>>>>> than to fail the whole operation.
>>>>>> Thank you for chasing this down!
>>>>>>
>>>>>> Failing the whole READDIR when mountd times out might be a bad idea.
>>>>>> If the mountd upcall times out every time, the client can't make
>>>>>> any progress and will continue to emit the failing READDIR request.
>>>>>>
>>>>>> Would it be better to skip the unresolvable entry instead and let
>>>>>> the READDIR succeed without that entry?
>>>>>>
>>>>> Mounting doesn't usually require working READDIR. In that situation, a
>>>>> readdir() might hang (until the client kills), but a lookup of other
>>>>> dentries that aren't perpetually stalled should be ok in this situation.
>>>>>
>>>>> If mountd is that hosed then I think it's unlikely that any progress
>>>>> will be possible anyway.
>>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount
>>>>
>>>> point (basically a valid dentry in this case I think) it should be listed.
>>>>
>>>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>>>
>>>> mount should occur.
>>>>
>>>>
>>>> That's usually the behavior we want for automounts, we don't want mount
>>>>
>>>> storms on directories full of automount points.
>>>>
>>> We only want to display it if it's a valid _exported_ mountpoint.
>>>
>>> The idea here is to only reveal the parts of the namespace that are
>>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
>>> only exported mountpoints and ancestor directories of those mountpoints.
>>>
>>> We don't want mountd triggering automounts, in general. If the
>>> underlying filesystem was exported, then it should also already be
>>> mounted, since nfsd doesn't currently trigger automounts in
>>> follow_down().
>> Umm ... must they already be mounted?
>>
>>
>> Can't it be a valid mount point either not yet mounted or timed out
>>
>> and umounted. In that case shouldn't it be listed, I know that's
>>
>> not the that good an outcome because its stat info will change when
>>
>> it gets walked into but it's usually the only sane choice.
>>
> Yes, it does need to already be mounted.
>
> The proposed kernel patches from Richard only trigger an automount if
> the parent mount is exported with -o crossmnt. I think this is necessary
> to avoid nfs client activity triggering automounts of filesystems that
> are not exported.

I'll be interested to see how this goes.


Over the years I've had a lot of difficulty with automount unwanted

mounting ...


Still nfsd exports are a bit like invisible dentry trees to the local

system aren't they ... so this situation is very different to what

I've worked on ...


Ian

>
>>> There is also a separate patchset by Richard Weinberger to allow nfsd to
>>> trigger automounts if the parent filesystem is exported with -o
>>> crossmnt. That should be ok with this patch, since the automount will be
>>> triggered before the upcall to mountd. That should ensure that it's
>>> already mounted by the time we get to upcalling for its export.
Ian Kent Jan. 2, 2023, 6:57 a.m. UTC | #11
On 2/1/23 14:34, Ian Kent wrote:
>
> On 2/1/23 02:09, Chuck Lever III wrote:
>>
>>> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote:
>>>
>>> On 14/12/22 08:39, Jeff Layton wrote:
>>>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
>>>>> On 14/12/22 04:02, Jeff Layton wrote:
>>>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
>>>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error,
>>>>>>>> then it will include that entry in the reply and set 
>>>>>>>> RDATTR_ERROR for it
>>>>>>>> to the error.
>>>>>>>>
>>>>>>>> That's fine for "normal" exported filesystems, but on the 
>>>>>>>> v4root, we
>>>>>>>> need to be more careful to only expose the existence of 
>>>>>>>> dentries that
>>>>>>>> lead to exports.
>>>>>>>>
>>>>>>>> If the mountd upcall times out while checking to see whether a
>>>>>>>> mountpoint on the v4root is exported, then we have no recourse 
>>>>>>>> other
>>>>>>>> than to fail the whole operation.
>>>>>>> Thank you for chasing this down!
>>>>>>>
>>>>>>> Failing the whole READDIR when mountd times out might be a bad 
>>>>>>> idea.
>>>>>>> If the mountd upcall times out every time, the client can't make
>>>>>>> any progress and will continue to emit the failing READDIR request.
>>>>>>>
>>>>>>> Would it be better to skip the unresolvable entry instead and let
>>>>>>> the READDIR succeed without that entry?
>>>>>>>
>>>>>> Mounting doesn't usually require working READDIR. In that 
>>>>>> situation, a
>>>>>> readdir() might hang (until the client kills), but a lookup of other
>>>>>> dentries that aren't perpetually stalled should be ok in this 
>>>>>> situation.
>>>>>>
>>>>>> If mountd is that hosed then I think it's unlikely that any progress
>>>>>> will be possible anyway.
>>>>> The READDIR shouldn't trigger a mount yes, but if it's a valid 
>>>>> automount
>>>>>
>>>>> point (basically a valid dentry in this case I think) it should be 
>>>>> listed.
>>>>>
>>>>> It certainly shouldn't hold up the READDIR, passing into it is when a
>>>>>
>>>>> mount should occur.
>>>>>
>>>>>
>>>>> That's usually the behavior we want for automounts, we don't want 
>>>>> mount
>>>>>
>>>>> storms on directories full of automount points.
>>>>>
>>>> We only want to display it if it's a valid _exported_ mountpoint.
>>>>
>>>> The idea here is to only reveal the parts of the namespace that are
>>>> exported in the nfsv4 pseudoroot. The "normal" contents are not 
>>>> shown --
>>>> only exported mountpoints and ancestor directories of those 
>>>> mountpoints.
>>>>
>>>> We don't want mountd triggering automounts, in general. If the
>>>> underlying filesystem was exported, then it should also already be
>>>> mounted, since nfsd doesn't currently trigger automounts in
>>>> follow_down().
>>> Umm ... must they already be mounted?
>>>
>>>
>>> Can't it be a valid mount point either not yet mounted or timed out
>>>
>>> and umounted. In that case shouldn't it be listed, I know that's
>>>
>>> not the that good an outcome because its stat info will change when
>>>
>>> it gets walked into but it's usually the only sane choice.
>>>
>>>
>>>> There is also a separate patchset by Richard Weinberger to allow 
>>>> nfsd to
>>>> trigger automounts if the parent filesystem is exported with -o
>>>> crossmnt. That should be ok with this patch, since the automount 
>>>> will be
>>>> triggered before the upcall to mountd. That should ensure that it's
>>>> already mounted by the time we get to upcalling for its export.
>>> Yep, saw that, ;)
>> I'm not sure if there is consensus on this patch.
>>
>> It's been pushed to nfsd's for-rc branch for wider testing, but if
>> there's a strong objection I can pull it out before the next -rc PR.
>
>
> I don't have any objections, my original comment about it breaking
>
> existing behavior has been addressed.


Actually I'm confused with the other patch series Jeff mentioned.

I still don't have any objections, ;)


I was a little curious about the error handling but that's

because my memories of the jukebox error handling on the client

side are different to what's being done but here it's the server

so it makes sense to assume the client will do the work and retry

or whatever.


Ian
Jeffrey Layton Jan. 3, 2023, 1:31 p.m. UTC | #12
On Sun, 2023-01-01 at 18:18 +0000, Chuck Lever III wrote:
> 
> > On Jan 1, 2023, at 1:09 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > 
> > 
> > 
> > > On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote:
> > > 
> > > On 14/12/22 08:39, Jeff Layton wrote:
> > > > On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote:
> > > > > On 14/12/22 04:02, Jeff Layton wrote:
> > > > > > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote:
> > > > > > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > 
> > > > > > > > If v4 READDIR operation hits a mountpoint and gets back an error,
> > > > > > > > then it will include that entry in the reply and set RDATTR_ERROR for it
> > > > > > > > to the error.
> > > > > > > > 
> > > > > > > > That's fine for "normal" exported filesystems, but on the v4root, we
> > > > > > > > need to be more careful to only expose the existence of dentries that
> > > > > > > > lead to exports.
> > > > > > > > 
> > > > > > > > If the mountd upcall times out while checking to see whether a
> > > > > > > > mountpoint on the v4root is exported, then we have no recourse other
> > > > > > > > than to fail the whole operation.
> > > > > > > Thank you for chasing this down!
> > > > > > > 
> > > > > > > Failing the whole READDIR when mountd times out might be a bad idea.
> > > > > > > If the mountd upcall times out every time, the client can't make
> > > > > > > any progress and will continue to emit the failing READDIR request.
> > > > > > > 
> > > > > > > Would it be better to skip the unresolvable entry instead and let
> > > > > > > the READDIR succeed without that entry?
> > > > > > > 
> > > > > > Mounting doesn't usually require working READDIR. In that situation, a
> > > > > > readdir() might hang (until the client kills), but a lookup of other
> > > > > > dentries that aren't perpetually stalled should be ok in this situation.
> > > > > > 
> > > > > > If mountd is that hosed then I think it's unlikely that any progress
> > > > > > will be possible anyway.
> > > > > The READDIR shouldn't trigger a mount yes, but if it's a valid automount
> > > > > 
> > > > > point (basically a valid dentry in this case I think) it should be listed.
> > > > > 
> > > > > It certainly shouldn't hold up the READDIR, passing into it is when a
> > > > > 
> > > > > mount should occur.
> > > > > 
> > > > > 
> > > > > That's usually the behavior we want for automounts, we don't want mount
> > > > > 
> > > > > storms on directories full of automount points.
> > > > > 
> > > > 
> > > > We only want to display it if it's a valid _exported_ mountpoint.
> > > > 
> > > > The idea here is to only reveal the parts of the namespace that are
> > > > exported in the nfsv4 pseudoroot. The "normal" contents are not shown --
> > > > only exported mountpoints and ancestor directories of those mountpoints.
> > > > 
> > > > We don't want mountd triggering automounts, in general. If the
> > > > underlying filesystem was exported, then it should also already be
> > > > mounted, since nfsd doesn't currently trigger automounts in
> > > > follow_down().
> > > 
> > > Umm ... must they already be mounted?
> > > 
> > > 
> > > Can't it be a valid mount point either not yet mounted or timed out
> > > 
> > > and umounted. In that case shouldn't it be listed, I know that's
> > > 
> > > not the that good an outcome because its stat info will change when
> > > 
> > > it gets walked into but it's usually the only sane choice.
> > > 
> > > 
> > > > 
> > > > There is also a separate patchset by Richard Weinberger to allow nfsd to
> > > > trigger automounts if the parent filesystem is exported with -o
> > > > crossmnt. That should be ok with this patch, since the automount will be
> > > > triggered before the upcall to mountd. That should ensure that it's
> > > > already mounted by the time we get to upcalling for its export.
> > > 
> > > Yep, saw that, ;)
> > 
> > I'm not sure if there is consensus on this patch.
> > 
> > It's been pushed to nfsd's for-rc branch for wider testing, but if
> > there's a strong objection I can pull it out before the next -rc PR.
> 
> Also, do we agree that it should get a "Cc: stable" tag?
> 

Yes, I think so. This potentially exposes some info to clients that they
really shouldn't have.
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2b4ae858c89b..984528ce8d68 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3588,6 +3588,7 @@  nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	struct readdir_cd *ccd = ccdv;
 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
 	struct xdr_stream *xdr = cd->xdr;
+	struct svc_export *exp = cd->rd_fhp->fh_export;
 	int start_offset = xdr->buf->len;
 	int cookie_offset;
 	u32 name_and_cookie;
@@ -3629,6 +3630,17 @@  nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	case nfserr_noent:
 		xdr_truncate_encode(xdr, start_offset);
 		goto skip_entry;
+	case nfserr_jukebox:
+		/*
+		 * The pseudoroot should only display dentries that lead to
+		 * exports. If we get EJUKEBOX here, then we can't tell whether
+		 * this entry should be included. Just fail the whole READDIR
+		 * with NFS4ERR_DELAY in that case, and hope that the situation
+		 * will resolve itself by the client's next attempt.
+		 */
+		if (exp->ex_flags & NFSEXP_V4ROOT)
+			goto fail;
+		fallthrough;
 	default:
 		/*
 		 * If the client requested the RDATTR_ERROR attribute,