[1/1] NFSD: fix use-after-free in __nfs42_ssc_open()

Message ID	1670786549-27041-1-git-send-email-dai.ngo@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Dai Ngo <dai.ngo@oracle.com> To: chuck.lever@oracle.com, jlayton@kernel.org Cc: kolga@netapp.com, hdthky0@gmail.com, linux-nfs@vger.kernel.org, security@kernel.org Subject: [PATCH 1/1] NFSD: fix use-after-free in __nfs42_ssc_open() Date: Sun, 11 Dec 2022 11:22:29 -0800 Message-Id: <1670786549-27041-1-git-send-email-dai.ngo@oracle.com> Precedence: bulk
Series	[1/1] NFSD: fix use-after-free in __nfs42_ssc_open() \| expand [1/1] NFSD: fix use-after-free in __nfs42_ssc_open()

Dai Ngo Dec. 11, 2022, 7:22 p.m. UTC

Problem caused by source's vfsmount being unmounted but remains
on the delayed unmount list. This happens when nfs42_ssc_open()
return errors.
Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
for the laundromat to unmount when idle time expires.

Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/nfs4proc.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

Xingyuan Mo Dec. 12, 2022, 11:22 a.m. UTC | #1

Can I share the patch with the linux-distros list, so that
distros can do their own testing and preparations?

Regards,
Xingyuan Mo

On Mon, Dec 12, 2022 at 3:22 AM Dai Ngo <dai.ngo@oracle.com> wrote:
>
> Problem caused by source's vfsmount being unmounted but remains
> on the delayed unmount list. This happens when nfs42_ssc_open()
> return errors.
> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> for the laundromat to unmount when idle time expires.
>
> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 8beb2bc4c328..756e42cf0d01 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>         return status;
>  }
>
> -static void
> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> -{
> -       nfs_do_sb_deactive(ss_mnt->mnt_sb);
> -       mntput(ss_mnt);
> -}
> -
>  /*
>   * Verify COPY destination stateid.
>   *
> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>  {
>  }
>
> -static void
> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> -{
> -}
> -
>  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>                                    struct nfs_fh *src_fh,
>                                    nfs4_stateid *stateid)
> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>                 struct file *filp;
>
>                 filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> -                                     &copy->stateid);
> +                                       &copy->stateid);
> +
>                 if (IS_ERR(filp)) {
>                         switch (PTR_ERR(filp)) {
>                         case -EBADF:
> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>                         default:
>                                 nfserr = nfserr_offload_denied;
>                         }
> -                       nfsd4_interssc_disconnect(copy->ss_mnt);
> +                       /* ss_mnt will be unmounted by the laundromat */
>                         goto do_callback;
>                 }
>                 nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>         if (async_copy)
>                 cleanup_async_copy(async_copy);
>         status = nfserrno(-ENOMEM);
> -       if (nfsd4_ssc_is_inter(copy))
> -               nfsd4_interssc_disconnect(copy->ss_mnt);
> +       /*
> +        * source's vfsmount of inter-copy will be unmounted
> +        * by the laundromat
> +        */
>         goto out;
>  }
>
> --
> 2.9.5
>

Greg KH Dec. 12, 2022, 11:59 a.m. UTC | #2

On Mon, Dec 12, 2022 at 07:22:38PM +0800, Xingyuan Mo wrote:
> Can I share the patch with the linux-distros list, so that
> distros can do their own testing and preparations?

Be _VERY_ aware of the rules regarding that list before you send
anything as you are going to have some very strict rules put on you with
regards to what you must do once you post to them.

Personally, I would prefer if this gets into Linus's tree first so that
we can get it into a stable release before letting distros know about
it, otherwise you are forcing me into a very tight schedule that might
require you to tell the world about the problem _BEFORE_ the fix is in
Linus's or any stable kernel trees.

thanks,

greg k-h

Jeff Layton Dec. 12, 2022, 12:22 p.m. UTC | #3

On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> Problem caused by source's vfsmount being unmounted but remains
> on the delayed unmount list. This happens when nfs42_ssc_open()
> return errors.
> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> for the laundromat to unmount when idle time expires.
> 
> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 8beb2bc4c328..756e42cf0d01 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>  	return status;
>  }
>  
> -static void
> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> -{
> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> -	mntput(ss_mnt);
> -}
> -
>  /*
>   * Verify COPY destination stateid.
>   *
> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>  {
>  }
>  
> -static void
> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> -{
> -}
> -
>  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>  				   struct nfs_fh *src_fh,
>  				   nfs4_stateid *stateid)
> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>  		struct file *filp;
>  
>  		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> -				      &copy->stateid);
> +					&copy->stateid);
> +
>  		if (IS_ERR(filp)) {
>  			switch (PTR_ERR(filp)) {
>  			case -EBADF:
> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>  			default:
>  				nfserr = nfserr_offload_denied;
>  			}
> -			nfsd4_interssc_disconnect(copy->ss_mnt);
> +			/* ss_mnt will be unmounted by the laundromat */
>  			goto do_callback;
>  		}
>  		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	if (async_copy)
>  		cleanup_async_copy(async_copy);
>  	status = nfserrno(-ENOMEM);
> -	if (nfsd4_ssc_is_inter(copy))
> -		nfsd4_interssc_disconnect(copy->ss_mnt);
> +	/*
> +	 * source's vfsmount of inter-copy will be unmounted
> +	 * by the laundromat
> +	 */
>  	goto out;
>  }
>  

This looks reasonable at first glance, but I have some concerns with the
refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
looks for an existing connection and bumps the ni->nsui_refcnt if it
finds one.

But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
just does a bare mntput:

        if (!nn) {
                mntput(ss_mnt);
                return;
        }
...
        if (!found) {
                mntput(ss_mnt);
                return;
        }

The first one looks bogus. Can net_generic return NULL? If so how, and
why is it not a problem elsewhere in the kernel?

For the second case, if the ni is no longer on the list, where did the
extra ss_mnt reference come from? Maybe that should be a WARN_ON or
BUG_ON?

Jeff Layton Dec. 12, 2022, 12:26 p.m. UTC | #4

On Mon, 2022-12-12 at 12:59 +0100, Greg KH wrote:
> On Mon, Dec 12, 2022 at 07:22:38PM +0800, Xingyuan Mo wrote:
> > Can I share the patch with the linux-distros list, so that
> > distros can do their own testing and preparations?
> 
> Be _VERY_ aware of the rules regarding that list before you send
> anything as you are going to have some very strict rules put on you with
> regards to what you must do once you post to them.
> 
> Personally, I would prefer if this gets into Linus's tree first so that
> we can get it into a stable release before letting distros know about
> it, otherwise you are forcing me into a very tight schedule that might
> require you to tell the world about the problem _BEFORE_ the fix is in
> Linus's or any stable kernel trees.
> 

I think that ship has sailed, as the linux-nfs list was cc'ed on this
patch.

Greg KH Dec. 12, 2022, 12:44 p.m. UTC | #5

On Mon, Dec 12, 2022 at 07:26:27AM -0500, Jeff Layton wrote:
> On Mon, 2022-12-12 at 12:59 +0100, Greg KH wrote:
> > On Mon, Dec 12, 2022 at 07:22:38PM +0800, Xingyuan Mo wrote:
> > > Can I share the patch with the linux-distros list, so that
> > > distros can do their own testing and preparations?
> > 
> > Be _VERY_ aware of the rules regarding that list before you send
> > anything as you are going to have some very strict rules put on you with
> > regards to what you must do once you post to them.
> > 
> > Personally, I would prefer if this gets into Linus's tree first so that
> > we can get it into a stable release before letting distros know about
> > it, otherwise you are forcing me into a very tight schedule that might
> > require you to tell the world about the problem _BEFORE_ the fix is in
> > Linus's or any stable kernel trees.
> > 
> 
> I think that ship has sailed, as the linux-nfs list was cc'ed on this
> patch.

Ah, missed that, nice!

So no need to let linux-distros know about it either :)

thanks,

greg k-h

Dai Ngo Dec. 12, 2022, 1:34 p.m. UTC | #6

On 12/12/22 4:22 AM, Jeff Layton wrote:
> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>> Problem caused by source's vfsmount being unmounted but remains
>> on the delayed unmount list. This happens when nfs42_ssc_open()
>> return errors.
>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>> for the laundromat to unmount when idle time expires.
>>
>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>   1 file changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 8beb2bc4c328..756e42cf0d01 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>   	return status;
>>   }
>>   
>> -static void
>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>> -{
>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>> -	mntput(ss_mnt);
>> -}
>> -
>>   /*
>>    * Verify COPY destination stateid.
>>    *
>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>   {
>>   }
>>   
>> -static void
>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>> -{
>> -}
>> -
>>   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>   				   struct nfs_fh *src_fh,
>>   				   nfs4_stateid *stateid)
>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>   		struct file *filp;
>>   
>>   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>> -				      &copy->stateid);
>> +					&copy->stateid);
>> +
>>   		if (IS_ERR(filp)) {
>>   			switch (PTR_ERR(filp)) {
>>   			case -EBADF:
>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>   			default:
>>   				nfserr = nfserr_offload_denied;
>>   			}
>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>> +			/* ss_mnt will be unmounted by the laundromat */
>>   			goto do_callback;
>>   		}
>>   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>   	if (async_copy)
>>   		cleanup_async_copy(async_copy);
>>   	status = nfserrno(-ENOMEM);
>> -	if (nfsd4_ssc_is_inter(copy))
>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>> +	/*
>> +	 * source's vfsmount of inter-copy will be unmounted
>> +	 * by the laundromat
>> +	 */
>>   	goto out;
>>   }
>>   
> This looks reasonable at first glance, but I have some concerns with the
> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> looks for an existing connection and bumps the ni->nsui_refcnt if it
> finds one.
>
> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> just does a bare mntput:
>
>          if (!nn) {
>                  mntput(ss_mnt);
>                  return;
>          }
> ...
>          if (!found) {
>                  mntput(ss_mnt);
>                  return;
>          }
>
> The first one looks bogus. Can net_generic return NULL? If so how, and
> why is it not a problem elsewhere in the kernel?

it looks like net_generic can not fail, no where else check for NULL
so I will remove this check.

>
> For the second case, if the ni is no longer on the list, where did the
> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> BUG_ON?

if ni is not found on the list then it's a bug somewhere so I will add
a BUG_ON on this.

Thanks,
-Dai

Jeff Layton Dec. 12, 2022, 1:40 p.m. UTC | #7

On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> On 12/12/22 4:22 AM, Jeff Layton wrote:
> > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > Problem caused by source's vfsmount being unmounted but remains
> > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > return errors.
> > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > for the laundromat to unmount when idle time expires.
> > > 
> > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > >   fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > >   1 file changed, 7 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 8beb2bc4c328..756e42cf0d01 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > >   	return status;
> > >   }
> > >   
> > > -static void
> > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > -{
> > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > -	mntput(ss_mnt);
> > > -}
> > > -
> > >   /*
> > >    * Verify COPY destination stateid.
> > >    *
> > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > >   {
> > >   }
> > >   
> > > -static void
> > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > -{
> > > -}
> > > -
> > >   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > >   				   struct nfs_fh *src_fh,
> > >   				   nfs4_stateid *stateid)
> > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > >   		struct file *filp;
> > >   
> > >   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > -				      &copy->stateid);
> > > +					&copy->stateid);
> > > +
> > >   		if (IS_ERR(filp)) {
> > >   			switch (PTR_ERR(filp)) {
> > >   			case -EBADF:
> > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > >   			default:
> > >   				nfserr = nfserr_offload_denied;
> > >   			}
> > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > +			/* ss_mnt will be unmounted by the laundromat */
> > >   			goto do_callback;
> > >   		}
> > >   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >   	if (async_copy)
> > >   		cleanup_async_copy(async_copy);
> > >   	status = nfserrno(-ENOMEM);
> > > -	if (nfsd4_ssc_is_inter(copy))
> > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > +	/*
> > > +	 * source's vfsmount of inter-copy will be unmounted
> > > +	 * by the laundromat
> > > +	 */
> > >   	goto out;
> > >   }
> > >   
> > This looks reasonable at first glance, but I have some concerns with the
> > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > finds one.
> > 
> > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > just does a bare mntput:
> > 
> >          if (!nn) {
> >                  mntput(ss_mnt);
> >                  return;
> >          }
> > ...
> >          if (!found) {
> >                  mntput(ss_mnt);
> >                  return;
> >          }
> > 
> > The first one looks bogus. Can net_generic return NULL? If so how, and
> > why is it not a problem elsewhere in the kernel?
> 
> it looks like net_generic can not fail, no where else check for NULL
> so I will remove this check.
> 
> > 
> > For the second case, if the ni is no longer on the list, where did the
> > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > BUG_ON?
> 
> if ni is not found on the list then it's a bug somewhere so I will add
> a BUG_ON on this.
> 

Probably better to just WARN_ON and let any references leak in that
case. A BUG_ON implies a panic in some environments, and it's best to
avoid that unless there really is no choice.

Xingyuan Mo Dec. 12, 2022, 1:44 p.m. UTC | #8

On Mon, Dec 12, 2022 at 7:59 PM Greg KH <greg@kroah.com> wrote:
> Personally, I would prefer if this gets into Linus's tree first so that
> we can get it into a stable release before letting distros know about
> it

I already let them know about this issue on 8th Dec, and they asked me if I
could share this patch, which is why I asked here.

> otherwise you are forcing me into a very tight schedule that might
> require you to tell the world about the problem _BEFORE_ the fix is in
> Linus's or any stable kernel trees.

Sorry for any inconvenience caused to you.

Regards,
Xingyuan Mo

Dai Ngo Dec. 12, 2022, 1:57 p.m. UTC | #9

On 12/12/22 5:40 AM, Jeff Layton wrote:
> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>> Problem caused by source's vfsmount being unmounted but remains
>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>> return errors.
>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>> for the laundromat to unmount when idle time expires.
>>>>
>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>    fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>    1 file changed, 7 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>    	return status;
>>>>    }
>>>>    
>>>> -static void
>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>> -{
>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>> -	mntput(ss_mnt);
>>>> -}
>>>> -
>>>>    /*
>>>>     * Verify COPY destination stateid.
>>>>     *
>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>    {
>>>>    }
>>>>    
>>>> -static void
>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>> -{
>>>> -}
>>>> -
>>>>    static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>    				   struct nfs_fh *src_fh,
>>>>    				   nfs4_stateid *stateid)
>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>    		struct file *filp;
>>>>    
>>>>    		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>> -				      &copy->stateid);
>>>> +					&copy->stateid);
>>>> +
>>>>    		if (IS_ERR(filp)) {
>>>>    			switch (PTR_ERR(filp)) {
>>>>    			case -EBADF:
>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>    			default:
>>>>    				nfserr = nfserr_offload_denied;
>>>>    			}
>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>    			goto do_callback;
>>>>    		}
>>>>    		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>    	if (async_copy)
>>>>    		cleanup_async_copy(async_copy);
>>>>    	status = nfserrno(-ENOMEM);
>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>> +	/*
>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>> +	 * by the laundromat
>>>> +	 */
>>>>    	goto out;
>>>>    }
>>>>    
>>> This looks reasonable at first glance, but I have some concerns with the
>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>> finds one.
>>>
>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>> just does a bare mntput:
>>>
>>>           if (!nn) {
>>>                   mntput(ss_mnt);
>>>                   return;
>>>           }
>>> ...
>>>           if (!found) {
>>>                   mntput(ss_mnt);
>>>                   return;
>>>           }
>>>
>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>> why is it not a problem elsewhere in the kernel?
>> it looks like net_generic can not fail, no where else check for NULL
>> so I will remove this check.
>>
>>> For the second case, if the ni is no longer on the list, where did the
>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>> BUG_ON?
>> if ni is not found on the list then it's a bug somewhere so I will add
>> a BUG_ON on this.
>>
> Probably better to just WARN_ON and let any references leak in that
> case. A BUG_ON implies a panic in some environments, and it's best to
> avoid that unless there really is no choice.

ok, thanks Jeff!

-Dai

Greg KH Dec. 12, 2022, 1:59 p.m. UTC | #10

On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > Problem caused by source's vfsmount being unmounted but remains
> > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > return errors.
> > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > for the laundromat to unmount when idle time expires.
> > > > 
> > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > ---
> > > >   fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > >   1 file changed, 7 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > >   	return status;
> > > >   }
> > > >   
> > > > -static void
> > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > -{
> > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > -	mntput(ss_mnt);
> > > > -}
> > > > -
> > > >   /*
> > > >    * Verify COPY destination stateid.
> > > >    *
> > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > >   {
> > > >   }
> > > >   
> > > > -static void
> > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > -{
> > > > -}
> > > > -
> > > >   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > >   				   struct nfs_fh *src_fh,
> > > >   				   nfs4_stateid *stateid)
> > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > >   		struct file *filp;
> > > >   
> > > >   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > -				      &copy->stateid);
> > > > +					&copy->stateid);
> > > > +
> > > >   		if (IS_ERR(filp)) {
> > > >   			switch (PTR_ERR(filp)) {
> > > >   			case -EBADF:
> > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > >   			default:
> > > >   				nfserr = nfserr_offload_denied;
> > > >   			}
> > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > >   			goto do_callback;
> > > >   		}
> > > >   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >   	if (async_copy)
> > > >   		cleanup_async_copy(async_copy);
> > > >   	status = nfserrno(-ENOMEM);
> > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > +	/*
> > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > +	 * by the laundromat
> > > > +	 */
> > > >   	goto out;
> > > >   }
> > > >   
> > > This looks reasonable at first glance, but I have some concerns with the
> > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > finds one.
> > > 
> > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > just does a bare mntput:
> > > 
> > >          if (!nn) {
> > >                  mntput(ss_mnt);
> > >                  return;
> > >          }
> > > ...
> > >          if (!found) {
> > >                  mntput(ss_mnt);
> > >                  return;
> > >          }
> > > 
> > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > why is it not a problem elsewhere in the kernel?
> > 
> > it looks like net_generic can not fail, no where else check for NULL
> > so I will remove this check.
> > 
> > > 
> > > For the second case, if the ni is no longer on the list, where did the
> > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > BUG_ON?
> > 
> > if ni is not found on the list then it's a bug somewhere so I will add
> > a BUG_ON on this.
> > 
> 
> Probably better to just WARN_ON and let any references leak in that
> case. A BUG_ON implies a panic in some environments, and it's best to
> avoid that unless there really is no choice.

WARN_ON also causes machines to boot that have panic_on_warn enabled.
Why not just handle the error and keep going?  Why panic at all?

thanks,

greg k-h

Jeff Layton Dec. 12, 2022, 2:31 p.m. UTC | #11

On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > return errors.
> > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > for the laundromat to unmount when idle time expires.
> > > > > 
> > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > ---
> > > > >   fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > >   1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > >   	return status;
> > > > >   }
> > > > >   
> > > > > -static void
> > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > -{
> > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > -	mntput(ss_mnt);
> > > > > -}
> > > > > -
> > > > >   /*
> > > > >    * Verify COPY destination stateid.
> > > > >    *
> > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > >   {
> > > > >   }
> > > > >   
> > > > > -static void
> > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > -{
> > > > > -}
> > > > > -
> > > > >   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > >   				   struct nfs_fh *src_fh,
> > > > >   				   nfs4_stateid *stateid)
> > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > >   		struct file *filp;
> > > > >   
> > > > >   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > -				      &copy->stateid);
> > > > > +					&copy->stateid);
> > > > > +
> > > > >   		if (IS_ERR(filp)) {
> > > > >   			switch (PTR_ERR(filp)) {
> > > > >   			case -EBADF:
> > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > >   			default:
> > > > >   				nfserr = nfserr_offload_denied;
> > > > >   			}
> > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > >   			goto do_callback;
> > > > >   		}
> > > > >   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >   	if (async_copy)
> > > > >   		cleanup_async_copy(async_copy);
> > > > >   	status = nfserrno(-ENOMEM);
> > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > +	/*
> > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > +	 * by the laundromat
> > > > > +	 */
> > > > >   	goto out;
> > > > >   }
> > > > >   
> > > > This looks reasonable at first glance, but I have some concerns with the
> > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > finds one.
> > > > 
> > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > just does a bare mntput:
> > > > 
> > > >          if (!nn) {
> > > >                  mntput(ss_mnt);
> > > >                  return;
> > > >          }
> > > > ...
> > > >          if (!found) {
> > > >                  mntput(ss_mnt);
> > > >                  return;
> > > >          }
> > > > 
> > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > why is it not a problem elsewhere in the kernel?
> > > 
> > > it looks like net_generic can not fail, no where else check for NULL
> > > so I will remove this check.
> > > 
> > > > 
> > > > For the second case, if the ni is no longer on the list, where did the
> > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > BUG_ON?
> > > 
> > > if ni is not found on the list then it's a bug somewhere so I will add
> > > a BUG_ON on this.
> > > 
> > 
> > Probably better to just WARN_ON and let any references leak in that
> > case. A BUG_ON implies a panic in some environments, and it's best to
> > avoid that unless there really is no choice.
> 
> WARN_ON also causes machines to boot that have panic_on_warn enabled.
> Why not just handle the error and keep going?  Why panic at all?
> 

Who the hell sets panic_on_warn (outside of testing environments)? I'm
suggesting a WARN_ON because not finding an entry at this point
represents a bug that we'd want reported.

The caller should hold a reference to the object that holds a vfsmount
reference. It relies on that vfsmount to do a copy. If it's gone at this
point where we're releasing that reference, then we're looking at a
refcounting bug of some sort.

I would expect anyone who sets panic_on_warn to _desire_ a panic in this
situation. After all, they asked for it. Presumably they want it to do
some coredump analysis or something?

It is debatable whether the stack trace at this point would be helpful
though, so you might consider a pr_warn or something less log-spammy.

Greg KH Dec. 12, 2022, 5:14 p.m. UTC | #12

On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> > On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > > return errors.
> > > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > > for the laundromat to unmount when idle time expires.
> > > > > > 
> > > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > ---
> > > > > >   fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > > >   1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > > >   	return status;
> > > > > >   }
> > > > > >   
> > > > > > -static void
> > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > -{
> > > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > > -	mntput(ss_mnt);
> > > > > > -}
> > > > > > -
> > > > > >   /*
> > > > > >    * Verify COPY destination stateid.
> > > > > >    *
> > > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > > >   {
> > > > > >   }
> > > > > >   
> > > > > > -static void
> > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > -{
> > > > > > -}
> > > > > > -
> > > > > >   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > > >   				   struct nfs_fh *src_fh,
> > > > > >   				   nfs4_stateid *stateid)
> > > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > > >   		struct file *filp;
> > > > > >   
> > > > > >   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > > -				      &copy->stateid);
> > > > > > +					&copy->stateid);
> > > > > > +
> > > > > >   		if (IS_ERR(filp)) {
> > > > > >   			switch (PTR_ERR(filp)) {
> > > > > >   			case -EBADF:
> > > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > > >   			default:
> > > > > >   				nfserr = nfserr_offload_denied;
> > > > > >   			}
> > > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > > >   			goto do_callback;
> > > > > >   		}
> > > > > >   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > >   	if (async_copy)
> > > > > >   		cleanup_async_copy(async_copy);
> > > > > >   	status = nfserrno(-ENOMEM);
> > > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > +	/*
> > > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > > +	 * by the laundromat
> > > > > > +	 */
> > > > > >   	goto out;
> > > > > >   }
> > > > > >   
> > > > > This looks reasonable at first glance, but I have some concerns with the
> > > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > > finds one.
> > > > > 
> > > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > > just does a bare mntput:
> > > > > 
> > > > >          if (!nn) {
> > > > >                  mntput(ss_mnt);
> > > > >                  return;
> > > > >          }
> > > > > ...
> > > > >          if (!found) {
> > > > >                  mntput(ss_mnt);
> > > > >                  return;
> > > > >          }
> > > > > 
> > > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > > why is it not a problem elsewhere in the kernel?
> > > > 
> > > > it looks like net_generic can not fail, no where else check for NULL
> > > > so I will remove this check.
> > > > 
> > > > > 
> > > > > For the second case, if the ni is no longer on the list, where did the
> > > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > > BUG_ON?
> > > > 
> > > > if ni is not found on the list then it's a bug somewhere so I will add
> > > > a BUG_ON on this.
> > > > 
> > > 
> > > Probably better to just WARN_ON and let any references leak in that
> > > case. A BUG_ON implies a panic in some environments, and it's best to
> > > avoid that unless there really is no choice.
> > 
> > WARN_ON also causes machines to boot that have panic_on_warn enabled.
> > Why not just handle the error and keep going?  Why panic at all?
> > 
> 
> Who the hell sets panic_on_warn (outside of testing environments)?

All cloud providers and anyone else that wants to "kill the system that
had a problem and have it reboot fast" in order to keep things working
overall.

> I'm
> suggesting a WARN_ON because not finding an entry at this point
> represents a bug that we'd want reported.

Your call, but we are generally discouraging adding new WARN_ON() for
anything that userspace could ever trigger.  And if userspace can't
trigger it, then it's a normal type of error that you need to handle
anyway, right?

Anyway, your call, just letting you know.

> The caller should hold a reference to the object that holds a vfsmount
> reference. It relies on that vfsmount to do a copy. If it's gone at this
> point where we're releasing that reference, then we're looking at a
> refcounting bug of some sort.

refcounting in the nfsd code, or outside of that?

> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
> situation. After all, they asked for it. Presumably they want it to do
> some coredump analysis or something?
> 
> It is debatable whether the stack trace at this point would be helpful
> though, so you might consider a pr_warn or something less log-spammy.

If you can recover from it, then yeah, pr_warn() is usually best.

thanks,

greg k-h

Jeff Layton Dec. 12, 2022, 5:44 p.m. UTC | #13

On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
> > On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> > > On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > > > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > > > return errors.
> > > > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > > > for the laundromat to unmount when idle time expires.
> > > > > > > 
> > > > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > ---
> > > > > > >   fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > > > >   1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > > > >   	return status;
> > > > > > >   }
> > > > > > >   
> > > > > > > -static void
> > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > -{
> > > > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > > > -	mntput(ss_mnt);
> > > > > > > -}
> > > > > > > -
> > > > > > >   /*
> > > > > > >    * Verify COPY destination stateid.
> > > > > > >    *
> > > > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > > > >   {
> > > > > > >   }
> > > > > > >   
> > > > > > > -static void
> > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > -{
> > > > > > > -}
> > > > > > > -
> > > > > > >   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > > > >   				   struct nfs_fh *src_fh,
> > > > > > >   				   nfs4_stateid *stateid)
> > > > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > >   		struct file *filp;
> > > > > > >   
> > > > > > >   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > > > -				      &copy->stateid);
> > > > > > > +					&copy->stateid);
> > > > > > > +
> > > > > > >   		if (IS_ERR(filp)) {
> > > > > > >   			switch (PTR_ERR(filp)) {
> > > > > > >   			case -EBADF:
> > > > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > >   			default:
> > > > > > >   				nfserr = nfserr_offload_denied;
> > > > > > >   			}
> > > > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > > > >   			goto do_callback;
> > > > > > >   		}
> > > > > > >   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > >   	if (async_copy)
> > > > > > >   		cleanup_async_copy(async_copy);
> > > > > > >   	status = nfserrno(-ENOMEM);
> > > > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > +	/*
> > > > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > > > +	 * by the laundromat
> > > > > > > +	 */
> > > > > > >   	goto out;
> > > > > > >   }
> > > > > > >   
> > > > > > This looks reasonable at first glance, but I have some concerns with the
> > > > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > > > finds one.
> > > > > > 
> > > > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > > > just does a bare mntput:
> > > > > > 
> > > > > >          if (!nn) {
> > > > > >                  mntput(ss_mnt);
> > > > > >                  return;
> > > > > >          }
> > > > > > ...
> > > > > >          if (!found) {
> > > > > >                  mntput(ss_mnt);
> > > > > >                  return;
> > > > > >          }
> > > > > > 
> > > > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > > > why is it not a problem elsewhere in the kernel?
> > > > > 
> > > > > it looks like net_generic can not fail, no where else check for NULL
> > > > > so I will remove this check.
> > > > > 
> > > > > > 
> > > > > > For the second case, if the ni is no longer on the list, where did the
> > > > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > > > BUG_ON?
> > > > > 
> > > > > if ni is not found on the list then it's a bug somewhere so I will add
> > > > > a BUG_ON on this.
> > > > > 
> > > > 
> > > > Probably better to just WARN_ON and let any references leak in that
> > > > case. A BUG_ON implies a panic in some environments, and it's best to
> > > > avoid that unless there really is no choice.
> > > 
> > > WARN_ON also causes machines to boot that have panic_on_warn enabled.
> > > Why not just handle the error and keep going?  Why panic at all?
> > > 
> > 
> > Who the hell sets panic_on_warn (outside of testing environments)?
> 
> All cloud providers and anyone else that wants to "kill the system that
> had a problem and have it reboot fast" in order to keep things working
> overall.
> 

If that's the case, then this situation would probably be one where a
cloud provider would want to crash it and come back. NFS grace periods
can suck though.

> > I'm
> > suggesting a WARN_ON because not finding an entry at this point
> > represents a bug that we'd want reported.
> 
> Your call, but we are generally discouraging adding new WARN_ON() for
> anything that userspace could ever trigger.  And if userspace can't
> trigger it, then it's a normal type of error that you need to handle
> anyway, right?
> 
> Anyway, your call, just letting you know.
> 

Understood.

> > The caller should hold a reference to the object that holds a vfsmount
> > reference. It relies on that vfsmount to do a copy. If it's gone at this
> > point where we're releasing that reference, then we're looking at a
> > refcounting bug of some sort.
> 
> refcounting in the nfsd code, or outside of that?
> 

It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
server copy is quite the tenuous house of cards. ;)

> > I would expect anyone who sets panic_on_warn to _desire_ a panic in this
> > situation. After all, they asked for it. Presumably they want it to do
> > some coredump analysis or something?
> > 
> > It is debatable whether the stack trace at this point would be helpful
> > though, so you might consider a pr_warn or something less log-spammy.
> 
> If you can recover from it, then yeah, pr_warn() is usually best.
> 

It does look like Dai went with pr_warn on his v2 patch.

We'd "recover" by leaking a vfsmount reference. The immediate crash
would be avoided, but it might make for interesting "fun" later when you
went to try and unmount the thing.

Hopefully, we're just being paranoid here and this situation never
happens.

Chuck Lever Dec. 12, 2022, 6:16 p.m. UTC | #14

> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>> return errors.
>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>> 
>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>> ---
>>>>>>>>  fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>  	return status;
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> -static void
>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>> -{
>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>> -	mntput(ss_mnt);
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>  /*
>>>>>>>>   * Verify COPY destination stateid.
>>>>>>>>   *
>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>  {
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> -static void
>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>> -{
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>  				   struct nfs_fh *src_fh,
>>>>>>>>  				   nfs4_stateid *stateid)
>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>  		struct file *filp;
>>>>>>>> 
>>>>>>>>  		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>> -				      &copy->stateid);
>>>>>>>> +					&copy->stateid);
>>>>>>>> +
>>>>>>>>  		if (IS_ERR(filp)) {
>>>>>>>>  			switch (PTR_ERR(filp)) {
>>>>>>>>  			case -EBADF:
>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>  			default:
>>>>>>>>  				nfserr = nfserr_offload_denied;
>>>>>>>>  			}
>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>  			goto do_callback;
>>>>>>>>  		}
>>>>>>>>  		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>  	if (async_copy)
>>>>>>>>  		cleanup_async_copy(async_copy);
>>>>>>>>  	status = nfserrno(-ENOMEM);
>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>> +	/*
>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>> +	 * by the laundromat
>>>>>>>> +	 */
>>>>>>>>  	goto out;
>>>>>>>>  }
>>>>>>>> 
>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>> finds one.
>>>>>>> 
>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>> just does a bare mntput:
>>>>>>> 
>>>>>>>         if (!nn) {
>>>>>>>                 mntput(ss_mnt);
>>>>>>>                 return;
>>>>>>>         }
>>>>>>> ...
>>>>>>>         if (!found) {
>>>>>>>                 mntput(ss_mnt);
>>>>>>>                 return;
>>>>>>>         }
>>>>>>> 
>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>> 
>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>> so I will remove this check.
>>>>>> 
>>>>>>> 
>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>> BUG_ON?
>>>>>> 
>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>> a BUG_ON on this.
>>>>>> 
>>>>> 
>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>> avoid that unless there really is no choice.
>>>> 
>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>> Why not just handle the error and keep going?  Why panic at all?
>>>> 
>>> 
>>> Who the hell sets panic_on_warn (outside of testing environments)?
>> 
>> All cloud providers and anyone else that wants to "kill the system that
>> had a problem and have it reboot fast" in order to keep things working
>> overall.
>> 
> 
> If that's the case, then this situation would probably be one where a
> cloud provider would want to crash it and come back. NFS grace periods
> can suck though.
> 
>>> I'm
>>> suggesting a WARN_ON because not finding an entry at this point
>>> represents a bug that we'd want reported.
>> 
>> Your call, but we are generally discouraging adding new WARN_ON() for
>> anything that userspace could ever trigger.  And if userspace can't
>> trigger it, then it's a normal type of error that you need to handle
>> anyway, right?
>> 
>> Anyway, your call, just letting you know.
>> 
> 
> Understood.
> 
>>> The caller should hold a reference to the object that holds a vfsmount
>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>> point where we're releasing that reference, then we're looking at a
>>> refcounting bug of some sort.
>> 
>> refcounting in the nfsd code, or outside of that?
>> 
> 
> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
> server copy is quite the tenuous house of cards. ;)
> 
>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>> situation. After all, they asked for it. Presumably they want it to do
>>> some coredump analysis or something?
>>> 
>>> It is debatable whether the stack trace at this point would be helpful
>>> though, so you might consider a pr_warn or something less log-spammy.
>> 
>> If you can recover from it, then yeah, pr_warn() is usually best.
>> 
> 
> It does look like Dai went with pr_warn on his v2 patch.
> 
> We'd "recover" by leaking a vfsmount reference. The immediate crash
> would be avoided, but it might make for interesting "fun" later when you
> went to try and unmount the thing.

This is a red flag for me. If the leak prevents the system from
shutting down reliably, then we need to do something more than
a pr_warn(), I would think.


--
Chuck Lever

Jeff Layton Dec. 12, 2022, 6:38 p.m. UTC | #15

On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
> 
> > On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
> > > On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
> > > > On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> > > > > On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > > > > > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > > > > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > > > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > > > > > return errors.
> > > > > > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > > > > > for the laundromat to unmount when idle time expires.
> > > > > > > > > 
> > > > > > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > > > ---
> > > > > > > > >  fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > > > > > >  1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > > > > > >  	return status;
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > > -static void
> > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > -{
> > > > > > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > > > > > -	mntput(ss_mnt);
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >  /*
> > > > > > > > >   * Verify COPY destination stateid.
> > > > > > > > >   *
> > > > > > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > > > > > >  {
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > > -static void
> > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > -{
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > > > > > >  				   struct nfs_fh *src_fh,
> > > > > > > > >  				   nfs4_stateid *stateid)
> > > > > > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > >  		struct file *filp;
> > > > > > > > > 
> > > > > > > > >  		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > > > > > -				      &copy->stateid);
> > > > > > > > > +					&copy->stateid);
> > > > > > > > > +
> > > > > > > > >  		if (IS_ERR(filp)) {
> > > > > > > > >  			switch (PTR_ERR(filp)) {
> > > > > > > > >  			case -EBADF:
> > > > > > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > >  			default:
> > > > > > > > >  				nfserr = nfserr_offload_denied;
> > > > > > > > >  			}
> > > > > > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > > > > > >  			goto do_callback;
> > > > > > > > >  		}
> > > > > > > > >  		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > > > >  	if (async_copy)
> > > > > > > > >  		cleanup_async_copy(async_copy);
> > > > > > > > >  	status = nfserrno(-ENOMEM);
> > > > > > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > +	/*
> > > > > > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > > > > > +	 * by the laundromat
> > > > > > > > > +	 */
> > > > > > > > >  	goto out;
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > This looks reasonable at first glance, but I have some concerns with the
> > > > > > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > > > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > > > > > finds one.
> > > > > > > > 
> > > > > > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > > > > > just does a bare mntput:
> > > > > > > > 
> > > > > > > >         if (!nn) {
> > > > > > > >                 mntput(ss_mnt);
> > > > > > > >                 return;
> > > > > > > >         }
> > > > > > > > ...
> > > > > > > >         if (!found) {
> > > > > > > >                 mntput(ss_mnt);
> > > > > > > >                 return;
> > > > > > > >         }
> > > > > > > > 
> > > > > > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > > > > > why is it not a problem elsewhere in the kernel?
> > > > > > > 
> > > > > > > it looks like net_generic can not fail, no where else check for NULL
> > > > > > > so I will remove this check.
> > > > > > > 
> > > > > > > > 
> > > > > > > > For the second case, if the ni is no longer on the list, where did the
> > > > > > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > > > > > BUG_ON?
> > > > > > > 
> > > > > > > if ni is not found on the list then it's a bug somewhere so I will add
> > > > > > > a BUG_ON on this.
> > > > > > > 
> > > > > > 
> > > > > > Probably better to just WARN_ON and let any references leak in that
> > > > > > case. A BUG_ON implies a panic in some environments, and it's best to
> > > > > > avoid that unless there really is no choice.
> > > > > 
> > > > > WARN_ON also causes machines to boot that have panic_on_warn enabled.
> > > > > Why not just handle the error and keep going?  Why panic at all?
> > > > > 
> > > > 
> > > > Who the hell sets panic_on_warn (outside of testing environments)?
> > > 
> > > All cloud providers and anyone else that wants to "kill the system that
> > > had a problem and have it reboot fast" in order to keep things working
> > > overall.
> > > 
> > 
> > If that's the case, then this situation would probably be one where a
> > cloud provider would want to crash it and come back. NFS grace periods
> > can suck though.
> > 
> > > > I'm
> > > > suggesting a WARN_ON because not finding an entry at this point
> > > > represents a bug that we'd want reported.
> > > 
> > > Your call, but we are generally discouraging adding new WARN_ON() for
> > > anything that userspace could ever trigger.  And if userspace can't
> > > trigger it, then it's a normal type of error that you need to handle
> > > anyway, right?
> > > 
> > > Anyway, your call, just letting you know.
> > > 
> > 
> > Understood.
> > 
> > > > The caller should hold a reference to the object that holds a vfsmount
> > > > reference. It relies on that vfsmount to do a copy. If it's gone at this
> > > > point where we're releasing that reference, then we're looking at a
> > > > refcounting bug of some sort.
> > > 
> > > refcounting in the nfsd code, or outside of that?
> > > 
> > 
> > It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
> > server copy is quite the tenuous house of cards. ;)
> > 
> > > > I would expect anyone who sets panic_on_warn to _desire_ a panic in this
> > > > situation. After all, they asked for it. Presumably they want it to do
> > > > some coredump analysis or something?
> > > > 
> > > > It is debatable whether the stack trace at this point would be helpful
> > > > though, so you might consider a pr_warn or something less log-spammy.
> > > 
> > > If you can recover from it, then yeah, pr_warn() is usually best.
> > > 
> > 
> > It does look like Dai went with pr_warn on his v2 patch.
> > 
> > We'd "recover" by leaking a vfsmount reference. The immediate crash
> > would be avoided, but it might make for interesting "fun" later when you
> > went to try and unmount the thing.
> 
> This is a red flag for me. If the leak prevents the system from
> shutting down reliably, then we need to do something more than
> a pr_warn(), I would think.
> 

Sorry, I should correct myself.

We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
longer on the list, then presumably it has already been cleaned up and
the vfsmount reference put.

It's still a bug though since we _should_ still have a reference to the
nfsd4_ssc_umount_item at this point. So this is really just a potential
use-after-free.

FWIW, the object handling here is somewhat weird as the copy operation
holds a reference to the nfsd4_ssc_umount_item but passes around a
pointer to the vfsmount

I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.

Dai Ngo Dec. 12, 2022, 7:16 p.m. UTC | #16

On 12/12/22 10:38 AM, Jeff Layton wrote:
> On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
>>> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>
>>> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>>>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>>>> return errors.
>>>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>>>>
>>>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>>>> ---
>>>>>>>>>>   fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>>>   1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>>>   	return status;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> -static void
>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>> -{
>>>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>>>> -	mntput(ss_mnt);
>>>>>>>>>> -}
>>>>>>>>>> -
>>>>>>>>>>   /*
>>>>>>>>>>    * Verify COPY destination stateid.
>>>>>>>>>>    *
>>>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>>>   {
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> -static void
>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>> -{
>>>>>>>>>> -}
>>>>>>>>>> -
>>>>>>>>>>   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>>>   				   struct nfs_fh *src_fh,
>>>>>>>>>>   				   nfs4_stateid *stateid)
>>>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>   		struct file *filp;
>>>>>>>>>>
>>>>>>>>>>   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>>>> -				      &copy->stateid);
>>>>>>>>>> +					&copy->stateid);
>>>>>>>>>> +
>>>>>>>>>>   		if (IS_ERR(filp)) {
>>>>>>>>>>   			switch (PTR_ERR(filp)) {
>>>>>>>>>>   			case -EBADF:
>>>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>   			default:
>>>>>>>>>>   				nfserr = nfserr_offload_denied;
>>>>>>>>>>   			}
>>>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>>>   			goto do_callback;
>>>>>>>>>>   		}
>>>>>>>>>>   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>>>   	if (async_copy)
>>>>>>>>>>   		cleanup_async_copy(async_copy);
>>>>>>>>>>   	status = nfserrno(-ENOMEM);
>>>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>> +	/*
>>>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>>>> +	 * by the laundromat
>>>>>>>>>> +	 */
>>>>>>>>>>   	goto out;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>>>> finds one.
>>>>>>>>>
>>>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>>>> just does a bare mntput:
>>>>>>>>>
>>>>>>>>>          if (!nn) {
>>>>>>>>>                  mntput(ss_mnt);
>>>>>>>>>                  return;
>>>>>>>>>          }
>>>>>>>>> ...
>>>>>>>>>          if (!found) {
>>>>>>>>>                  mntput(ss_mnt);
>>>>>>>>>                  return;
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>>>> so I will remove this check.
>>>>>>>>
>>>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>>>> BUG_ON?
>>>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>>>> a BUG_ON on this.
>>>>>>>>
>>>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>>>> avoid that unless there really is no choice.
>>>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>>>> Why not just handle the error and keep going?  Why panic at all?
>>>>>>
>>>>> Who the hell sets panic_on_warn (outside of testing environments)?
>>>> All cloud providers and anyone else that wants to "kill the system that
>>>> had a problem and have it reboot fast" in order to keep things working
>>>> overall.
>>>>
>>> If that's the case, then this situation would probably be one where a
>>> cloud provider would want to crash it and come back. NFS grace periods
>>> can suck though.
>>>
>>>>> I'm
>>>>> suggesting a WARN_ON because not finding an entry at this point
>>>>> represents a bug that we'd want reported.
>>>> Your call, but we are generally discouraging adding new WARN_ON() for
>>>> anything that userspace could ever trigger.  And if userspace can't
>>>> trigger it, then it's a normal type of error that you need to handle
>>>> anyway, right?
>>>>
>>>> Anyway, your call, just letting you know.
>>>>
>>> Understood.
>>>
>>>>> The caller should hold a reference to the object that holds a vfsmount
>>>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>>>> point where we're releasing that reference, then we're looking at a
>>>>> refcounting bug of some sort.
>>>> refcounting in the nfsd code, or outside of that?
>>>>
>>> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
>>> server copy is quite the tenuous house of cards. ;)
>>>
>>>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>>>> situation. After all, they asked for it. Presumably they want it to do
>>>>> some coredump analysis or something?
>>>>>
>>>>> It is debatable whether the stack trace at this point would be helpful
>>>>> though, so you might consider a pr_warn or something less log-spammy.
>>>> If you can recover from it, then yeah, pr_warn() is usually best.
>>>>
>>> It does look like Dai went with pr_warn on his v2 patch.
>>>
>>> We'd "recover" by leaking a vfsmount reference. The immediate crash
>>> would be avoided, but it might make for interesting "fun" later when you
>>> went to try and unmount the thing.
>> This is a red flag for me. If the leak prevents the system from
>> shutting down reliably, then we need to do something more than
>> a pr_warn(), I would think.
>>
> Sorry, I should correct myself.
>
> We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
> longer on the list, then presumably it has already been cleaned up and
> the vfsmount reference put.

I think the issue here is not vfsmount reference count. The issue is that
we could not find a nfsd4_ssc_umount_item on the list that matches the
vfsmount ss_mnt. So the question is what should we do in this case?

Prior to this patch, when we hit this scenario we just go ahead and
unmount the ss_mnt there since it won't be unmounted by the laundromat
(it's not on the delayed unmount list).

With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.

I'd prefer to go back to the previous code to do the unmount and also
do a pr_warn.

> It's still a bug though since we _should_ still have a reference to the
> nfsd4_ssc_umount_item at this point. So this is really just a potential
> use-after-free.

The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
but we just can't find it on the list. Even though the possibility for
this to happen is from slim to none, we still have to check for it.

> FWIW, the object handling here is somewhat weird as the copy operation
> holds a reference to the nfsd4_ssc_umount_item but passes around a
> pointer to the vfsmount
>
> I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
> back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
> nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.

Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
would be better.  We won't have to deal with the situation where we can't
find an item on the list (even though it almost never happen).

Can we do this enhancement after fixing this use-after-free problem, in
a separate patch series?


-Dai

Chuck Lever Dec. 12, 2022, 7:28 p.m. UTC | #17

> On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> 
> On 12/12/22 10:38 AM, Jeff Layton wrote:
>> On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
>>>> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>> 
>>>> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>>>>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>>>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>>>>> return errors.
>>>>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>>>>> 
>>>>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>>>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>>>>  	return status;
>>>>>>>>>>>  }
>>>>>>>>>>> 
>>>>>>>>>>> -static void
>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>> -{
>>>>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>>>>> -	mntput(ss_mnt);
>>>>>>>>>>> -}
>>>>>>>>>>> -
>>>>>>>>>>>  /*
>>>>>>>>>>>   * Verify COPY destination stateid.
>>>>>>>>>>>   *
>>>>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>>>>  {
>>>>>>>>>>>  }
>>>>>>>>>>> 
>>>>>>>>>>> -static void
>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>> -{
>>>>>>>>>>> -}
>>>>>>>>>>> -
>>>>>>>>>>>  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>>>>  				   struct nfs_fh *src_fh,
>>>>>>>>>>>  				   nfs4_stateid *stateid)
>>>>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>  		struct file *filp;
>>>>>>>>>>> 
>>>>>>>>>>>  		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>>>>> -				      &copy->stateid);
>>>>>>>>>>> +					&copy->stateid);
>>>>>>>>>>> +
>>>>>>>>>>>  		if (IS_ERR(filp)) {
>>>>>>>>>>>  			switch (PTR_ERR(filp)) {
>>>>>>>>>>>  			case -EBADF:
>>>>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>  			default:
>>>>>>>>>>>  				nfserr = nfserr_offload_denied;
>>>>>>>>>>>  			}
>>>>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>>>>  			goto do_callback;
>>>>>>>>>>>  		}
>>>>>>>>>>>  		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>>>>  	if (async_copy)
>>>>>>>>>>>  		cleanup_async_copy(async_copy);
>>>>>>>>>>>  	status = nfserrno(-ENOMEM);
>>>>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>> +	/*
>>>>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>>>>> +	 * by the laundromat
>>>>>>>>>>> +	 */
>>>>>>>>>>>  	goto out;
>>>>>>>>>>>  }
>>>>>>>>>>> 
>>>>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>>>>> finds one.
>>>>>>>>>> 
>>>>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>>>>> just does a bare mntput:
>>>>>>>>>> 
>>>>>>>>>>         if (!nn) {
>>>>>>>>>>                 mntput(ss_mnt);
>>>>>>>>>>                 return;
>>>>>>>>>>         }
>>>>>>>>>> ...
>>>>>>>>>>         if (!found) {
>>>>>>>>>>                 mntput(ss_mnt);
>>>>>>>>>>                 return;
>>>>>>>>>>         }
>>>>>>>>>> 
>>>>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>>>>> so I will remove this check.
>>>>>>>>> 
>>>>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>>>>> BUG_ON?
>>>>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>>>>> a BUG_ON on this.
>>>>>>>>> 
>>>>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>>>>> avoid that unless there really is no choice.
>>>>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>>>>> Why not just handle the error and keep going?  Why panic at all?
>>>>>>> 
>>>>>> Who the hell sets panic_on_warn (outside of testing environments)?
>>>>> All cloud providers and anyone else that wants to "kill the system that
>>>>> had a problem and have it reboot fast" in order to keep things working
>>>>> overall.
>>>>> 
>>>> If that's the case, then this situation would probably be one where a
>>>> cloud provider would want to crash it and come back. NFS grace periods
>>>> can suck though.
>>>> 
>>>>>> I'm
>>>>>> suggesting a WARN_ON because not finding an entry at this point
>>>>>> represents a bug that we'd want reported.
>>>>> Your call, but we are generally discouraging adding new WARN_ON() for
>>>>> anything that userspace could ever trigger.  And if userspace can't
>>>>> trigger it, then it's a normal type of error that you need to handle
>>>>> anyway, right?
>>>>> 
>>>>> Anyway, your call, just letting you know.
>>>>> 
>>>> Understood.
>>>> 
>>>>>> The caller should hold a reference to the object that holds a vfsmount
>>>>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>>>>> point where we're releasing that reference, then we're looking at a
>>>>>> refcounting bug of some sort.
>>>>> refcounting in the nfsd code, or outside of that?
>>>>> 
>>>> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
>>>> server copy is quite the tenuous house of cards. ;)
>>>> 
>>>>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>>>>> situation. After all, they asked for it. Presumably they want it to do
>>>>>> some coredump analysis or something?
>>>>>> 
>>>>>> It is debatable whether the stack trace at this point would be helpful
>>>>>> though, so you might consider a pr_warn or something less log-spammy.
>>>>> If you can recover from it, then yeah, pr_warn() is usually best.
>>>>> 
>>>> It does look like Dai went with pr_warn on his v2 patch.
>>>> 
>>>> We'd "recover" by leaking a vfsmount reference. The immediate crash
>>>> would be avoided, but it might make for interesting "fun" later when you
>>>> went to try and unmount the thing.
>>> This is a red flag for me. If the leak prevents the system from
>>> shutting down reliably, then we need to do something more than
>>> a pr_warn(), I would think.
>>> 
>> Sorry, I should correct myself.
>> 
>> We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
>> longer on the list, then presumably it has already been cleaned up and
>> the vfsmount reference put.
> 
> I think the issue here is not vfsmount reference count. The issue is that
> we could not find a nfsd4_ssc_umount_item on the list that matches the
> vfsmount ss_mnt. So the question is what should we do in this case?
> 
> Prior to this patch, when we hit this scenario we just go ahead and
> unmount the ss_mnt there since it won't be unmounted by the laundromat
> (it's not on the delayed unmount list).
> 
> With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
> 
> I'd prefer to go back to the previous code to do the unmount and also
> do a pr_warn.
> 
>> It's still a bug though since we _should_ still have a reference to the
>> nfsd4_ssc_umount_item at this point. So this is really just a potential
>> use-after-free.
> 
> The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
> but we just can't find it on the list. Even though the possibility for
> this to happen is from slim to none, we still have to check for it.
> 
>> FWIW, the object handling here is somewhat weird as the copy operation
>> holds a reference to the nfsd4_ssc_umount_item but passes around a
>> pointer to the vfsmount
>> 
>> I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
>> back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
>> nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
> 
> Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
> would be better.  We won't have to deal with the situation where we can't
> find an item on the list (even though it almost never happen).
> 
> Can we do this enhancement after fixing this use-after-free problem, in
> a separate patch series?

Is there a reason not fix it correctly now?

I'd rather not merge a fix that leaves the possibility of a leak.

--
Chuck Lever

Dai Ngo Dec. 12, 2022, 7:45 p.m. UTC | #18

On 12/12/22 11:28 AM, Chuck Lever III wrote:
>
>> On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>
>>
>> On 12/12/22 10:38 AM, Jeff Layton wrote:
>>> On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
>>>>> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>
>>>>> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>>>>>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>>>>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>>>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>>>>>> return errors.
>>>>>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>>>>>>
>>>>>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>   fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>>>>>   1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>>>>>   	return status;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> -static void
>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>> -{
>>>>>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>>>>>> -	mntput(ss_mnt);
>>>>>>>>>>>> -}
>>>>>>>>>>>> -
>>>>>>>>>>>>   /*
>>>>>>>>>>>>    * Verify COPY destination stateid.
>>>>>>>>>>>>    *
>>>>>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>>>>>   {
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> -static void
>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>> -{
>>>>>>>>>>>> -}
>>>>>>>>>>>> -
>>>>>>>>>>>>   static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>>>>>   				   struct nfs_fh *src_fh,
>>>>>>>>>>>>   				   nfs4_stateid *stateid)
>>>>>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>   		struct file *filp;
>>>>>>>>>>>>
>>>>>>>>>>>>   		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>>>>>> -				      &copy->stateid);
>>>>>>>>>>>> +					&copy->stateid);
>>>>>>>>>>>> +
>>>>>>>>>>>>   		if (IS_ERR(filp)) {
>>>>>>>>>>>>   			switch (PTR_ERR(filp)) {
>>>>>>>>>>>>   			case -EBADF:
>>>>>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>   			default:
>>>>>>>>>>>>   				nfserr = nfserr_offload_denied;
>>>>>>>>>>>>   			}
>>>>>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>>>>>   			goto do_callback;
>>>>>>>>>>>>   		}
>>>>>>>>>>>>   		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>>>>>   	if (async_copy)
>>>>>>>>>>>>   		cleanup_async_copy(async_copy);
>>>>>>>>>>>>   	status = nfserrno(-ENOMEM);
>>>>>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>> +	/*
>>>>>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>>>>>> +	 * by the laundromat
>>>>>>>>>>>> +	 */
>>>>>>>>>>>>   	goto out;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>>>>>> finds one.
>>>>>>>>>>>
>>>>>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>>>>>> just does a bare mntput:
>>>>>>>>>>>
>>>>>>>>>>>          if (!nn) {
>>>>>>>>>>>                  mntput(ss_mnt);
>>>>>>>>>>>                  return;
>>>>>>>>>>>          }
>>>>>>>>>>> ...
>>>>>>>>>>>          if (!found) {
>>>>>>>>>>>                  mntput(ss_mnt);
>>>>>>>>>>>                  return;
>>>>>>>>>>>          }
>>>>>>>>>>>
>>>>>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>>>>>> so I will remove this check.
>>>>>>>>>>
>>>>>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>>>>>> BUG_ON?
>>>>>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>>>>>> a BUG_ON on this.
>>>>>>>>>>
>>>>>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>>>>>> avoid that unless there really is no choice.
>>>>>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>>>>>> Why not just handle the error and keep going?  Why panic at all?
>>>>>>>>
>>>>>>> Who the hell sets panic_on_warn (outside of testing environments)?
>>>>>> All cloud providers and anyone else that wants to "kill the system that
>>>>>> had a problem and have it reboot fast" in order to keep things working
>>>>>> overall.
>>>>>>
>>>>> If that's the case, then this situation would probably be one where a
>>>>> cloud provider would want to crash it and come back. NFS grace periods
>>>>> can suck though.
>>>>>
>>>>>>> I'm
>>>>>>> suggesting a WARN_ON because not finding an entry at this point
>>>>>>> represents a bug that we'd want reported.
>>>>>> Your call, but we are generally discouraging adding new WARN_ON() for
>>>>>> anything that userspace could ever trigger.  And if userspace can't
>>>>>> trigger it, then it's a normal type of error that you need to handle
>>>>>> anyway, right?
>>>>>>
>>>>>> Anyway, your call, just letting you know.
>>>>>>
>>>>> Understood.
>>>>>
>>>>>>> The caller should hold a reference to the object that holds a vfsmount
>>>>>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>>>>>> point where we're releasing that reference, then we're looking at a
>>>>>>> refcounting bug of some sort.
>>>>>> refcounting in the nfsd code, or outside of that?
>>>>>>
>>>>> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
>>>>> server copy is quite the tenuous house of cards. ;)
>>>>>
>>>>>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>>>>>> situation. After all, they asked for it. Presumably they want it to do
>>>>>>> some coredump analysis or something?
>>>>>>>
>>>>>>> It is debatable whether the stack trace at this point would be helpful
>>>>>>> though, so you might consider a pr_warn or something less log-spammy.
>>>>>> If you can recover from it, then yeah, pr_warn() is usually best.
>>>>>>
>>>>> It does look like Dai went with pr_warn on his v2 patch.
>>>>>
>>>>> We'd "recover" by leaking a vfsmount reference. The immediate crash
>>>>> would be avoided, but it might make for interesting "fun" later when you
>>>>> went to try and unmount the thing.
>>>> This is a red flag for me. If the leak prevents the system from
>>>> shutting down reliably, then we need to do something more than
>>>> a pr_warn(), I would think.
>>>>
>>> Sorry, I should correct myself.
>>>
>>> We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
>>> longer on the list, then presumably it has already been cleaned up and
>>> the vfsmount reference put.
>> I think the issue here is not vfsmount reference count. The issue is that
>> we could not find a nfsd4_ssc_umount_item on the list that matches the
>> vfsmount ss_mnt. So the question is what should we do in this case?
>>
>> Prior to this patch, when we hit this scenario we just go ahead and
>> unmount the ss_mnt there since it won't be unmounted by the laundromat
>> (it's not on the delayed unmount list).
>>
>> With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
>>
>> I'd prefer to go back to the previous code to do the unmount and also
>> do a pr_warn.
>>
>>> It's still a bug though since we _should_ still have a reference to the
>>> nfsd4_ssc_umount_item at this point. So this is really just a potential
>>> use-after-free.
>> The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
>> but we just can't find it on the list. Even though the possibility for
>> this to happen is from slim to none, we still have to check for it.
>>
>>> FWIW, the object handling here is somewhat weird as the copy operation
>>> holds a reference to the nfsd4_ssc_umount_item but passes around a
>>> pointer to the vfsmount
>>>
>>> I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
>>> back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
>>> nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
>> Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
>> would be better.  We won't have to deal with the situation where we can't
>> find an item on the list (even though it almost never happen).
>>
>> Can we do this enhancement after fixing this use-after-free problem, in
>> a separate patch series?
> Is there a reason not fix it correctly now?
>
> I'd rather not merge a fix that leaves the possibility of a leak.

I think we should do it separately because the reported problem of
use-after-free is not related to the problem of not finding the ni
item on the list.

AFAIK, the problem of not finding ni item on the list was never
reported or happened during my testing.

The fix for the use-after-free deals with the handling of errors
returned from nfs42_ssc_open and other places in the code. This
fix remains the same no matter how the vfsmount is passed around.

-Dai

Jeff Layton Dec. 12, 2022, 7:46 p.m. UTC | #19

On Mon, 2022-12-12 at 19:28 +0000, Chuck Lever III wrote:
> 
> > On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > 
> > 
> > On 12/12/22 10:38 AM, Jeff Layton wrote:
> > > On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
> > > > > On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
> > > > > > On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
> > > > > > > On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> > > > > > > > On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > > > > > > > > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > > > > > > > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > > > > > > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > > > > > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > > > > > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > > > > > > > > return errors.
> > > > > > > > > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > > > > > > > > for the laundromat to unmount when idle time expires.
> > > > > > > > > > > > 
> > > > > > > > > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > > > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > > > > > > > > >  1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > > > > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > > > > > > > > >  	return status;
> > > > > > > > > > > >  }
> > > > > > > > > > > > 
> > > > > > > > > > > > -static void
> > > > > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > > > > -{
> > > > > > > > > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > > > > > > > > -	mntput(ss_mnt);
> > > > > > > > > > > > -}
> > > > > > > > > > > > -
> > > > > > > > > > > >  /*
> > > > > > > > > > > >   * Verify COPY destination stateid.
> > > > > > > > > > > >   *
> > > > > > > > > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > > > > > > > > >  {
> > > > > > > > > > > >  }
> > > > > > > > > > > > 
> > > > > > > > > > > > -static void
> > > > > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > > > > -{
> > > > > > > > > > > > -}
> > > > > > > > > > > > -
> > > > > > > > > > > >  static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > > > > > > > > >  				   struct nfs_fh *src_fh,
> > > > > > > > > > > >  				   nfs4_stateid *stateid)
> > > > > > > > > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > > > > >  		struct file *filp;
> > > > > > > > > > > > 
> > > > > > > > > > > >  		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > > > > > > > > -				      &copy->stateid);
> > > > > > > > > > > > +					&copy->stateid);
> > > > > > > > > > > > +
> > > > > > > > > > > >  		if (IS_ERR(filp)) {
> > > > > > > > > > > >  			switch (PTR_ERR(filp)) {
> > > > > > > > > > > >  			case -EBADF:
> > > > > > > > > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > > > > >  			default:
> > > > > > > > > > > >  				nfserr = nfserr_offload_denied;
> > > > > > > > > > > >  			}
> > > > > > > > > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > > > > > > > > >  			goto do_callback;
> > > > > > > > > > > >  		}
> > > > > > > > > > > >  		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > > > > > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > > > > > > >  	if (async_copy)
> > > > > > > > > > > >  		cleanup_async_copy(async_copy);
> > > > > > > > > > > >  	status = nfserrno(-ENOMEM);
> > > > > > > > > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > > > > > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > > > > +	/*
> > > > > > > > > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > > > > > > > > +	 * by the laundromat
> > > > > > > > > > > > +	 */
> > > > > > > > > > > >  	goto out;
> > > > > > > > > > > >  }
> > > > > > > > > > > > 
> > > > > > > > > > > This looks reasonable at first glance, but I have some concerns with the
> > > > > > > > > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > > > > > > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > > > > > > > > finds one.
> > > > > > > > > > > 
> > > > > > > > > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > > > > > > > > just does a bare mntput:
> > > > > > > > > > > 
> > > > > > > > > > >         if (!nn) {
> > > > > > > > > > >                 mntput(ss_mnt);
> > > > > > > > > > >                 return;
> > > > > > > > > > >         }
> > > > > > > > > > > ...
> > > > > > > > > > >         if (!found) {
> > > > > > > > > > >                 mntput(ss_mnt);
> > > > > > > > > > >                 return;
> > > > > > > > > > >         }
> > > > > > > > > > > 
> > > > > > > > > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > > > > > > > > why is it not a problem elsewhere in the kernel?
> > > > > > > > > > it looks like net_generic can not fail, no where else check for NULL
> > > > > > > > > > so I will remove this check.
> > > > > > > > > > 
> > > > > > > > > > > For the second case, if the ni is no longer on the list, where did the
> > > > > > > > > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > > > > > > > > BUG_ON?
> > > > > > > > > > if ni is not found on the list then it's a bug somewhere so I will add
> > > > > > > > > > a BUG_ON on this.
> > > > > > > > > > 
> > > > > > > > > Probably better to just WARN_ON and let any references leak in that
> > > > > > > > > case. A BUG_ON implies a panic in some environments, and it's best to
> > > > > > > > > avoid that unless there really is no choice.
> > > > > > > > WARN_ON also causes machines to boot that have panic_on_warn enabled.
> > > > > > > > Why not just handle the error and keep going?  Why panic at all?
> > > > > > > > 
> > > > > > > Who the hell sets panic_on_warn (outside of testing environments)?
> > > > > > All cloud providers and anyone else that wants to "kill the system that
> > > > > > had a problem and have it reboot fast" in order to keep things working
> > > > > > overall.
> > > > > > 
> > > > > If that's the case, then this situation would probably be one where a
> > > > > cloud provider would want to crash it and come back. NFS grace periods
> > > > > can suck though.
> > > > > 
> > > > > > > I'm
> > > > > > > suggesting a WARN_ON because not finding an entry at this point
> > > > > > > represents a bug that we'd want reported.
> > > > > > Your call, but we are generally discouraging adding new WARN_ON() for
> > > > > > anything that userspace could ever trigger.  And if userspace can't
> > > > > > trigger it, then it's a normal type of error that you need to handle
> > > > > > anyway, right?
> > > > > > 
> > > > > > Anyway, your call, just letting you know.
> > > > > > 
> > > > > Understood.
> > > > > 
> > > > > > > The caller should hold a reference to the object that holds a vfsmount
> > > > > > > reference. It relies on that vfsmount to do a copy. If it's gone at this
> > > > > > > point where we're releasing that reference, then we're looking at a
> > > > > > > refcounting bug of some sort.
> > > > > > refcounting in the nfsd code, or outside of that?
> > > > > > 
> > > > > It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
> > > > > server copy is quite the tenuous house of cards. ;)
> > > > > 
> > > > > > > I would expect anyone who sets panic_on_warn to _desire_ a panic in this
> > > > > > > situation. After all, they asked for it. Presumably they want it to do
> > > > > > > some coredump analysis or something?
> > > > > > > 
> > > > > > > It is debatable whether the stack trace at this point would be helpful
> > > > > > > though, so you might consider a pr_warn or something less log-spammy.
> > > > > > If you can recover from it, then yeah, pr_warn() is usually best.
> > > > > > 
> > > > > It does look like Dai went with pr_warn on his v2 patch.
> > > > > 
> > > > > We'd "recover" by leaking a vfsmount reference. The immediate crash
> > > > > would be avoided, but it might make for interesting "fun" later when you
> > > > > went to try and unmount the thing.
> > > > This is a red flag for me. If the leak prevents the system from
> > > > shutting down reliably, then we need to do something more than
> > > > a pr_warn(), I would think.
> > > > 
> > > Sorry, I should correct myself.
> > > 
> > > We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
> > > longer on the list, then presumably it has already been cleaned up and
> > > the vfsmount reference put.
> > 
> > I think the issue here is not vfsmount reference count. The issue is that
> > we could not find a nfsd4_ssc_umount_item on the list that matches the
> > vfsmount ss_mnt. So the question is what should we do in this case?
> > 
> > Prior to this patch, when we hit this scenario we just go ahead and
> > unmount the ss_mnt there since it won't be unmounted by the laundromat
> > (it's not on the delayed unmount list).
> > 
> > With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
> > 
> > I'd prefer to go back to the previous code to do the unmount and also
> > do a pr_warn.
> > 
> > > It's still a bug though since we _should_ still have a reference to the
> > > nfsd4_ssc_umount_item at this point. So this is really just a potential
> > > use-after-free.
> > 
> > The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
> > but we just can't find it on the list. Even though the possibility for
> > this to happen is from slim to none, we still have to check for it.
> > 
> > > FWIW, the object handling here is somewhat weird as the copy operation
> > > holds a reference to the nfsd4_ssc_umount_item but passes around a
> > > pointer to the vfsmount
> > > 
> > > I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
> > > back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
> > > nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
> > 
> > Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
> > would be better.  We won't have to deal with the situation where we can't
> > find an item on the list (even though it almost never happen).
> > 
> > Can we do this enhancement after fixing this use-after-free problem, in
> > a separate patch series?

^^^
I think that'd be best.

> Is there a reason not fix it correctly now?
> 
> I'd rather not merge a fix that leaves the possibility of a leak.

We're going to need to backport this to earlier kernels and it'll need
to go in soon. I think it'd be to take a minimal fix for the reported
crash, and then Dai can have some time to do a larger cleanup.

Chuck Lever Dec. 12, 2022, 7:48 p.m. UTC | #20

> On Dec 12, 2022, at 2:46 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Mon, 2022-12-12 at 19:28 +0000, Chuck Lever III wrote:
>> 
>>> On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>> 
>>> 
>>> On 12/12/22 10:38 AM, Jeff Layton wrote:
>>>> On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
>>>>>> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> 
>>>>>> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>>>>>>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>>>>>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>>>>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>>>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>>>>>>> return errors.
>>>>>>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>>>>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>>>>>> 	return status;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -static void
>>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>>> -{
>>>>>>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>>>>>>> -	mntput(ss_mnt);
>>>>>>>>>>>>> -}
>>>>>>>>>>>>> -
>>>>>>>>>>>>> /*
>>>>>>>>>>>>>  * Verify COPY destination stateid.
>>>>>>>>>>>>>  *
>>>>>>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>>>>>> {
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -static void
>>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>>> -{
>>>>>>>>>>>>> -}
>>>>>>>>>>>>> -
>>>>>>>>>>>>> static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>>>>>> 				   struct nfs_fh *src_fh,
>>>>>>>>>>>>> 				   nfs4_stateid *stateid)
>>>>>>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>> 		struct file *filp;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>>>>>>> -				      &copy->stateid);
>>>>>>>>>>>>> +					&copy->stateid);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> 		if (IS_ERR(filp)) {
>>>>>>>>>>>>> 			switch (PTR_ERR(filp)) {
>>>>>>>>>>>>> 			case -EBADF:
>>>>>>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>> 			default:
>>>>>>>>>>>>> 				nfserr = nfserr_offload_denied;
>>>>>>>>>>>>> 			}
>>>>>>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>>>>>> 			goto do_callback;
>>>>>>>>>>>>> 		}
>>>>>>>>>>>>> 		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>>>>>> 	if (async_copy)
>>>>>>>>>>>>> 		cleanup_async_copy(async_copy);
>>>>>>>>>>>>> 	status = nfserrno(-ENOMEM);
>>>>>>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>>> +	/*
>>>>>>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>>>>>>> +	 * by the laundromat
>>>>>>>>>>>>> +	 */
>>>>>>>>>>>>> 	goto out;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>>>>>>> finds one.
>>>>>>>>>>>> 
>>>>>>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>>>>>>> just does a bare mntput:
>>>>>>>>>>>> 
>>>>>>>>>>>>        if (!nn) {
>>>>>>>>>>>>                mntput(ss_mnt);
>>>>>>>>>>>>                return;
>>>>>>>>>>>>        }
>>>>>>>>>>>> ...
>>>>>>>>>>>>        if (!found) {
>>>>>>>>>>>>                mntput(ss_mnt);
>>>>>>>>>>>>                return;
>>>>>>>>>>>>        }
>>>>>>>>>>>> 
>>>>>>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>>>>>>> so I will remove this check.
>>>>>>>>>>> 
>>>>>>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>>>>>>> BUG_ON?
>>>>>>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>>>>>>> a BUG_ON on this.
>>>>>>>>>>> 
>>>>>>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>>>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>>>>>>> avoid that unless there really is no choice.
>>>>>>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>>>>>>> Why not just handle the error and keep going?  Why panic at all?
>>>>>>>>> 
>>>>>>>> Who the hell sets panic_on_warn (outside of testing environments)?
>>>>>>> All cloud providers and anyone else that wants to "kill the system that
>>>>>>> had a problem and have it reboot fast" in order to keep things working
>>>>>>> overall.
>>>>>>> 
>>>>>> If that's the case, then this situation would probably be one where a
>>>>>> cloud provider would want to crash it and come back. NFS grace periods
>>>>>> can suck though.
>>>>>> 
>>>>>>>> I'm
>>>>>>>> suggesting a WARN_ON because not finding an entry at this point
>>>>>>>> represents a bug that we'd want reported.
>>>>>>> Your call, but we are generally discouraging adding new WARN_ON() for
>>>>>>> anything that userspace could ever trigger.  And if userspace can't
>>>>>>> trigger it, then it's a normal type of error that you need to handle
>>>>>>> anyway, right?
>>>>>>> 
>>>>>>> Anyway, your call, just letting you know.
>>>>>>> 
>>>>>> Understood.
>>>>>> 
>>>>>>>> The caller should hold a reference to the object that holds a vfsmount
>>>>>>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>>>>>>> point where we're releasing that reference, then we're looking at a
>>>>>>>> refcounting bug of some sort.
>>>>>>> refcounting in the nfsd code, or outside of that?
>>>>>>> 
>>>>>> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
>>>>>> server copy is quite the tenuous house of cards. ;)
>>>>>> 
>>>>>>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>>>>>>> situation. After all, they asked for it. Presumably they want it to do
>>>>>>>> some coredump analysis or something?
>>>>>>>> 
>>>>>>>> It is debatable whether the stack trace at this point would be helpful
>>>>>>>> though, so you might consider a pr_warn or something less log-spammy.
>>>>>>> If you can recover from it, then yeah, pr_warn() is usually best.
>>>>>>> 
>>>>>> It does look like Dai went with pr_warn on his v2 patch.
>>>>>> 
>>>>>> We'd "recover" by leaking a vfsmount reference. The immediate crash
>>>>>> would be avoided, but it might make for interesting "fun" later when you
>>>>>> went to try and unmount the thing.
>>>>> This is a red flag for me. If the leak prevents the system from
>>>>> shutting down reliably, then we need to do something more than
>>>>> a pr_warn(), I would think.
>>>>> 
>>>> Sorry, I should correct myself.
>>>> 
>>>> We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
>>>> longer on the list, then presumably it has already been cleaned up and
>>>> the vfsmount reference put.
>>> 
>>> I think the issue here is not vfsmount reference count. The issue is that
>>> we could not find a nfsd4_ssc_umount_item on the list that matches the
>>> vfsmount ss_mnt. So the question is what should we do in this case?
>>> 
>>> Prior to this patch, when we hit this scenario we just go ahead and
>>> unmount the ss_mnt there since it won't be unmounted by the laundromat
>>> (it's not on the delayed unmount list).
>>> 
>>> With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
>>> 
>>> I'd prefer to go back to the previous code to do the unmount and also
>>> do a pr_warn.
>>> 
>>>> It's still a bug though since we _should_ still have a reference to the
>>>> nfsd4_ssc_umount_item at this point. So this is really just a potential
>>>> use-after-free.
>>> 
>>> The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
>>> but we just can't find it on the list. Even though the possibility for
>>> this to happen is from slim to none, we still have to check for it.
>>> 
>>>> FWIW, the object handling here is somewhat weird as the copy operation
>>>> holds a reference to the nfsd4_ssc_umount_item but passes around a
>>>> pointer to the vfsmount
>>>> 
>>>> I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
>>>> back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
>>>> nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
>>> 
>>> Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
>>> would be better.  We won't have to deal with the situation where we can't
>>> find an item on the list (even though it almost never happen).
>>> 
>>> Can we do this enhancement after fixing this use-after-free problem, in
>>> a separate patch series?
> 
> ^^^
> I think that'd be best.
> 
>> Is there a reason not fix it correctly now?
>> 
>> I'd rather not merge a fix that leaves the possibility of a leak.
> 
> We're going to need to backport this to earlier kernels and it'll need
> to go in soon. I think it'd be to take a minimal fix for the reported
> crash, and then Dai can have some time to do a larger cleanup.

Backporting is important, of course.

What I was hearing was that the simple fix couldn't be done without
introducing a leak or UAF.


--
Chuck Lever

Dai Ngo Dec. 12, 2022, 7:59 p.m. UTC | #21

On 12/12/22 11:48 AM, Chuck Lever III wrote:
>
>> On Dec 12, 2022, at 2:46 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>
>> On Mon, 2022-12-12 at 19:28 +0000, Chuck Lever III wrote:
>>>> On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>
>>>>
>>>> On 12/12/22 10:38 AM, Jeff Layton wrote:
>>>>> On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
>>>>>>> On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
>>>>>>>> On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
>>>>>>>>> On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
>>>>>>>>>> On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
>>>>>>>>>>> On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
>>>>>>>>>>>> On 12/12/22 4:22 AM, Jeff Layton wrote:
>>>>>>>>>>>>> On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
>>>>>>>>>>>>>> Problem caused by source's vfsmount being unmounted but remains
>>>>>>>>>>>>>> on the delayed unmount list. This happens when nfs42_ssc_open()
>>>>>>>>>>>>>> return errors.
>>>>>>>>>>>>>> Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
>>>>>>>>>>>>>> for the laundromat to unmount when idle time expires.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Reported-by: Xingyuan Mo <hdthky0@gmail.com>
>>>>>>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> fs/nfsd/nfs4proc.c | 23 +++++++----------------
>>>>>>>>>>>>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>>> index 8beb2bc4c328..756e42cf0d01 100644
>>>>>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>>>>>> @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>>>>>>>> 	return status;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -static void
>>>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>>>> -{
>>>>>>>>>>>>>> -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
>>>>>>>>>>>>>> -	mntput(ss_mnt);
>>>>>>>>>>>>>> -}
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> /*
>>>>>>>>>>>>>>   * Verify COPY destination stateid.
>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>> @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -static void
>>>>>>>>>>>>>> -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
>>>>>>>>>>>>>> -{
>>>>>>>>>>>>>> -}
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
>>>>>>>>>>>>>> 				   struct nfs_fh *src_fh,
>>>>>>>>>>>>>> 				   nfs4_stateid *stateid)
>>>>>>>>>>>>>> @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>>> 		struct file *filp;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
>>>>>>>>>>>>>> -				      &copy->stateid);
>>>>>>>>>>>>>> +					&copy->stateid);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> 		if (IS_ERR(filp)) {
>>>>>>>>>>>>>> 			switch (PTR_ERR(filp)) {
>>>>>>>>>>>>>> 			case -EBADF:
>>>>>>>>>>>>>> @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>>>>>>>>> 			default:
>>>>>>>>>>>>>> 				nfserr = nfserr_offload_denied;
>>>>>>>>>>>>>> 			}
>>>>>>>>>>>>>> -			nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>>>> +			/* ss_mnt will be unmounted by the laundromat */
>>>>>>>>>>>>>> 			goto do_callback;
>>>>>>>>>>>>>> 		}
>>>>>>>>>>>>>> 		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
>>>>>>>>>>>>>> @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>>>>>>> 	if (async_copy)
>>>>>>>>>>>>>> 		cleanup_async_copy(async_copy);
>>>>>>>>>>>>>> 	status = nfserrno(-ENOMEM);
>>>>>>>>>>>>>> -	if (nfsd4_ssc_is_inter(copy))
>>>>>>>>>>>>>> -		nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>>>>>>>>> +	/*
>>>>>>>>>>>>>> +	 * source's vfsmount of inter-copy will be unmounted
>>>>>>>>>>>>>> +	 * by the laundromat
>>>>>>>>>>>>>> +	 */
>>>>>>>>>>>>>> 	goto out;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>> This looks reasonable at first glance, but I have some concerns with the
>>>>>>>>>>>>> refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
>>>>>>>>>>>>> looks for an existing connection and bumps the ni->nsui_refcnt if it
>>>>>>>>>>>>> finds one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
>>>>>>>>>>>>> just does a bare mntput:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         if (!nn) {
>>>>>>>>>>>>>                 mntput(ss_mnt);
>>>>>>>>>>>>>                 return;
>>>>>>>>>>>>>         }
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>         if (!found) {
>>>>>>>>>>>>>                 mntput(ss_mnt);
>>>>>>>>>>>>>                 return;
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>
>>>>>>>>>>>>> The first one looks bogus. Can net_generic return NULL? If so how, and
>>>>>>>>>>>>> why is it not a problem elsewhere in the kernel?
>>>>>>>>>>>> it looks like net_generic can not fail, no where else check for NULL
>>>>>>>>>>>> so I will remove this check.
>>>>>>>>>>>>
>>>>>>>>>>>>> For the second case, if the ni is no longer on the list, where did the
>>>>>>>>>>>>> extra ss_mnt reference come from? Maybe that should be a WARN_ON or
>>>>>>>>>>>>> BUG_ON?
>>>>>>>>>>>> if ni is not found on the list then it's a bug somewhere so I will add
>>>>>>>>>>>> a BUG_ON on this.
>>>>>>>>>>>>
>>>>>>>>>>> Probably better to just WARN_ON and let any references leak in that
>>>>>>>>>>> case. A BUG_ON implies a panic in some environments, and it's best to
>>>>>>>>>>> avoid that unless there really is no choice.
>>>>>>>>>> WARN_ON also causes machines to boot that have panic_on_warn enabled.
>>>>>>>>>> Why not just handle the error and keep going?  Why panic at all?
>>>>>>>>>>
>>>>>>>>> Who the hell sets panic_on_warn (outside of testing environments)?
>>>>>>>> All cloud providers and anyone else that wants to "kill the system that
>>>>>>>> had a problem and have it reboot fast" in order to keep things working
>>>>>>>> overall.
>>>>>>>>
>>>>>>> If that's the case, then this situation would probably be one where a
>>>>>>> cloud provider would want to crash it and come back. NFS grace periods
>>>>>>> can suck though.
>>>>>>>
>>>>>>>>> I'm
>>>>>>>>> suggesting a WARN_ON because not finding an entry at this point
>>>>>>>>> represents a bug that we'd want reported.
>>>>>>>> Your call, but we are generally discouraging adding new WARN_ON() for
>>>>>>>> anything that userspace could ever trigger.  And if userspace can't
>>>>>>>> trigger it, then it's a normal type of error that you need to handle
>>>>>>>> anyway, right?
>>>>>>>>
>>>>>>>> Anyway, your call, just letting you know.
>>>>>>>>
>>>>>>> Understood.
>>>>>>>
>>>>>>>>> The caller should hold a reference to the object that holds a vfsmount
>>>>>>>>> reference. It relies on that vfsmount to do a copy. If it's gone at this
>>>>>>>>> point where we're releasing that reference, then we're looking at a
>>>>>>>>> refcounting bug of some sort.
>>>>>>>> refcounting in the nfsd code, or outside of that?
>>>>>>>>
>>>>>>> It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
>>>>>>> server copy is quite the tenuous house of cards. ;)
>>>>>>>
>>>>>>>>> I would expect anyone who sets panic_on_warn to _desire_ a panic in this
>>>>>>>>> situation. After all, they asked for it. Presumably they want it to do
>>>>>>>>> some coredump analysis or something?
>>>>>>>>>
>>>>>>>>> It is debatable whether the stack trace at this point would be helpful
>>>>>>>>> though, so you might consider a pr_warn or something less log-spammy.
>>>>>>>> If you can recover from it, then yeah, pr_warn() is usually best.
>>>>>>>>
>>>>>>> It does look like Dai went with pr_warn on his v2 patch.
>>>>>>>
>>>>>>> We'd "recover" by leaking a vfsmount reference. The immediate crash
>>>>>>> would be avoided, but it might make for interesting "fun" later when you
>>>>>>> went to try and unmount the thing.
>>>>>> This is a red flag for me. If the leak prevents the system from
>>>>>> shutting down reliably, then we need to do something more than
>>>>>> a pr_warn(), I would think.
>>>>>>
>>>>> Sorry, I should correct myself.
>>>>>
>>>>> We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
>>>>> longer on the list, then presumably it has already been cleaned up and
>>>>> the vfsmount reference put.
>>>> I think the issue here is not vfsmount reference count. The issue is that
>>>> we could not find a nfsd4_ssc_umount_item on the list that matches the
>>>> vfsmount ss_mnt. So the question is what should we do in this case?
>>>>
>>>> Prior to this patch, when we hit this scenario we just go ahead and
>>>> unmount the ss_mnt there since it won't be unmounted by the laundromat
>>>> (it's not on the delayed unmount list).
>>>>
>>>> With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
>>>>
>>>> I'd prefer to go back to the previous code to do the unmount and also
>>>> do a pr_warn.
>>>>
>>>>> It's still a bug though since we _should_ still have a reference to the
>>>>> nfsd4_ssc_umount_item at this point. So this is really just a potential
>>>>> use-after-free.
>>>> The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
>>>> but we just can't find it on the list. Even though the possibility for
>>>> this to happen is from slim to none, we still have to check for it.
>>>>
>>>>> FWIW, the object handling here is somewhat weird as the copy operation
>>>>> holds a reference to the nfsd4_ssc_umount_item but passes around a
>>>>> pointer to the vfsmount
>>>>>
>>>>> I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
>>>>> back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
>>>>> nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
>>>> Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
>>>> would be better.  We won't have to deal with the situation where we can't
>>>> find an item on the list (even though it almost never happen).
>>>>
>>>> Can we do this enhancement after fixing this use-after-free problem, in
>>>> a separate patch series?
>> ^^^
>> I think that'd be best.
>>
>>> Is there a reason not fix it correctly now?
>>>
>>> I'd rather not merge a fix that leaves the possibility of a leak.
>> We're going to need to backport this to earlier kernels and it'll need
>> to go in soon. I think it'd be to take a minimal fix for the reported
>> crash, and then Dai can have some time to do a larger cleanup.
> Backporting is important, of course.
>
> What I was hearing was that the simple fix couldn't be done without
> introducing a leak or UAF.

No, this is not true. This fix is independent of the enhancement
suggested by Jeff to deal with the way the vfsmount is passed around
to avoid the condition where the ni item is not found on the list.

-Dai

>
>
> --
> Chuck Lever
>
>
>

Jeff Layton Dec. 12, 2022, 9:17 p.m. UTC | #22

On Mon, 2022-12-12 at 11:59 -0800, dai.ngo@oracle.com wrote:
> On 12/12/22 11:48 AM, Chuck Lever III wrote:
> > 
> > > On Dec 12, 2022, at 2:46 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > On Mon, 2022-12-12 at 19:28 +0000, Chuck Lever III wrote:
> > > > > On Dec 12, 2022, at 2:16 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > > 
> > > > > 
> > > > > On 12/12/22 10:38 AM, Jeff Layton wrote:
> > > > > > On Mon, 2022-12-12 at 18:16 +0000, Chuck Lever III wrote:
> > > > > > > > On Dec 12, 2022, at 12:44 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > 
> > > > > > > > On Mon, 2022-12-12 at 18:14 +0100, Greg KH wrote:
> > > > > > > > > On Mon, Dec 12, 2022 at 09:31:19AM -0500, Jeff Layton wrote:
> > > > > > > > > > On Mon, 2022-12-12 at 14:59 +0100, Greg KH wrote:
> > > > > > > > > > > On Mon, Dec 12, 2022 at 08:40:31AM -0500, Jeff Layton wrote:
> > > > > > > > > > > > On Mon, 2022-12-12 at 05:34 -0800, dai.ngo@oracle.com wrote:
> > > > > > > > > > > > > On 12/12/22 4:22 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > On Sun, 2022-12-11 at 11:22 -0800, Dai Ngo wrote:
> > > > > > > > > > > > > > > Problem caused by source's vfsmount being unmounted but remains
> > > > > > > > > > > > > > > on the delayed unmount list. This happens when nfs42_ssc_open()
> > > > > > > > > > > > > > > return errors.
> > > > > > > > > > > > > > > Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
> > > > > > > > > > > > > > > for the laundromat to unmount when idle time expires.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Reported-by: Xingyuan Mo <hdthky0@gmail.com>
> > > > > > > > > > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > fs/nfsd/nfs4proc.c | 23 +++++++----------------
> > > > > > > > > > > > > > > 1 file changed, 7 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > > > > index 8beb2bc4c328..756e42cf0d01 100644
> > > > > > > > > > > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > > > > > > > > > > @@ -1463,13 +1463,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > > > > > > > > > > > > > > 	return status;
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > -static void
> > > > > > > > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > > > > > > > -{
> > > > > > > > > > > > > > > -	nfs_do_sb_deactive(ss_mnt->mnt_sb);
> > > > > > > > > > > > > > > -	mntput(ss_mnt);
> > > > > > > > > > > > > > > -}
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > /*
> > > > > > > > > > > > > > >   * Verify COPY destination stateid.
> > > > > > > > > > > > > > >   *
> > > > > > > > > > > > > > > @@ -1572,11 +1565,6 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp,
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > -static void
> > > > > > > > > > > > > > > -nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > > > > > > > > > > > > > > -{
> > > > > > > > > > > > > > > -}
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > > > > > > > > > > > > > > 				   struct nfs_fh *src_fh,
> > > > > > > > > > > > > > > 				   nfs4_stateid *stateid)
> > > > > > > > > > > > > > > @@ -1762,7 +1750,8 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > > > > > > > > 		struct file *filp;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 		filp = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
> > > > > > > > > > > > > > > -				      &copy->stateid);
> > > > > > > > > > > > > > > +					&copy->stateid);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > 		if (IS_ERR(filp)) {
> > > > > > > > > > > > > > > 			switch (PTR_ERR(filp)) {
> > > > > > > > > > > > > > > 			case -EBADF:
> > > > > > > > > > > > > > > @@ -1771,7 +1760,7 @@ static int nfsd4_do_async_copy(void *data)
> > > > > > > > > > > > > > > 			default:
> > > > > > > > > > > > > > > 				nfserr = nfserr_offload_denied;
> > > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > > -			nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > > > > > > > +			/* ss_mnt will be unmounted by the laundromat */
> > > > > > > > > > > > > > > 			goto do_callback;
> > > > > > > > > > > > > > > 		}
> > > > > > > > > > > > > > > 		nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file,
> > > > > > > > > > > > > > > @@ -1852,8 +1841,10 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > > > > > > > > > > 	if (async_copy)
> > > > > > > > > > > > > > > 		cleanup_async_copy(async_copy);
> > > > > > > > > > > > > > > 	status = nfserrno(-ENOMEM);
> > > > > > > > > > > > > > > -	if (nfsd4_ssc_is_inter(copy))
> > > > > > > > > > > > > > > -		nfsd4_interssc_disconnect(copy->ss_mnt);
> > > > > > > > > > > > > > > +	/*
> > > > > > > > > > > > > > > +	 * source's vfsmount of inter-copy will be unmounted
> > > > > > > > > > > > > > > +	 * by the laundromat
> > > > > > > > > > > > > > > +	 */
> > > > > > > > > > > > > > > 	goto out;
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This looks reasonable at first glance, but I have some concerns with the
> > > > > > > > > > > > > > refcounting around ss_mnt elsewhere in this code. nfsd4_ssc_setup_dul
> > > > > > > > > > > > > > looks for an existing connection and bumps the ni->nsui_refcnt if it
> > > > > > > > > > > > > > finds one.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > But then later, nfsd4_cleanup_inter_ssc has a couple of cases where it
> > > > > > > > > > > > > > just does a bare mntput:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >         if (!nn) {
> > > > > > > > > > > > > >                 mntput(ss_mnt);
> > > > > > > > > > > > > >                 return;
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > ...
> > > > > > > > > > > > > >         if (!found) {
> > > > > > > > > > > > > >                 mntput(ss_mnt);
> > > > > > > > > > > > > >                 return;
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The first one looks bogus. Can net_generic return NULL? If so how, and
> > > > > > > > > > > > > > why is it not a problem elsewhere in the kernel?
> > > > > > > > > > > > > it looks like net_generic can not fail, no where else check for NULL
> > > > > > > > > > > > > so I will remove this check.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > For the second case, if the ni is no longer on the list, where did the
> > > > > > > > > > > > > > extra ss_mnt reference come from? Maybe that should be a WARN_ON or
> > > > > > > > > > > > > > BUG_ON?
> > > > > > > > > > > > > if ni is not found on the list then it's a bug somewhere so I will add
> > > > > > > > > > > > > a BUG_ON on this.
> > > > > > > > > > > > > 
> > > > > > > > > > > > Probably better to just WARN_ON and let any references leak in that
> > > > > > > > > > > > case. A BUG_ON implies a panic in some environments, and it's best to
> > > > > > > > > > > > avoid that unless there really is no choice.
> > > > > > > > > > > WARN_ON also causes machines to boot that have panic_on_warn enabled.
> > > > > > > > > > > Why not just handle the error and keep going?  Why panic at all?
> > > > > > > > > > > 
> > > > > > > > > > Who the hell sets panic_on_warn (outside of testing environments)?
> > > > > > > > > All cloud providers and anyone else that wants to "kill the system that
> > > > > > > > > had a problem and have it reboot fast" in order to keep things working
> > > > > > > > > overall.
> > > > > > > > > 
> > > > > > > > If that's the case, then this situation would probably be one where a
> > > > > > > > cloud provider would want to crash it and come back. NFS grace periods
> > > > > > > > can suck though.
> > > > > > > > 
> > > > > > > > > > I'm
> > > > > > > > > > suggesting a WARN_ON because not finding an entry at this point
> > > > > > > > > > represents a bug that we'd want reported.
> > > > > > > > > Your call, but we are generally discouraging adding new WARN_ON() for
> > > > > > > > > anything that userspace could ever trigger.  And if userspace can't
> > > > > > > > > trigger it, then it's a normal type of error that you need to handle
> > > > > > > > > anyway, right?
> > > > > > > > > 
> > > > > > > > > Anyway, your call, just letting you know.
> > > > > > > > > 
> > > > > > > > Understood.
> > > > > > > > 
> > > > > > > > > > The caller should hold a reference to the object that holds a vfsmount
> > > > > > > > > > reference. It relies on that vfsmount to do a copy. If it's gone at this
> > > > > > > > > > point where we're releasing that reference, then we're looking at a
> > > > > > > > > > refcounting bug of some sort.
> > > > > > > > > refcounting in the nfsd code, or outside of that?
> > > > > > > > > 
> > > > > > > > It'd be in the nfsd code, but might affect the vfsmount refcount. Inter-
> > > > > > > > server copy is quite the tenuous house of cards. ;)
> > > > > > > > 
> > > > > > > > > > I would expect anyone who sets panic_on_warn to _desire_ a panic in this
> > > > > > > > > > situation. After all, they asked for it. Presumably they want it to do
> > > > > > > > > > some coredump analysis or something?
> > > > > > > > > > 
> > > > > > > > > > It is debatable whether the stack trace at this point would be helpful
> > > > > > > > > > though, so you might consider a pr_warn or something less log-spammy.
> > > > > > > > > If you can recover from it, then yeah, pr_warn() is usually best.
> > > > > > > > > 
> > > > > > > > It does look like Dai went with pr_warn on his v2 patch.
> > > > > > > > 
> > > > > > > > We'd "recover" by leaking a vfsmount reference. The immediate crash
> > > > > > > > would be avoided, but it might make for interesting "fun" later when you
> > > > > > > > went to try and unmount the thing.
> > > > > > > This is a red flag for me. If the leak prevents the system from
> > > > > > > shutting down reliably, then we need to do something more than
> > > > > > > a pr_warn(), I would think.
> > > > > > > 
> > > > > > Sorry, I should correct myself.
> > > > > > 
> > > > > > We wouldn't (necessarily) leak a vfsmount reference. If the entry was no
> > > > > > longer on the list, then presumably it has already been cleaned up and
> > > > > > the vfsmount reference put.
> > > > > I think the issue here is not vfsmount reference count. The issue is that
> > > > > we could not find a nfsd4_ssc_umount_item on the list that matches the
> > > > > vfsmount ss_mnt. So the question is what should we do in this case?
> > > > > 
> > > > > Prior to this patch, when we hit this scenario we just go ahead and
> > > > > unmount the ss_mnt there since it won't be unmounted by the laundromat
> > > > > (it's not on the delayed unmount list).
> > > > > 
> > > > > With this patch, we don't even unmount the ss_mnt, we just do a pr_warn.
> > > > > 
> > > > > I'd prefer to go back to the previous code to do the unmount and also
> > > > > do a pr_warn.
> > > > > 
> > > > > > It's still a bug though since we _should_ still have a reference to the
> > > > > > nfsd4_ssc_umount_item at this point. So this is really just a potential
> > > > > > use-after-free.
> > > > > The ss_mnt still might have a reference on the nfsd4_ssc_umount_item
> > > > > but we just can't find it on the list. Even though the possibility for
> > > > > this to happen is from slim to none, we still have to check for it.
> > > > > 
> > > > > > FWIW, the object handling here is somewhat weird as the copy operation
> > > > > > holds a reference to the nfsd4_ssc_umount_item but passes around a
> > > > > > pointer to the vfsmount
> > > > > > 
> > > > > > I have to wonder if it'd be cleaner to have nfsd4_setup_inter_ssc pass
> > > > > > back a pointer to the nfsd4_ssc_umount_item, so you could pass that to
> > > > > > nfsd4_cleanup_inter_ssc and skip searching for it again at cleanup time.
> > > > > Yes, I think returning a pointer to the nfsd4_ssc_umount_item approach
> > > > > would be better.  We won't have to deal with the situation where we can't
> > > > > find an item on the list (even though it almost never happen).
> > > > > 
> > > > > Can we do this enhancement after fixing this use-after-free problem, in
> > > > > a separate patch series?
> > > ^^^
> > > I think that'd be best.
> > > 
> > > > Is there a reason not fix it correctly now?
> > > > 
> > > > I'd rather not merge a fix that leaves the possibility of a leak.
> > > We're going to need to backport this to earlier kernels and it'll need
> > > to go in soon. I think it'd be to take a minimal fix for the reported
> > > crash, and then Dai can have some time to do a larger cleanup.
> > Backporting is important, of course.
> > 
> > What I was hearing was that the simple fix couldn't be done without
> > introducing a leak or UAF.
> 
> No, this is not true. This fix is independent of the enhancement
> suggested by Jeff to deal with the way the vfsmount is passed around
> to avoid the condition where the ni item is not found on the list.
> 
> -Dai

Yep. Dai's patch should fix the reported bug. It just took me a bit to
figure that out because of the unconventional way that the reference to
the nsui is handled. That can and should be addressed in later patches.

[1/1] NFSD: fix use-after-free in __nfs42_ssc_open()

Commit Message

Comments

Patch