[1/1] NFSv4.1 fix a kswap nfs4_state_manger race

On Mon, 2013-11-25 at 20:29 +0000, Adamson, Andy wrote:
> On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
>  wrote:
> 
> > 
> > On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote:
> > 
> >> 
> >> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
> >> wrote:
> >> 
> >>> 
> >>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote:
> >>> 
> >>>> 
> >>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
> >>>> wrote:
> >>>> 
> >>>>> 
> >>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote:
> >>>>> 
> >>>>>> 
> >>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote:
> >>>>>> 
> >>>>>>> From: Andy Adamson <andros@netapp.com>
> >>>>>>> 
> >>>>>>> The state manager is recovering expired state and recovery OPENs are being
> >>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur
> >>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
> >>>>>>> resultant layoutreturn gets an error that the state mangager is to handle,
> >>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
> >>>>>>> 
> >>>>>>> At the same time an open is waiting for the inode deletion to complete in
> >>>>>>> __wait_on_freeing_inode.
> >>>>>>> 
> >>>>>>> If the open is either the open called by the state manager, or an open from
> >>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the
> >>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
> >>>>>>> then the state is deadlocked with kswapd.
> >>>>>>> 
> >>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode.
> >>>>>> 
> >>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost.
> >>>>>> 
> >>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases?
> >>>>> 
> >>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode…
> >>>> 
> >>>> Yes, good point.  kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running.
> >>>> 
> >>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case.
> >>> 
> >>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here.
> >> 
> >> Except we just changed that to fix a different state manager hang:
> >> 
> >> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0
> >> Author: Andy Adamson <andros@netapp.com>
> >> Date:   Fri Nov 15 16:36:16 2013 -0500
> >> 
> >>   NFSv4 wait on recovery for async session errors
> > 
> > Right, but that won’t prevent nfs4_evict_inode from completing,
> 
> Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem
> 
> -->Andy
> 
> > and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread.

How about something like the attached...

[1/1] NFSv4.1 fix a kswap nfs4_state_manger race

Commit Message

Patch