diff mbox series

[1/2,v3] NFSv4: Fix a livelock when CLOSE pre-emptively bumps state sequence

Message ID 787d0d4946efb286f4dc51051b048277c0dc697e.1600882430.git.bcodding@redhat.com
State New
Headers show
Series a stateid race and a cleanup | expand

Commit Message

Benjamin Coddington Sept. 23, 2020, 5:37 p.m. UTC
Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a CLOSE races
with the update of the nfs_state:

Process 1           Process 2           Server
=========           =========           ========
 OPEN file
                    OPEN file
                                        Reply OPEN (1)
                                        Reply OPEN (2)
 Update state (1)
 CLOSE file (1)
                                        Reply OLD_STATEID (1)
 CLOSE file (2)
                                        Reply CLOSE (-1)
                    Update state (2)
                    wait for state change
 OPEN file
                    wake
 CLOSE file
 OPEN file
                    wake
 CLOSE file
 ...
                    ...

As long as the first process continues updating state, the second process
will fail to exit the loop in nfs_set_open_stateid_locked().  This livelock
has been observed in generic/168.

Fix this by detecting the case in nfs_need_update_open_stateid() and
then exit the loop if:
 - the state is NFS_OPEN_STATE, and
 - the stateid sequence is > 1, and
 - the stateid doesn't match the current open stateid

Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/nfs4proc.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Trond Myklebust Sept. 23, 2020, 6:53 p.m. UTC | #1
On Wed, 2020-09-23 at 13:37 -0400, Benjamin Coddington wrote:
> Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a CLOSE
> races
> with the update of the nfs_state:
> 
> Process 1           Process 2           Server
> =========           =========           ========
>  OPEN file
>                     OPEN file
>                                         Reply OPEN (1)
>                                         Reply OPEN (2)
>  Update state (1)
>  CLOSE file (1)
>                                         Reply OLD_STATEID (1)
>  CLOSE file (2)
>                                         Reply CLOSE (-1)
>                     Update state (2)
>                     wait for state change
>  OPEN file
>                     wake
>  CLOSE file
>  OPEN file
>                     wake
>  CLOSE file
>  ...
>                     ...
> 
> As long as the first process continues updating state, the second
> process
> will fail to exit the loop in nfs_set_open_stateid_locked().  This
> livelock
> has been observed in generic/168.
> 
> Fix this by detecting the case in nfs_need_update_open_stateid() and
> then exit the loop if:
>  - the state is NFS_OPEN_STATE, and
>  - the stateid sequence is > 1, and
>  - the stateid doesn't match the current open stateid
> 
> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> CLOSE/OPEN_DOWNGRADE")
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
>  fs/nfs/nfs4proc.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6e95c85fe395..8c2bb91127ee 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1588,19 +1588,21 @@ static void
> nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
>  static bool nfs_need_update_open_stateid(struct nfs4_state *state,
>  		const nfs4_stateid *stateid)
>  {
> -	if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 ||
> -	    !nfs4_stateid_match_other(stateid, &state->open_stateid)) {
> +	if (test_bit(NFS_OPEN_STATE, &state->flags)) {
> +		/* The common case - we're updating to a new sequence
> number */
> +		if (nfs4_stateid_match_other(stateid, &state-
> >open_stateid) &&
> +			nfs4_stateid_is_newer(stateid, &state-
> >open_stateid)) {
> +			nfs_state_log_out_of_order_open_stateid(state,
> stateid);
> +			return true;
> +		}
> +	} else {
> +		/* This is the first OPEN */
>  		if (stateid->seqid == cpu_to_be32(1))
>  			nfs_state_log_update_open_stateid(state);
>  		else
>  			set_bit(NFS_STATE_CHANGE_WAIT, &state->flags);

Isn't this going to cause a reopen of the file on the client if it ends
up processing the reply to the second OPEN after it processes the
successful CLOSE?

Isn't the real problem here rather that the reply to CLOSE needs to be
processed in order too?

>  		return true;
>  	}
> -
> -	if (nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
> -		nfs_state_log_out_of_order_open_stateid(state,
> stateid);
> -		return true;
> -	}
>  	return false;
>  }
>
Benjamin Coddington Sept. 23, 2020, 7:29 p.m. UTC | #2
On 23 Sep 2020, at 14:53, Trond Myklebust wrote:

> On Wed, 2020-09-23 at 13:37 -0400, Benjamin Coddington wrote:
>> Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
>> CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a CLOSE
>> races
>> with the update of the nfs_state:
>>
>> Process 1           Process 2           Server
>> =========           =========           ========
>>  OPEN file
>>                     OPEN file
>>                                         Reply OPEN (1)
>>                                         Reply OPEN (2)
>>  Update state (1)
>>  CLOSE file (1)
>>                                         Reply OLD_STATEID (1)
>>  CLOSE file (2)
>>                                         Reply CLOSE (-1)
>>                     Update state (2)
>>                     wait for state change
>>  OPEN file
>>                     wake
>>  CLOSE file
>>  OPEN file
>>                     wake
>>  CLOSE file
>>  ...
>>                     ...
>>
>> As long as the first process continues updating state, the second
>> process
>> will fail to exit the loop in nfs_set_open_stateid_locked().  This
>> livelock
>> has been observed in generic/168.
>>
>> Fix this by detecting the case in nfs_need_update_open_stateid() and
>> then exit the loop if:
>>  - the state is NFS_OPEN_STATE, and
>>  - the stateid sequence is > 1, and
>>  - the stateid doesn't match the current open stateid
>>
>> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
>> CLOSE/OPEN_DOWNGRADE")
>> Cc: stable@vger.kernel.org # v5.4+
>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> ---
>>  fs/nfs/nfs4proc.c | 16 +++++++++-------
>>  1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 6e95c85fe395..8c2bb91127ee 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1588,19 +1588,21 @@ static void
>> nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
>>  static bool nfs_need_update_open_stateid(struct nfs4_state *state,
>>  		const nfs4_stateid *stateid)
>>  {
>> -	if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 ||
>> -	    !nfs4_stateid_match_other(stateid, &state->open_stateid)) {
>> +	if (test_bit(NFS_OPEN_STATE, &state->flags)) {
>> +		/* The common case - we're updating to a new sequence
>> number */
>> +		if (nfs4_stateid_match_other(stateid, &state-
>>> open_stateid) &&
>> +			nfs4_stateid_is_newer(stateid, &state-
>>> open_stateid)) {
>> +			nfs_state_log_out_of_order_open_stateid(state,
>> stateid);
>> +			return true;
>> +		}
>> +	} else {
>> +		/* This is the first OPEN */
>>  		if (stateid->seqid == cpu_to_be32(1))
>>  			nfs_state_log_update_open_stateid(state);
>>  		else
>>  			set_bit(NFS_STATE_CHANGE_WAIT, &state->flags);
>
> Isn't this going to cause a reopen of the file on the client if it ends
> up processing the reply to the second OPEN after it processes the
> successful CLOSE?

Yes, that's true - but that's a different bug that I haven't noticed or
considered.  This patch isn't introducing it.

> Isn't the real problem here rather that the reply to CLOSE needs to be
> processed in order too?

Not just the reply, the actual request as well.  If we have a way to
properly serialize procssing of CLOSE responses, we could just not send the
CLOSE in the first place.

I'd rather not send the CLOSE if there's another OPEN in play, and if that's
the barrier to getting this particular bug fixed, I'll work on that.  What
mechanism can be used?  What if the client kept a separate "pending" stateid
that could be updated before each operation that would attempt to predict
what the server's resulting change would be?

Maybe better would be a counter that gets incremented for each transition
to/from NFS_OPEN_STATE so we can check if the stateid is in the same
generation and a counter for outstanding operations that are expected to
bump the sequence.
Trond Myklebust Sept. 23, 2020, 7:39 p.m. UTC | #3
On Wed, 2020-09-23 at 15:29 -0400, Benjamin Coddington wrote:
> On 23 Sep 2020, at 14:53, Trond Myklebust wrote:
> 
> > On Wed, 2020-09-23 at 13:37 -0400, Benjamin Coddington wrote:
> > > Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a
> > > CLOSE
> > > races
> > > with the update of the nfs_state:
> > > 
> > > Process 1           Process 2           Server
> > > =========           =========           ========
> > >  OPEN file
> > >                     OPEN file
> > >                                         Reply OPEN (1)
> > >                                         Reply OPEN (2)
> > >  Update state (1)
> > >  CLOSE file (1)
> > >                                         Reply OLD_STATEID (1)
> > >  CLOSE file (2)
> > >                                         Reply CLOSE (-1)
> > >                     Update state (2)
> > >                     wait for state change
> > >  OPEN file
> > >                     wake
> > >  CLOSE file
> > >  OPEN file
> > >                     wake
> > >  CLOSE file
> > >  ...
> > >                     ...
> > > 
> > > As long as the first process continues updating state, the second
> > > process
> > > will fail to exit the loop in
> > > nfs_set_open_stateid_locked().  This
> > > livelock
> > > has been observed in generic/168.
> > > 
> > > Fix this by detecting the case in nfs_need_update_open_stateid()
> > > and
> > > then exit the loop if:
> > >  - the state is NFS_OPEN_STATE, and
> > >  - the stateid sequence is > 1, and
> > >  - the stateid doesn't match the current open stateid
> > > 
> > > Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE")
> > > Cc: stable@vger.kernel.org # v5.4+
> > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > > ---
> > >  fs/nfs/nfs4proc.c | 16 +++++++++-------
> > >  1 file changed, 9 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index 6e95c85fe395..8c2bb91127ee 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -1588,19 +1588,21 @@ static void
> > > nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
> > >  static bool nfs_need_update_open_stateid(struct nfs4_state
> > > *state,
> > >  		const nfs4_stateid *stateid)
> > >  {
> > > -	if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 ||
> > > -	    !nfs4_stateid_match_other(stateid, &state->open_stateid)) {
> > > +	if (test_bit(NFS_OPEN_STATE, &state->flags)) {
> > > +		/* The common case - we're updating to a new sequence
> > > number */
> > > +		if (nfs4_stateid_match_other(stateid, &state-
> > > > open_stateid) &&
> > > +			nfs4_stateid_is_newer(stateid, &state-
> > > > open_stateid)) {
> > > +			nfs_state_log_out_of_order_open_stateid(state,
> > > stateid);
> > > +			return true;
> > > +		}
> > > +	} else {
> > > +		/* This is the first OPEN */
> > >  		if (stateid->seqid == cpu_to_be32(1))
> > >  			nfs_state_log_update_open_stateid(state);
> > >  		else
> > >  			set_bit(NFS_STATE_CHANGE_WAIT, &state->flags);
> > 
> > Isn't this going to cause a reopen of the file on the client if it
> > ends
> > up processing the reply to the second OPEN after it processes the
> > successful CLOSE?
> 
> Yes, that's true - but that's a different bug that I haven't noticed
> or
> considered.  This patch isn't introducing it.
> 
> > Isn't the real problem here rather that the reply to CLOSE needs to
> > be
> > processed in order too?
> 
> Not just the reply, the actual request as well.  If we have a way to
> properly serialize procssing of CLOSE responses, we could just not
> send the
> CLOSE in the first place.
> 
> I'd rather not send the CLOSE if there's another OPEN in play, and if
> that's
> the barrier to getting this particular bug fixed, I'll work on
> that.  What
> mechanism can be used?  What if the client kept a separate "pending"
> stateid
> that could be updated before each operation that would attempt to
> predict
> what the server's resulting change would be?
> 
> Maybe better would be a counter that gets incremented for each
> transition
> to/from NFS_OPEN_STATE so we can check if the stateid is in the same
> generation and a counter for outstanding operations that are expected
> to
> bump the sequence.
> 

The client can't predict what is going to happen w.r.t. an OPEN call.
If it does an open by name, it doesn't even know which file is going to
get opened. That's why we have the wait loop
in nfs_set_open_stateid_locked(). Why should we not do the same in
CLOSE and OPEN_DOWNGRADE? It's the same problem.
Benjamin Coddington Sept. 23, 2020, 7:46 p.m. UTC | #4
On 23 Sep 2020, at 15:39, Trond Myklebust wrote:
> The client can't predict what is going to happen w.r.t. an OPEN call.
> If it does an open by name, it doesn't even know which file is going to
> get opened. That's why we have the wait loop
> in nfs_set_open_stateid_locked(). Why should we not do the same in
> CLOSE and OPEN_DOWNGRADE? It's the same problem.

I will give it a shot.  In the meantime, please consider adding this patch
which fixes a real bug today.  Thank you for your excellent advice and time.

Ben
Trond Myklebust Sept. 23, 2020, 7:55 p.m. UTC | #5
On Wed, 2020-09-23 at 15:46 -0400, Benjamin Coddington wrote:
> On 23 Sep 2020, at 15:39, Trond Myklebust wrote:
> > The client can't predict what is going to happen w.r.t. an OPEN
> > call.
> > If it does an open by name, it doesn't even know which file is
> > going to
> > get opened. That's why we have the wait loop
> > in nfs_set_open_stateid_locked(). Why should we not do the same in
> > CLOSE and OPEN_DOWNGRADE? It's the same problem.
> 
> I will give it a shot.  In the meantime, please consider adding this
> patch
> which fixes a real bug today.  Thank you for your excellent advice
> and time.
> 

I don't think we should take that patch, and certainly not as a stable
patch. I'd prefer to wait for the real fix.
diff mbox series

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6e95c85fe395..8c2bb91127ee 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1588,19 +1588,21 @@  static void nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
 static bool nfs_need_update_open_stateid(struct nfs4_state *state,
 		const nfs4_stateid *stateid)
 {
-	if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 ||
-	    !nfs4_stateid_match_other(stateid, &state->open_stateid)) {
+	if (test_bit(NFS_OPEN_STATE, &state->flags)) {
+		/* The common case - we're updating to a new sequence number */
+		if (nfs4_stateid_match_other(stateid, &state->open_stateid) &&
+			nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
+			nfs_state_log_out_of_order_open_stateid(state, stateid);
+			return true;
+		}
+	} else {
+		/* This is the first OPEN */
 		if (stateid->seqid == cpu_to_be32(1))
 			nfs_state_log_update_open_stateid(state);
 		else
 			set_bit(NFS_STATE_CHANGE_WAIT, &state->flags);
 		return true;
 	}
-
-	if (nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
-		nfs_state_log_out_of_order_open_stateid(state, stateid);
-		return true;
-	}
 	return false;
 }