[v7,13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks
diff mbox

Message ID BDCCD810-781F-4DD6-91E8-279A2C3377EF@redhat.com
State New
Headers show

Commit Message

Benjamin Coddington Nov. 10, 2016, 8:18 p.m. UTC
On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:

> Hi Anna,
>
> On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
>> Do you have an estimate for when this patch will be ready?  I want to 
>> include it in my next bugfix pull request for 4.9.
>
> I haven't posted because I am still trying to get to the bottom of 
> another
> problem where the client gets stuck in a loop sending the same stateid 
> over
> and over on NFS4ERR_OLD_STATEID.  I want to make sure this problem 
> isn't
> caused by this fix -- which I don't think it is, but I'd rather make 
> sure.
> If I don't make any progress on this problem by the end of today, I'll 
> post
> what I have.
>
> Read on if interested in this new problem:
>
> It looks like racing opens with the same openowner can be returned out 
> of
> order by the server, so the client sees stateid seqid of 2 before 1.  
> Then a
> LOCK sent with seqid 1 is endlessly retried if sent while doing 
> recovery.
>
> It's hard to tell if I was able to capture all the moving parts to 
> describe
> this problem, though.  As it takes a very long time for me to 
> reproduce, and
> the packet captures were dropping frames.  I'm working on manually
> reproducing it now.

Anna,

I haven't gotten to the bottom of it, and so I'm not confident it isn't 
a
problem created by the fix I've been testing, which is:

                 if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
                         struct rpc_cred *cred =
lsp->ls_state->owner->so_cred;

@@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
nfs4_state *state)
                                 break;
                         }
                 }
-       };
+               nfs4_put_lock_state(lsp);
+               spin_lock(&state->state_lock);
+       }
+       spin_unlock(&state->state_lock);
  out:
         return ret;
  }

http://people.redhat.com/bcodding/old_stateid_loop is tshark output of 
my
only good wirecapture of the problem.  Without this patch, generic/089
crashes long before this problem is reproduced, so I am stuck figuring 
it
out, I'm afraid.  Don't wait on my account.

I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
write about it under separate cover.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Anna Schumaker Nov. 10, 2016, 8:54 p.m. UTC | #1
On 11/10/2016 03:18 PM, Benjamin Coddington wrote:
> 
> On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:
> 
>> Hi Anna,
>>
>> On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
>>> Do you have an estimate for when this patch will be ready?  I want to include it in my next bugfix pull request for 4.9.
>>
>> I haven't posted because I am still trying to get to the bottom of another
>> problem where the client gets stuck in a loop sending the same stateid over
>> and over on NFS4ERR_OLD_STATEID.  I want to make sure this problem isn't
>> caused by this fix -- which I don't think it is, but I'd rather make sure.
>> If I don't make any progress on this problem by the end of today, I'll post
>> what I have.
>>
>> Read on if interested in this new problem:
>>
>> It looks like racing opens with the same openowner can be returned out of
>> order by the server, so the client sees stateid seqid of 2 before 1.  Then a
>> LOCK sent with seqid 1 is endlessly retried if sent while doing recovery.
>>
>> It's hard to tell if I was able to capture all the moving parts to describe
>> this problem, though.  As it takes a very long time for me to reproduce, and
>> the packet captures were dropping frames.  I'm working on manually
>> reproducing it now.
> 
> Anna,
> 
> I haven't gotten to the bottom of it, and so I'm not confident it isn't a
> problem created by the fix I've been testing, which is:
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e809498..2aa9d86 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct
> nfs4_state *state)
>  static int nfs41_check_expired_locks(struct nfs4_state *state)
>  {
>         int status, ret = NFS_OK;
> -       struct nfs4_lock_state *lsp;
> +       struct nfs4_lock_state *lsp, *tmp;
>         struct nfs_server *server = NFS_SERVER(state->inode);
> 
>         if (!test_bit(LK_STATE_IN_USE, &state->flags))
>                 goto out;
> -       list_for_each_entry(lsp, &state->lock_states, ls_locks) {
> +       spin_lock(&state->state_lock);
> +       list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) {
> +               atomic_inc(&lsp->ls_count);
> +               spin_unlock(&state->state_lock);
>                 if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
>                         struct rpc_cred *cred =
> lsp->ls_state->owner->so_cred;
> 
> @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
> nfs4_state *state)
>                                 break;
>                         }
>                 }
> -       };
> +               nfs4_put_lock_state(lsp);
> +               spin_lock(&state->state_lock);
> +       }
> +       spin_unlock(&state->state_lock);
>  out:
>         return ret;
>  }
> 
> http://people.redhat.com/bcodding/old_stateid_loop is tshark output of my
> only good wirecapture of the problem.  Without this patch, generic/089
> crashes long before this problem is reproduced, so I am stuck figuring it
> out, I'm afraid.  Don't wait on my account.
> 
> I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
> write about it under separate cover.

Sounds good.  Thanks for the update!

Anna

> 
> Ben

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e809498..2aa9d86 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2564,12 +2564,15 @@  static void 
nfs41_check_delegation_stateid(struct
nfs4_state *state)
  static int nfs41_check_expired_locks(struct nfs4_state *state)
  {
         int status, ret = NFS_OK;
-       struct nfs4_lock_state *lsp;
+       struct nfs4_lock_state *lsp, *tmp;
         struct nfs_server *server = NFS_SERVER(state->inode);

         if (!test_bit(LK_STATE_IN_USE, &state->flags))
                 goto out;
-       list_for_each_entry(lsp, &state->lock_states, ls_locks) {
+       spin_lock(&state->state_lock);
+       list_for_each_entry_safe(lsp, tmp, &state->lock_states, 
ls_locks) {
+               atomic_inc(&lsp->ls_count);
+               spin_unlock(&state->state_lock);