[2/2] NFS: Limit the number of retries

Message ID	20201102162438.14034-3-chenwenle@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=KohJ=EI=vger.kernel.org=linux-nfs-owner@kernel.org> From: Wenle Chen <solomonchenclever@gmail.com> To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, chenwenle@huawei.com, solachenclever@hotmail.com, nixiaoming@huawei.com, solachenclever@gmail.com Subject: [PATCH 2/2] NFS: Limit the number of retries Date: Tue, 3 Nov 2020 00:24:38 +0800 Message-Id: <20201102162438.14034-3-chenwenle@huawei.com> In-Reply-To: <20201102162438.14034-1-chenwenle@huawei.com> References: <20201102162438.14034-1-chenwenle@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	NFS: Delay Retry Proposed Changes \| expand [0/2] NFS: Delay Retry Proposed Changes [1/2] NFS: Reduce redundant comparison [2/2] NFS: Limit the number of retries

Message ID

20201102162438.14034-3-chenwenle@huawei.com (mailing list archive)

State

New, archived

Headers

From: Wenle Chen <solomonchenclever@gmail.com>
To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
        chenwenle@huawei.com, solachenclever@hotmail.com,
        nixiaoming@huawei.com, solachenclever@gmail.com
Subject: [PATCH 2/2] NFS: Limit the number of retries
Date: Tue,  3 Nov 2020 00:24:38 +0800
Message-Id: <20201102162438.14034-3-chenwenle@huawei.com>
In-Reply-To: <20201102162438.14034-1-chenwenle@huawei.com>
References: <20201102162438.14034-1-chenwenle@huawei.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

NFS: Delay Retry Proposed Changes | expand

Comments

Trond Myklebust Nov. 2, 2020, 5:45 p.m. UTC | #1

On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
>   We can't wait forever, even if the state
> is always delayed.
> 
> Signed-off-by: Wenle Chen <chenwenle@huawei.com>
> ---
>  fs/nfs/nfs4proc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index f6b5dc792b33..bb2316bf13f6 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
> file_lock *fl, struct nfs4_state *state,
>  {
>         struct nfs_server *server = NFS_SERVER(state->inode);
>         int err;
> +       int retry = 3;
>  
>         err = nfs4_set_lock_state(state, fl);
>         if (err != 0)
>                 return err;
>         do {
>                 err = _nfs4_do_setlk(state, F_SETLK, fl,
> NFS_LOCK_NEW);
> -               if (err != -NFS4ERR_DELAY)
> +               if (err != -NFS4ERR_DELAY || retry == 0)
>                         break;
>                 ssleep(1);
> +               --retry;
>         } while (1);
>         return nfs4_handle_delegation_recall_error(server, state,
> stateid, fl, err);
>  }

This patch will just cause the locks to be silently lost, no?

Wenle Chen Nov. 4, 2020, 11:33 a.m. UTC | #2

Trond Myklebust 於 2020/11/3 上午1:45 寫道:
> On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
>>    We can't wait forever, even if the state
>> is always delayed.
>>
>> Signed-off-by: Wenle Chen <chenwenle@huawei.com>
>> ---
>>   fs/nfs/nfs4proc.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index f6b5dc792b33..bb2316bf13f6 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
>> file_lock *fl, struct nfs4_state *state,
>>   {
>>          struct nfs_server *server = NFS_SERVER(state->inode);
>>          int err;
>> +       int retry = 3;
>>   
>>          err = nfs4_set_lock_state(state, fl);
>>          if (err != 0)
>>                  return err;
>>          do {
>>                  err = _nfs4_do_setlk(state, F_SETLK, fl,
>> NFS_LOCK_NEW);
>> -               if (err != -NFS4ERR_DELAY)
>> +               if (err != -NFS4ERR_DELAY || retry == 0)
>>                          break;
>>                  ssleep(1);
>> +               --retry;
>>          } while (1);
>>          return nfs4_handle_delegation_recall_error(server, state,
>> stateid, fl, err);
>>   }
> 
> This patch will just cause the locks to be silently lost, no?
> 
This loop was introduced in commit 3d7a9520f0c3e to simplify the delay 
retry loop. Before this, the function nfs4_lock_delegation_recall would 
return a -EAGAIN to do a whole retry loop.

When we retried three times and waited three seconds, it was still in 
delay. I think we can get a whole loop and check the other points if it 
was changed or not. It is just a proposal.

Olga Kornievskaia Nov. 4, 2020, 1:22 p.m. UTC | #3

On Wed, Nov 4, 2020 at 6:36 AM Wenle Chen <solomonchenclever@gmail.com> wrote:
>
>
>
> Trond Myklebust 於 2020/11/3 上午1:45 寫道:
> > On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
> >>    We can't wait forever, even if the state
> >> is always delayed.
> >>
> >> Signed-off-by: Wenle Chen <chenwenle@huawei.com>
> >> ---
> >>   fs/nfs/nfs4proc.c | 4 +++-
> >>   1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >> index f6b5dc792b33..bb2316bf13f6 100644
> >> --- a/fs/nfs/nfs4proc.c
> >> +++ b/fs/nfs/nfs4proc.c
> >> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
> >> file_lock *fl, struct nfs4_state *state,
> >>   {
> >>          struct nfs_server *server = NFS_SERVER(state->inode);
> >>          int err;
> >> +       int retry = 3;
> >>
> >>          err = nfs4_set_lock_state(state, fl);
> >>          if (err != 0)
> >>                  return err;
> >>          do {
> >>                  err = _nfs4_do_setlk(state, F_SETLK, fl,
> >> NFS_LOCK_NEW);
> >> -               if (err != -NFS4ERR_DELAY)
> >> +               if (err != -NFS4ERR_DELAY || retry == 0)
> >>                          break;
> >>                  ssleep(1);
> >> +               --retry;
> >>          } while (1);
> >>          return nfs4_handle_delegation_recall_error(server, state,
> >> stateid, fl, err);
> >>   }
> >
> > This patch will just cause the locks to be silently lost, no?
> >
> This loop was introduced in commit 3d7a9520f0c3e to simplify the delay
> retry loop. Before this, the function nfs4_lock_delegation_recall would
> return a -EAGAIN to do a whole retry loop.

This commit was not simplifying retry but actually handling the error.
Without it the error isn't handled and client falsely thinks it holds
the lock. Limiting the number of retries as Trond points out would
lead to the same problem which in the end is data corruption.
Alternative would be to fail the application. However ERR_DELAY is a
transient error and the server would, when ready, return something
else. If server is broken and continues to do so then the server needs
to be fix (client isn't coded to the broken server). I don't see a
good argument for limiting the number of re-tries.

> When we retried three times and waited three seconds, it was still in
> delay. I think we can get a whole loop and check the other points if it
> was changed or not. It is just a proposal.

Wenle Chen Nov. 4, 2020, 3:51 p.m. UTC | #4

Olga Kornievskaia 於 2020/11/4 下午9:22 寫道:
> On Wed, Nov 4, 2020 at 6:36 AM Wenle Chen <solomonchenclever@gmail.com> wrote:
>>
>>
>>
>> Trond Myklebust 於 2020/11/3 上午1:45 寫道:
>>> On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
>>>>     We can't wait forever, even if the state
>>>> is always delayed.
>>>>
>>>> Signed-off-by: Wenle Chen <chenwenle@huawei.com>
>>>> ---
>>>>    fs/nfs/nfs4proc.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index f6b5dc792b33..bb2316bf13f6 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
>>>> file_lock *fl, struct nfs4_state *state,
>>>>    {
>>>>           struct nfs_server *server = NFS_SERVER(state->inode);
>>>>           int err;
>>>> +       int retry = 3;
>>>>
>>>>           err = nfs4_set_lock_state(state, fl);
>>>>           if (err != 0)
>>>>                   return err;
>>>>           do {
>>>>                   err = _nfs4_do_setlk(state, F_SETLK, fl,
>>>> NFS_LOCK_NEW);
>>>> -               if (err != -NFS4ERR_DELAY)
>>>> +               if (err != -NFS4ERR_DELAY || retry == 0)
>>>>                           break;
>>>>                   ssleep(1);
>>>> +               --retry;
>>>>           } while (1);
>>>>           return nfs4_handle_delegation_recall_error(server, state,
>>>> stateid, fl, err);
>>>>    }
>>>
>>> This patch will just cause the locks to be silently lost, no?
>>>
>> This loop was introduced in commit 3d7a9520f0c3e to simplify the delay
>> retry loop. Before this, the function nfs4_lock_delegation_recall would
>> return a -EAGAIN to do a whole retry loop.
> 
> This commit was not simplifying retry but actually handling the error.
> Without it the error isn't handled and client falsely thinks it holds
> the lock. Limiting the number of retries as Trond points out would
> lead to the same problem which in the end is data corruption.
> Alternative would be to fail the application. However ERR_DELAY is a
> transient error and the server would, when ready, return something
> else. If server is broken and continues to do so then the server needs
> to be fix (client isn't coded to the broken server). I don't see a
> good argument for limiting the number of re-tries.
> 
>> When we retried three times and waited three seconds, it was still in
>> delay. I think we can get a whole loop and check the other points if it
>> was changed or not. It is just a proposal.
In the function nfs_end_delegation_return, it would get the return 
err=-EAGAIN and check the client is active and get a retry. I has so 
thought. Maybe I think wrong. I will understand more carefully. Thinks.

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f6b5dc792b33..bb2316bf13f6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7390,15 +7390,17 @@  int nfs4_lock_delegation_recall(struct file_lock *fl, struct nfs4_state *state,
 {
 	struct nfs_server *server = NFS_SERVER(state->inode);
 	int err;
+	int retry = 3;
 
 	err = nfs4_set_lock_state(state, fl);
 	if (err != 0)
 		return err;
 	do {
 		err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW);
-		if (err != -NFS4ERR_DELAY)
+		if (err != -NFS4ERR_DELAY || retry == 0)
 			break;
 		ssleep(1);
+		--retry;
 	} while (1);
 	return nfs4_handle_delegation_recall_error(server, state, stateid, fl, err);
 }

[2/2] NFS: Limit the number of retries

Commit Message

Comments

Patch