[1/1] NFSD: fix error handling in callbacks

Message ID	20210309144127.57833-1-olga.kornievskaia@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Olga Kornievskaia <olga.kornievskaia@gmail.com> To: bfields@redhat.com, chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org Subject: [PATCH 1/1] NFSD: fix error handling in callbacks Date: Tue, 9 Mar 2021 09:41:27 -0500 Message-Id: <20210309144127.57833-1-olga.kornievskaia@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[1/1] NFSD: fix error handling in callbacks \| expand [1/1] NFSD: fix error handling in callbacks

Olga Kornievskaia March 9, 2021, 2:41 p.m. UTC

From: Olga Kornievskaia <kolga@netapp.com>

When the server tries to do a callback and a client fails it due to
authentication problems, we need the server to set callback down
flag in RENEW so that client can recover.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4callback.c | 1 +
 1 file changed, 1 insertion(+)

Bruce Fields March 9, 2021, 3:37 p.m. UTC | #1

On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> When the server tries to do a callback and a client fails it due to
> authentication problems, we need the server to set callback down
> flag in RENEW so that client can recover.

I was looking at this.  It looks to me like this should really be just:

	case 1:
		if (task->tk_status)
			nfsd4_mark_cb_down(clp, task->tk_status);

If tk_status showed an error, and the ->done method doesn't return 0 to
tell us it something worth retrying, then the callback failed
permanently, so we should mark the callback path down, regardless of the
exact error.

--b.

> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4callback.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 052be5bf9ef5..7325592b456e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>  		switch (task->tk_status) {
>  		case -EIO:
>  		case -ETIMEDOUT:
> +		case -EACCES:
>  			nfsd4_mark_cb_down(clp, task->tk_status);
>  		}
>  		break;
> -- 
> 2.27.0
>

Olga Kornievskaia March 9, 2021, 4:39 p.m. UTC | #2

On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
>
> I was looking at this.  It looks to me like this should really be just:
>
>         case 1:
>                 if (task->tk_status)
>                         nfsd4_mark_cb_down(clp, task->tk_status);
>
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

Ok. v2 coming (will change the title to make it 4.0 callback)

>
> --b.
>
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> >               switch (task->tk_status) {
> >               case -EIO:
> >               case -ETIMEDOUT:
> > +             case -EACCES:
> >                       nfsd4_mark_cb_down(clp, task->tk_status);
> >               }
> >               break;
> > --
> > 2.27.0
> >
>

Olga Kornievskaia March 9, 2021, 4:46 p.m. UTC | #3

On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this.  It looks to me like this should really be just:
> >
> >         case 1:
> >                 if (task->tk_status)
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
>
> Ok. v2 coming (will change the title to make it 4.0 callback)

Sigh, I didn't change the wording of the commit and left the
authentication problem which is not accurate enough for this patch (as
say connection errors are also covered by this patch). Do you need me
to change the wording of the commit and send v3?

>
> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > >               switch (task->tk_status) {
> > >               case -EIO:
> > >               case -ETIMEDOUT:
> > > +             case -EACCES:
> > >                       nfsd4_mark_cb_down(clp, task->tk_status);
> > >               }
> > >               break;
> > > --
> > > 2.27.0
> > >
> >

Chuck Lever March 9, 2021, 5:42 p.m. UTC | #4

> On Mar 9, 2021, at 11:46 AM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
>> 
>> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
>>> 
>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>> 
>>>> When the server tries to do a callback and a client fails it due to
>>>> authentication problems, we need the server to set callback down
>>>> flag in RENEW so that client can recover.
>>> 
>>> I was looking at this.  It looks to me like this should really be just:
>>> 
>>>        case 1:
>>>                if (task->tk_status)
>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
>>> 
>>> If tk_status showed an error, and the ->done method doesn't return 0 to
>>> tell us it something worth retrying, then the callback failed
>>> permanently, so we should mark the callback path down, regardless of the
>>> exact error.
>> 
>> Ok. v2 coming (will change the title to make it 4.0 callback)
> 
> Sigh, I didn't change the wording of the commit and left the
> authentication problem which is not accurate enough for this patch (as
> say connection errors are also covered by this patch). Do you need me
> to change the wording of the commit and send v3?

Yes, please post a v3. Thanks.


>>> --b.
>>> 
>>>> 
>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>> ---
>>>> fs/nfsd/nfs4callback.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>> index 052be5bf9ef5..7325592b456e 100644
>>>> --- a/fs/nfsd/nfs4callback.c
>>>> +++ b/fs/nfsd/nfs4callback.c
>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>>>>              switch (task->tk_status) {
>>>>              case -EIO:
>>>>              case -ETIMEDOUT:
>>>> +             case -EACCES:
>>>>                      nfsd4_mark_cb_down(clp, task->tk_status);
>>>>              }
>>>>              break;
>>>> --
>>>> 2.27.0

--
Chuck Lever

Scott Mayhew March 9, 2021, 7:45 p.m. UTC | #5

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
> 
> I was looking at this.  It looks to me like this should really be just:
> 
> 	case 1:
> 		if (task->tk_status)
> 			nfsd4_mark_cb_down(clp, task->tk_status);
> 
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

That switch was added because the server was erroneously setting
SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
NFS-level error for a CB_RECALL that the client had already done a
FREE_STATEID for.  Removing the switch will cause the server to go back
to that behaviour, won't it?

-Scott
> 
> --b.
> 
> > 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> >  		switch (task->tk_status) {
> >  		case -EIO:
> >  		case -ETIMEDOUT:
> > +		case -EACCES:
> >  			nfsd4_mark_cb_down(clp, task->tk_status);
> >  		}
> >  		break;
> > -- 
> > 2.27.0
> > 
>

Bruce Fields March 9, 2021, 8:12 p.m. UTC | #6

On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> On Tue, 09 Mar 2021, J. Bruce Fields wrote:
> 
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > > 
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> > 
> > I was looking at this.  It looks to me like this should really be just:
> > 
> > 	case 1:
> > 		if (task->tk_status)
> > 			nfsd4_mark_cb_down(clp, task->tk_status);
> > 
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
> 
> That switch was added because the server was erroneously setting
> SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> NFS-level error for a CB_RECALL that the client had already done a
> FREE_STATEID for.  Removing the switch will cause the server to go back
> to that behaviour, won't it?

Oh, thanks for the history.

My knee-jerk reaction is: that sounds like a recall-specific issue, so
maybe that logic should be in nfsd4_cb_recall_done?

I guess I'm OK continuing instead to enumerate transport-layer errors in
nfsd4_cb_done, but do we know that EACCES makes it a complete list?

--b.

> 
> -Scott
> > 
> > --b.
> > 
> > > 
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > >  		switch (task->tk_status) {
> > >  		case -EIO:
> > >  		case -ETIMEDOUT:
> > > +		case -EACCES:
> > >  			nfsd4_mark_cb_down(clp, task->tk_status);
> > >  		}
> > >  		break;
> > > -- 
> > > 2.27.0
> > > 
> >

Trond Myklebust March 9, 2021, 8:22 p.m. UTC | #7

On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
> 
> I was looking at this.  It looks to me like this should really be
> just:
> 
>         case 1:
>                 if (task->tk_status)
>                         nfsd4_mark_cb_down(clp, task->tk_status);
> 
> If tk_status showed an error, and the ->done method doesn't return 0
> to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of
> the
> exact error.

I disagree. task->tk_status could be an unhandled NFSv4 error (see
nfsd4_cb_recall_done()). The client might, for instance, be in the
process of returning the delegation being recalled. Why should that
result in the callback channel being marked as down?

> 
> --b.
> 
> > 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > *task, void *calldata)
> >                 switch (task->tk_status) {
> >                 case -EIO:
> >                 case -ETIMEDOUT:
> > +               case -EACCES:
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >                 }
> >                 break;
> > -- 
> > 2.27.0
> > 
>

Scott Mayhew March 9, 2021, 8:40 p.m. UTC | #8

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> > On Tue, 09 Mar 2021, J. Bruce Fields wrote:
> > 
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > 
> > > > When the server tries to do a callback and a client fails it due to
> > > > authentication problems, we need the server to set callback down
> > > > flag in RENEW so that client can recover.
> > > 
> > > I was looking at this.  It looks to me like this should really be just:
> > > 
> > > 	case 1:
> > > 		if (task->tk_status)
> > > 			nfsd4_mark_cb_down(clp, task->tk_status);
> > > 
> > > If tk_status showed an error, and the ->done method doesn't return 0 to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless of the
> > > exact error.
> > 
> > That switch was added because the server was erroneously setting
> > SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> > NFS-level error for a CB_RECALL that the client had already done a
> > FREE_STATEID for.  Removing the switch will cause the server to go back
> > to that behaviour, won't it?
> 
> Oh, thanks for the history.
> 
> My knee-jerk reaction is: that sounds like a recall-specific issue, so
> maybe that logic should be in nfsd4_cb_recall_done?
> 
> I guess I'm OK continuing instead to enumerate transport-layer errors in
> nfsd4_cb_done, but do we know that EACCES makes it a complete list?

No idea.  I'm pretty sure EIO and ETIMEDOUT were the two errors that I
was seeing at the time.  I don't remember the exact test case but I
think I was hammering the server trying to reproduce seq misordered &
false retry errors.  And EACCES is what Olga's seeing when the server
uses the wrong principal for the callback cred.

-Scott
> 
> --b.
> 
> > 
> > -Scott
> > > 
> > > --b.
> > > 
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfsd/nfs4callback.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > > >  		switch (task->tk_status) {
> > > >  		case -EIO:
> > > >  		case -ETIMEDOUT:
> > > > +		case -EACCES:
> > > >  			nfsd4_mark_cb_down(clp, task->tk_status);
> > > >  		}
> > > >  		break;
> > > > -- 
> > > > 2.27.0
> > > > 
> > >

Olga Kornievskaia March 9, 2021, 8:41 p.m. UTC | #9

On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this.  It looks to me like this should really be
> > just:
> >
> >         case 1:
> >                 if (task->tk_status)
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0
> > to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of
> > the
> > exact error.
>
> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> nfsd4_cb_recall_done()). The client might, for instance, be in the
> process of returning the delegation being recalled. Why should that
> result in the callback channel being marked as down?
>

Are you talking about say the connection going down and server should
just reconnect instead of recovering the callback channel. I assumed
that connection break is something that's not  recoverable by the
callback but perhaps I'm wrong.

> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > *task, void *calldata)
> > >                 switch (task->tk_status) {
> > >                 case -EIO:
> > >                 case -ETIMEDOUT:
> > > +               case -EACCES:
> > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > >                 }
> > >                 break;
> > > --
> > > 2.27.0
> > >
> >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

Trond Myklebust March 9, 2021, 8:59 p.m. UTC | #10

On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > 
> > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > 
> > > > When the server tries to do a callback and a client fails it
> > > > due to
> > > > authentication problems, we need the server to set callback
> > > > down
> > > > flag in RENEW so that client can recover.
> > > 
> > > I was looking at this.  It looks to me like this should really be
> > > just:
> > > 
> > >         case 1:
> > >                 if (task->tk_status)
> > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > 
> > > If tk_status showed an error, and the ->done method doesn't
> > > return 0
> > > to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless
> > > of
> > > the
> > > exact error.
> > 
> > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > process of returning the delegation being recalled. Why should that
> > result in the callback channel being marked as down?
> > 
> 
> Are you talking about say the connection going down and server should
> just reconnect instead of recovering the callback channel. I assumed
> that connection break is something that's not  recoverable by the
> callback but perhaps I'm wrong.

No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
not seeing why either of those errors should be handled by marking the
callback channel as being down.

Looking further, it seems that the same function will also return '1'
without checking the value of task->tk_status if the delegation has
been revoked or returned. So that would mean that even NFS4ERR_DELAY
could trigger the call to nfsd4_mark_cb_down() with the above change.

> 
> > > 
> > > --b.
> > > 
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfsd/nfs4callback.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > *task, void *calldata)
> > > >                 switch (task->tk_status) {
> > > >                 case -EIO:
> > > >                 case -ETIMEDOUT:
> > > > +               case -EACCES:
> > > >                         nfsd4_mark_cb_down(clp, task-
> > > > >tk_status);
> > > >                 }
> > > >                 break;
> > > > --
> > > > 2.27.0
> > > > 
> > > 
> > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> >

Bruce Fields March 10, 2021, 2:47 p.m. UTC | #11

On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > > 
> > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > 
> > > > > When the server tries to do a callback and a client fails it
> > > > > due to
> > > > > authentication problems, we need the server to set callback
> > > > > down
> > > > > flag in RENEW so that client can recover.
> > > > 
> > > > I was looking at this.  It looks to me like this should really be
> > > > just:
> > > > 
> > > >         case 1:
> > > >                 if (task->tk_status)
> > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > 
> > > > If tk_status showed an error, and the ->done method doesn't
> > > > return 0
> > > > to
> > > > tell us it something worth retrying, then the callback failed
> > > > permanently, so we should mark the callback path down, regardless
> > > > of
> > > > the
> > > > exact error.
> > > 
> > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > process of returning the delegation being recalled. Why should that
> > > result in the callback channel being marked as down?
> > > 
> > 
> > Are you talking about say the connection going down and server should
> > just reconnect instead of recovering the callback channel. I assumed
> > that connection break is something that's not  recoverable by the
> > callback but perhaps I'm wrong.
> 
> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> not seeing why either of those errors should be handled by marking the
> callback channel as being down.
> 
> Looking further, it seems that the same function will also return '1'
> without checking the value of task->tk_status if the delegation has
> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> could trigger the call to nfsd4_mark_cb_down() with the above change.

Yeah, OK, that's wrong, apologies.

I'm just a little worried about the attempt to enumerate transport level
errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
the right list?

--b.

> 
> > 
> > > > 
> > > > --b.
> > > > 
> > > > > 
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > *task, void *calldata)
> > > > >                 switch (task->tk_status) {
> > > > >                 case -EIO:
> > > > >                 case -ETIMEDOUT:
> > > > > +               case -EACCES:
> > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > >tk_status);
> > > > >                 }
> > > > >                 break;
> > > > > --
> > > > > 2.27.0
> > > > > 
> > > > 
> > > 
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > > 
> > > 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>

Olga Kornievskaia March 10, 2021, 10:09 p.m. UTC | #12

On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > >
> > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > wrote:
> > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > >
> > > > > > When the server tries to do a callback and a client fails it
> > > > > > due to
> > > > > > authentication problems, we need the server to set callback
> > > > > > down
> > > > > > flag in RENEW so that client can recover.
> > > > >
> > > > > I was looking at this.  It looks to me like this should really be
> > > > > just:
> > > > >
> > > > >         case 1:
> > > > >                 if (task->tk_status)
> > > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > >
> > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > return 0
> > > > > to
> > > > > tell us it something worth retrying, then the callback failed
> > > > > permanently, so we should mark the callback path down, regardless
> > > > > of
> > > > > the
> > > > > exact error.
> > > >
> > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > process of returning the delegation being recalled. Why should that
> > > > result in the callback channel being marked as down?
> > > >
> > >
> > > Are you talking about say the connection going down and server should
> > > just reconnect instead of recovering the callback channel. I assumed
> > > that connection break is something that's not  recoverable by the
> > > callback but perhaps I'm wrong.
> >
> > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > not seeing why either of those errors should be handled by marking the
> > callback channel as being down.
> >
> > Looking further, it seems that the same function will also return '1'
> > without checking the value of task->tk_status if the delegation has
> > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > could trigger the call to nfsd4_mark_cb_down() with the above change.
>
> Yeah, OK, that's wrong, apologies.
>
> I'm just a little worried about the attempt to enumerate transport level
> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> the right list?

Looking at call_transmit_status error handling, I don't think
connection errors are returned. Instead the code tries to fix the
connection by retrying unless the rpc_timeout is reached and then only
EIO,TIMEDOUT is returned.

Can then my original patch be considered without resubmission?

>
> --b.
>
> >
> > >
> > > > >
> > > > > --b.
> > > > >
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > ---
> > > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > *task, void *calldata)
> > > > > >                 switch (task->tk_status) {
> > > > > >                 case -EIO:
> > > > > >                 case -ETIMEDOUT:
> > > > > > +               case -EACCES:
> > > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > > >tk_status);
> > > > > >                 }
> > > > > >                 break;
> > > > > > --
> > > > > > 2.27.0
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, Hammerspace
> > > > trond.myklebust@hammerspace.com
> > > >
> > > >
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> >
> >
>

Bruce Fields March 11, 2021, 2:58 p.m. UTC | #13

On Wed, Mar 10, 2021 at 05:09:20PM -0500, Olga Kornievskaia wrote:
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > > wrote:
> > > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > > >
> > > > > > > When the server tries to do a callback and a client fails it
> > > > > > > due to
> > > > > > > authentication problems, we need the server to set callback
> > > > > > > down
> > > > > > > flag in RENEW so that client can recover.
> > > > > >
> > > > > > I was looking at this.  It looks to me like this should really be
> > > > > > just:
> > > > > >
> > > > > >         case 1:
> > > > > >                 if (task->tk_status)
> > > > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > > >
> > > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > > return 0
> > > > > > to
> > > > > > tell us it something worth retrying, then the callback failed
> > > > > > permanently, so we should mark the callback path down, regardless
> > > > > > of
> > > > > > the
> > > > > > exact error.
> > > > >
> > > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > > process of returning the delegation being recalled. Why should that
> > > > > result in the callback channel being marked as down?
> > > > >
> > > >
> > > > Are you talking about say the connection going down and server should
> > > > just reconnect instead of recovering the callback channel. I assumed
> > > > that connection break is something that's not  recoverable by the
> > > > callback but perhaps I'm wrong.
> > >
> > > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > > not seeing why either of those errors should be handled by marking the
> > > callback channel as being down.
> > >
> > > Looking further, it seems that the same function will also return '1'
> > > without checking the value of task->tk_status if the delegation has
> > > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > > could trigger the call to nfsd4_mark_cb_down() with the above change.
> >
> > Yeah, OK, that's wrong, apologies.
> >
> > I'm just a little worried about the attempt to enumerate transport level
> > errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> > the right list?
> 
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
> 
> Can then my original patch be considered without resubmission?

Sure, thanks for checking that.

--b.

> 
> >
> > --b.
> >
> > >
> > > >
> > > > > >
> > > > > > --b.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > ---
> > > > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > > *task, void *calldata)
> > > > > > >                 switch (task->tk_status) {
> > > > > > >                 case -EIO:
> > > > > > >                 case -ETIMEDOUT:
> > > > > > > +               case -EACCES:
> > > > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > > > >tk_status);
> > > > > > >                 }
> > > > > > >                 break;
> > > > > > > --
> > > > > > > 2.27.0
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Trond Myklebust
> > > > > Linux NFS client maintainer, Hammerspace
> > > > > trond.myklebust@hammerspace.com
> > > > >
> > > > >
> > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > >
> > >
> >
>

Chuck Lever March 11, 2021, 3:10 p.m. UTC | #14

> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>> 
>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>> trondmy@hammerspace.com> wrote:
>>>>> 
>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>>>>> 
>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>> due to
>>>>>>> authentication problems, we need the server to set callback
>>>>>>> down
>>>>>>> flag in RENEW so that client can recover.
>>>>>> 
>>>>>> I was looking at this.  It looks to me like this should really be
>>>>>> just:
>>>>>> 
>>>>>>        case 1:
>>>>>>                if (task->tk_status)
>>>>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>> 
>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>> return 0
>>>>>> to
>>>>>> tell us it something worth retrying, then the callback failed
>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>> of
>>>>>> the
>>>>>> exact error.
>>>>> 
>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>> process of returning the delegation being recalled. Why should that
>>>>> result in the callback channel being marked as down?
>>>>> 
>>>> 
>>>> Are you talking about say the connection going down and server should
>>>> just reconnect instead of recovering the callback channel. I assumed
>>>> that connection break is something that's not  recoverable by the
>>>> callback but perhaps I'm wrong.
>>> 
>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>> not seeing why either of those errors should be handled by marking the
>>> callback channel as being down.
>>> 
>>> Looking further, it seems that the same function will also return '1'
>>> without checking the value of task->tk_status if the delegation has
>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>> 
>> Yeah, OK, that's wrong, apologies.
>> 
>> I'm just a little worried about the attempt to enumerate transport level
>> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
>> the right list?
> 
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
> 
> Can then my original patch be considered without resubmission?

Bruce has authorized v1 of this patch, but that one has the
uncorrected patch description. Post a v4?



>> --b.
>> 
>>> 
>>>> 
>>>>>> 
>>>>>> --b.
>>>>>> 
>>>>>>> 
>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>> ---
>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>> 
>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>> *task, void *calldata)
>>>>>>>                switch (task->tk_status) {
>>>>>>>                case -EIO:
>>>>>>>                case -ETIMEDOUT:
>>>>>>> +               case -EACCES:
>>>>>>>                        nfsd4_mark_cb_down(clp, task-
>>>>>>>> tk_status);
>>>>>>>                }
>>>>>>>                break;
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> trond.myklebust@hammerspace.com
>>>>> 
>>>>> 
>>> 
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer, Hammerspace
>>> trond.myklebust@hammerspace.com

--
Chuck Lever

Olga Kornievskaia March 11, 2021, 3:16 p.m. UTC | #15

On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> >
> > On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >>
> >> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> >>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> >>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> >>>> trondmy@hammerspace.com> wrote:
> >>>>>
> >>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> >>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>>
> >>>>>>> When the server tries to do a callback and a client fails it
> >>>>>>> due to
> >>>>>>> authentication problems, we need the server to set callback
> >>>>>>> down
> >>>>>>> flag in RENEW so that client can recover.
> >>>>>>
> >>>>>> I was looking at this.  It looks to me like this should really be
> >>>>>> just:
> >>>>>>
> >>>>>>        case 1:
> >>>>>>                if (task->tk_status)
> >>>>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
> >>>>>>
> >>>>>> If tk_status showed an error, and the ->done method doesn't
> >>>>>> return 0
> >>>>>> to
> >>>>>> tell us it something worth retrying, then the callback failed
> >>>>>> permanently, so we should mark the callback path down, regardless
> >>>>>> of
> >>>>>> the
> >>>>>> exact error.
> >>>>>
> >>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> >>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
> >>>>> process of returning the delegation being recalled. Why should that
> >>>>> result in the callback channel being marked as down?
> >>>>>
> >>>>
> >>>> Are you talking about say the connection going down and server should
> >>>> just reconnect instead of recovering the callback channel. I assumed
> >>>> that connection break is something that's not  recoverable by the
> >>>> callback but perhaps I'm wrong.
> >>>
> >>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> >>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> >>> not seeing why either of those errors should be handled by marking the
> >>> callback channel as being down.
> >>>
> >>> Looking further, it seems that the same function will also return '1'
> >>> without checking the value of task->tk_status if the delegation has
> >>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> >>> could trigger the call to nfsd4_mark_cb_down() with the above change.
> >>
> >> Yeah, OK, that's wrong, apologies.
> >>
> >> I'm just a little worried about the attempt to enumerate transport level
> >> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> >> the right list?
> >
> > Looking at call_transmit_status error handling, I don't think
> > connection errors are returned. Instead the code tries to fix the
> > connection by retrying unless the rpc_timeout is reached and then only
> > EIO,TIMEDOUT is returned.
> >
> > Can then my original patch be considered without resubmission?
>
> Bruce has authorized v1 of this patch, but that one has the
> uncorrected patch description. Post a v4?

v1's description is accurate. It reflects that only authentication
errors are handled.

>
>
>
> >> --b.
> >>
> >>>
> >>>>
> >>>>>>
> >>>>>> --b.
> >>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>> ---
> >>>>>>> fs/nfsd/nfs4callback.c | 1 +
> >>>>>>> 1 file changed, 1 insertion(+)
> >>>>>>>
> >>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> >>>>>>> index 052be5bf9ef5..7325592b456e 100644
> >>>>>>> --- a/fs/nfsd/nfs4callback.c
> >>>>>>> +++ b/fs/nfsd/nfs4callback.c
> >>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> >>>>>>> *task, void *calldata)
> >>>>>>>                switch (task->tk_status) {
> >>>>>>>                case -EIO:
> >>>>>>>                case -ETIMEDOUT:
> >>>>>>> +               case -EACCES:
> >>>>>>>                        nfsd4_mark_cb_down(clp, task-
> >>>>>>>> tk_status);
> >>>>>>>                }
> >>>>>>>                break;
> >>>>>>> --
> >>>>>>> 2.27.0
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Trond Myklebust
> >>>>> Linux NFS client maintainer, Hammerspace
> >>>>> trond.myklebust@hammerspace.com
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Trond Myklebust
> >>> Linux NFS client maintainer, Hammerspace
> >>> trond.myklebust@hammerspace.com
>
> --
> Chuck Lever
>
>
>

Chuck Lever March 11, 2021, 3:18 p.m. UTC | #16

> On Mar 11, 2021, at 10:16 AM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
>>> 
>>> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>>>> 
>>>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>>>> trondmy@hammerspace.com> wrote:
>>>>>>> 
>>>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>>>> wrote:
>>>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> 
>>>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>>>> due to
>>>>>>>>> authentication problems, we need the server to set callback
>>>>>>>>> down
>>>>>>>>> flag in RENEW so that client can recover.
>>>>>>>> 
>>>>>>>> I was looking at this.  It looks to me like this should really be
>>>>>>>> just:
>>>>>>>> 
>>>>>>>>       case 1:
>>>>>>>>               if (task->tk_status)
>>>>>>>>                       nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>>>> 
>>>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>>>> return 0
>>>>>>>> to
>>>>>>>> tell us it something worth retrying, then the callback failed
>>>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>>>> of
>>>>>>>> the
>>>>>>>> exact error.
>>>>>>> 
>>>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>>>> process of returning the delegation being recalled. Why should that
>>>>>>> result in the callback channel being marked as down?
>>>>>>> 
>>>>>> 
>>>>>> Are you talking about say the connection going down and server should
>>>>>> just reconnect instead of recovering the callback channel. I assumed
>>>>>> that connection break is something that's not  recoverable by the
>>>>>> callback but perhaps I'm wrong.
>>>>> 
>>>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>>>> not seeing why either of those errors should be handled by marking the
>>>>> callback channel as being down.
>>>>> 
>>>>> Looking further, it seems that the same function will also return '1'
>>>>> without checking the value of task->tk_status if the delegation has
>>>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>>>> 
>>>> Yeah, OK, that's wrong, apologies.
>>>> 
>>>> I'm just a little worried about the attempt to enumerate transport level
>>>> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
>>>> the right list?
>>> 
>>> Looking at call_transmit_status error handling, I don't think
>>> connection errors are returned. Instead the code tries to fix the
>>> connection by retrying unless the rpc_timeout is reached and then only
>>> EIO,TIMEDOUT is returned.
>>> 
>>> Can then my original patch be considered without resubmission?
>> 
>> Bruce has authorized v1 of this patch, but that one has the
>> uncorrected patch description. Post a v4?
> 
> v1's description is accurate. It reflects that only authentication
> errors are handled.

Nit: The short description isn't specific to NFSv4.0, as the v3
one was.


>>>> --b.
>>>> 
>>>>> 
>>>>>> 
>>>>>>>> 
>>>>>>>> --b.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> ---
>>>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>> 
>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>>>> *task, void *calldata)
>>>>>>>>>               switch (task->tk_status) {
>>>>>>>>>               case -EIO:
>>>>>>>>>               case -ETIMEDOUT:
>>>>>>>>> +               case -EACCES:
>>>>>>>>>                       nfsd4_mark_cb_down(clp, task-
>>>>>>>>>> tk_status);
>>>>>>>>>               }
>>>>>>>>>               break;
>>>>>>>>> --
>>>>>>>>> 2.27.0
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Trond Myklebust
>>>>>>> Linux NFS client maintainer, Hammerspace
>>>>>>> trond.myklebust@hammerspace.com
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> trond.myklebust@hammerspace.com
>> 
>> --
>> Chuck Lever

--
Chuck Lever

[1/1] NFSD: fix error handling in callbacks

Commit Message

Comments

Patch