Message ID | 20210916182212.81608-4-dai.ngo@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfsd: Initial implementation of NFSv4 Courteous Server | expand |
Bruce, Dai - > On Sep 16, 2021, at 2:22 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client > recovers by sending BIND_CONN_TO_SESSION but the server fails to recover > the back channel and leaves it as NFSD4_CB_DOWN. > > Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel > by calling nfsd4_probe_callback. > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> I'm wondering if this one is appropriate to pull into v5.15-rc. > --- > fs/nfsd/nfs4state.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 54e5317f00f1..63b4d0e6fc29 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3580,7 +3580,7 @@ static struct nfsd4_conn *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s > } > > static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, > - struct nfsd4_session *session, u32 req) > + struct nfsd4_session *session, u32 req, struct nfsd4_conn **conn) > { > struct nfs4_client *clp = session->se_client; > struct svc_xprt *xpt = rqst->rq_xprt; > @@ -3603,6 +3603,8 @@ static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, > else > status = nfserr_inval; > spin_unlock(&clp->cl_lock); > + if (status == nfs_ok && conn) > + *conn = c; > return status; > } > > @@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct svc_rqst *rqstp, > status = nfserr_wrong_cred; > if (!nfsd4_mach_creds_match(session->se_client, rqstp)) > goto out; > - status = nfsd4_match_existing_connection(rqstp, session, bcts->dir); > - if (status == nfs_ok || status == nfserr_inval) > + status = nfsd4_match_existing_connection(rqstp, session, > + bcts->dir, &conn); > + if (status == nfs_ok) { > + if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH || > + bcts->dir == NFS4_CDFC4_BACK) > + conn->cn_flags |= NFS4_CDFC4_BACK; > + nfsd4_probe_callback(session->se_client); > + goto out; > + } > + if (status == nfserr_inval) > goto out; > status = nfsd4_map_bcts_dir(&bcts->dir); > if (status) > -- > 2.9.5 > -- Chuck Lever
On Thu, Sep 16, 2021 at 07:00:20PM +0000, Chuck Lever III wrote: > Bruce, Dai - > > > On Sep 16, 2021, at 2:22 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > > When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client > > recovers by sending BIND_CONN_TO_SESSION but the server fails to recover > > the back channel and leaves it as NFSD4_CB_DOWN. > > > > Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel > > by calling nfsd4_probe_callback. > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > > I'm wondering if this one is appropriate to pull into v5.15-rc. I think so. Dai, do you have a pynfs test for this case? --b. > > --- > > fs/nfsd/nfs4state.c | 16 +++++++++++++--- > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index 54e5317f00f1..63b4d0e6fc29 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -3580,7 +3580,7 @@ static struct nfsd4_conn *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s > > } > > > > static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, > > - struct nfsd4_session *session, u32 req) > > + struct nfsd4_session *session, u32 req, struct nfsd4_conn **conn) > > { > > struct nfs4_client *clp = session->se_client; > > struct svc_xprt *xpt = rqst->rq_xprt; > > @@ -3603,6 +3603,8 @@ static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, > > else > > status = nfserr_inval; > > spin_unlock(&clp->cl_lock); > > + if (status == nfs_ok && conn) > > + *conn = c; > > return status; > > } > > > > @@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct svc_rqst *rqstp, > > status = nfserr_wrong_cred; > > if (!nfsd4_mach_creds_match(session->se_client, rqstp)) > > goto out; > > - status = nfsd4_match_existing_connection(rqstp, session, bcts->dir); > > - if (status == nfs_ok || status == nfserr_inval) > > + status = nfsd4_match_existing_connection(rqstp, session, > > + bcts->dir, &conn); > > + if (status == nfs_ok) { > > + if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH || > > + bcts->dir == NFS4_CDFC4_BACK) > > + conn->cn_flags |= NFS4_CDFC4_BACK; > > + nfsd4_probe_callback(session->se_client); > > + goto out; > > + } > > + if (status == nfserr_inval) > > goto out; > > status = nfsd4_map_bcts_dir(&bcts->dir); > > if (status) > > -- > > 2.9.5 > > > > -- > Chuck Lever > >
On 9/16/21 12:55 PM, Bruce Fields wrote: > On Thu, Sep 16, 2021 at 07:00:20PM +0000, Chuck Lever III wrote: >> Bruce, Dai - >> >>> On Sep 16, 2021, at 2:22 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client >>> recovers by sending BIND_CONN_TO_SESSION but the server fails to recover >>> the back channel and leaves it as NFSD4_CB_DOWN. >>> >>> Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel >>> by calling nfsd4_probe_callback. >>> >>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >> I'm wondering if this one is appropriate to pull into v5.15-rc. > I think so. > > Dai, do you have a pynfs test for this case? I don't, but I can create a pynfs test for reproduce the problem. -Dai > > --b. > >>> --- >>> fs/nfsd/nfs4state.c | 16 +++++++++++++--- >>> 1 file changed, 13 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>> index 54e5317f00f1..63b4d0e6fc29 100644 >>> --- a/fs/nfsd/nfs4state.c >>> +++ b/fs/nfsd/nfs4state.c >>> @@ -3580,7 +3580,7 @@ static struct nfsd4_conn *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s >>> } >>> >>> static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, >>> - struct nfsd4_session *session, u32 req) >>> + struct nfsd4_session *session, u32 req, struct nfsd4_conn **conn) >>> { >>> struct nfs4_client *clp = session->se_client; >>> struct svc_xprt *xpt = rqst->rq_xprt; >>> @@ -3603,6 +3603,8 @@ static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, >>> else >>> status = nfserr_inval; >>> spin_unlock(&clp->cl_lock); >>> + if (status == nfs_ok && conn) >>> + *conn = c; >>> return status; >>> } >>> >>> @@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct svc_rqst *rqstp, >>> status = nfserr_wrong_cred; >>> if (!nfsd4_mach_creds_match(session->se_client, rqstp)) >>> goto out; >>> - status = nfsd4_match_existing_connection(rqstp, session, bcts->dir); >>> - if (status == nfs_ok || status == nfserr_inval) >>> + status = nfsd4_match_existing_connection(rqstp, session, >>> + bcts->dir, &conn); >>> + if (status == nfs_ok) { >>> + if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH || >>> + bcts->dir == NFS4_CDFC4_BACK) >>> + conn->cn_flags |= NFS4_CDFC4_BACK; >>> + nfsd4_probe_callback(session->se_client); >>> + goto out; >>> + } >>> + if (status == nfserr_inval) >>> goto out; >>> status = nfsd4_map_bcts_dir(&bcts->dir); >>> if (status) >>> -- >>> 2.9.5 >>> >> -- >> Chuck Lever >> >>
On 9/16/21 1:15 PM, dai.ngo@oracle.com wrote: > > On 9/16/21 12:55 PM, Bruce Fields wrote: >> On Thu, Sep 16, 2021 at 07:00:20PM +0000, Chuck Lever III wrote: >>> Bruce, Dai - >>> >>>> On Sep 16, 2021, at 2:22 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>> >>>> When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the >>>> client >>>> recovers by sending BIND_CONN_TO_SESSION but the server fails to >>>> recover >>>> the back channel and leaves it as NFSD4_CB_DOWN. >>>> >>>> Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel >>>> by calling nfsd4_probe_callback. >>>> >>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >>> I'm wondering if this one is appropriate to pull into v5.15-rc. >> I think so. >> >> Dai, do you have a pynfs test for this case? > > I don't, but I can create a pynfs test for reproduce the problem. Here are the steps to reproduce the stuck SEQ4_STATUS_CB_PATH_DOWN problem using 'tcpkill': Client: 5.13.0-rc2 Server: 5.15.0-rc1 1. [root@nfsvmd07 ~]# mount -o vers=4.1 nfsvme14:/root/xfs /tmp/mnt 2. [root@nfsvmd07 ~]# tcpkill host nfsvme14 and port 2049 3. [root@nfsvmd07 ~]# ls /tmp/mnt 4. CTRL-C to stop tcpkill 5. [root@nfsvmd07 ~]# ls /tmp/mnt The problem can be observed in the wire trace where the back channel in stuck in SEQ4_STATUS_CB_PATH_DOWN causing the client to keep sending BCTS. Note: this problem can only be reproduced with client running 5.13 or older. Client with 5.14 or newer does not have this problem. The reason is in 5.13, when the client re-establishes the TCP connection it re-uses the previous port number which was destroyed by tcpkill (client sends RST to server). This causes the server to set the state of the back channel to SEQ4_STATUS_CB_PATH_DOWN. In 5.14, the client uses a new port number when re-establish the connection this results in server returning NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the reply of the stand-alone SEQUENCE which the causes the client to send BCTS once re-establish the back channel successfully. I can provide the pcap files of a good and bad run of the test if interested. I don't have pynfs test for this case. -Dai > > -Dai > >> >> --b. >> >>>> --- >>>> fs/nfsd/nfs4state.c | 16 +++++++++++++--- >>>> 1 file changed, 13 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index 54e5317f00f1..63b4d0e6fc29 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -3580,7 +3580,7 @@ static struct nfsd4_conn >>>> *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s >>>> } >>>> >>>> static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, >>>> - struct nfsd4_session *session, u32 req) >>>> + struct nfsd4_session *session, u32 req, struct nfsd4_conn >>>> **conn) >>>> { >>>> struct nfs4_client *clp = session->se_client; >>>> struct svc_xprt *xpt = rqst->rq_xprt; >>>> @@ -3603,6 +3603,8 @@ static __be32 >>>> nfsd4_match_existing_connection(struct svc_rqst *rqst, >>>> else >>>> status = nfserr_inval; >>>> spin_unlock(&clp->cl_lock); >>>> + if (status == nfs_ok && conn) >>>> + *conn = c; >>>> return status; >>>> } >>>> >>>> @@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct >>>> svc_rqst *rqstp, >>>> status = nfserr_wrong_cred; >>>> if (!nfsd4_mach_creds_match(session->se_client, rqstp)) >>>> goto out; >>>> - status = nfsd4_match_existing_connection(rqstp, session, >>>> bcts->dir); >>>> - if (status == nfs_ok || status == nfserr_inval) >>>> + status = nfsd4_match_existing_connection(rqstp, session, >>>> + bcts->dir, &conn); >>>> + if (status == nfs_ok) { >>>> + if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH || >>>> + bcts->dir == NFS4_CDFC4_BACK) >>>> + conn->cn_flags |= NFS4_CDFC4_BACK; >>>> + nfsd4_probe_callback(session->se_client); >>>> + goto out; >>>> + } >>>> + if (status == nfserr_inval) >>>> goto out; >>>> status = nfsd4_map_bcts_dir(&bcts->dir); >>>> if (status) >>>> -- >>>> 2.9.5 >>>> >>> -- >>> Chuck Lever >>> >>>
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 54e5317f00f1..63b4d0e6fc29 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3580,7 +3580,7 @@ static struct nfsd4_conn *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s } static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, - struct nfsd4_session *session, u32 req) + struct nfsd4_session *session, u32 req, struct nfsd4_conn **conn) { struct nfs4_client *clp = session->se_client; struct svc_xprt *xpt = rqst->rq_xprt; @@ -3603,6 +3603,8 @@ static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst, else status = nfserr_inval; spin_unlock(&clp->cl_lock); + if (status == nfs_ok && conn) + *conn = c; return status; } @@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct svc_rqst *rqstp, status = nfserr_wrong_cred; if (!nfsd4_mach_creds_match(session->se_client, rqstp)) goto out; - status = nfsd4_match_existing_connection(rqstp, session, bcts->dir); - if (status == nfs_ok || status == nfserr_inval) + status = nfsd4_match_existing_connection(rqstp, session, + bcts->dir, &conn); + if (status == nfs_ok) { + if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH || + bcts->dir == NFS4_CDFC4_BACK) + conn->cn_flags |= NFS4_CDFC4_BACK; + nfsd4_probe_callback(session->se_client); + goto out; + } + if (status == nfserr_inval) goto out; status = nfsd4_map_bcts_dir(&bcts->dir); if (status)
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover the back channel and leaves it as NFSD4_CB_DOWN. Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel by calling nfsd4_probe_callback. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/nfs4state.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)