diff mbox series

[RFC,4/4] ceph: queue request when CLEANRECOVER is set

Message ID 20200925140851.320673-5-jlayton@kernel.org
State New, archived
Headers show
Series ceph: fix spurious recover_session=clean errors | expand

Commit Message

Jeff Layton Sept. 25, 2020, 2:08 p.m. UTC
Ilya noticed that the first access to a blacklisted mount would often
get back -EACCES, but then subsequent calls would be OK. The problem is
in __do_request. If the session is marked as REJECTED, a hard error is
returned instead of waiting for a new session to come into being.

When the session is REJECTED and the mount was done with
recover_session=clean, queue the request to the waiting_for_map queue,
which will be awoken after tearing down the old session.

URL: https://tracker.ceph.com/issues/47385
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Yan, Zheng Sept. 29, 2020, 8:31 a.m. UTC | #1
On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Ilya noticed that the first access to a blacklisted mount would often
> get back -EACCES, but then subsequent calls would be OK. The problem is
> in __do_request. If the session is marked as REJECTED, a hard error is
> returned instead of waiting for a new session to come into being.
>
> When the session is REJECTED and the mount was done with
> recover_session=clean, queue the request to the waiting_for_map queue,
> which will be awoken after tearing down the old session.
>
> URL: https://tracker.ceph.com/issues/47385
> Reported-by: Ilya Dryomov <idryomov@gmail.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/mds_client.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index fd16db6ecb0a..b07e7adf146f 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
>         if (session->s_state != CEPH_MDS_SESSION_OPEN &&
>             session->s_state != CEPH_MDS_SESSION_HUNG) {
>                 if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> -                       err = -EACCES;
> +                       if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> +                               list_add(&req->r_wait, &mdsc->waiting_for_map);
> +                       else
> +                               err = -EACCES;

During recovering session, client drops all dirty caps and abort all
osd requests.  It does not make sense , some operations are waiting,
the others get aborted.

>                         goto out_session;
>                 }
>                 /*
> --
> 2.26.2
>
Jeff Layton Sept. 29, 2020, 12:46 p.m. UTC | #2
On Tue, 2020-09-29 at 16:31 +0800, Yan, Zheng wrote:
> On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > Ilya noticed that the first access to a blacklisted mount would often
> > get back -EACCES, but then subsequent calls would be OK. The problem is
> > in __do_request. If the session is marked as REJECTED, a hard error is
> > returned instead of waiting for a new session to come into being.
> > 
> > When the session is REJECTED and the mount was done with
> > recover_session=clean, queue the request to the waiting_for_map queue,
> > which will be awoken after tearing down the old session.
> > 
> > URL: https://tracker.ceph.com/issues/47385
> > Reported-by: Ilya Dryomov <idryomov@gmail.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/mds_client.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index fd16db6ecb0a..b07e7adf146f 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
> >         if (session->s_state != CEPH_MDS_SESSION_OPEN &&
> >             session->s_state != CEPH_MDS_SESSION_HUNG) {
> >                 if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> > -                       err = -EACCES;
> > +                       if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> > +                               list_add(&req->r_wait, &mdsc->waiting_for_map);
> > +                       else
> > +                               err = -EACCES;
> 
> During recovering session, client drops all dirty caps and abort all
> osd requests.  It does not make sense , some operations are waiting,
> the others get aborted.
> 


It makes sense to drop the caps and fail writeback of pages that were
dirty. The issue here is what to do with MDS (metadata) requests that
come in just after we notice the blocklisting but before the session has
been reestablished. Most of those aren't going to have any dependency on
the state of the pagecache.

They also (for the most part) won't have a dependency on caps. The main
exception I see is async unlink (async creates will be saved by the fact
we'll be dropping our delegated inode number range). An async unlink
could end up stalling across a recovery. The new MDS probably won't have
granted Du caps by the time the call goes out. We could cancel that but
likely would have already returned success on the unlink() call.

Granted, with all of this we're _way_ outside the realm of POSIX
behavior, so ultimately the right behavior is whatever we decide it
should be. Anyone who turns this stuff on should be prepared for some of
the operations leading up to the blocklisting to vaporize.
Jeff Layton Sept. 29, 2020, 7:55 p.m. UTC | #3
On Fri, 2020-09-25 at 10:08 -0400, Jeff Layton wrote:
> Ilya noticed that the first access to a blacklisted mount would often
> get back -EACCES, but then subsequent calls would be OK. The problem is
> in __do_request. If the session is marked as REJECTED, a hard error is
> returned instead of waiting for a new session to come into being.
> 
> When the session is REJECTED and the mount was done with
> recover_session=clean, queue the request to the waiting_for_map queue,
> which will be awoken after tearing down the old session.
> 
> URL: https://tracker.ceph.com/issues/47385
> Reported-by: Ilya Dryomov <idryomov@gmail.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/mds_client.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index fd16db6ecb0a..b07e7adf146f 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
>  	if (session->s_state != CEPH_MDS_SESSION_OPEN &&
>  	    session->s_state != CEPH_MDS_SESSION_HUNG) {
>  		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> -			err = -EACCES;
> +			if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> +				list_add(&req->r_wait, &mdsc->waiting_for_map);
> +			else
> +				err = -EACCES;
>  			goto out_session;
>  		}
>  		/*

I think this is wrong when CEPH_MDS_R_ASYNC is set. I'll send a v2 set
if we end up staying with this approach.
diff mbox series

Patch

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd16db6ecb0a..b07e7adf146f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2819,7 +2819,10 @@  static void __do_request(struct ceph_mds_client *mdsc,
 	if (session->s_state != CEPH_MDS_SESSION_OPEN &&
 	    session->s_state != CEPH_MDS_SESSION_HUNG) {
 		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
-			err = -EACCES;
+			if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
+				list_add(&req->r_wait, &mdsc->waiting_for_map);
+			else
+				err = -EACCES;
 			goto out_session;
 		}
 		/*