mbox series

[RFC,0/4] ceph: fix spurious recover_session=clean errors

Message ID 20200925140851.320673-1-jlayton@kernel.org (mailing list archive)
Headers show
Series ceph: fix spurious recover_session=clean errors | expand

Message

Jeff Layton Sept. 25, 2020, 2:08 p.m. UTC
Ilya noticed that he would get spurious EACCES errors on calls done just
after blocklisting the client on mounts with recover_session=clean. The
session would get marked as REJECTED and that caused in-flight calls to
die with EACCES. This patchset seems to smooth over the problem, but I'm
not fully convinced it's the right approach.

The potential issue I see is that the client could take cap references to
do a call on a session that has been blocklisted. We then queue the
message and reestablish the session, but we may not have been granted
the same caps by the MDS at that point.

If this is a problem, then we probably need to rework it so that we
return a distinct error code in this situation and have the upper layers
issue a completely new mds request (with new cap refs, etc.)

Obviously, that's a much more invasive approach though, so it would be
nice to avoid that if this would suffice.

Jeff Layton (4):
  ceph: don't WARN when removing caps due to blocklisting
  ceph: don't mark mount as SHUTDOWN when recovering session
  ceph: remove timeout on allowing reconnect after blocklisting
  ceph: queue request when CLEANRECOVER is set

 fs/ceph/caps.c       |  2 +-
 fs/ceph/mds_client.c | 10 ++++------
 fs/ceph/super.c      | 13 +++++++++----
 fs/ceph/super.h      |  1 -
 4 files changed, 14 insertions(+), 12 deletions(-)

Comments

Yan, Zheng Sept. 29, 2020, 8:28 a.m. UTC | #1
On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Ilya noticed that he would get spurious EACCES errors on calls done just
> after blocklisting the client on mounts with recover_session=clean. The
> session would get marked as REJECTED and that caused in-flight calls to
> die with EACCES. This patchset seems to smooth over the problem, but I'm
> not fully convinced it's the right approach.
>

the root is cause is that client does not recover session instantly
after getting rejected by mds. Before session gets recovered, client
continues to return error.


> The potential issue I see is that the client could take cap references to
> do a call on a session that has been blocklisted. We then queue the
> message and reestablish the session, but we may not have been granted
> the same caps by the MDS at that point.
>
> If this is a problem, then we probably need to rework it so that we
> return a distinct error code in this situation and have the upper layers
> issue a completely new mds request (with new cap refs, etc.)
>
> Obviously, that's a much more invasive approach though, so it would be
> nice to avoid that if this would suffice.
>
> Jeff Layton (4):
>   ceph: don't WARN when removing caps due to blocklisting
>   ceph: don't mark mount as SHUTDOWN when recovering session
>   ceph: remove timeout on allowing reconnect after blocklisting
>   ceph: queue request when CLEANRECOVER is set
>
>  fs/ceph/caps.c       |  2 +-
>  fs/ceph/mds_client.c | 10 ++++------
>  fs/ceph/super.c      | 13 +++++++++----
>  fs/ceph/super.h      |  1 -
>  4 files changed, 14 insertions(+), 12 deletions(-)
>
> --
> 2.26.2
>
Ilya Dryomov Sept. 29, 2020, 8:54 a.m. UTC | #2
On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > Ilya noticed that he would get spurious EACCES errors on calls done just
> > after blocklisting the client on mounts with recover_session=clean. The
> > session would get marked as REJECTED and that caused in-flight calls to
> > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > not fully convinced it's the right approach.
> >
>
> the root is cause is that client does not recover session instantly
> after getting rejected by mds. Before session gets recovered, client
> continues to return error.

Hi Zheng,

I don't think it's about whether that happens instantly or not.
In the example from [1], the first "ls" would fail even if issued
minutes after the session reject message and the reconnect.  From
the user's POV it is well after the automatic recovery promised by
recover_session=clean.

[1] https://tracker.ceph.com/issues/47385

Thanks,

                Ilya

>
>
> > The potential issue I see is that the client could take cap references to
> > do a call on a session that has been blocklisted. We then queue the
> > message and reestablish the session, but we may not have been granted
> > the same caps by the MDS at that point.
> >
> > If this is a problem, then we probably need to rework it so that we
> > return a distinct error code in this situation and have the upper layers
> > issue a completely new mds request (with new cap refs, etc.)
> >
> > Obviously, that's a much more invasive approach though, so it would be
> > nice to avoid that if this would suffice.
> >
> > Jeff Layton (4):
> >   ceph: don't WARN when removing caps due to blocklisting
> >   ceph: don't mark mount as SHUTDOWN when recovering session
> >   ceph: remove timeout on allowing reconnect after blocklisting
> >   ceph: queue request when CLEANRECOVER is set
> >
> >  fs/ceph/caps.c       |  2 +-
> >  fs/ceph/mds_client.c | 10 ++++------
> >  fs/ceph/super.c      | 13 +++++++++----
> >  fs/ceph/super.h      |  1 -
> >  4 files changed, 14 insertions(+), 12 deletions(-)
> >
> > --
> > 2.26.2
> >
Yan, Zheng Sept. 29, 2020, 10:44 a.m. UTC | #3
On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> >
> > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > after blocklisting the client on mounts with recover_session=clean. The
> > > session would get marked as REJECTED and that caused in-flight calls to
> > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > not fully convinced it's the right approach.
> > >
> >
> > the root is cause is that client does not recover session instantly
> > after getting rejected by mds. Before session gets recovered, client
> > continues to return error.
>
> Hi Zheng,
>
> I don't think it's about whether that happens instantly or not.
> In the example from [1], the first "ls" would fail even if issued
> minutes after the session reject message and the reconnect.  From
> the user's POV it is well after the automatic recovery promised by
> recover_session=clean.
>
> [1] https://tracker.ceph.com/issues/47385

Reconnect should close all old session. It's likely because that
client didn't detect it's blacklisted.

>
> Thanks,
>
>                 Ilya
>
> >
> >
> > > The potential issue I see is that the client could take cap references to
> > > do a call on a session that has been blocklisted. We then queue the
> > > message and reestablish the session, but we may not have been granted
> > > the same caps by the MDS at that point.
> > >
> > > If this is a problem, then we probably need to rework it so that we
> > > return a distinct error code in this situation and have the upper layers
> > > issue a completely new mds request (with new cap refs, etc.)
> > >
> > > Obviously, that's a much more invasive approach though, so it would be
> > > nice to avoid that if this would suffice.
> > >
> > > Jeff Layton (4):
> > >   ceph: don't WARN when removing caps due to blocklisting
> > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > >   ceph: remove timeout on allowing reconnect after blocklisting
> > >   ceph: queue request when CLEANRECOVER is set
> > >
> > >  fs/ceph/caps.c       |  2 +-
> > >  fs/ceph/mds_client.c | 10 ++++------
> > >  fs/ceph/super.c      | 13 +++++++++----
> > >  fs/ceph/super.h      |  1 -
> > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > >
> > > --
> > > 2.26.2
> > >
Ilya Dryomov Sept. 29, 2020, 10:58 a.m. UTC | #4
On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> >
> > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > >
> > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > not fully convinced it's the right approach.
> > > >
> > >
> > > the root is cause is that client does not recover session instantly
> > > after getting rejected by mds. Before session gets recovered, client
> > > continues to return error.
> >
> > Hi Zheng,
> >
> > I don't think it's about whether that happens instantly or not.
> > In the example from [1], the first "ls" would fail even if issued
> > minutes after the session reject message and the reconnect.  From
> > the user's POV it is well after the automatic recovery promised by
> > recover_session=clean.
> >
> > [1] https://tracker.ceph.com/issues/47385
>
> Reconnect should close all old session. It's likely because that
> client didn't detect it's blacklisted.

Sorry, I should have pasted dmesg there as well.  It _does_ detect
blacklisting -- notice that I wrote "after the session reject message
and the reconnect".

Thanks,

                Ilya
Jeff Layton Sept. 29, 2020, 12:48 p.m. UTC | #5
On Tue, 2020-09-29 at 12:58 +0200, Ilya Dryomov wrote:
> On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@gmail.com> wrote:
> > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > not fully convinced it's the right approach.
> > > > > 
> > > > 
> > > > the root is cause is that client does not recover session instantly
> > > > after getting rejected by mds. Before session gets recovered, client
> > > > continues to return error.
> > > 
> > > Hi Zheng,
> > > 
> > > I don't think it's about whether that happens instantly or not.
> > > In the example from [1], the first "ls" would fail even if issued
> > > minutes after the session reject message and the reconnect.  From
> > > the user's POV it is well after the automatic recovery promised by
> > > recover_session=clean.
> > > 
> > > [1] https://tracker.ceph.com/issues/47385
> > 
> > Reconnect should close all old session. It's likely because that
> > client didn't detect it's blacklisted.
> 
> Sorry, I should have pasted dmesg there as well.  It _does_ detect
> blacklisting -- notice that I wrote "after the session reject message
> and the reconnect".
> 

Yep, this is pretty easy to reproduce too (as Ilya points out in the tracker).

I'm open to other ways of smoothing this over. If we end up with a small
window where errors can occur, then so be it, but I think we can
probably do better than we have now.
Jeff Layton Sept. 29, 2020, 7:50 p.m. UTC | #6
On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > not fully convinced it's the right approach.
> > > > 
> > > 
> > > the root is cause is that client does not recover session instantly
> > > after getting rejected by mds. Before session gets recovered, client
> > > continues to return error.
> > 
> > Hi Zheng,
> > 
> > I don't think it's about whether that happens instantly or not.
> > In the example from [1], the first "ls" would fail even if issued
> > minutes after the session reject message and the reconnect.  From
> > the user's POV it is well after the automatic recovery promised by
> > recover_session=clean.
> > 
> > [1] https://tracker.ceph.com/issues/47385
> 
> Reconnect should close all old session. It's likely because that
> client didn't detect it's blacklisted.
> 

I should have described this better -- let me explain:

It did detect that it was blocklisted (almost immediately) because the
MDS shuts down the session. I think it immediately sends a
SESSION_REJECT message when blacklisting and indicates that it has been
blocklisted.

At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
calls through would see that it was in that state and would return
-EACCES. Eventually, the delayed work runs and then the session gets
reconnected, and further calls proceed normally.

So, I think this is just a timing thing for the most part. The workqueue
job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
enough for the disruption to be noticeable.

While this was happening during 'ls' for Ilya, it could happen in
anything that involves sending a request to the MDS. I think we want to
prevent new opens from erroring out during this window if we can.

The real question is whether this is safe in all cases. For instance, if
the call that we're idling is dependent on holding certain caps, then
it's possible we will have lost them when we got REJECTED.

Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
a reply to that patch.

> > Thanks,
> > 
> >                 Ilya
> > 
> > > 
> > > > The potential issue I see is that the client could take cap references to
> > > > do a call on a session that has been blocklisted. We then queue the
> > > > message and reestablish the session, but we may not have been granted
> > > > the same caps by the MDS at that point.
> > > > 
> > > > If this is a problem, then we probably need to rework it so that we
> > > > return a distinct error code in this situation and have the upper layers
> > > > issue a completely new mds request (with new cap refs, etc.)
> > > > 
> > > > Obviously, that's a much more invasive approach though, so it would be
> > > > nice to avoid that if this would suffice.
> > > > 
> > > > Jeff Layton (4):
> > > >   ceph: don't WARN when removing caps due to blocklisting
> > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > >   ceph: queue request when CLEANRECOVER is set
> > > > 
> > > >  fs/ceph/caps.c       |  2 +-
> > > >  fs/ceph/mds_client.c | 10 ++++------
> > > >  fs/ceph/super.c      | 13 +++++++++----
> > > >  fs/ceph/super.h      |  1 -
> > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > 
> > > > --
> > > > 2.26.2
> > > >
Yan, Zheng Sept. 30, 2020, 8:45 a.m. UTC | #7
On Wed, Sep 30, 2020 at 3:50 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > not fully convinced it's the right approach.
> > > > >
> > > >
> > > > the root is cause is that client does not recover session instantly
> > > > after getting rejected by mds. Before session gets recovered, client
> > > > continues to return error.
> > >
> > > Hi Zheng,
> > >
> > > I don't think it's about whether that happens instantly or not.
> > > In the example from [1], the first "ls" would fail even if issued
> > > minutes after the session reject message and the reconnect.  From
> > > the user's POV it is well after the automatic recovery promised by
> > > recover_session=clean.
> > >
> > > [1] https://tracker.ceph.com/issues/47385
> >
> > Reconnect should close all old session. It's likely because that
> > client didn't detect it's blacklisted.
> >
>
> I should have described this better -- let me explain:
>
> It did detect that it was blocklisted (almost immediately) because the
> MDS shuts down the session. I think it immediately sends a
> SESSION_REJECT message when blacklisting and indicates that it has been
> blocklisted.
>
> At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
> calls through would see that it was in that state and would return
> -EACCES. Eventually, the delayed work runs and then the session gets
> reconnected, and further calls proceed normally.
>
> So, I think this is just a timing thing for the most part. The workqueue
> job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
> enough for the disruption to be noticeable.
>
> While this was happening during 'ls' for Ilya, it could happen in
> anything that involves sending a request to the MDS. I think we want to
> prevent new opens from erroring out during this window if we can.
>
> The real question is whether this is safe in all cases. For instance, if
> the call that we're idling is dependent on holding certain caps, then
> it's possible we will have lost them when we got REJECTED.
>

The session in rejected state is new session. It should hold no caps.

> Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
> a reply to that patch.
>
> > > Thanks,
> > >
> > >                 Ilya
> > >
> > > >
> > > > > The potential issue I see is that the client could take cap references to
> > > > > do a call on a session that has been blocklisted. We then queue the
> > > > > message and reestablish the session, but we may not have been granted
> > > > > the same caps by the MDS at that point.
> > > > >
> > > > > If this is a problem, then we probably need to rework it so that we
> > > > > return a distinct error code in this situation and have the upper layers
> > > > > issue a completely new mds request (with new cap refs, etc.)
> > > > >
> > > > > Obviously, that's a much more invasive approach though, so it would be
> > > > > nice to avoid that if this would suffice.
> > > > >
> > > > > Jeff Layton (4):
> > > > >   ceph: don't WARN when removing caps due to blocklisting
> > > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > > >   ceph: queue request when CLEANRECOVER is set
> > > > >
> > > > >  fs/ceph/caps.c       |  2 +-
> > > > >  fs/ceph/mds_client.c | 10 ++++------
> > > > >  fs/ceph/super.c      | 13 +++++++++----
> > > > >  fs/ceph/super.h      |  1 -
> > > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > >
> > > > > --
> > > > > 2.26.2
> > > > >
>
> --
> Jeff Layton <jlayton@kernel.org>
>
Jeff Layton Sept. 30, 2020, 5:55 p.m. UTC | #8
On Wed, 2020-09-30 at 16:45 +0800, Yan, Zheng wrote:
> On Wed, Sep 30, 2020 at 3:50 AM Jeff Layton <jlayton@kernel.org> wrote:
> > On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> > > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > > not fully convinced it's the right approach.
> > > > > > 
> > > > > 
> > > > > the root is cause is that client does not recover session instantly
> > > > > after getting rejected by mds. Before session gets recovered, client
> > > > > continues to return error.
> > > > 
> > > > Hi Zheng,
> > > > 
> > > > I don't think it's about whether that happens instantly or not.
> > > > In the example from [1], the first "ls" would fail even if issued
> > > > minutes after the session reject message and the reconnect.  From
> > > > the user's POV it is well after the automatic recovery promised by
> > > > recover_session=clean.
> > > > 
> > > > [1] https://tracker.ceph.com/issues/47385
> > > 
> > > Reconnect should close all old session. It's likely because that
> > > client didn't detect it's blacklisted.
> > > 
> > 
> > I should have described this better -- let me explain:
> > 
> > It did detect that it was blocklisted (almost immediately) because the
> > MDS shuts down the session. I think it immediately sends a
> > SESSION_REJECT message when blacklisting and indicates that it has been
> > blocklisted.
> > 
> > At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
> > calls through would see that it was in that state and would return
> > -EACCES. Eventually, the delayed work runs and then the session gets
> > reconnected, and further calls proceed normally.
> > 
> > So, I think this is just a timing thing for the most part. The workqueue
> > job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
> > enough for the disruption to be noticeable.
> > 
> > While this was happening during 'ls' for Ilya, it could happen in
> > anything that involves sending a request to the MDS. I think we want to
> > prevent new opens from erroring out during this window if we can.
> > 
> > The real question is whether this is safe in all cases. For instance, if
> > the call that we're idling is dependent on holding certain caps, then
> > it's possible we will have lost them when we got REJECTED.
> > 
> 
> The session in rejected state is new session. It should hold no caps.
> 

Right.

We're actually OK here wrt to async requests as they will return
EJUKEBOX and the caller will redrive a synchronous request. Other
MClientRequest calls don't require that the client hold any caps,
AFAICT, so idling them until we can establish a new session should be
OK, no?

> > Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
> > a reply to that patch.
> > 
> > > > Thanks,
> > > > 
> > > >                 Ilya
> > > > 
> > > > > > The potential issue I see is that the client could take cap references to
> > > > > > do a call on a session that has been blocklisted. We then queue the
> > > > > > message and reestablish the session, but we may not have been granted
> > > > > > the same caps by the MDS at that point.
> > > > > > 
> > > > > > If this is a problem, then we probably need to rework it so that we
> > > > > > return a distinct error code in this situation and have the upper layers
> > > > > > issue a completely new mds request (with new cap refs, etc.)
> > > > > > 
> > > > > > Obviously, that's a much more invasive approach though, so it would be
> > > > > > nice to avoid that if this would suffice.
> > > > > > 
> > > > > > Jeff Layton (4):
> > > > > >   ceph: don't WARN when removing caps due to blocklisting
> > > > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > > > >   ceph: queue request when CLEANRECOVER is set
> > > > > > 
> > > > > >  fs/ceph/caps.c       |  2 +-
> > > > > >  fs/ceph/mds_client.c | 10 ++++------
> > > > > >  fs/ceph/super.c      | 13 +++++++++----
> > > > > >  fs/ceph/super.h      |  1 -
> > > > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > > > 
> > > > > > --
> > > > > > 2.26.2
> > > > > > 
> > 
> > --
> > Jeff Layton <jlayton@kernel.org>
> >