mbox series

[v4,0/5] ceph: fix spurious recover_session=clean errors

Message ID 20201007121700.10489-1-jlayton@kernel.org (mailing list archive)
Headers show
Series ceph: fix spurious recover_session=clean errors | expand

Message

Jeff Layton Oct. 7, 2020, 12:16 p.m. UTC
v4: test for CEPH_MOUNT_RECOVER in more places
v3: add RECOVER mount_state and allow dumping pagecache when it's set
    shrink size of mount_state field
v2: fix handling of async requests in patch to queue requests

This is the fourth revision of this patchset. The main difference from
v3 is that this one converts more "==" tests for SHUTDOWN state into
">=", so that the RECOVER state is treated the same way.

Original cover letter:

Ilya noticed that he would get spurious EACCES errors on calls done just
after blocklisting the client on mounts with recover_session=clean. The
session would get marked as REJECTED and that caused in-flight calls to
die with EACCES. This patchset seems to smooth over the problem, but I'm
not fully convinced it's the right approach.

The potential issue I see is that the client could take cap references to
do a call on a session that has been blocklisted. We then queue the
message and reestablish the session, but we may not have been granted
the same caps by the MDS at that point.

If this is a problem, then we probably need to rework it so that we
return a distinct error code in this situation and have the upper layers
issue a completely new mds request (with new cap refs, etc.)

Obviously, that's a much more invasive approach though, so it would be
nice to avoid that if this would suffice.

Jeff Layton (5):
  ceph: don't WARN when removing caps due to blocklisting
  ceph: make fsc->mount_state an int
  ceph: add new RECOVER mount_state when recovering session
  ceph: remove timeout on allowing reconnect after blocklisting
  ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set

 fs/ceph/addr.c               |  4 ++--
 fs/ceph/caps.c               |  4 ++--
 fs/ceph/inode.c              |  2 +-
 fs/ceph/mds_client.c         | 27 ++++++++++++++++-----------
 fs/ceph/super.c              | 14 ++++++++++----
 fs/ceph/super.h              |  3 +--
 include/linux/ceph/libceph.h |  1 +
 7 files changed, 33 insertions(+), 22 deletions(-)

Comments

Xiubo Li Oct. 20, 2020, 7:03 a.m. UTC | #1
On 2020/10/7 20:16, Jeff Layton wrote:
> v4: test for CEPH_MOUNT_RECOVER in more places
> v3: add RECOVER mount_state and allow dumping pagecache when it's set
>      shrink size of mount_state field
> v2: fix handling of async requests in patch to queue requests
>
> This is the fourth revision of this patchset. The main difference from
> v3 is that this one converts more "==" tests for SHUTDOWN state into
> ">=", so that the RECOVER state is treated the same way.
>
> Original cover letter:
>
> Ilya noticed that he would get spurious EACCES errors on calls done just
> after blocklisting the client on mounts with recover_session=clean. The
> session would get marked as REJECTED and that caused in-flight calls to
> die with EACCES. This patchset seems to smooth over the problem, but I'm
> not fully convinced it's the right approach.
>
> The potential issue I see is that the client could take cap references to
> do a call on a session that has been blocklisted. We then queue the
> message and reestablish the session, but we may not have been granted
> the same caps by the MDS at that point.
>
> If this is a problem, then we probably need to rework it so that we
> return a distinct error code in this situation and have the upper layers
> issue a completely new mds request (with new cap refs, etc.)
>
> Obviously, that's a much more invasive approach though, so it would be
> nice to avoid that if this would suffice.
>
> Jeff Layton (5):
>    ceph: don't WARN when removing caps due to blocklisting
>    ceph: make fsc->mount_state an int
>    ceph: add new RECOVER mount_state when recovering session
>    ceph: remove timeout on allowing reconnect after blocklisting
>    ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set
>
>   fs/ceph/addr.c               |  4 ++--
>   fs/ceph/caps.c               |  4 ++--
>   fs/ceph/inode.c              |  2 +-
>   fs/ceph/mds_client.c         | 27 ++++++++++++++++-----------
>   fs/ceph/super.c              | 14 ++++++++++----
>   fs/ceph/super.h              |  3 +--
>   include/linux/ceph/libceph.h |  1 +
>   7 files changed, 33 insertions(+), 22 deletions(-)
>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Yan, Zheng Oct. 21, 2020, 1:51 p.m. UTC | #2
On Wed, Oct 7, 2020 at 8:17 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> v4: test for CEPH_MOUNT_RECOVER in more places
> v3: add RECOVER mount_state and allow dumping pagecache when it's set
>     shrink size of mount_state field
> v2: fix handling of async requests in patch to queue requests
>
> This is the fourth revision of this patchset. The main difference from
> v3 is that this one converts more "==" tests for SHUTDOWN state into
> ">=", so that the RECOVER state is treated the same way.
>
> Original cover letter:
>
> Ilya noticed that he would get spurious EACCES errors on calls done just
> after blocklisting the client on mounts with recover_session=clean. The
> session would get marked as REJECTED and that caused in-flight calls to
> die with EACCES. This patchset seems to smooth over the problem, but I'm
> not fully convinced it's the right approach.
>
> The potential issue I see is that the client could take cap references to
> do a call on a session that has been blocklisted. We then queue the
> message and reestablish the session, but we may not have been granted
> the same caps by the MDS at that point.
>
> If this is a problem, then we probably need to rework it so that we
> return a distinct error code in this situation and have the upper layers
> issue a completely new mds request (with new cap refs, etc.)
>
> Obviously, that's a much more invasive approach though, so it would be
> nice to avoid that if this would suffice.
>
> Jeff Layton (5):
>   ceph: don't WARN when removing caps due to blocklisting
>   ceph: make fsc->mount_state an int
>   ceph: add new RECOVER mount_state when recovering session
>   ceph: remove timeout on allowing reconnect after blocklisting
>   ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set
>

series
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>

>  fs/ceph/addr.c               |  4 ++--
>  fs/ceph/caps.c               |  4 ++--
>  fs/ceph/inode.c              |  2 +-
>  fs/ceph/mds_client.c         | 27 ++++++++++++++++-----------
>  fs/ceph/super.c              | 14 ++++++++++----
>  fs/ceph/super.h              |  3 +--
>  include/linux/ceph/libceph.h |  1 +
>  7 files changed, 33 insertions(+), 22 deletions(-)
>
> --
> 2.26.2
>