[0/9] ceph: auto reconnect after blacklisted

Message ID	20190703124442.6614-1-zyan@redhat.com (mailing list archive)
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: "Yan, Zheng" <zyan@redhat.com> To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" <zyan@redhat.com> Subject: [PATCH 0/9] ceph: auto reconnect after blacklisted Date: Wed, 3 Jul 2019 20:44:33 +0800 Message-Id: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk
Series	ceph: auto reconnect after blacklisted \| expand [0/9] ceph: auto reconnect after blacklisted [1/9] libceph: add function that reset client's entity addr [2/9] libceph: add function that clears osd client's abort_err [3/9] ceph: allow closing session in restarting/reconnect state [4/9] ceph: track and report error of async metadata operation [5/9] ceph: pass filp to ceph_get_caps() [6/9] ceph: return -EIO if read/write against filp that lost file locks [7/9] ceph: add 'force_reconnect' option for remount [8/9] ceph: invalidate all write mode filp after reconnect [9/9] ceph: auto reconnect after blacklisted

Yan, Zheng July 3, 2019, 12:44 p.m. UTC

This series add support for auto reconnect after blacklisted.

Auto reconnect is controlled by recover_session=<clean|no> mount option.
Clean mode is enabled by default. In this mode, client drops dirty date
and dirty metadata, All writable file handles are invalidated. Read-only
file handles continue to work and caches are dropped if necessary.

If an inode contains any lost file lock, read and write are not allowed.
until all lost file locks are released.

Yan, Zheng (9):
  libceph: add function that reset client's entity addr
  libceph: add function that clears osd client's abort_err
  ceph: allow closing session in restarting/reconnect state
  ceph: track and report error of async metadata operation
  ceph: pass filp to ceph_get_caps()
  ceph: return -EIO if read/write against filp that lost file locks
  ceph: add 'force_reconnect' option for remount
  ceph: invalidate all write mode filp after reconnect
  ceph: auto reconnect after blacklisted

 fs/ceph/addr.c                  | 30 +++++++----
 fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
 fs/ceph/file.c                  | 50 ++++++++++--------
 fs/ceph/inode.c                 |  2 +
 fs/ceph/locks.c                 |  8 ++-
 fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
 fs/ceph/mds_client.h            |  6 +--
 fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
 fs/ceph/super.h                 | 23 +++++++--
 include/linux/ceph/libceph.h    |  1 +
 include/linux/ceph/messenger.h  |  1 +
 include/linux/ceph/mon_client.h |  1 +
 include/linux/ceph/osd_client.h |  2 +
 net/ceph/ceph_common.c          | 38 +++++++++-----
 net/ceph/messenger.c            |  5 ++
 net/ceph/mon_client.c           |  7 +++
 net/ceph/osd_client.c           | 24 +++++++++
 17 files changed, 365 insertions(+), 100 deletions(-)

Jeff Layton July 3, 2019, 4:01 p.m. UTC | #1

On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> This series add support for auto reconnect after blacklisted.
> 
> Auto reconnect is controlled by recover_session=<clean|no> mount option.
> Clean mode is enabled by default. In this mode, client drops dirty date
> and dirty metadata, All writable file handles are invalidated. Read-only
> file handles continue to work and caches are dropped if necessary.
> If an inode contains any lost file lock, read and write are not allowed.
> until all lost file locks are released.

Just giving this a quick glance:

Based on the last email discussion about this, I thought that you were
going to provide a mount option that someone could enable that would
basically allow the client to "soldier on" in the face of being
blacklisted and then unblacklisted, without needing to remount anything.

This set seems to keep the force_reconnect option (patch #7) though, so
I'm quite confused at this point. What exactly is the goal of here?

There's also nothing in the changelogs or comments about
recover_session=brute, which seems like it ought to at least be
mentioned.

At this point, I'm going to say NAK on this set until there is some
accompanying documentation about how you intend for this be used and by
whom. A patch for the mount.ceph(8) manpage would be a good place to
start.

> Yan, Zheng (9):
>   libceph: add function that reset client's entity addr
>   libceph: add function that clears osd client's abort_err
>   ceph: allow closing session in restarting/reconnect state
>   ceph: track and report error of async metadata operation
>   ceph: pass filp to ceph_get_caps()
>   ceph: return -EIO if read/write against filp that lost file locks
>   ceph: add 'force_reconnect' option for remount
>   ceph: invalidate all write mode filp after reconnect
>   ceph: auto reconnect after blacklisted
> 
>  fs/ceph/addr.c                  | 30 +++++++----
>  fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
>  fs/ceph/file.c                  | 50 ++++++++++--------
>  fs/ceph/inode.c                 |  2 +
>  fs/ceph/locks.c                 |  8 ++-
>  fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
>  fs/ceph/mds_client.h            |  6 +--
>  fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
>  fs/ceph/super.h                 | 23 +++++++--
>  include/linux/ceph/libceph.h    |  1 +
>  include/linux/ceph/messenger.h  |  1 +
>  include/linux/ceph/mon_client.h |  1 +
>  include/linux/ceph/osd_client.h |  2 +
>  net/ceph/ceph_common.c          | 38 +++++++++-----
>  net/ceph/messenger.c            |  5 ++
>  net/ceph/mon_client.c           |  7 +++
>  net/ceph/osd_client.c           | 24 +++++++++
>  17 files changed, 365 insertions(+), 100 deletions(-)
>

Jeff Layton July 3, 2019, 7:25 p.m. UTC | #2

On Wed, 2019-07-03 at 12:01 -0400, Jeff Layton wrote:
> On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > This series add support for auto reconnect after blacklisted.
> > 
> > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > Clean mode is enabled by default. In this mode, client drops dirty date
> > and dirty metadata, All writable file handles are invalidated. Read-only
> > file handles continue to work and caches are dropped if necessary.
> > If an inode contains any lost file lock, read and write are not allowed.
> > until all lost file locks are released.
> 
> Just giving this a quick glance:
> 
> Based on the last email discussion about this, I thought that you were
> going to provide a mount option that someone could enable that would
> basically allow the client to "soldier on" in the face of being
> blacklisted and then unblacklisted, without needing to remount anything.
> 
> This set seems to keep the force_reconnect option (patch #7) though, so
> I'm quite confused at this point. What exactly is the goal of here?
> 

s/of here/of this/

> There's also nothing in the changelogs or comments about
> recover_session=brute, which seems like it ought to at least be
> mentioned.
> 
> At this point, I'm going to say NAK on this set until there is some
> accompanying documentation about how you intend for this be used and by
> whom. A patch for the mount.ceph(8) manpage would be a good place to
> start.
> 

So to be clear, I'd suggest (at a minimum):

* writing an RFC patch for the mount.ceph manpage that explains what
these options are and the expected behavior when they are used. In fact,
I'd start with this before you do any more work on the code. We need to
agree on the user-facing interface before anything else.

* consider dropping the force_reconnect option. It seems like this
shouldn't be needed if you've mounted with recover_session=brute. If you
do decide to keep it, then this also needs to be clearly documented, or
someone will attempt to use this to unwedge their client at some point
without fully understanding the ramifications.

* the cover letter should have a clear explanation of what this patchset
is intended to do, and why you're adding it. Yes, I know we've discussed
this on the list before, but in 5 years when someone new is revisiting
the behavior and we've all moved on to do other things and paged this
out of our brains, we want them to be able to find that rationale, and
the associated discussion around it by searching the mailing list
archives.

* I'll also note that this is going to cause a behavior change. Clients
that have been blacklisted and then unblacklisted are going to see
different behavior if they're running on a kernel with this set. That
may be expected and what we want, but it should be clearly stated in the
changelog of the patch that makes this change. The manpage should
probably have a nice detailed section on all of this as well, so that we
can explain the expected behavior on various kernel versions.

> > Yan, Zheng (9):
> >   libceph: add function that reset client's entity addr
> >   libceph: add function that clears osd client's abort_err
> >   ceph: allow closing session in restarting/reconnect state
> >   ceph: track and report error of async metadata operation
> >   ceph: pass filp to ceph_get_caps()
> >   ceph: return -EIO if read/write against filp that lost file locks
> >   ceph: add 'force_reconnect' option for remount
> >   ceph: invalidate all write mode filp after reconnect
> >   ceph: auto reconnect after blacklisted
> > 
> >  fs/ceph/addr.c                  | 30 +++++++----
> >  fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> >  fs/ceph/file.c                  | 50 ++++++++++--------
> >  fs/ceph/inode.c                 |  2 +
> >  fs/ceph/locks.c                 |  8 ++-
> >  fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> >  fs/ceph/mds_client.h            |  6 +--
> >  fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> >  fs/ceph/super.h                 | 23 +++++++--
> >  include/linux/ceph/libceph.h    |  1 +
> >  include/linux/ceph/messenger.h  |  1 +
> >  include/linux/ceph/mon_client.h |  1 +
> >  include/linux/ceph/osd_client.h |  2 +
> >  net/ceph/ceph_common.c          | 38 +++++++++-----
> >  net/ceph/messenger.c            |  5 ++
> >  net/ceph/mon_client.c           |  7 +++
> >  net/ceph/osd_client.c           | 24 +++++++++
> >  17 files changed, 365 insertions(+), 100 deletions(-)
> >

Yan, Zheng July 4, 2019, 1:30 a.m. UTC | #3

On 7/4/19 12:01 AM, Jeff Layton wrote:
> On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
>> This series add support for auto reconnect after blacklisted.
>>
>> Auto reconnect is controlled by recover_session=<clean|no> mount option.
>> Clean mode is enabled by default. In this mode, client drops dirty date
>> and dirty metadata, All writable file handles are invalidated. Read-only
>> file handles continue to work and caches are dropped if necessary.
>> If an inode contains any lost file lock, read and write are not allowed.
>> until all lost file locks are released.
> 
> Just giving this a quick glance:
> 
> Based on the last email discussion about this, I thought that you were
> going to provide a mount option that someone could enable that would
> basically allow the client to "soldier on" in the face of being
> blacklisted and then unblacklisted, without needing to remount anything.
> 
> This set seems to keep the force_reconnect option (patch #7) though, so
> I'm quite confused at this point. What exactly is the goal of here?
> 

because auto reconnect can be disabled, force_reconnect is the manual 
way to fix blacklistd mount.

> There's also nothing in the changelogs or comments about
> recover_session=brute, which seems like it ought to at least be
> mentioned.

brute code is not enabled yet
> 
> At this point, I'm going to say NAK on this set until there is some
> accompanying documentation about how you intend for this be used and by
> whom. A patch for the mount.ceph(8) manpage would be a good place to
> start.
> 
>> Yan, Zheng (9):
>>    libceph: add function that reset client's entity addr
>>    libceph: add function that clears osd client's abort_err
>>    ceph: allow closing session in restarting/reconnect state
>>    ceph: track and report error of async metadata operation
>>    ceph: pass filp to ceph_get_caps()
>>    ceph: return -EIO if read/write against filp that lost file locks
>>    ceph: add 'force_reconnect' option for remount
>>    ceph: invalidate all write mode filp after reconnect
>>    ceph: auto reconnect after blacklisted
>>
>>   fs/ceph/addr.c                  | 30 +++++++----
>>   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
>>   fs/ceph/file.c                  | 50 ++++++++++--------
>>   fs/ceph/inode.c                 |  2 +
>>   fs/ceph/locks.c                 |  8 ++-
>>   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
>>   fs/ceph/mds_client.h            |  6 +--
>>   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
>>   fs/ceph/super.h                 | 23 +++++++--
>>   include/linux/ceph/libceph.h    |  1 +
>>   include/linux/ceph/messenger.h  |  1 +
>>   include/linux/ceph/mon_client.h |  1 +
>>   include/linux/ceph/osd_client.h |  2 +
>>   net/ceph/ceph_common.c          | 38 +++++++++-----
>>   net/ceph/messenger.c            |  5 ++
>>   net/ceph/mon_client.c           |  7 +++
>>   net/ceph/osd_client.c           | 24 +++++++++
>>   17 files changed, 365 insertions(+), 100 deletions(-)
>>
>

Jeff Layton July 4, 2019, 2:26 p.m. UTC | #4

On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> On 7/4/19 12:01 AM, Jeff Layton wrote:
> > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > This series add support for auto reconnect after blacklisted.
> > > 
> > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > file handles continue to work and caches are dropped if necessary.
> > > If an inode contains any lost file lock, read and write are not allowed.
> > > until all lost file locks are released.
> > 
> > Just giving this a quick glance:
> > 
> > Based on the last email discussion about this, I thought that you were
> > going to provide a mount option that someone could enable that would
> > basically allow the client to "soldier on" in the face of being
> > blacklisted and then unblacklisted, without needing to remount anything.
> > 
> > This set seems to keep the force_reconnect option (patch #7) though, so
> > I'm quite confused at this point. What exactly is the goal of here?
> > 
> 
> because auto reconnect can be disabled, force_reconnect is the manual 
> way to fix blacklistd mount.
> 

Why not instead allow remounting with a different recover_session= mode?
Then you wouldn't need this option that's only valid during a remount.
That seems like a more natural way to use a new mount option.

> > There's also nothing in the changelogs or comments about
> > recover_session=brute, which seems like it ought to at least be
> > mentioned.
> 
> brute code is not enabled yet

Got it -- I missed that that the mount option for it was commented out.

Given that this is a user interface change, I think it'd be best to not
merge merge this until it's complete. Otherwise we'll have to deal with
intermediate kernel versions that don't implement some parts of the new
interface. That's makes it more difficult for admins to use (and for us
to document).

> > At this point, I'm going to say NAK on this set until there is some
> > accompanying documentation about how you intend for this be used and by
> > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > start.
> > 
> > > Yan, Zheng (9):
> > >    libceph: add function that reset client's entity addr
> > >    libceph: add function that clears osd client's abort_err
> > >    ceph: allow closing session in restarting/reconnect state
> > >    ceph: track and report error of async metadata operation
> > >    ceph: pass filp to ceph_get_caps()
> > >    ceph: return -EIO if read/write against filp that lost file locks
> > >    ceph: add 'force_reconnect' option for remount
> > >    ceph: invalidate all write mode filp after reconnect
> > >    ceph: auto reconnect after blacklisted
> > > 
> > >   fs/ceph/addr.c                  | 30 +++++++----
> > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > >   fs/ceph/inode.c                 |  2 +
> > >   fs/ceph/locks.c                 |  8 ++-
> > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > >   fs/ceph/mds_client.h            |  6 +--
> > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > >   fs/ceph/super.h                 | 23 +++++++--
> > >   include/linux/ceph/libceph.h    |  1 +
> > >   include/linux/ceph/messenger.h  |  1 +
> > >   include/linux/ceph/mon_client.h |  1 +
> > >   include/linux/ceph/osd_client.h |  2 +
> > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > >   net/ceph/messenger.c            |  5 ++
> > >   net/ceph/mon_client.c           |  7 +++
> > >   net/ceph/osd_client.c           | 24 +++++++++
> > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > >

Yan, Zheng July 5, 2019, 1:17 a.m. UTC | #5

On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > This series add support for auto reconnect after blacklisted.
> > > >
> > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > file handles continue to work and caches are dropped if necessary.
> > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > until all lost file locks are released.
> > >
> > > Just giving this a quick glance:
> > >
> > > Based on the last email discussion about this, I thought that you were
> > > going to provide a mount option that someone could enable that would
> > > basically allow the client to "soldier on" in the face of being
> > > blacklisted and then unblacklisted, without needing to remount anything.
> > >
> > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > I'm quite confused at this point. What exactly is the goal of here?
> > >
> >
> > because auto reconnect can be disabled, force_reconnect is the manual
> > way to fix blacklistd mount.
> >
>
> Why not instead allow remounting with a different recover_session= mode?
> Then you wouldn't need this option that's only valid during a remount.
> That seems like a more natural way to use a new mount option.
>

you mean something like 'recover_session=now' for remount?


> > > There's also nothing in the changelogs or comments about
> > > recover_session=brute, which seems like it ought to at least be
> > > mentioned.
> >
> > brute code is not enabled yet
>
> Got it -- I missed that that the mount option for it was commented out.
>
> Given that this is a user interface change, I think it'd be best to not
> merge merge this until it's complete. Otherwise we'll have to deal with
> intermediate kernel versions that don't implement some parts of the new
> interface. That's makes it more difficult for admins to use (and for us
> to document).
>
> > > At this point, I'm going to say NAK on this set until there is some
> > > accompanying documentation about how you intend for this be used and by
> > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > start.
> > >
> > > > Yan, Zheng (9):
> > > >    libceph: add function that reset client's entity addr
> > > >    libceph: add function that clears osd client's abort_err
> > > >    ceph: allow closing session in restarting/reconnect state
> > > >    ceph: track and report error of async metadata operation
> > > >    ceph: pass filp to ceph_get_caps()
> > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > >    ceph: add 'force_reconnect' option for remount
> > > >    ceph: invalidate all write mode filp after reconnect
> > > >    ceph: auto reconnect after blacklisted
> > > >
> > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > >   fs/ceph/inode.c                 |  2 +
> > > >   fs/ceph/locks.c                 |  8 ++-
> > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > >   fs/ceph/mds_client.h            |  6 +--
> > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > >   fs/ceph/super.h                 | 23 +++++++--
> > > >   include/linux/ceph/libceph.h    |  1 +
> > > >   include/linux/ceph/messenger.h  |  1 +
> > > >   include/linux/ceph/mon_client.h |  1 +
> > > >   include/linux/ceph/osd_client.h |  2 +
> > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > >   net/ceph/messenger.c            |  5 ++
> > > >   net/ceph/mon_client.c           |  7 +++
> > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 5, 2019, 10:16 a.m. UTC | #6

On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > This series add support for auto reconnect after blacklisted.
> > > > > 
> > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > file handles continue to work and caches are dropped if necessary.
> > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > until all lost file locks are released.
> > > > 
> > > > Just giving this a quick glance:
> > > > 
> > > > Based on the last email discussion about this, I thought that you were
> > > > going to provide a mount option that someone could enable that would
> > > > basically allow the client to "soldier on" in the face of being
> > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > 
> > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > 
> > > 
> > > because auto reconnect can be disabled, force_reconnect is the manual
> > > way to fix blacklistd mount.
> > > 
> > 
> > Why not instead allow remounting with a different recover_session= mode?
> > Then you wouldn't need this option that's only valid during a remount.
> > That seems like a more natural way to use a new mount option.
> > 
> 
> you mean something like 'recover_session=now' for remount?
> 
> 

No, I meant something like:

    -o remount,recover_session=brute

IOW, allow the admin to just change the mount to use a different
recover_session= mode once things are stuck.
 
> > > > There's also nothing in the changelogs or comments about
> > > > recover_session=brute, which seems like it ought to at least be
> > > > mentioned.
> > > 
> > > brute code is not enabled yet
> > 
> > Got it -- I missed that that the mount option for it was commented out.
> > 
> > Given that this is a user interface change, I think it'd be best to not
> > merge merge this until it's complete. Otherwise we'll have to deal with
> > intermediate kernel versions that don't implement some parts of the new
> > interface. That's makes it more difficult for admins to use (and for us
> > to document).
> > 
> > > > At this point, I'm going to say NAK on this set until there is some
> > > > accompanying documentation about how you intend for this be used and by
> > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > start.
> > > > 
> > > > > Yan, Zheng (9):
> > > > >    libceph: add function that reset client's entity addr
> > > > >    libceph: add function that clears osd client's abort_err
> > > > >    ceph: allow closing session in restarting/reconnect state
> > > > >    ceph: track and report error of async metadata operation
> > > > >    ceph: pass filp to ceph_get_caps()
> > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > >    ceph: add 'force_reconnect' option for remount
> > > > >    ceph: invalidate all write mode filp after reconnect
> > > > >    ceph: auto reconnect after blacklisted
> > > > > 
> > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > >   fs/ceph/inode.c                 |  2 +
> > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > >   net/ceph/messenger.c            |  5 ++
> > > > >   net/ceph/mon_client.c           |  7 +++
> > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > 
> > 
> > --
> > Jeff Layton <jlayton@redhat.com>
> >

Yan, Zheng July 5, 2019, 11:26 a.m. UTC | #7

On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > This series add support for auto reconnect after blacklisted.
> > > > > >
> > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > until all lost file locks are released.
> > > > >
> > > > > Just giving this a quick glance:
> > > > >
> > > > > Based on the last email discussion about this, I thought that you were
> > > > > going to provide a mount option that someone could enable that would
> > > > > basically allow the client to "soldier on" in the face of being
> > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > >
> > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > >
> > > >
> > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > way to fix blacklistd mount.
> > > >
> > >
> > > Why not instead allow remounting with a different recover_session= mode?
> > > Then you wouldn't need this option that's only valid during a remount.
> > > That seems like a more natural way to use a new mount option.
> > >
> >
> > you mean something like 'recover_session=now' for remount?
> >
> >
>
> No, I meant something like:
>
>     -o remount,recover_session=brute
>

This is confusing. user may just want to change auto reconnect mode
for backlist event in the future, does not want to force reconnect.

> IOW, allow the admin to just change the mount to use a different
> recover_session= mode once things are stuck.
>
> > > > > There's also nothing in the changelogs or comments about
> > > > > recover_session=brute, which seems like it ought to at least be
> > > > > mentioned.
> > > >
> > > > brute code is not enabled yet
> > >
> > > Got it -- I missed that that the mount option for it was commented out.
> > >
> > > Given that this is a user interface change, I think it'd be best to not
> > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > intermediate kernel versions that don't implement some parts of the new
> > > interface. That's makes it more difficult for admins to use (and for us
> > > to document).
> > >
> > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > accompanying documentation about how you intend for this be used and by
> > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > start.
> > > > >
> > > > > > Yan, Zheng (9):
> > > > > >    libceph: add function that reset client's entity addr
> > > > > >    libceph: add function that clears osd client's abort_err
> > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > >    ceph: track and report error of async metadata operation
> > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > >    ceph: auto reconnect after blacklisted
> > > > > >
> > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > >
> > >
> > > --
> > > Jeff Layton <jlayton@redhat.com>
> > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 5, 2019, 1:22 p.m. UTC | #8

On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > 
> > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > until all lost file locks are released.
> > > > > > 
> > > > > > Just giving this a quick glance:
> > > > > > 
> > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > going to provide a mount option that someone could enable that would
> > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > 
> > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > 
> > > > > 
> > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > way to fix blacklistd mount.
> > > > > 
> > > > 
> > > > Why not instead allow remounting with a different recover_session= mode?
> > > > Then you wouldn't need this option that's only valid during a remount.
> > > > That seems like a more natural way to use a new mount option.
> > > > 
> > > 
> > > you mean something like 'recover_session=now' for remount?
> > > 
> > > 
> > 
> > No, I meant something like:
> > 
> >     -o remount,recover_session=brute
> > 
> 
> This is confusing. user may just want to change auto reconnect mode
> for backlist event in the future, does not want to force reconnect.
> 

Why do we need to allow the admin to manually force a reconnect? If you
(hypothetically) change the mode to "brute" then it should do it on its
own when it detects that it's in this situation, no?

> > IOW, allow the admin to just change the mount to use a different
> > recover_session= mode once things are stuck.
> > 
> > > > > > There's also nothing in the changelogs or comments about
> > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > mentioned.
> > > > > 
> > > > > brute code is not enabled yet
> > > > 
> > > > Got it -- I missed that that the mount option for it was commented out.
> > > > 
> > > > Given that this is a user interface change, I think it'd be best to not
> > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > intermediate kernel versions that don't implement some parts of the new
> > > > interface. That's makes it more difficult for admins to use (and for us
> > > > to document).
> > > > 
> > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > start.
> > > > > > 
> > > > > > > Yan, Zheng (9):
> > > > > > >    libceph: add function that reset client's entity addr
> > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > >    ceph: track and report error of async metadata operation
> > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > 
> > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > 
> > > > 
> > > > --
> > > > Jeff Layton <jlayton@redhat.com>
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@redhat.com>
> >

Yan, Zheng July 8, 2019, 8:43 a.m. UTC | #9

On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > >
> > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > until all lost file locks are released.
> > > > > > >
> > > > > > > Just giving this a quick glance:
> > > > > > >
> > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > >
> > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > >
> > > > > >
> > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > way to fix blacklistd mount.
> > > > > >
> > > > >
> > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > That seems like a more natural way to use a new mount option.
> > > > >
> > > >
> > > > you mean something like 'recover_session=now' for remount?
> > > >
> > > >
> > >
> > > No, I meant something like:
> > >
> > >     -o remount,recover_session=brute
> > >
> >
> > This is confusing. user may just want to change auto reconnect mode
> > for backlist event in the future, does not want to force reconnect.
> >
>
> Why do we need to allow the admin to manually force a reconnect? If you
> (hypothetically) change the mode to "brute" then it should do it on its
> own when it detects that it's in this situation, no?
>

First, auto reconnect is limited to once every 30 seconds. Second,
client may fail to detect that itself is blacklisted. So I think we
still need a way to force client to reconnect

> > > IOW, allow the admin to just change the mount to use a different
> > > recover_session= mode once things are stuck.
> > >
> > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > mentioned.
> > > > > >
> > > > > > brute code is not enabled yet
> > > > >
> > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > >
> > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > to document).
> > > > >
> > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > start.
> > > > > > >
> > > > > > > > Yan, Zheng (9):
> > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > >
> > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > >
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@redhat.com>
> > > > >
> > >
> > > --
> > > Jeff Layton <jlayton@redhat.com>
> > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 8, 2019, 10:59 a.m. UTC | #10

On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > 
> > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > until all lost file locks are released.
> > > > > > > > 
> > > > > > > > Just giving this a quick glance:
> > > > > > > > 
> > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > 
> > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > 
> > > > > > > 
> > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > way to fix blacklistd mount.
> > > > > > > 
> > > > > > 
> > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > That seems like a more natural way to use a new mount option.
> > > > > > 
> > > > > 
> > > > > you mean something like 'recover_session=now' for remount?
> > > > > 
> > > > > 
> > > > 
> > > > No, I meant something like:
> > > > 
> > > >     -o remount,recover_session=brute
> > > > 
> > > 
> > > This is confusing. user may just want to change auto reconnect mode
> > > for backlist event in the future, does not want to force reconnect.
> > > 
> > 
> > Why do we need to allow the admin to manually force a reconnect? If you
> > (hypothetically) change the mode to "brute" then it should do it on its
> > own when it detects that it's in this situation, no?
> > 
> 
> First, auto reconnect is limited to once every 30 seconds. Second,
> client may fail to detect that itself is blacklisted. So I think we
> still need a way to force client to reconnect
> 

How does it detect that it has been blacklisted? Does it do that by
looking at the OSD maps? I'd like to better understand how the client
would recognize this automatically and why it might miss it.

If we do end up needing some sort of control to forcibly reconnect, then
it seems like we might be better off with something besides a mount
option. sysfs control file maybe?

> > > > IOW, allow the admin to just change the mount to use a different
> > > > recover_session= mode once things are stuck.
> > > > 
> > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > mentioned.
> > > > > > > 
> > > > > > > brute code is not enabled yet
> > > > > > 
> > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > 
> > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > to document).
> > > > > > 
> > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > start.
> > > > > > > > 
> > > > > > > > > Yan, Zheng (9):
> > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > 
> > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > 
> > > > 
> > > > --
> > > > Jeff Layton <jlayton@redhat.com>
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@redhat.com>
> >

Yan, Zheng July 8, 2019, 11:34 a.m. UTC | #11

On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > >
> > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > until all lost file locks are released.
> > > > > > > > >
> > > > > > > > > Just giving this a quick glance:
> > > > > > > > >
> > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > >
> > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > >
> > > > > > > >
> > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > way to fix blacklistd mount.
> > > > > > > >
> > > > > > >
> > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > >
> > > > > >
> > > > > > you mean something like 'recover_session=now' for remount?
> > > > > >
> > > > > >
> > > > >
> > > > > No, I meant something like:
> > > > >
> > > > >     -o remount,recover_session=brute
> > > > >
> > > >
> > > > This is confusing. user may just want to change auto reconnect mode
> > > > for backlist event in the future, does not want to force reconnect.
> > > >
> > >
> > > Why do we need to allow the admin to manually force a reconnect? If you
> > > (hypothetically) change the mode to "brute" then it should do it on its
> > > own when it detects that it's in this situation, no?
> > >
> >
> > First, auto reconnect is limited to once every 30 seconds. Second,
> > client may fail to detect that itself is blacklisted. So I think we
> > still need a way to force client to reconnect
> >
>
> How does it detect that it has been blacklisted? Does it do that by
> looking at the OSD maps? I'd like to better understand how the client
> would recognize this automatically and why it might miss it.
>

By checking osd request reply and session reject message from mds.

> If we do end up needing some sort of control to forcibly reconnect, then
> it seems like we might be better off with something besides a mount
> option. sysfs control file maybe?
>

why

> > > > > IOW, allow the admin to just change the mount to use a different
> > > > > recover_session= mode once things are stuck.
> > > > >
> > > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > > mentioned.
> > > > > > > >
> > > > > > > > brute code is not enabled yet
> > > > > > >
> > > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > >
> > > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > > to document).
> > > > > > >
> > > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > > start.
> > > > > > > > >
> > > > > > > > > > Yan, Zheng (9):
> > > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > >
> > > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > >
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@redhat.com>
> > > > >
> > >
> > > --
> > > Jeff Layton <jlayton@redhat.com>
> > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 8, 2019, 11:43 a.m. UTC | #12

On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > 
> > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > 
> > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > 
> > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > 
> > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > 
> > > > > > > 
> > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > No, I meant something like:
> > > > > > 
> > > > > >     -o remount,recover_session=brute
> > > > > > 
> > > > > 
> > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > for backlist event in the future, does not want to force reconnect.
> > > > > 
> > > > 
> > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > own when it detects that it's in this situation, no?
> > > > 
> > > 
> > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > client may fail to detect that itself is blacklisted. So I think we
> > > still need a way to force client to reconnect
> > > 
> > 
> > How does it detect that it has been blacklisted? Does it do that by
> > looking at the OSD maps? I'd like to better understand how the client
> > would recognize this automatically and why it might miss it.
> > 
> 
> By checking osd request reply and session reject message from mds.
> 

Ok, so is the issue is that the client may become blacklisted and
unblacklisted before it sends anything to either server?
 
> > If we do end up needing some sort of control to forcibly reconnect, then
> > it seems like we might be better off with something besides a mount
> > option. sysfs control file maybe?
> > 
> 
> why
> 

Because mount options are generally there to control the behavior of the
filesystem at mount time, and not to cue a mount to do some one-shot
activity. That sort of thing is usually done via other mechanisms
(sysfs, ioctl, etc.).
 
> > > > > > IOW, allow the admin to just change the mount to use a different
> > > > > > recover_session= mode once things are stuck.
> > > > > > 
> > > > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > > > mentioned.
> > > > > > > > > 
> > > > > > > > > brute code is not enabled yet
> > > > > > > > 
> > > > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > > > 
> > > > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > > > to document).
> > > > > > > > 
> > > > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > > > start.
> > > > > > > > > > 
> > > > > > > > > > > Yan, Zheng (9):
> > > > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > > > 
> > > > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > 
> > > > 
> > > > --
> > > > Jeff Layton <jlayton@redhat.com>
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@redhat.com>
> >

Yan, Zheng July 8, 2019, 11:55 a.m. UTC | #13

On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > >
> > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > >
> > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > >
> > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > >
> > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > >
> > > > > > > >
> > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > No, I meant something like:
> > > > > > >
> > > > > > >     -o remount,recover_session=brute
> > > > > > >
> > > > > >
> > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > >
> > > > >
> > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > own when it detects that it's in this situation, no?
> > > > >
> > > >
> > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > client may fail to detect that itself is blacklisted. So I think we
> > > > still need a way to force client to reconnect
> > > >
> > >
> > > How does it detect that it has been blacklisted? Does it do that by
> > > looking at the OSD maps? I'd like to better understand how the client
> > > would recognize this automatically and why it might miss it.
> > >
> >
> > By checking osd request reply and session reject message from mds.
> >
>
> Ok, so is the issue is that the client may become blacklisted and
> unblacklisted before it sends anything to either server?
>

No. The issue is that old version mds does not send session reject
message or no 'error_str=blacklisted' in session reject message.

> > > If we do end up needing some sort of control to forcibly reconnect, then
> > > it seems like we might be better off with something besides a mount
> > > option. sysfs control file maybe?
> > >
> >
> > why
> >
>
> Because mount options are generally there to control the behavior of the
> filesystem at mount time, and not to cue a mount to do some one-shot
> activity. That sort of thing is usually done via other mechanisms
> (sysfs, ioctl, etc.).
>
> > > > > > > IOW, allow the admin to just change the mount to use a different
> > > > > > > recover_session= mode once things are stuck.
> > > > > > >
> > > > > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > > > > mentioned.
> > > > > > > > > >
> > > > > > > > > > brute code is not enabled yet
> > > > > > > > >
> > > > > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > > > >
> > > > > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > > > > to document).
> > > > > > > > >
> > > > > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > > > > start.
> > > > > > > > > > >
> > > > > > > > > > > > Yan, Zheng (9):
> > > > > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > > > >
> > > > > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > >
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@redhat.com>
> > > > >
> > >
> > > --
> > > Jeff Layton <jlayton@redhat.com>
> > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 8, 2019, 1:45 p.m. UTC | #14

On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > 
> > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > 
> > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > 
> > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > No, I meant something like:
> > > > > > > > 
> > > > > > > >     -o remount,recover_session=brute
> > > > > > > > 
> > > > > > > 
> > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > 
> > > > > > 
> > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > own when it detects that it's in this situation, no?
> > > > > > 
> > > > > 
> > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > still need a way to force client to reconnect
> > > > > 
> > > > 
> > > > How does it detect that it has been blacklisted? Does it do that by
> > > > looking at the OSD maps? I'd like to better understand how the client
> > > > would recognize this automatically and why it might miss it.
> > > > 
> > > 
> > > By checking osd request reply and session reject message from mds.
> > > 
> > 
> > Ok, so is the issue is that the client may become blacklisted and
> > unblacklisted before it sends anything to either server?
> > 
> 
> No. The issue is that old version mds does not send session reject
> message or no 'error_str=blacklisted' in session reject message.

Is that the only way to detect that this has happened? What if we were
to simply force a reconnect on any remount? Would that break anything?

> > > > If we do end up needing some sort of control to forcibly reconnect, then
> > > > it seems like we might be better off with something besides a mount
> > > > option. sysfs control file maybe?
> > > > 
> > > 
> > > why
> > > 
> > 
> > Because mount options are generally there to control the behavior of the
> > filesystem at mount time, and not to cue a mount to do some one-shot
> > activity. That sort of thing is usually done via other mechanisms
> > (sysfs, ioctl, etc.).
> > 
> > > > > > > > IOW, allow the admin to just change the mount to use a different
> > > > > > > > recover_session= mode once things are stuck.
> > > > > > > > 
> > > > > > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > > > > > mentioned.
> > > > > > > > > > > 
> > > > > > > > > > > brute code is not enabled yet
> > > > > > > > > > 
> > > > > > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > > > > > 
> > > > > > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > > > > > to document).
> > > > > > > > > > 
> > > > > > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > > > > > start.
> > > > > > > > > > > > 
> > > > > > > > > > > > > Yan, Zheng (9):
> > > > > > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > > > > > 
> > > > > > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > --
> > > > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > 
> > > > 
> > > > --
> > > > Jeff Layton <jlayton@redhat.com>
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@redhat.com>
> >

Yan, Zheng July 9, 2019, 2:14 a.m. UTC | #15

On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > No, I meant something like:
> > > > > > > > >
> > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > >
> > > > > > > >
> > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > >
> > > > > > >
> > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > own when it detects that it's in this situation, no?
> > > > > > >
> > > > > >
> > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > still need a way to force client to reconnect
> > > > > >
> > > > >
> > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > would recognize this automatically and why it might miss it.
> > > > >
> > > >
> > > > By checking osd request reply and session reject message from mds.
> > > >
> > >
> > > Ok, so is the issue is that the client may become blacklisted and
> > > unblacklisted before it sends anything to either server?
> > >
> >
> > No. The issue is that old version mds does not send session reject
> > message or no 'error_str=blacklisted' in session reject message.
>
> Is that the only way to detect that this has happened? What if we were
> to simply force a reconnect on any remount? Would that break anything?
>

why?  reconnect causes all sorts of integrity issues

> > > > > If we do end up needing some sort of control to forcibly reconnect, then
> > > > > it seems like we might be better off with something besides a mount
> > > > > option. sysfs control file maybe?
> > > > >
> > > >
> > > > why
> > > >
> > >
> > > Because mount options are generally there to control the behavior of the
> > > filesystem at mount time, and not to cue a mount to do some one-shot
> > > activity. That sort of thing is usually done via other mechanisms
> > > (sysfs, ioctl, etc.).
> > >
> > > > > > > > > IOW, allow the admin to just change the mount to use a different
> > > > > > > > > recover_session= mode once things are stuck.
> > > > > > > > >
> > > > > > > > > > > > > There's also nothing in the changelogs or comments about
> > > > > > > > > > > > > recover_session=brute, which seems like it ought to at least be
> > > > > > > > > > > > > mentioned.
> > > > > > > > > > > >
> > > > > > > > > > > > brute code is not enabled yet
> > > > > > > > > > >
> > > > > > > > > > > Got it -- I missed that that the mount option for it was commented out.
> > > > > > > > > > >
> > > > > > > > > > > Given that this is a user interface change, I think it'd be best to not
> > > > > > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with
> > > > > > > > > > > intermediate kernel versions that don't implement some parts of the new
> > > > > > > > > > > interface. That's makes it more difficult for admins to use (and for us
> > > > > > > > > > > to document).
> > > > > > > > > > >
> > > > > > > > > > > > > At this point, I'm going to say NAK on this set until there is some
> > > > > > > > > > > > > accompanying documentation about how you intend for this be used and by
> > > > > > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to
> > > > > > > > > > > > > start.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Yan, Zheng (9):
> > > > > > > > > > > > > >    libceph: add function that reset client's entity addr
> > > > > > > > > > > > > >    libceph: add function that clears osd client's abort_err
> > > > > > > > > > > > > >    ceph: allow closing session in restarting/reconnect state
> > > > > > > > > > > > > >    ceph: track and report error of async metadata operation
> > > > > > > > > > > > > >    ceph: pass filp to ceph_get_caps()
> > > > > > > > > > > > > >    ceph: return -EIO if read/write against filp that lost file locks
> > > > > > > > > > > > > >    ceph: add 'force_reconnect' option for remount
> > > > > > > > > > > > > >    ceph: invalidate all write mode filp after reconnect
> > > > > > > > > > > > > >    ceph: auto reconnect after blacklisted
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   fs/ceph/addr.c                  | 30 +++++++----
> > > > > > > > > > > > > >   fs/ceph/caps.c                  | 84 ++++++++++++++++++++----------
> > > > > > > > > > > > > >   fs/ceph/file.c                  | 50 ++++++++++--------
> > > > > > > > > > > > > >   fs/ceph/inode.c                 |  2 +
> > > > > > > > > > > > > >   fs/ceph/locks.c                 |  8 ++-
> > > > > > > > > > > > > >   fs/ceph/mds_client.c            | 92 ++++++++++++++++++++++++++-------
> > > > > > > > > > > > > >   fs/ceph/mds_client.h            |  6 +--
> > > > > > > > > > > > > >   fs/ceph/super.c                 | 91 ++++++++++++++++++++++++++++++--
> > > > > > > > > > > > > >   fs/ceph/super.h                 | 23 +++++++--
> > > > > > > > > > > > > >   include/linux/ceph/libceph.h    |  1 +
> > > > > > > > > > > > > >   include/linux/ceph/messenger.h  |  1 +
> > > > > > > > > > > > > >   include/linux/ceph/mon_client.h |  1 +
> > > > > > > > > > > > > >   include/linux/ceph/osd_client.h |  2 +
> > > > > > > > > > > > > >   net/ceph/ceph_common.c          | 38 +++++++++-----
> > > > > > > > > > > > > >   net/ceph/messenger.c            |  5 ++
> > > > > > > > > > > > > >   net/ceph/mon_client.c           |  7 +++
> > > > > > > > > > > > > >   net/ceph/osd_client.c           | 24 +++++++++
> > > > > > > > > > > > > >   17 files changed, 365 insertions(+), 100 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jeff Layton <jlayton@redhat.com>
> > > > > > >
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@redhat.com>
> > > > >
> > >
> > > --
> > > Jeff Layton <jlayton@redhat.com>
> > >
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 9, 2019, 10:18 a.m. UTC | #16

On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > No, I meant something like:
> > > > > > > > > > 
> > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > 
> > > > > > > 
> > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > still need a way to force client to reconnect
> > > > > > > 
> > > > > > 
> > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > would recognize this automatically and why it might miss it.
> > > > > > 
> > > > > 
> > > > > By checking osd request reply and session reject message from mds.
> > > > > 
> > > > 
> > > > Ok, so is the issue is that the client may become blacklisted and
> > > > unblacklisted before it sends anything to either server?
> > > > 
> > > 
> > > No. The issue is that old version mds does not send session reject
> > > message or no 'error_str=blacklisted' in session reject message.
> > 
> > Is that the only way to detect that this has happened? What if we were
> > to simply force a reconnect on any remount? Would that break anything?
> > 
> 
> why?  reconnect causes all sorts of integrity issues
> 

Care to elaborate?

My understanding was that the fact that the MDS journaled everything
meant that the client would be able to reclaim all of its state if the
MDS crashed and restarted, or we had a momentary loss of connection. Is
that not the case?

Either way, remounts should be _very_ rare events, almost always
performed manually by an administrator. I suggested this under the
assumption that an immediate reconnection might just be a small blip in
performance. If there are data integrity issues when this occurs then
that seems like a bigger problem.

Yan, Zheng July 9, 2019, 1:31 p.m. UTC | #17

On Tue, Jul 9, 2019 at 6:18 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> > On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > No, I meant something like:
> > > > > > > > > > >
> > > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > >
> > > > > > > >
> > > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > > still need a way to force client to reconnect
> > > > > > > >
> > > > > > >
> > > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > > would recognize this automatically and why it might miss it.
> > > > > > >
> > > > > >
> > > > > > By checking osd request reply and session reject message from mds.
> > > > > >
> > > > >
> > > > > Ok, so is the issue is that the client may become blacklisted and
> > > > > unblacklisted before it sends anything to either server?
> > > > >
> > > >
> > > > No. The issue is that old version mds does not send session reject
> > > > message or no 'error_str=blacklisted' in session reject message.
> > >
> > > Is that the only way to detect that this has happened? What if we were
> > > to simply force a reconnect on any remount? Would that break anything?
> > >
> >
> > why?  reconnect causes all sorts of integrity issues
> >
>
> Care to elaborate?
>
> My understanding was that the fact that the MDS journaled everything
> meant that the client would be able to reclaim all of its state if the
> MDS crashed and restarted, or we had a momentary loss of connection. Is
> that not the case?
>
> Either way, remounts should be _very_ rare events, almost always
> performed manually by an administrator. I suggested this under the
> assumption that an immediate reconnection might just be a small blip in
> performance. If there are data integrity issues when this occurs then
> that seems like a bigger problem.

If reconnect means 're-open mds sessions',  mds lose track of caps and
file locks after reconnect.  It's similar to the situation that client
get blacklisted.

> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 9, 2019, 2:17 p.m. UTC | #18

On Tue, 2019-07-09 at 21:31 +0800, Yan, Zheng wrote:
> On Tue, Jul 9, 2019 at 6:18 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> > > On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > No, I meant something like:
> > > > > > > > > > > > 
> > > > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > > > still need a way to force client to reconnect
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > > > would recognize this automatically and why it might miss it.
> > > > > > > > 
> > > > > > > 
> > > > > > > By checking osd request reply and session reject message from mds.
> > > > > > > 
> > > > > > 
> > > > > > Ok, so is the issue is that the client may become blacklisted and
> > > > > > unblacklisted before it sends anything to either server?
> > > > > > 
> > > > > 
> > > > > No. The issue is that old version mds does not send session reject
> > > > > message or no 'error_str=blacklisted' in session reject message.
> > > > 
> > > > Is that the only way to detect that this has happened? What if we were
> > > > to simply force a reconnect on any remount? Would that break anything?
> > > > 
> > > 
> > > why?  reconnect causes all sorts of integrity issues
> > > 
> > 
> > Care to elaborate?
> > 
> > My understanding was that the fact that the MDS journaled everything
> > meant that the client would be able to reclaim all of its state if the
> > MDS crashed and restarted, or we had a momentary loss of connection. Is
> > that not the case?
> > 
> > Either way, remounts should be _very_ rare events, almost always
> > performed manually by an administrator. I suggested this under the
> > assumption that an immediate reconnection might just be a small blip in
> > performance. If there are data integrity issues when this occurs then
> > that seems like a bigger problem.
> 
> If reconnect means 're-open mds sessions',  mds lose track of caps and
> file locks after reconnect.  It's similar to the situation that client
> get blacklisted.
> 

I don't have a great grasp of the way state recovery works with cephfs,
so please bear with me here...

Suppose I have a client with a bunch of caps and file locks, and the MDS
crashes and is restarted. Will the client be able to reclaim those
caps/locks in some fashion? If so, how is that different from the
situation where the client reconnects its session spuriously?

I'm quite leery of giving admins a knob that may cause data integrity
problems. No other network filesystem requires something like this
force_reconnect button, so I'm rather interested to see if we can come
up with a more conventional way to achieve what you want.

Yan, Zheng July 9, 2019, 3:09 p.m. UTC | #19

On Tue, Jul 9, 2019 at 10:17 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Tue, 2019-07-09 at 21:31 +0800, Yan, Zheng wrote:
> > On Tue, Jul 9, 2019 at 6:18 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> > > > On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > > > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > No, I meant something like:
> > > > > > > > > > > > >
> > > > > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > > > > still need a way to force client to reconnect
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > > > > would recognize this automatically and why it might miss it.
> > > > > > > > >
> > > > > > > >
> > > > > > > > By checking osd request reply and session reject message from mds.
> > > > > > > >
> > > > > > >
> > > > > > > Ok, so is the issue is that the client may become blacklisted and
> > > > > > > unblacklisted before it sends anything to either server?
> > > > > > >
> > > > > >
> > > > > > No. The issue is that old version mds does not send session reject
> > > > > > message or no 'error_str=blacklisted' in session reject message.
> > > > >
> > > > > Is that the only way to detect that this has happened? What if we were
> > > > > to simply force a reconnect on any remount? Would that break anything?
> > > > >
> > > >
> > > > why?  reconnect causes all sorts of integrity issues
> > > >
> > >
> > > Care to elaborate?
> > >
> > > My understanding was that the fact that the MDS journaled everything
> > > meant that the client would be able to reclaim all of its state if the
> > > MDS crashed and restarted, or we had a momentary loss of connection. Is
> > > that not the case?
> > >
> > > Either way, remounts should be _very_ rare events, almost always
> > > performed manually by an administrator. I suggested this under the
> > > assumption that an immediate reconnection might just be a small blip in
> > > performance. If there are data integrity issues when this occurs then
> > > that seems like a bigger problem.
> >
> > If reconnect means 're-open mds sessions',  mds lose track of caps and
> > file locks after reconnect.  It's similar to the situation that client
> > get blacklisted.
> >
>
> I don't have a great grasp of the way state recovery works with cephfs,
> so please bear with me here...
>
> Suppose I have a client with a bunch of caps and file locks, and the MDS
> crashes and is restarted. Will the client be able to reclaim those
> caps/locks in some fashion? If so, how is that different from the
> situation where the client reconnects its session spuriously?
>

client can only reclaim caps/locks when mds is in reconnect state.
'rre-open sessions' is likely to happen when mds is active.

> I'm quite leery of giving admins a knob that may cause data integrity
> problems. No other network filesystem requires something like this
> force_reconnect button, so I'm rather interested to see if we can come
> up with a more conventional way to achieve what you want.


> --
> Jeff Layton <jlayton@redhat.com>
>

Yan, Zheng July 10, 2019, 2:07 a.m. UTC | #20

On Tue, Jul 9, 2019 at 10:17 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Tue, 2019-07-09 at 21:31 +0800, Yan, Zheng wrote:
> > On Tue, Jul 9, 2019 at 6:18 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> > > > On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > > > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > No, I meant something like:
> > > > > > > > > > > > >
> > > > > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > > > > still need a way to force client to reconnect
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > > > > would recognize this automatically and why it might miss it.
> > > > > > > > >
> > > > > > > >
> > > > > > > > By checking osd request reply and session reject message from mds.
> > > > > > > >
> > > > > > >
> > > > > > > Ok, so is the issue is that the client may become blacklisted and
> > > > > > > unblacklisted before it sends anything to either server?
> > > > > > >
> > > > > >
> > > > > > No. The issue is that old version mds does not send session reject
> > > > > > message or no 'error_str=blacklisted' in session reject message.
> > > > >
> > > > > Is that the only way to detect that this has happened? What if we were
> > > > > to simply force a reconnect on any remount? Would that break anything?
> > > > >
> > > >
> > > > why?  reconnect causes all sorts of integrity issues
> > > >
> > >
> > > Care to elaborate?
> > >
> > > My understanding was that the fact that the MDS journaled everything
> > > meant that the client would be able to reclaim all of its state if the
> > > MDS crashed and restarted, or we had a momentary loss of connection. Is
> > > that not the case?
> > >
> > > Either way, remounts should be _very_ rare events, almost always
> > > performed manually by an administrator. I suggested this under the
> > > assumption that an immediate reconnection might just be a small blip in
> > > performance. If there are data integrity issues when this occurs then
> > > that seems like a bigger problem.
> >
> > If reconnect means 're-open mds sessions',  mds lose track of caps and
> > file locks after reconnect.  It's similar to the situation that client
> > get blacklisted.
> >
>
> I don't have a great grasp of the way state recovery works with cephfs,
> so please bear with me here...
>
> Suppose I have a client with a bunch of caps and file locks, and the MDS
> crashes and is restarted. Will the client be able to reclaim those
> caps/locks in some fashion? If so, how is that different from the
> situation where the client reconnects its session spuriously?
>
> I'm quite leery of giving admins a knob that may cause data integrity
> problems. No other network filesystem requires something like this
> force_reconnect button, so I'm rather interested to see if we can come
> up with a more conventional way to achieve what you want.

I guess other network filesystems have loosy cache consistency. In
multiple client case, they don't guarantee the order of flushing dirty
data/metadata.  Cephfs can do the same (client retry flush dirty
data/metadata after reconnect) by disabling blacklist_on_eviction.

> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff Layton July 10, 2019, 10:27 a.m. UTC | #21

On Wed, 2019-07-10 at 10:07 +0800, Yan, Zheng wrote:
> On Tue, Jul 9, 2019 at 10:17 PM Jeff Layton <jlayton@redhat.com> wrote:
> > On Tue, 2019-07-09 at 21:31 +0800, Yan, Zheng wrote:
> > > On Tue, Jul 9, 2019 at 6:18 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > On Tue, 2019-07-09 at 10:14 +0800, Yan, Zheng wrote:
> > > > > On Mon, Jul 8, 2019 at 9:45 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Mon, 2019-07-08 at 19:55 +0800, Yan, Zheng wrote:
> > > > > > > On Mon, Jul 8, 2019 at 7:43 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > On Mon, 2019-07-08 at 19:34 +0800, Yan, Zheng wrote:
> > > > > > > > > On Mon, Jul 8, 2019 at 6:59 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote:
> > > > > > > > > > > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > > > > > > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote:
> > > > > > > > > > > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote:
> > > > > > > > > > > > > > > > > > > This series add support for auto reconnect after blacklisted.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option.
> > > > > > > > > > > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date
> > > > > > > > > > > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only
> > > > > > > > > > > > > > > > > > > file handles continue to work and caches are dropped if necessary.
> > > > > > > > > > > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed.
> > > > > > > > > > > > > > > > > > > until all lost file locks are released.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Just giving this a quick glance:
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were
> > > > > > > > > > > > > > > > > > going to provide a mount option that someone could enable that would
> > > > > > > > > > > > > > > > > > basically allow the client to "soldier on" in the face of being
> > > > > > > > > > > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so
> > > > > > > > > > > > > > > > > > I'm quite confused at this point. What exactly is the goal of here?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual
> > > > > > > > > > > > > > > > > way to fix blacklistd mount.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode?
> > > > > > > > > > > > > > > > Then you wouldn't need this option that's only valid during a remount.
> > > > > > > > > > > > > > > > That seems like a more natural way to use a new mount option.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > you mean something like 'recover_session=now' for remount?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > No, I meant something like:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     -o remount,recover_session=brute
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This is confusing. user may just want to change auto reconnect mode
> > > > > > > > > > > > > for backlist event in the future, does not want to force reconnect.
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you
> > > > > > > > > > > > (hypothetically) change the mode to "brute" then it should do it on its
> > > > > > > > > > > > own when it detects that it's in this situation, no?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > First, auto reconnect is limited to once every 30 seconds. Second,
> > > > > > > > > > > client may fail to detect that itself is blacklisted. So I think we
> > > > > > > > > > > still need a way to force client to reconnect
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > How does it detect that it has been blacklisted? Does it do that by
> > > > > > > > > > looking at the OSD maps? I'd like to better understand how the client
> > > > > > > > > > would recognize this automatically and why it might miss it.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > By checking osd request reply and session reject message from mds.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Ok, so is the issue is that the client may become blacklisted and
> > > > > > > > unblacklisted before it sends anything to either server?
> > > > > > > > 
> > > > > > > 
> > > > > > > No. The issue is that old version mds does not send session reject
> > > > > > > message or no 'error_str=blacklisted' in session reject message.
> > > > > > 
> > > > > > Is that the only way to detect that this has happened? What if we were
> > > > > > to simply force a reconnect on any remount? Would that break anything?
> > > > > > 
> > > > > 
> > > > > why?  reconnect causes all sorts of integrity issues
> > > > > 
> > > > 
> > > > Care to elaborate?
> > > > 
> > > > My understanding was that the fact that the MDS journaled everything
> > > > meant that the client would be able to reclaim all of its state if the
> > > > MDS crashed and restarted, or we had a momentary loss of connection. Is
> > > > that not the case?
> > > > 
> > > > Either way, remounts should be _very_ rare events, almost always
> > > > performed manually by an administrator. I suggested this under the
> > > > assumption that an immediate reconnection might just be a small blip in
> > > > performance. If there are data integrity issues when this occurs then
> > > > that seems like a bigger problem.
> > > 
> > > If reconnect means 're-open mds sessions',  mds lose track of caps and
> > > file locks after reconnect.  It's similar to the situation that client
> > > get blacklisted.
> > > 
> > 
> > I don't have a great grasp of the way state recovery works with cephfs,
> > so please bear with me here...
> > 
> > Suppose I have a client with a bunch of caps and file locks, and the MDS
> > crashes and is restarted. Will the client be able to reclaim those
> > caps/locks in some fashion? If so, how is that different from the
> > situation where the client reconnects its session spuriously?
> > 
> > I'm quite leery of giving admins a knob that may cause data integrity
> > problems. No other network filesystem requires something like this
> > force_reconnect button, so I'm rather interested to see if we can come
> > up with a more conventional way to achieve what you want.
> 
> I guess other network filesystems have loosy cache consistency. In
> multiple client case, they don't guarantee the order of flushing dirty
> data/metadata.  Cephfs can do the same (client retry flush dirty
> data/metadata after reconnect) by disabling blacklist_on_eviction.
> 

True, but I'm not sure that's really relevant here.
blacklist_on_eviction is an MDS parameter, so that would have to be
done for all clients or none of them, correct?

I think your earlier email convinced me that we _do_ need a knob of
some sort to force the client to reconnect when dealing with an older
MDS that doesn't allow the client to detect when it has been
blacklisted.

I think though that a mount option is probably the wrong place to do
that. Would it be possible to change that to done via some other
method? Maybe a new force_reconnect file under debugfs that the admin
could write to to trigger that?

To be clear, I think we do still want to have the recover_session=
mount option too -- the debugfs file (or whatever) would just be needed
to handle the legacy MDS case.

All that said, I'd still like to see a manpage update before we merge
any of this as well. Let's make sure we have the user interface correct
as that's the really difficult part to change later if we decide it's
not right.

[0/9] ceph: auto reconnect after blacklisted

Message

Comments