mbox series

[0/3] mdsmap: fix mds choosing

Message ID 20191120082902.38666-1-xiubli@redhat.com (mailing list archive)
Headers show
Series mdsmap: fix mds choosing | expand

Message

Xiubo Li Nov. 20, 2019, 8:28 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

Xiubo Li (3):
  mdsmap: add more debug info when decoding
  mdsmap: fix mdsmap cluster available check based on laggy number
  mdsmap: only choose one MDS who is in up:active state without laggy

 fs/ceph/mds_client.c |  6 ++++--
 fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
 2 files changed, 22 insertions(+), 11 deletions(-)

Comments

Jeffrey Layton Nov. 20, 2019, 1:50 p.m. UTC | #1
On Wed, 2019-11-20 at 03:28 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Xiubo Li (3):
>   mdsmap: add more debug info when decoding
>   mdsmap: fix mdsmap cluster available check based on laggy number
>   mdsmap: only choose one MDS who is in up:active state without laggy
> 
>  fs/ceph/mds_client.c |  6 ++++--
>  fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
>  2 files changed, 22 insertions(+), 11 deletions(-)
> 

These all look good to me. I'll plan to merge them for v5.5, unless
anyone else sees issues with them.

Thanks!
Yan, Zheng Nov. 21, 2019, 2:42 a.m. UTC | #2
On 11/20/19 9:50 PM, Jeff Layton wrote:
> On Wed, 2019-11-20 at 03:28 -0500, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Xiubo Li (3):
>>    mdsmap: add more debug info when decoding
>>    mdsmap: fix mdsmap cluster available check based on laggy number
>>    mdsmap: only choose one MDS who is in up:active state without laggy
>>
>>   fs/ceph/mds_client.c |  6 ++++--
>>   fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
> 
> These all look good to me. I'll plan to merge them for v5.5, unless
> anyone else sees issues with them.
> 
> Thanks!
> 

Main problem of this series is that we need to distinguish between mds 
crash and transient mds laggy.
Xiubo Li Nov. 21, 2019, 11:28 a.m. UTC | #3
On 2019/11/21 10:42, Yan, Zheng wrote:
> On 11/20/19 9:50 PM, Jeff Layton wrote:
>> On Wed, 2019-11-20 at 03:28 -0500, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> Xiubo Li (3):
>>>    mdsmap: add more debug info when decoding
>>>    mdsmap: fix mdsmap cluster available check based on laggy number
>>>    mdsmap: only choose one MDS who is in up:active state without laggy
>>>
>>>   fs/ceph/mds_client.c |  6 ++++--
>>>   fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
>>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>>
>>
>> These all look good to me. I'll plan to merge them for v5.5, unless
>> anyone else sees issues with them.
>>
>> Thanks!
>>
>
> Main problem of this series is that we need to distinguish between mds 
> crash and transient mds laggy.

How about let's try to check and get an up:active & !laggy mds first, if 
we couldn't find one then fall back to one that is up:active & laggy ?

For the auth mds case, we will ignore the laggy stuff.


BRs
Jeffrey Layton Nov. 21, 2019, 5:28 p.m. UTC | #4
On Thu, 2019-11-21 at 19:28 +0800, Xiubo Li wrote:
> On 2019/11/21 10:42, Yan, Zheng wrote:
> > On 11/20/19 9:50 PM, Jeff Layton wrote:
> > > On Wed, 2019-11-20 at 03:28 -0500, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > Xiubo Li (3):
> > > >    mdsmap: add more debug info when decoding
> > > >    mdsmap: fix mdsmap cluster available check based on laggy number
> > > >    mdsmap: only choose one MDS who is in up:active state without laggy
> > > > 
> > > >   fs/ceph/mds_client.c |  6 ++++--
> > > >   fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
> > > >   2 files changed, 22 insertions(+), 11 deletions(-)
> > > > 
> > > 
> > > These all look good to me. I'll plan to merge them for v5.5, unless
> > > anyone else sees issues with them.
> > > 
> > > Thanks!
> > > 
> > 
> > Main problem of this series is that we need to distinguish between mds 
> > crash and transient mds laggy.
> 
> How about let's try to check and get an up:active & !laggy mds first, if 
> we couldn't find one then fall back to one that is up:active & laggy ?
> 
> For the auth mds case, we will ignore the laggy stuff.
> 

Ok. I've dropped this series for now with the expectation that you'll
re-post when you have something ready.

Cheers,