diff mbox series

[2/3] mdsmap: fix mdsmap cluster available check based on laggy number

Message ID 20191120082902.38666-3-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series mdsmap: fix mds choosing | expand

Commit Message

Xiubo Li Nov. 20, 2019, 8:29 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

In case the max_mds > 1 in MDS cluster and there is no any standby
MDS and all the max_mds MDSs are in up:active state, if one of the
up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
Then the mount will fail without considering other healthy MDSs.

Only when all the MDSs in the cluster are laggy will treat the
cluster as not be available.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mdsmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff Layton Nov. 21, 2019, 5:30 p.m. UTC | #1
On Wed, 2019-11-20 at 03:29 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> In case the max_mds > 1 in MDS cluster and there is no any standby
> MDS and all the max_mds MDSs are in up:active state, if one of the
> up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
> Then the mount will fail without considering other healthy MDSs.
> 
> Only when all the MDSs in the cluster are laggy will treat the
> cluster as not be available.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/mdsmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
> index 471bac335fae..8b4f93e5b468 100644
> --- a/fs/ceph/mdsmap.c
> +++ b/fs/ceph/mdsmap.c
> @@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
>  		return false;
>  	if (m->m_damaged)
>  		return false;
> -	if (m->m_num_laggy > 0)
> +	if (m->m_num_laggy == m->m_num_mds)
>  		return false;
>  	for (i = 0; i < m->m_num_mds; i++) {
>  		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)

Given that laggy servers are still expected to be "in" the cluster,
should we just eliminate this check altogether? It seems like we'd still
want to allow a mount to occur even if the cluster is lagging.
Xiubo Li Nov. 22, 2019, 6:56 a.m. UTC | #2
On 2019/11/22 1:30, Jeff Layton wrote:
> On Wed, 2019-11-20 at 03:29 -0500, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> In case the max_mds > 1 in MDS cluster and there is no any standby
>> MDS and all the max_mds MDSs are in up:active state, if one of the
>> up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
>> Then the mount will fail without considering other healthy MDSs.
>>
>> Only when all the MDSs in the cluster are laggy will treat the
>> cluster as not be available.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/mdsmap.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
>> index 471bac335fae..8b4f93e5b468 100644
>> --- a/fs/ceph/mdsmap.c
>> +++ b/fs/ceph/mdsmap.c
>> @@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
>>   		return false;
>>   	if (m->m_damaged)
>>   		return false;
>> -	if (m->m_num_laggy > 0)
>> +	if (m->m_num_laggy == m->m_num_mds)
>>   		return false;
>>   	for (i = 0; i < m->m_num_mds; i++) {
>>   		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)
> Given that laggy servers are still expected to be "in" the cluster,
> should we just eliminate this check altogether? It seems like we'd still
> want to allow a mount to occur even if the cluster is lagging.

For this we need one way to distinguish between mds crash and transient 
mds laggy, for now in both cases the mds will keep staying "in" the 
cluster and be in "up:active & laggy" state.
Jeff Layton Nov. 22, 2019, 1:55 p.m. UTC | #3
On Fri, 2019-11-22 at 14:56 +0800, Xiubo Li wrote:
> On 2019/11/22 1:30, Jeff Layton wrote:
> > On Wed, 2019-11-20 at 03:29 -0500, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > In case the max_mds > 1 in MDS cluster and there is no any standby
> > > MDS and all the max_mds MDSs are in up:active state, if one of the
> > > up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
> > > Then the mount will fail without considering other healthy MDSs.
> > > 
> > > Only when all the MDSs in the cluster are laggy will treat the
> > > cluster as not be available.
> > > 
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/mdsmap.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
> > > index 471bac335fae..8b4f93e5b468 100644
> > > --- a/fs/ceph/mdsmap.c
> > > +++ b/fs/ceph/mdsmap.c
> > > @@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
> > >   		return false;
> > >   	if (m->m_damaged)
> > >   		return false;
> > > -	if (m->m_num_laggy > 0)
> > > +	if (m->m_num_laggy == m->m_num_mds)
> > >   		return false;
> > >   	for (i = 0; i < m->m_num_mds; i++) {
> > >   		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)
> > Given that laggy servers are still expected to be "in" the cluster,
> > should we just eliminate this check altogether? It seems like we'd still
> > want to allow a mount to occur even if the cluster is lagging.
> 
> For this we need one way to distinguish between mds crash and transient 
> mds laggy, for now in both cases the mds will keep staying "in" the 
> cluster and be in "up:active & laggy" state.

I would doubt there's any way to do that reliably, and in any case
detection of that state will always involve some delay.

ceph_mdsmap_is_cluster_available() is only called when mounting though.
We wouldn't want to choose a laggy server over one that isn't, but I
don't think we want to fail to mount just because all of the servers
appear to be laggy. We should consider such servers to be potentially
available but not preferred.
Xiubo Li Nov. 24, 2019, 1:41 a.m. UTC | #4
On 2019/11/22 21:55, Jeff Layton wrote:
> On Fri, 2019-11-22 at 14:56 +0800, Xiubo Li wrote:
>> On 2019/11/22 1:30, Jeff Layton wrote:
>>> On Wed, 2019-11-20 at 03:29 -0500, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> In case the max_mds > 1 in MDS cluster and there is no any standby
>>>> MDS and all the max_mds MDSs are in up:active state, if one of the
>>>> up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
>>>> Then the mount will fail without considering other healthy MDSs.
>>>>
>>>> Only when all the MDSs in the cluster are laggy will treat the
>>>> cluster as not be available.
>>>>
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    fs/ceph/mdsmap.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
>>>> index 471bac335fae..8b4f93e5b468 100644
>>>> --- a/fs/ceph/mdsmap.c
>>>> +++ b/fs/ceph/mdsmap.c
>>>> @@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
>>>>    		return false;
>>>>    	if (m->m_damaged)
>>>>    		return false;
>>>> -	if (m->m_num_laggy > 0)
>>>> +	if (m->m_num_laggy == m->m_num_mds)
>>>>    		return false;
>>>>    	for (i = 0; i < m->m_num_mds; i++) {
>>>>    		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)
>>> Given that laggy servers are still expected to be "in" the cluster,
>>> should we just eliminate this check altogether? It seems like we'd still
>>> want to allow a mount to occur even if the cluster is lagging.
>> For this we need one way to distinguish between mds crash and transient
>> mds laggy, for now in both cases the mds will keep staying "in" the
>> cluster and be in "up:active & laggy" state.
> I would doubt there's any way to do that reliably, and in any case
> detection of that state will always involve some delay.

Yeah, checked it and It seems will be.

>
> ceph_mdsmap_is_cluster_available() is only called when mounting though.
> We wouldn't want to choose a laggy server over one that isn't, but I
> don't think we want to fail to mount just because all of the servers
> appear to be laggy. We should consider such servers to be potentially
> available but not preferred.

This makes sense.

BRs
diff mbox series

Patch

diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index 471bac335fae..8b4f93e5b468 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -396,7 +396,7 @@  bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
 		return false;
 	if (m->m_damaged)
 		return false;
-	if (m->m_num_laggy > 0)
+	if (m->m_num_laggy == m->m_num_mds)
 		return false;
 	for (i = 0; i < m->m_num_mds; i++) {
 		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)