diff mbox series

[v3] ceph: check availability of mds cluster on mount after wait timeout

Message ID 20191211012940.18128-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3] ceph: check availability of mds cluster on mount after wait timeout | expand

Commit Message

Xiubo Li Dec. 11, 2019, 1:29 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

If all the MDS daemons are down for some reasons and for the first
time to do the mount, it will fail with IO error after the mount
request timed out.

Or if the cluster becomes laggy suddenly, and just before the kclient
getting the new mdsmap and the mount request is fired off, it also
will fail with IO error.

This will add some useful hint message by checking the cluster state
before the fail the mount operation.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---

V3:
- Rebase to the new mount API version.

 fs/ceph/mds_client.c | 3 +--
 fs/ceph/super.c      | 5 +++++
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

Jeff Layton Dec. 11, 2019, 1:17 p.m. UTC | #1
On Tue, 2019-12-10 at 20:29 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> If all the MDS daemons are down for some reasons and for the first
> time to do the mount, it will fail with IO error after the mount
> request timed out.
> 
> Or if the cluster becomes laggy suddenly, and just before the kclient
> getting the new mdsmap and the mount request is fired off, it also
> will fail with IO error.
> 
> This will add some useful hint message by checking the cluster state
> before the fail the mount operation.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
> 
> V3:
> - Rebase to the new mount API version.
> 
>  fs/ceph/mds_client.c | 3 +--
>  fs/ceph/super.c      | 5 +++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 7d3ec051f179..bf507120659e 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc,
>  		if (!(mdsc->fsc->mount_options->flags &
>  		      CEPH_MOUNT_OPT_MOUNTWAIT) &&
>  		    !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) {
> -			err = -ENOENT;
> -			pr_info("probably no mds server is up\n");
> +			err = -EHOSTUNREACH;
>  			goto finish;
>  		}
>  	}
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 9c9a7c68eea3..6f33a265ccf1 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc)
>  	return 0;
>  
>  out_splat:
> +	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> +		pr_info("No mds server is up or the cluster is laggy\n");
> +		err = -EHOSTUNREACH;
> +	}
> +
>  	ceph_mdsc_close_sessions(fsc->mdsc);
>  	deactivate_locked_super(sb);
>  	goto out_final;

Looks reasonable. Merged into testing branch with a revamped changelog.
Please have a look at the testing branch and make sure the changelog is
OK with you.

Thanks,
Xiubo Li Dec. 12, 2019, 12:14 a.m. UTC | #2
On 2019/12/11 21:17, Jeff Layton wrote:
> On Tue, 2019-12-10 at 20:29 -0500, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> If all the MDS daemons are down for some reasons and for the first
>> time to do the mount, it will fail with IO error after the mount
>> request timed out.
>>
>> Or if the cluster becomes laggy suddenly, and just before the kclient
>> getting the new mdsmap and the mount request is fired off, it also
>> will fail with IO error.
>>
>> This will add some useful hint message by checking the cluster state
>> before the fail the mount operation.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>
>> V3:
>> - Rebase to the new mount API version.
>>
>>   fs/ceph/mds_client.c | 3 +--
>>   fs/ceph/super.c      | 5 +++++
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 7d3ec051f179..bf507120659e 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc,
>>   		if (!(mdsc->fsc->mount_options->flags &
>>   		      CEPH_MOUNT_OPT_MOUNTWAIT) &&
>>   		    !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) {
>> -			err = -ENOENT;
>> -			pr_info("probably no mds server is up\n");
>> +			err = -EHOSTUNREACH;
>>   			goto finish;
>>   		}
>>   	}
>> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
>> index 9c9a7c68eea3..6f33a265ccf1 100644
>> --- a/fs/ceph/super.c
>> +++ b/fs/ceph/super.c
>> @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc)
>>   	return 0;
>>   
>>   out_splat:
>> +	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
>> +		pr_info("No mds server is up or the cluster is laggy\n");
>> +		err = -EHOSTUNREACH;
>> +	}
>> +
>>   	ceph_mdsc_close_sessions(fsc->mdsc);
>>   	deactivate_locked_super(sb);
>>   	goto out_final;
> Looks reasonable. Merged into testing branch with a revamped changelog.
> Please have a look at the testing branch and make sure the changelog is
> OK with you.

Yeah, that looks good to me.

Thanks.


>
> Thanks,
diff mbox series

Patch

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 7d3ec051f179..bf507120659e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2576,8 +2576,7 @@  static void __do_request(struct ceph_mds_client *mdsc,
 		if (!(mdsc->fsc->mount_options->flags &
 		      CEPH_MOUNT_OPT_MOUNTWAIT) &&
 		    !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) {
-			err = -ENOENT;
-			pr_info("probably no mds server is up\n");
+			err = -EHOSTUNREACH;
 			goto finish;
 		}
 	}
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 9c9a7c68eea3..6f33a265ccf1 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1068,6 +1068,11 @@  static int ceph_get_tree(struct fs_context *fc)
 	return 0;
 
 out_splat:
+	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
+		pr_info("No mds server is up or the cluster is laggy\n");
+		err = -EHOSTUNREACH;
+	}
+
 	ceph_mdsc_close_sessions(fsc->mdsc);
 	deactivate_locked_super(sb);
 	goto out_final;