diff mbox series

[v4,4/5] ceph: record updated mon_addr on remount

Message ID 20210714100554.85978-5-vshankar@redhat.com (mailing list archive)
State New, archived
Headers show
Series ceph: new mount device syntax | expand

Commit Message

Venky Shankar July 14, 2021, 10:05 a.m. UTC
Note that the new monitors are just shown in /proc/mounts.
Ceph does not (re)connect to new monitors yet.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
---
 fs/ceph/super.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Jeff Layton July 14, 2021, 4:17 p.m. UTC | #1
On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> Note that the new monitors are just shown in /proc/mounts.
> Ceph does not (re)connect to new monitors yet.
> 
> Signed-off-by: Venky Shankar <vshankar@redhat.com>
> ---
>  fs/ceph/super.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index d8c6168b7fcd..d3a5a3729c5b 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
>  	else
>  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
>  
> +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> +		kfree(fsc->mount_options->mon_addr);
> +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> +		fsopt->mon_addr = NULL;
> +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");

It's currently more in-vogue to use pr_notice() for this. I'll plan to
make that (minor) change before I merge. No need to resend.

> +	}
> +
>  	sync_filesystem(fc->root->d_sb);
>  	return 0;
>  }
Luís Henriques July 14, 2021, 4:35 p.m. UTC | #2
On Wed, Jul 14, 2021 at 12:17:33PM -0400, Jeff Layton wrote:
> On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > Note that the new monitors are just shown in /proc/mounts.
> > Ceph does not (re)connect to new monitors yet.
> > 
> > Signed-off-by: Venky Shankar <vshankar@redhat.com>
> > ---
> >  fs/ceph/super.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index d8c6168b7fcd..d3a5a3729c5b 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> >  	else
> >  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> >  
> > +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > +		kfree(fsc->mount_options->mon_addr);
> > +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> > +		fsopt->mon_addr = NULL;
> > +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> 
> It's currently more in-vogue to use pr_notice() for this. I'll plan to
> make that (minor) change before I merge. No need to resend.

Yeah, this was the only comment I had too.  I saw some issues in the
previous revision but the changes to ceph_parse_source() seem to fix it in
this revision.

The other annoying thing I found isn't related with this patchset but with
a change that's been done some time ago by Xiubo (added to CC): it looks
like that if we have an invalid parameter (for example, wrong secret)
we'll always get -EHOSTUNREACH.

See below a possible fix (although I'm not entirely sure that's the correct
one).

Cheers,
--
Luís

From a988d24d8e72fc4933459f3dd5d303cbc9a566ed Mon Sep 17 00:00:00 2001
From: Luis Henriques <lhenriques@suse.de>
Date: Wed, 14 Jul 2021 16:56:36 +0100
Subject: [PATCH] ceph: don't hide error code if we don't have mdsmap

Since commit 97820058fb28 ("ceph: check availability of mds cluster on mount
after wait timeout") we're returning -EHOSTUNREACH, even if the error isn't
related with the MDSs availability.  For example, we'll get it even if we're
trying to mounting a filesystem with an invalid username or secret.

Only return this error if we get -EIO.

Fixes: 97820058fb28 ("ceph: check availability of mds cluster on mount after wait timeout")
Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
 fs/ceph/super.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 086a1ceec9d8..67d70059ce9f 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1230,7 +1230,8 @@ static int ceph_get_tree(struct fs_context *fc)
 	return 0;
 
 out_splat:
-	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
+	if ((err == -EIO) &&
+	    !ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
 		pr_info("No mds server is up or the cluster is laggy\n");
 		err = -EHOSTUNREACH;
 	}
Jeff Layton July 14, 2021, 6:05 p.m. UTC | #3
On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> Note that the new monitors are just shown in /proc/mounts.
> Ceph does not (re)connect to new monitors yet.
> 
> Signed-off-by: Venky Shankar <vshankar@redhat.com>
> ---
>  fs/ceph/super.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index d8c6168b7fcd..d3a5a3729c5b 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
>  	else
>  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
>  
> +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {

One other problem with this patch...

The above needs to use strcmp_null(). I'll plan to make that change too.

> +		kfree(fsc->mount_options->mon_addr);
> +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> +		fsopt->mon_addr = NULL;
> +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> +	}
> +
>  	sync_filesystem(fc->root->d_sb);
>  	return 0;
>  }
Jeff Layton July 14, 2021, 6:15 p.m. UTC | #4
On Wed, 2021-07-14 at 17:35 +0100, Luis Henriques wrote:
> On Wed, Jul 14, 2021 at 12:17:33PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > > Note that the new monitors are just shown in /proc/mounts.
> > > Ceph does not (re)connect to new monitors yet.
> > > 
> > > Signed-off-by: Venky Shankar <vshankar@redhat.com>
> > > ---
> > >  fs/ceph/super.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > > index d8c6168b7fcd..d3a5a3729c5b 100644
> > > --- a/fs/ceph/super.c
> > > +++ b/fs/ceph/super.c
> > > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> > >  	else
> > >  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> > >  
> > > +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > > +		kfree(fsc->mount_options->mon_addr);
> > > +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> > > +		fsopt->mon_addr = NULL;
> > > +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> > 
> > It's currently more in-vogue to use pr_notice() for this. I'll plan to
> > make that (minor) change before I merge. No need to resend.
> 
> Yeah, this was the only comment I had too.  I saw some issues in the
> previous revision but the changes to ceph_parse_source() seem to fix it in
> this revision.
> 
> The other annoying thing I found isn't related with this patchset but with
> a change that's been done some time ago by Xiubo (added to CC): it looks
> like that if we have an invalid parameter (for example, wrong secret)
> we'll always get -EHOSTUNREACH.
> 
> See below a possible fix (although I'm not entirely sure that's the correct
> one).
> 
> Cheers,
> --
> Luís
> 
> From a988d24d8e72fc4933459f3dd5d303cbc9a566ed Mon Sep 17 00:00:00 2001
> From: Luis Henriques <lhenriques@suse.de>
> Date: Wed, 14 Jul 2021 16:56:36 +0100
> Subject: [PATCH] ceph: don't hide error code if we don't have mdsmap
> 
> Since commit 97820058fb28 ("ceph: check availability of mds cluster on mount
> after wait timeout") we're returning -EHOSTUNREACH, even if the error isn't
> related with the MDSs availability.  For example, we'll get it even if we're
> trying to mounting a filesystem with an invalid username or secret.
> 
> Only return this error if we get -EIO.
> 
> Fixes: 97820058fb28 ("ceph: check availability of mds cluster on mount after wait timeout")
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
>  fs/ceph/super.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 086a1ceec9d8..67d70059ce9f 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1230,7 +1230,8 @@ static int ceph_get_tree(struct fs_context *fc)
>  	return 0;
>  
>  out_splat:
> -	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> +	if ((err == -EIO) &&
> +	    !ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
>  		pr_info("No mds server is up or the cluster is laggy\n");
>  		err = -EHOSTUNREACH;
>  	}

Yeah, I've noticed that message pop up under all sorts of circumstances
and it is an annoyance. I'm happy to consider such a patch if you send
it separately.

That said, I'm honestly not sure this message is really helpful, and
overriding errors like this at a high level seems sort of sketchy. Maybe
we should just drop that message, or figure out a way to limit it to
_just_ that situation.

--
Jeff Layton <jlayton@redhat.com>
Venky Shankar July 15, 2021, 5:52 a.m. UTC | #5
On Wed, Jul 14, 2021 at 9:47 PM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > Note that the new monitors are just shown in /proc/mounts.
> > Ceph does not (re)connect to new monitors yet.
> >
> > Signed-off-by: Venky Shankar <vshankar@redhat.com>
> > ---
> >  fs/ceph/super.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index d8c6168b7fcd..d3a5a3729c5b 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> >       else
> >               ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> >
> > +     if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > +             kfree(fsc->mount_options->mon_addr);
> > +             fsc->mount_options->mon_addr = fsopt->mon_addr;
> > +             fsopt->mon_addr = NULL;
> > +             printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
>
> It's currently more in-vogue to use pr_notice() for this. I'll plan to
> make that (minor) change before I merge. No need to resend.

Got it. ACK.

>
> > +     }
> > +
> >       sync_filesystem(fc->root->d_sb);
> >       return 0;
> >  }
>
> --
> Jeff Layton <jlayton@redhat.com>
>
Luís Henriques July 15, 2021, 11:42 a.m. UTC | #6
On Wed, Jul 14, 2021 at 02:15:50PM -0400, Jeff Layton wrote:
> On Wed, 2021-07-14 at 17:35 +0100, Luis Henriques wrote:
> > On Wed, Jul 14, 2021 at 12:17:33PM -0400, Jeff Layton wrote:
> > > On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > > > Note that the new monitors are just shown in /proc/mounts.
> > > > Ceph does not (re)connect to new monitors yet.
> > > > 
> > > > Signed-off-by: Venky Shankar <vshankar@redhat.com>
> > > > ---
> > > >  fs/ceph/super.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > > > index d8c6168b7fcd..d3a5a3729c5b 100644
> > > > --- a/fs/ceph/super.c
> > > > +++ b/fs/ceph/super.c
> > > > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> > > >  	else
> > > >  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> > > >  
> > > > +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > > > +		kfree(fsc->mount_options->mon_addr);
> > > > +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> > > > +		fsopt->mon_addr = NULL;
> > > > +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> > > 
> > > It's currently more in-vogue to use pr_notice() for this. I'll plan to
> > > make that (minor) change before I merge. No need to resend.
> > 
> > Yeah, this was the only comment I had too.  I saw some issues in the
> > previous revision but the changes to ceph_parse_source() seem to fix it in
> > this revision.
> > 
> > The other annoying thing I found isn't related with this patchset but with
> > a change that's been done some time ago by Xiubo (added to CC): it looks
> > like that if we have an invalid parameter (for example, wrong secret)
> > we'll always get -EHOSTUNREACH.
> > 
> > See below a possible fix (although I'm not entirely sure that's the correct
> > one).
> > 
> > Cheers,
> > --
> > Luís
> > 
> > From a988d24d8e72fc4933459f3dd5d303cbc9a566ed Mon Sep 17 00:00:00 2001
> > From: Luis Henriques <lhenriques@suse.de>
> > Date: Wed, 14 Jul 2021 16:56:36 +0100
> > Subject: [PATCH] ceph: don't hide error code if we don't have mdsmap
> > 
> > Since commit 97820058fb28 ("ceph: check availability of mds cluster on mount
> > after wait timeout") we're returning -EHOSTUNREACH, even if the error isn't
> > related with the MDSs availability.  For example, we'll get it even if we're
> > trying to mounting a filesystem with an invalid username or secret.
> > 
> > Only return this error if we get -EIO.
> > 
> > Fixes: 97820058fb28 ("ceph: check availability of mds cluster on mount after wait timeout")
> > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > ---
> >  fs/ceph/super.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index 086a1ceec9d8..67d70059ce9f 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -1230,7 +1230,8 @@ static int ceph_get_tree(struct fs_context *fc)
> >  	return 0;
> >  
> >  out_splat:
> > -	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> > +	if ((err == -EIO) &&
> > +	    !ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> >  		pr_info("No mds server is up or the cluster is laggy\n");
> >  		err = -EHOSTUNREACH;
> >  	}
> 
> Yeah, I've noticed that message pop up under all sorts of circumstances
> and it is an annoyance. I'm happy to consider such a patch if you send
> it separately.
> 
> That said, I'm honestly not sure this message is really helpful, and
> overriding errors like this at a high level seems sort of sketchy. Maybe
> we should just drop that message, or figure out a way to limit it to
> _just_ that situation.

I agree that the message isn't really useful but the -EHOSTUNREACH is a
bit more confusing if we get it in the mount command when we simply have a
typo in the parameters.

Anyway, after looking closer, I couldn't find a way to reach this code
and have -EHOSTUNREACH to make sense.  When the MDSs are down/unreachable
we have already err set to this error code.  This would mean that the
correct fix would be to simply drop this 'if' statement.  Xiubo, do you
remember how would this be possible?

Cheers,
--
Luís
diff mbox series

Patch

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index d8c6168b7fcd..d3a5a3729c5b 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1268,6 +1268,13 @@  static int ceph_reconfigure_fc(struct fs_context *fc)
 	else
 		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
 
+	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
+		kfree(fsc->mount_options->mon_addr);
+		fsc->mount_options->mon_addr = fsopt->mon_addr;
+		fsopt->mon_addr = NULL;
+		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
+	}
+
 	sync_filesystem(fc->root->d_sb);
 	return 0;
 }