diff mbox series

libmultipath: unset detect_checker for clariion / Unity arrays

Message ID 1654641901-12324-1-git-send-email-bmarzins@redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: Mike Snitzer
Headers show
Series libmultipath: unset detect_checker for clariion / Unity arrays | expand

Commit Message

Benjamin Marzinski June 7, 2022, 10:45 p.m. UTC
Dell EMC would like to always use the emc_clariion checker. Currently
detect_checker will switch the checker to TUR for Unity arrays.
This can cause problems on some setups with replicated Unity LUNs,
which are handled correctly the the emc_checker, but not the TUR
checker.

Cc: vincent.chen1@dell.com
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/hwtable.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Martin Wilck June 8, 2022, 7:56 a.m. UTC | #1
On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> Dell EMC would like to always use the emc_clariion checker. Currently
> detect_checker will switch the checker to TUR for Unity arrays.
> This can cause problems on some setups with replicated Unity LUNs,
> which are handled correctly the the emc_checker, but not the TUR
> checker.
> 
> Cc: vincent.chen1@dell.com
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

This points us to a flaw in our logic.

It was wrong in the first place to have autodetection take precedence,
even over "overrides". The effect for users is that whenever
"path_checker" is set, "detect_checker no" must also be set, which
is highly surprising and adds no benefit.

We should assume that if a device has an explicit checker configured
either in the device configuration, overrides, or the hwtable, this
checker must be used, and fall back to autodetection only if this is
not the case.

So while this patch is alright, I'd prefer a patch that fixes the
logic.

(The same could be said for detect_prio, but I don't want to make
outrageous demands).

Martin





> ---
>  libmultipath/hwtable.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index 39daadc2..415bf683 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -350,6 +350,7 @@ static struct hwentry default_hw[] = {
>                 .no_path_retry = (300 / DEFAULT_CHECKINT),
>                 .checker_name  = EMC_CLARIION,
>                 .prio_name     = PRIO_EMC,
> +               .detect_checker = DETECT_CHECKER_OFF,
>         },
>         {
>                 /* Invista / VPLEX */

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Chen, Yanfei June 8, 2022, 8:27 a.m. UTC | #2
I agree with thinking,   it's better to check if device has a specific checker in its stanza firstly.

-----Original Message-----
From: Martin Wilck <martin.wilck@suse.com>
Sent: Wednesday, June 8, 2022 3:56 PM
To: bmarzins@redhat.com; christophe.varoqui@opensvc.com
Cc: dm-devel@redhat.com; Chen, Yanfei
Subject: Re: [PATCH] libmultipath: unset detect_checker for clariion / Unity arrays


[EXTERNAL EMAIL]

On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> Dell EMC would like to always use the emc_clariion checker. Currently
> detect_checker will switch the checker to TUR for Unity arrays.
> This can cause problems on some setups with replicated Unity LUNs,
> which are handled correctly the the emc_checker, but not the TUR
> checker.
>
> Cc: vincent.chen1@dell.com
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

This points us to a flaw in our logic.

It was wrong in the first place to have autodetection take precedence, even over "overrides". The effect for users is that whenever "path_checker" is set, "detect_checker no" must also be set, which is highly surprising and adds no benefit.

We should assume that if a device has an explicit checker configured either in the device configuration, overrides, or the hwtable, this checker must be used, and fall back to autodetection only if this is not the case.

So while this patch is alright, I'd prefer a patch that fixes the logic.

(The same could be said for detect_prio, but I don't want to make outrageous demands).

Martin





> ---
>  libmultipath/hwtable.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c index
> 39daadc2..415bf683 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -350,6 +350,7 @@ static struct hwentry default_hw[] = {
>                 .no_path_retry = (300 / DEFAULT_CHECKINT),
>                 .checker_name  = EMC_CLARIION,
>                 .prio_name     = PRIO_EMC,
> +               .detect_checker = DETECT_CHECKER_OFF,
>         },
>         {
>                 /* Invista / VPLEX */


Internal Use - Confidential

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski June 8, 2022, 2:40 p.m. UTC | #3
On Wed, Jun 08, 2022 at 07:56:27AM +0000, Martin Wilck wrote:
> On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> > Dell EMC would like to always use the emc_clariion checker. Currently
> > detect_checker will switch the checker to TUR for Unity arrays.
> > This can cause problems on some setups with replicated Unity LUNs,
> > which are handled correctly the the emc_checker, but not the TUR
> > checker.
> > 
> > Cc: vincent.chen1@dell.com
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> This points us to a flaw in our logic.
> 
> It was wrong in the first place to have autodetection take precedence,
> even over "overrides". The effect for users is that whenever
> "path_checker" is set, "detect_checker no" must also be set, which
> is highly surprising and adds no benefit.
> 
> We should assume that if a device has an explicit checker configured
> either in the device configuration, overrides, or the hwtable, this
> checker must be used, and fall back to autodetection only if this is
> not the case.
> 
> So while this patch is alright, I'd prefer a patch that fixes the
> logic.

I'm not sure that this is a good idea. IIRC, the current method was an
intentional choice.  The idea was that if a checker was autodetected, we
would use that. If not, we would fall back to the user defined checker.
For the most part this is useful for arrays that could be run in ALUA or
non-alua mode.  Changing the priority of detect_checker will suddenly
change how these arrays are configured.  Users that configured a
failback checker for cases where their arrays weren't in ALUA mode would
suddenly find that their arrays were always using the fallback method. 

Now I'm not saying that the original logic was the best option. But I am
saying that it hasn't caused many issues over the years that it's been
in existance. There are a number of arrays in the builtin hardware table
that define checkers. I assume that they either don't support ALUA, or
they are happy with only using their configured checker when their
arrays aren't in ALUA mode. I would rather fix the remaining cases where
the existing priority gives the wrong answer than suddenly change how
things work, in a way that could suddenly break things for an unknown
(but quite possibly large) number of existing users.

> (The same could be said for detect_prio, but I don't want to make
> outrageous demands).

The above arguments apply here, only moreso.
 
-Ben

> Martin
> 
> 
> 
> 
> 
> > ---
> >  libmultipath/hwtable.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> > index 39daadc2..415bf683 100644
> > --- a/libmultipath/hwtable.c
> > +++ b/libmultipath/hwtable.c
> > @@ -350,6 +350,7 @@ static struct hwentry default_hw[] = {
> >                 .no_path_retry = (300 / DEFAULT_CHECKINT),
> >                 .checker_name  = EMC_CLARIION,
> >                 .prio_name     = PRIO_EMC,
> > +               .detect_checker = DETECT_CHECKER_OFF,
> >         },
> >         {
> >                 /* Invista / VPLEX */
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Martin Wilck June 8, 2022, 4:47 p.m. UTC | #4
On Wed, 2022-06-08 at 09:40 -0500, Benjamin Marzinski wrote:
> On Wed, Jun 08, 2022 at 07:56:27AM +0000, Martin Wilck wrote:
> > On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> > > Dell EMC would like to always use the emc_clariion checker.
> > > Currently
> > > detect_checker will switch the checker to TUR for Unity arrays.
> > > This can cause problems on some setups with replicated Unity
> > > LUNs,
> > > which are handled correctly the the emc_checker, but not the TUR
> > > checker.
> > > 
> > > Cc: vincent.chen1@dell.com
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > 
> > This points us to a flaw in our logic.
> > 
> > It was wrong in the first place to have autodetection take
> > precedence,
> > even over "overrides". The effect for users is that whenever
> > "path_checker" is set, "detect_checker no" must also be set, which
> > is highly surprising and adds no benefit.
> > 
> > We should assume that if a device has an explicit checker
> > configured
> > either in the device configuration, overrides, or the hwtable, this
> > checker must be used, and fall back to autodetection only if this
> > is
> > not the case.
> > 
> > So while this patch is alright, I'd prefer a patch that fixes the
> > logic.
> 
> I'm not sure that this is a good idea. IIRC, the current method was
> an
> intentional choice.  The idea was that if a checker was autodetected,
> we
> would use that. If not, we would fall back to the user defined
> checker.
> For the most part this is useful for arrays that could be run in ALUA
> or
> non-alua mode.  Changing the priority of detect_checker will suddenly
> change how these arrays are configured.  Users that configured a
> failback checker for cases where their arrays weren't in ALUA mode
> would
> suddenly find that their arrays were always using the fallback
> method. 

> Now I'm not saying that the original logic was the best option. But I
> am
> saying that it hasn't caused many issues over the years that it's
> been
> in existance. There are a number of arrays in the builtin hardware
> table
> that define checkers. I assume that they either don't support ALUA,
> or
> they are happy with only using their configured checker when
> their<Thinking about it, 
> arrays aren't in ALUA mode.

Most of the built-in hwtable entries set the RDAC checker. For these,
the result will be unchanged if we lower the precedence of
detect_checker. There are two entries setting DIRECTIO for non-SCSI
devices (DASD and Huawei NVMe): no regression risk there. Then there's
Clariion CX/AX/VNX, for which the change in behavior is intended.
Finally, there are two entries for ancient HPE devices using HP_SW. If
we change the precedence, these devices might switch from TUR to HP_SW
(if they support ALUA, dunno).

>  I would rather fix the remaining cases where
> the existing priority gives the wrong answer than suddenly change how
> things work, in a way that could suddenly break things for an unknown
> (but quite possibly large) number of existing users.

According to the above, only people who'd use very old HPE storage
arrays with latest upstream multipath-tools would be affected.

Remain those users that you mentioned — people who operate in ALUA mode
but keep some explicit multipath.conf settings around as fallback "when
their arrays aren't in ALUA mode" (for whatever reason). These users
would now observe that their arrays operate in fallback mode, even if
ALUA was enabled.

I'm not sure if this matters much. OTOH, it's rather common for people
to forget to set "detect_checker no" and wonder why their
multipath.conf settings don't take effect.

> 
> > (The same could be said for detect_prio, but I don't want to make
> > outrageous demands).
> 
> The above arguments apply here, only moreso.

Besides the devices already mentioned for detect_checker, there's the
Hitachi Vantara AMS setting PRIO_HDS, and ONTAP using PRIO_ONTAP. The
big difference is that in detect_prio(), RDAC arrays are configured to
use ALUA rather than RDAC. So if we switch the precedence, all those
arrays would switch from ALUA to RDAC. I agree this would be surprising
and undesirable. Are these arrays still configurable to not support
ALUA? If no, we could just remove the RDAC entries. Same for ONTAP.

Anyway, _if_ we change the precedence rules, we should do it for both
detect_prio and detect_checker, otherwise our behavior might appear
more inconsistent than it's now.

For now, I'll ack your patch and let this discussion sink in a bit.
Comments from users welcome!

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Martin Wilck June 8, 2022, 4:48 p.m. UTC | #5
On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> Dell EMC would like to always use the emc_clariion checker. Currently
> detect_checker will switch the checker to TUR for Unity arrays.
> This can cause problems on some setups with replicated Unity LUNs,
> which are handled correctly the the emc_checker, but not the TUR
> checker.
> 
> Cc: vincent.chen1@dell.com
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

(while we're discussing possible alternatives)

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/hwtable.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index 39daadc2..415bf683 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -350,6 +350,7 @@ static struct hwentry default_hw[] = {
>                 .no_path_retry = (300 / DEFAULT_CHECKINT),
>                 .checker_name  = EMC_CLARIION,
>                 .prio_name     = PRIO_EMC,
> +               .detect_checker = DETECT_CHECKER_OFF,
>         },
>         {
>                 /* Invista / VPLEX */

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski June 9, 2022, 4:36 p.m. UTC | #6
On Wed, Jun 08, 2022 at 04:47:37PM +0000, Martin Wilck wrote:
> On Wed, 2022-06-08 at 09:40 -0500, Benjamin Marzinski wrote:
> > On Wed, Jun 08, 2022 at 07:56:27AM +0000, Martin Wilck wrote:
> > > On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> > > > Dell EMC would like to always use the emc_clariion checker.
> > > > Currently
> > > > detect_checker will switch the checker to TUR for Unity arrays.
> > > > This can cause problems on some setups with replicated Unity
> > > > LUNs,
> > > > which are handled correctly the the emc_checker, but not the TUR
> > > > checker.
> > > > 
> > > > Cc: vincent.chen1@dell.com
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > 
> > > This points us to a flaw in our logic.
> > > 
> > > It was wrong in the first place to have autodetection take
> > > precedence,
> > > even over "overrides". The effect for users is that whenever
> > > "path_checker" is set, "detect_checker no" must also be set, which
> > > is highly surprising and adds no benefit.
> > > 
> > > We should assume that if a device has an explicit checker
> > > configured
> > > either in the device configuration, overrides, or the hwtable, this
> > > checker must be used, and fall back to autodetection only if this
> > > is
> > > not the case.
> > > 
> > > So while this patch is alright, I'd prefer a patch that fixes the
> > > logic.
> > 
> > I'm not sure that this is a good idea. IIRC, the current method was
> > an
> > intentional choice.  The idea was that if a checker was autodetected,
> > we
> > would use that. If not, we would fall back to the user defined
> > checker.
> > For the most part this is useful for arrays that could be run in ALUA
> > or
> > non-alua mode.  Changing the priority of detect_checker will suddenly
> > change how these arrays are configured.  Users that configured a
> > failback checker for cases where their arrays weren't in ALUA mode
> > would
> > suddenly find that their arrays were always using the fallback
> > method. 
> 
> > Now I'm not saying that the original logic was the best option. But I
> > am
> > saying that it hasn't caused many issues over the years that it's
> > been
> > in existance. There are a number of arrays in the builtin hardware
> > table
> > that define checkers. I assume that they either don't support ALUA,
> > or
> > they are happy with only using their configured checker when
> > their<Thinking about it, 
> > arrays aren't in ALUA mode.
> 
> Most of the built-in hwtable entries set the RDAC checker. For these,
> the result will be unchanged if we lower the precedence of
> detect_checker. There are two entries setting DIRECTIO for non-SCSI
> devices (DASD and Huawei NVMe): no regression risk there. Then there's
> Clariion CX/AX/VNX, for which the change in behavior is intended.
> Finally, there are two entries for ancient HPE devices using HP_SW. If
> we change the precedence, these devices might switch from TUR to HP_SW
> (if they support ALUA, dunno).

As well as any user configurations that set this.  I realize that it has
been a while since the days when most of the default device configs
explicitly set a checker, but I don't have any idea how many users are
still running with a user config like that.
 
> >  I would rather fix the remaining cases where
> > the existing priority gives the wrong answer than suddenly change how
> > things work, in a way that could suddenly break things for an unknown
> > (but quite possibly large) number of existing users.
> 
> According to the above, only people who'd use very old HPE storage
> arrays with latest upstream multipath-tools would be affected.
> 
> Remain those users that you mentioned — people who operate in ALUA mode
> but keep some explicit multipath.conf settings around as fallback "when
> their arrays aren't in ALUA mode" (for whatever reason). These users
> would now observe that their arrays operate in fallback mode, even if
> ALUA was enabled.
> 
> I'm not sure if this matters much. OTOH, it's rather common for people
> to forget to set "detect_checker no" and wonder why their
> multipath.conf settings don't take effect.

But like my patch showed, it's a simple fix. If users wanted to define a
fallback method for arrays that optionally support ALUA (and I admit
that I have no idea how many arrays like this are still being used) it's
a lot tricker if the devices section path_checker takes precedence. They
can't set the fallback checker there. They would have to set it in the
defaults section. This is problematic, since you are changing the
checker on arrays you may not have intended.

> > 
> > > (The same could be said for detect_prio, but I don't want to make
> > > outrageous demands).
> > 
> > The above arguments apply here, only moreso.
> 
> Besides the devices already mentioned for detect_checker, there's the
> Hitachi Vantara AMS setting PRIO_HDS, and ONTAP using PRIO_ONTAP. The
> big difference is that in detect_prio(), RDAC arrays are configured to
> use ALUA rather than RDAC. So if we switch the precedence, all those
> arrays would switch from ALUA to RDAC. I agree this would be surprising
> and undesirable. Are these arrays still configurable to not support
> ALUA? If no, we could just remove the RDAC entries. Same for ONTAP.
> 
> Anyway, _if_ we change the precedence rules, we should do it for both
> detect_prio and detect_checker, otherwise our behavior might appear
> more inconsistent than it's now.

Agreed. the hardware_handler also works in a similar way (with retaining
the kernel detected one by default, and using the device's section
hardware_handler option as a backup). I don't think we want to change
that one either.

-Ben

> For now, I'll ack your patch and let this discussion sink in a bit.
> Comments from users welcome!
> 
> Regards
> Martin
> 
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Xose Vazquez Perez July 15, 2022, 9:08 p.m. UTC | #7
This patch is still pending.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Martin Wilck July 15, 2022, 10:10 p.m. UTC | #8
On Fri, 2022-07-15 at 23:08 +0200, Xose Vazquez Perez wrote:
> This patch is still pending.

Thanks for the reminder, pushed.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 39daadc2..415bf683 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -350,6 +350,7 @@  static struct hwentry default_hw[] = {
 		.no_path_retry = (300 / DEFAULT_CHECKINT),
 		.checker_name  = EMC_CLARIION,
 		.prio_name     = PRIO_EMC,
+		.detect_checker = DETECT_CHECKER_OFF,
 	},
 	{
 		/* Invista / VPLEX */