diff mbox

libmultipath: update INFINIDAT builtin config

Message ID 392CDE41-4C36-41B1-8811-B7EC93F0864C@infinidat.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Arnon Yaari Aug. 22, 2017, 1:37 p.m. UTC
Based on the manufacturer documentation:
https://support.infinidat.com/hc/en-us/articles/202319222

Signed-off-by: Arnon Yaari <arnony@infinidat.com>
---
 libmultipath/hwtable.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Xose Vazquez Perez Aug. 25, 2017, 12:26 a.m. UTC | #1
On 08/22/2017 03:37 PM, Arnon Yaari wrote:

> Based on the manufacturer documentation:
> https://support.infinidat.com/hc/en-us/articles/202319222
> 
> Signed-off-by: Arnon Yaari <arnony@infinidat.com>

NACK.

> ---
>  libmultipath/hwtable.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index 9e14ec1e..1ea48d58 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -969,8 +969,18 @@ static struct hwentry default_hw[] = {
>  		.vendor        = "NFINIDAT",
>  		.product       = "InfiniBox",
>  		.pgpolicy      = GROUP_BY_PRIO,

> -		.pgfailback    = -FAILBACK_IMMEDIATE,
> +		.pgfailback    = 30,

Why is it needed?

>  		.prio_name     = PRIO_ALUA,
> +		.checker_name  = TUR,

Default value.

> +		.selector      = "round-robin 0",

round-robin is the dumbest, queue-length is smarter and
service-time is the smartest and default selector.

> +		.features      = "0",

Default value.

> +		.rr_weight     = RR_WEIGHT_PRIO,

Useless with service-time

> +		.no_path_retry = NO_PATH_RETRY_FAIL,

Default value.

> +		.minio         = 1,

Useless with service-time

> +		.minio_rq      = 1,

Useless with service-time

> +		.flush_on_last_del = FLUSH_ENABLED,

Why is it needed?

> +		.fast_io_fail  = 15,

Why is it needed?

> +		.dev_loss      = 15,

Why is it needed?

>  	},
>  	/*
>  	 * Nimble Storage
> 
There is minimal info on how to send patches at the header of hwtable.c
$ grep -A64 "Tuning suggestions" libmultipath/hwtable.c | less -S


Thank you.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Xose Vazquez Perez Aug. 28, 2017, 11:02 p.m. UTC | #2
On 08/28/2017 03:59 PM, Arnon Yaari wrote:

>> On 25 Aug 2017, at 3:26, Xose Vazquez Perez <xose.vazquez@gmail.com> wrote:

>> On 08/22/2017 03:37 PM, Arnon Yaari wrote:

>>> -		.pgfailback    = -FAILBACK_IMMEDIATE,
>>> +		.pgfailback    = 30,

>> Why is it needed?

> All the values suggested here are the values that the company’s interoperability
> team found work best for the product on Linux. The timeout values are there
> because in some upgrade scenarios we don’t want to fail devices until the
> upgrade finishes.

Were they tested with a *current* kernel and multipath-tools?

> I don’t think there is a reason to add in-line comments in the code to
> explain the values (none of the other vendors are doing that).

It should be documented in the body of the git patch(changelog).

Anything other than default values must be documented.
This is a standard request: https://marc.info/?t=149850399700003

>>> +		.selector      = "round-robin 0",
>>
>> round-robin is the dumbest, queue-length is smarter and
>> service-time is the smartest and default selector.
> 
> Regardless of which algorithm is “dumbest” or “smartest”, round-robin
> is a legitimate path selection algorithm that the vendor decided works
> best with its product. The vendor knows its product best :)
> The product’s customers set the values suggested here (manually or
> using tools provided by the vendor), whether they are upstream or not.

In an *ideal environment* , "round-robin" "queue-length" and "service-time"
work the same way, doing a round robin distribution.

But when things get ugly, "round-robin" could lost data.

"service-time" works much better in any environment, even asymmetrical.
In highly congested links, with different link speeds, faulty paths, ...

path_selector is like TCP congestion control algorithm.

>>> +		.no_path_retry = NO_PATH_RETRY_FAIL,
>>
>> Default value.
> 
> Isn’t the default NO_PATH_RETRY_UNDEF?


Practically it's the same: https://marc.info/?l=dm-devel&m=147854542623733


Thank you.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Arnon Yaari Aug. 29, 2017, 11:26 a.m. UTC | #3
> On 29 Aug 2017, at 2:02, Xose Vazquez Perez <xose.vazquez@gmail.com> wrote:
> 
> On 08/28/2017 03:59 PM, Arnon Yaari wrote:
> 
>>> On 25 Aug 2017, at 3:26, Xose Vazquez Perez <xose.vazquez@gmail.com> wrote:
> 
>>> On 08/22/2017 03:37 PM, Arnon Yaari wrote:
> 
>>>> -		.pgfailback    = -FAILBACK_IMMEDIATE,
>>>> +		.pgfailback    = 30,
> 
>>> Why is it needed?
> 
>> All the values suggested here are the values that the company’s interoperability
>> team found work best for the product on Linux. The timeout values are there
>> because in some upgrade scenarios we don’t want to fail devices until the
>> upgrade finishes.
> 
> Were they tested with a *current* kernel and multipath-tools?

Of course. We do extensive interoperability testing on RHEL, CentOS, Ubuntu, SLES, on old and all new versions.
The way our "hot upgrade" process works can cause device loss if these values are not in place.

> 
>> I don’t think there is a reason to add in-line comments in the code to
>> explain the values (none of the other vendors are doing that).
> 
> It should be documented in the body of the git patch(changelog).
> 
> Anything other than default values must be documented.
> This is a standard request: https://marc.info/?t=149850399700003 <https://marc.info/?t=149850399700003>

All vendor config commits I see just link to the vendor support docs, which makes sense because the vendor config is the authority on what needs to be set, and the docs themselves are the explanation (perhaps we need to add the reasons to the docs...).
I will change the patch message anyway.

> 
>>>> +		.selector      = "round-robin 0",
>>> 
>>> round-robin is the dumbest, queue-length is smarter and
>>> service-time is the smartest and default selector.
>> 
>> Regardless of which algorithm is “dumbest” or “smartest”, round-robin
>> is a legitimate path selection algorithm that the vendor decided works
>> best with its product. The vendor knows its product best :)
>> The product’s customers set the values suggested here (manually or
>> using tools provided by the vendor), whether they are upstream or not.
> 
> In an *ideal environment* , "round-robin" "queue-length" and "service-time"
> work the same way, doing a round robin distribution.
> 
> But when things get ugly, "round-robin" could lost data.
> 
> "service-time" works much better in any environment, even asymmetrical.
> In highly congested links, with different link speeds, faulty paths, ...
> 
> path_selector is like TCP congestion control algorithm.

We may decide to change this behavior in the future. I will pass your input to our product team. I don’t expect it’s something we’ll change in a short time frame without more discussion and testing.
We don’t see how round-robin can lose data. Can you please elaborate?
In any case, this is our current configuration that is set in customer environments. For better or worse, not accepting this doesn’t change anything other than inconveniencing users.

> 
>>>> +		.no_path_retry = NO_PATH_RETRY_FAIL,
>>> 
>>> Default value.
>> 
>> Isn’t the default NO_PATH_RETRY_UNDEF?
> 
> 
> Practically it's the same: https://marc.info/?l=dm-devel&m=147854542623733 <https://marc.info/?l=dm-devel&m=147854542623733>

I’m not convinced it’s the same. The message you linked says it takes what is in the “features” line, which can be changed.
Even if it’s implicitly the same, it’s not really the default. We prefer to specify this explicitly.

> 
> 
> Thank you.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 9e14ec1e..1ea48d58 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -969,8 +969,18 @@  static struct hwentry default_hw[] = {
 		.vendor        = "NFINIDAT",
 		.product       = "InfiniBox",
 		.pgpolicy      = GROUP_BY_PRIO,
-		.pgfailback    = -FAILBACK_IMMEDIATE,
+		.pgfailback    = 30,
 		.prio_name     = PRIO_ALUA,
+		.checker_name  = TUR,
+		.selector      = "round-robin 0",
+		.features      = "0",
+		.rr_weight     = RR_WEIGHT_PRIO,
+		.no_path_retry = NO_PATH_RETRY_FAIL,
+		.minio         = 1,
+		.minio_rq      = 1,
+		.flush_on_last_del = FLUSH_ENABLED,
+		.fast_io_fail  = 15,
+		.dev_loss      = 15,
 	},
 	/*
 	 * Nimble Storage