Message ID | 9bb062bf-8a08-f659-a540-f68ddf0ef5da@huawei.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
Hello Benjamin and Christophe, How about this patch? Thanks a lot. Best. On 2017/5/19 17:45, Yang Feng wrote: > Hi Benjamin, > > Thank you very much for your comments. > Please find my replys and the up-to-date patch. > Best regards! > >> >> First, one overall question. We have dynamic path selectors available to >> deal with paths that are just simply slower that other paths, but can >> still be used together. Is there specific hardware or a specific setup >> where this isn't good enough and we really need to seperate these paths >> into different pathgroups, but we can't find out deterministically how >> the groups should be set up? It just seems like there could be a less >> hacky solution to this problem, but perhaps there are some situations >> where this is truly the best option. I'm just wondering what those are.1. In the Storage-Backup environment of HyperCluster,includs one storage array near > to the host and one remote storage array, and the two storage arrays have the same hardware. > The same LUN is writed or readed by the two storage arrays. > However, usually, the average latency of the paths of the remote storage array is much higher than the > near storage array's. > apparently, the prioritizer can be a good automatic solution. > And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. > 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, > IOs will not send to this paths. > But the current selectors don't solve this problem, IOPS will be influenced unavoidably. > >>> + >>> +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ >>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = { >>> + CHAR_USEC, CHAR_MSEC, CHAR_SEC >> >> This is a nit, but for constant strings, could you please use "char >> *var" instead of "char var[]", to be consistent with the rest of the >> multipath code. > Thanks, as the following patch. > >>> + if ((args == NULL) || (interval == NULL) >>> + || (consnum == NULL) || (type == NULL)) >>> + return 0; >>> + >>> + /* int type */ >>> + if ((size < 1) || (size > MAX_CHAR_SIZE-1)) >>> + return 0; >> >> You should probably have log messages for these error returns. > Thanks, as the following patch. > >>> + >>> + memcpy(source, args, size+1); >>> + if (strstr(source, vertica) == NULL) >>> + return 0; >>> + >>> + *type = get_interval_type(source, typestr); >>> + if (*type == INTERVAL_INVALID) >>> + { >>> + condlog(0, "delay_interval type is invalid"); >>> + return 0; >>> + } >> >> I'm confused here. How do you get to use the default interval. Shouldn't >> you accept "20s|" and "|30" and as valid inputs that use the defaults >> for the part they don't specify. > OK,the default arguments value is removed. If get inputs failed, return default priority "0". > As the following patch. >> >>> + tokenbefore = strtok(source, vertica); >>> + tokenafter = strtok(NULL, vertica); >>> + typestr[1] = '\0'; >>> + tokenbefore = strtok(tokenbefore, typestr); >>> + if ((tokenbefore == NULL) || (tokenafter == NULL)) >>> + return 0; >>> + >>> + tmp = tokenbefore; >>> + while (*tmp != '\0') >>> + if (!isdigit(*tmp++)) >>> + { >>> + condlog(0, "delay_interval string include invalid char"); >>> + return 0; >>> + } >>> + >>> + tmp = tokenafter; >>> + while (*tmp != '\0') >>> + if (!isdigit(*tmp++)) >>> + { >>> + condlog(0, "cons_num string include invalid char"); >>> + return 0; >>> + } >>> + >>> + *interval = atoi(tokenbefore); >> >> Why do you keep track of the type and the interval seperately? Can't you >> just find out the type, and use that to multiply the interval once you >> read it, and then just use that value, instead of keeping track of two >> values across multiple functions? > Thanks, as the following patch. > >>> + >>> + if (pp->fd < 0) >>> + return -PRIO_NO_INFORMATION; >>> + >>> + if (get_delay_pref_arg(args, &delay_interval, &cons_num, &type) == 0) >>> + { >>> + condlog(3, "%s: get delay arg fail", pp->dev); >> >> Why use the word "fail" in this message? Not setting prio_args to get >> the defaults seems like a perfectly valid choice. > The defaults are not used. Insteadly, return default priority "0". See below. > >>> diff --git a/libmultipath/prioritizers/delayedpath.h b/libmultipath/prioritizers/delayedpath.h >>> new file mode 100644 >>> index 0000000..d8213e9 >>> --- /dev/null >>> +++ b/libmultipath/prioritizers/delayedpath.h >>> @@ -0,0 +1,17 @@ >>> +#ifndef _DELAYEDPATH_H >>> +#define _DELAYEDPATH_H >>> + >>> +#define PRIO_DELAYED_PATH "delayedpath" >> >> In order for the rest of the code to refer to this prioritizer, this >> define should be in prio.h with the other prioritizer names, and as long >> as delayedpath.c includes prio.h, there's no need to put it in >> delayedpath.h. > OK, as the following patch. >> >>> + >>> +#define PRIO_NO_INFORMATION 5 >> >> The rest of the multipath code only cares if getprio returns a negative >> number of not. It doesn't check what the specific negative number is. I >> realize the the alua prioritizer returns a set of error codes, but they >> aren't used, or even usable in their present form. If we wanted to have >> better error reporting, we should set up a universal set of error codes >> in prio.h, and have all prioritizers use them, instead of having each >> prioritizer define its own error codes. There's no reason why your >> prioritizer needs to return this error code instead of -1. > OK, as the following patch. >> >>> + >>> +#define USEC_PER_SEC 1000000LL >>> +#define USEC_PER_MSEC 1000LL >>> +#define USEC_PER_USEC 1LL >>> + >>> +static inline long long timeval_to_us(const struct timespec *tv) >>> +{ >>> + return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); >>> +} >> >> No other file besides delayedpath.c will likely be including this .h >> file, so I don't see any purpose for these being defined here. In fact, >> I don't see why you can't just have a .c file without a .h file like the >> majority of prioritizers. I'm pretty sure that none of the prioritizers >> really need their own .h file. > OK, as the following patch. >> >>> +#endif >>> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 >>> index 5939688..f1e126e 100644 >>> --- a/multipath/multipath.conf.5 >>> +++ b/multipath/multipath.conf.5 >>> @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. >>> Generate the path priority based on the regular expression and the >>> priority provided as argument. Requires prio_args keyword. >>> .TP >>> +.I delayedpath >>> +Generate the path priority based on a time-delay algorithm. >>> +Requires prio_args keyword. >> >> Really it doesn't require prio_args if you want to use the default >> values, and should probably say so. > The default args is discarded, as the following patch. > >>> +.I delayed >>> +Needs a value of the form >>> +\fI"<delay_interval|cons_num>"\fR >>> +.RS >>> +.TP 8 >>> +.I delay_interval >>> +The interval values of average IO-time-delay between two different neighbour ranks of path priority, used to partition different priority ranks. >>> +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or Millisecond. Valid Values: Integer, s [1, 60], ms [1, 1000), us [1, 1000), >>> +For example: 10s, or 100us, or 100ms. The default is: 10ms. >>> +.TP >>> +.I cons_num >>> +The number of read IOs sent to the current path continuously, used to calculate the average IO-time-delay. Valid Values: Integer, [3, 1000]. >>> +For example: 30. The default is: 20. >>> +.RE >>> +.TP 12 >> >> Looking at the "weighted" prio_args definition just above your "delayed" >> definition, the pipe character "|" is being used to say that any of a >> set of options is allowed. Your definition has it being a literal >> character, but it's still inside the angle brackets that usually >> delineate a variable. perhaps "<delay_interval>|<io_num>" would be >> easier to understand, or even "[delayed_interval]|[io_num]" if you can >> omit these to use the defaults. > OK, as the following patch. > The up-to-date patch as follows: > > --- >>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 > From: Yang Feng <philip.yang@huawei.com> > Date: Fri, 19 May 2017 16:09:07 +0800 > Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority > values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following > arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: > 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. > 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. > > latency_interval latency_interval latency_interval latency_interval > |------------------|------------------|------------------|...|------------------| > | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | > |------------------|------------------|------------------|...|------------------| > Priority Rank Partitioning > --- > libmultipath/prioritizers/Makefile | 6 +- > libmultipath/prioritizers/path_latency.c | 271 +++++++++++++++++++++++++++++++ > multipath/multipath.conf.5 | 18 ++ > libmultipath/prio.h | 1 + > 4 files changed, 295 insertions(+), 1 deletion(-) > create mode 100644 libmultipath/prioritizers/path_latency.c > > diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile > index 36b42e4..d2f20f6 100644 > --- a/libmultipath/prioritizers/Makefile > +++ b/libmultipath/prioritizers/Makefile > @@ -18,13 +18,17 @@ LIBS = \ > libpriorandom.so \ > libpriordac.so \ > libprioweightedpath.so \ > - libpriosysfs.so > + libpriopath_latency.so \ > + libpriosysfs.so > > all: $(LIBS) > > libprioalua.so: alua.o alua_rtpg.o > $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ > > +libpriopath_latency.so: path_latency.o ../checkers/libsg.o > + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm > + > libprio%.so: %.o > $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ > > diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c > new file mode 100644 > index 0000000..a666b6c > --- /dev/null > +++ b/libmultipath/prioritizers/path_latency.c > @@ -0,0 +1,271 @@ > +/* > + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. > + * > + * main.c > + * > + * Prioritizer for device mapper multipath, where the corresponding priority > + * values of specific paths are provided by a latency algorithm. And the > + * latency algorithm is dependent on arguments. > + * > + * The principle of the algorithm as follows: > + * 1. By sending a certain number "io_num" of read IOs to the current path > + * continuously, the IOs' average latency can be calculated. > + * 2. According to the average latency of each path and the weight value > + * "latency_interval", the priority "rc" of each path can be provided. > + * > + * Author(s): Yang Feng <philip.yang@huawei.com> > + * Zou Ming <zouming.zouming@huawei.com> > + * > + * This file is released under the GPL. > + */ > +#include <stdio.h> > +#include <math.h> > +#include <ctype.h> > +#include <time.h> > + > +#include "debug.h" > +#include "prio.h" > +#include "structs.h" > +#include "../checkers/libsg.h" > + > +#define THRES_USEC_VALUE 120000000LL /*unit: us, =120s*/ > + > +#define MAX_IO_NUM 200 > +#define MIN_IO_NUM 10 > + > +#define MAX_LATENCY_INTERVAL 60 /*unit: s*/ > +#define MIN_LATENCY_INTERVAL 1 /*unit: us, or ms, or s*/ > + > +#define DEFAULT_PRIORITY 0 > + > +#define MAX_CHAR_SIZE 30 > + > +#define CHAR_USEC "us" > +#define CHAR_MSEC "ms" > +#define CHAR_SEC "s" > + > +enum interval_type { > + INTERVAL_USEC, > + INTERVAL_MSEC, > + INTERVAL_SEC, > + INTERVAL_INVALID > +}; > + > +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ > +static const char *interval_unit_str[MAX_CHAR_SIZE] = { > + CHAR_USEC, CHAR_MSEC, CHAR_SEC > +}; > +static const int interval_unit_type[] = { > + INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC > +}; > + > +#define USEC_PER_SEC 1000000LL > +#define USEC_PER_MSEC 1000LL > +#define USEC_PER_USEC 1LL > + > +static const int conversion_ratio[] = { > + [INTERVAL_USEC] = USEC_PER_USEC, > + [INTERVAL_MSEC] = USEC_PER_MSEC, > + [INTERVAL_SEC] = USEC_PER_SEC, > + [INTERVAL_INVALID] = 0 > +}; > + > +static long long path_latency[MAX_IO_NUM]; > + > +static inline long long timeval_to_us(const struct timespec *tv) > +{ > + return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); > +} > + > +static int do_readsector0(int fd, unsigned int timeout) > +{ > + unsigned char buf[4096]; > + unsigned char sbuf[SENSE_BUFF_LEN]; > + int ret; > + > + ret = sg_read(fd, &buf[0], 4096, &sbuf[0], > + SENSE_BUFF_LEN, timeout); > + > + return ret; > +} > + > +int check_args_valid(int io_num, long long latency_interval, int type) > +{ > + if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) > + { > + condlog(0, "args io_num is more than the valid values range"); > + return 0; > + } > + > + /* s:[1, 60], ms:[1, 60000], us:[1, 60000000] */ > + if ((latency_interval < MIN_LATENCY_INTERVAL) || (latency_interval > (MAX_LATENCY_INTERVAL * USEC_PER_SEC / conversion_ratio[type]))) > + { > + condlog(0, "args latency_interval is more than the valid values range"); > + return 0; > + } > + > + return 1; > +} > + > +static int get_interval_type(char *type) > +{ > + int index; > + > + for (index = 0; index < sizeof(interval_unit_str)/MAX_CHAR_SIZE; index++) > + { > + if (strcmp(type, interval_unit_str[index]) == 0) > + { > + return interval_unit_type[index]; > + } > + } > + > + return INTERVAL_INVALID; > +} > + > +long long get_conversion_ratio(int type) > +{ > + return conversion_ratio[type]; > +} > + > +/* In multipath.conf, args form: io_num|latency_interval. For example, > +* args is "20|10ms", this function can get 20, 10. > +*/ > +static int get_interval_and_ionum(char *args, > + int *ionum, > + long long *interval) > +{ > + char source[MAX_CHAR_SIZE]; > + char vertica = '|'; > + char *endstrbefore = NULL; > + char *endstrafter = NULL; > + int type; > + unsigned int size = strlen(args); > + long long ratio; > + > + if ((args == NULL) || (ionum == NULL) || (interval == NULL)) > + { > + condlog(0, "args string is NULL"); > + return 0; > + } > + > + if ((size < 1) || (size > MAX_CHAR_SIZE-1)) > + { > + condlog(0, "args string's size is too long"); > + return 0; > + } > + > + memcpy(source, args, size+1); > + > + if (!isdigit(source[0])) > + { > + condlog(0, "args io_num string's first char is not digit"); > + return 0; > + } > + > + *ionum = (int)strtoul(source, &endstrbefore, 10); > + if (endstrbefore[0] != vertica) > + { > + condlog(0, "segmentation char is invalid"); > + return 0; > + } > + > + if (!isdigit(endstrbefore[1])) > + { > + condlog(0, "args latency_interval string's first char is not digit"); > + return 0; > + } > + > + *interval = (long long)strtol(&endstrbefore[1], &endstrafter, 10); > + type = get_interval_type(endstrafter); > + if (type == INTERVAL_INVALID) > + { > + condlog(0, "args latency_interval type is invalid"); > + return 0; > + } > + > + if (check_args_valid(*ionum, *interval, type) == 0) > + { > + return 0; > + } > + > + ratio = get_conversion_ratio(type); > + *interval *= (long long)ratio; > + > + return 1; > +} > + > +long long calc_standard_deviation(long long *path_latency, int size, long long avglatency) > +{ > + int index; > + long long total = 0; > + > + for (index = 0; index < size; index++) > + { > + total += (path_latency[index] - avglatency) * (path_latency[index] - avglatency); > + } > + > + total /= (size-1); > + > + return (long long)sqrt((double)total); > +} > + > +int getprio (struct path *pp, char *args, unsigned int timeout) > +{ > + int rc, temp; > + int index = 0; > + int io_num; > + long long latency_interval; > + long long avglatency; > + long long standard_deviation; > + long long toldelay = 0; > + long long before, after; > + struct timespec tv; > + > + if (pp->fd < 0) > + return -1; > + > + if (get_interval_and_ionum(args, &io_num, &latency_interval) == 0) > + { > + condlog(0, "%s: get path_latency args fail", pp->dev); > + return DEFAULT_PRIORITY; > + } > + > + memset(path_latency, 0, sizeof(path_latency)); > + > + temp = io_num; > + while (temp-- > 0) > + { > + (void)clock_gettime(CLOCK_MONOTONIC, &tv); > + before = timeval_to_us(&tv); > + > + if (do_readsector0(pp->fd, timeout) == 2) > + { > + condlog(0, "%s: path down", pp->dev); > + return -1; > + } > + > + (void)clock_gettime(CLOCK_MONOTONIC, &tv); > + after = timeval_to_us(&tv); > + > + path_latency[index] = after - before; > + toldelay += path_latency[index++]; > + } > + > + avglatency = toldelay/(long long)io_num; > + condlog(4, "%s: average latency is (%lld)", pp->dev, avglatency); > + > + if (avglatency > THRES_USEC_VALUE) > + { > + condlog(0, "%s: average latency (%lld) is more than thresold", pp->dev, avglatency); > + return DEFAULT_PRIORITY; > + } > + > + /* warn the user if the latency_interval set is smaller than (2 * standard deviation), or equal */ > + standard_deviation = calc_standard_deviation(path_latency, index, avglatency); > + if (latency_interval <= (2 * standard_deviation)) > + condlog(3, "%s: args latency_interval set is smaller than 2 * standard deviation (%lld us), or equal", > + pp->dev, standard_deviation); > + > + rc = (int)(THRES_USEC_VALUE - (avglatency/(long long)latency_interval)); > + return rc; > +} > diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 > index 5939688..3dd0d77 100644 > --- a/multipath/multipath.conf.5 > +++ b/multipath/multipath.conf.5 > @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. > Generate the path priority based on the regular expression and the > priority provided as argument. Requires prio_args keyword. > .TP > +.I path_latency > +Generate the path priority based on a latency algorithm. > +Requires prio_args keyword. > +.TP > .I datacore > .\" XXX > ???. Requires prio_args keyword. > @@ -333,6 +337,20 @@ these values can be looked up through sysfs or by running \fImultipathd show pat > "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.* > .RE > .TP 12 > +.I path_latency > +Needs a value of the form > +\fI"<latency_interval>|<io_num>"\fR > +.RS > +.TP 8 > +.I latency_interval > +The interval values of average latency between two different neighbour ranks of path priority, used to partition different priority ranks. > +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or Millisecond. Valid Values: Integer, s [1, 60], ms [1, 60000], us [1, 60000000], > +For example: If latency_interval=10ms, the paths will be grouped in priority groups with path latency 0-10ms, 10-20ms, 20-30ms, etc.. > +.TP > +.I io_num > +The number of read IOs sent to the current path continuously, used to calculate the average path latency. Valid Values: Integer, [10, 200]. > +.RE > +.TP 12 > .I alua > If \fIexclusive_pref_bit\fR is set, paths with the \fIpreferred path\fR bit > set will always be in their own path group. > diff --git a/libmultipath/prio.h b/libmultipath/prio.h > index 0193c52..c97fe39 100644 > --- a/libmultipath/prio.h > +++ b/libmultipath/prio.h > @@ -29,6 +29,7 @@ struct path; > #define PRIO_RDAC "rdac" > #define PRIO_WEIGHTED_PATH "weightedpath" > #define PRIO_SYSFS "sysfs" > +#define PRIO_PATH_LATENCY "path_latency" > > /* > * Value used to mark the fact prio was not defined > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hello Benjamin, How about this patch? Look forward your replies. Thank you very much. Best regards! On 2017/5/19 17:45, Yang Feng wrote: > Hi Benjamin, > > Thank you very much for your comments. > Please find my replys and the up-to-date patch. > Best regards! > >> >> First, one overall question. We have dynamic path selectors available to >> deal with paths that are just simply slower that other paths, but can >> still be used together. Is there specific hardware or a specific setup >> where this isn't good enough and we really need to seperate these paths >> into different pathgroups, but we can't find out deterministically how >> the groups should be set up? It just seems like there could be a less >> hacky solution to this problem, but perhaps there are some situations >> where this is truly the best option. I'm just wondering what those are.1. In the Storage-Backup environment of HyperCluster,includs one storage array near > to the host and one remote storage array, and the two storage arrays have the same hardware. > The same LUN is writed or readed by the two storage arrays. > However, usually, the average latency of the paths of the remote storage array is much higher than the > near storage array's. > apparently, the prioritizer can be a good automatic solution. > And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. > 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, > IOs will not send to this paths. > But the current selectors don't solve this problem, IOPS will be influenced unavoidably. > >>> + >>> +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ >>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = { >>> + CHAR_USEC, CHAR_MSEC, CHAR_SEC >> >> This is a nit, but for constant strings, could you please use "char >> *var" instead of "char var[]", to be consistent with the rest of the >> multipath code. > Thanks, as the following patch. > >>> + if ((args == NULL) || (interval == NULL) >>> + || (consnum == NULL) || (type == NULL)) >>> + return 0; >>> + >>> + /* int type */ >>> + if ((size < 1) || (size > MAX_CHAR_SIZE-1)) >>> + return 0; >> >> You should probably have log messages for these error returns. > Thanks, as the following patch. > >>> + >>> + memcpy(source, args, size+1); >>> + if (strstr(source, vertica) == NULL) >>> + return 0; >>> + >>> + *type = get_interval_type(source, typestr); >>> + if (*type == INTERVAL_INVALID) >>> + { >>> + condlog(0, "delay_interval type is invalid"); >>> + return 0; >>> + } >> >> I'm confused here. How do you get to use the default interval. Shouldn't >> you accept "20s|" and "|30" and as valid inputs that use the defaults >> for the part they don't specify. > OK,the default arguments value is removed. If get inputs failed, return default priority "0". > As the following patch. >> >>> + tokenbefore = strtok(source, vertica); >>> + tokenafter = strtok(NULL, vertica); >>> + typestr[1] = '\0'; >>> + tokenbefore = strtok(tokenbefore, typestr); >>> + if ((tokenbefore == NULL) || (tokenafter == NULL)) >>> + return 0; >>> + >>> + tmp = tokenbefore; >>> + while (*tmp != '\0') >>> + if (!isdigit(*tmp++)) >>> + { >>> + condlog(0, "delay_interval string include invalid char"); >>> + return 0; >>> + } >>> + >>> + tmp = tokenafter; >>> + while (*tmp != '\0') >>> + if (!isdigit(*tmp++)) >>> + { >>> + condlog(0, "cons_num string include invalid char"); >>> + return 0; >>> + } >>> + >>> + *interval = atoi(tokenbefore); >> >> Why do you keep track of the type and the interval seperately? Can't you >> just find out the type, and use that to multiply the interval once you >> read it, and then just use that value, instead of keeping track of two >> values across multiple functions? > Thanks, as the following patch. > >>> + >>> + if (pp->fd < 0) >>> + return -PRIO_NO_INFORMATION; >>> + >>> + if (get_delay_pref_arg(args, &delay_interval, &cons_num, &type) == 0) >>> + { >>> + condlog(3, "%s: get delay arg fail", pp->dev); >> >> Why use the word "fail" in this message? Not setting prio_args to get >> the defaults seems like a perfectly valid choice. > The defaults are not used. Insteadly, return default priority "0". See below. > >>> diff --git a/libmultipath/prioritizers/delayedpath.h b/libmultipath/prioritizers/delayedpath.h >>> new file mode 100644 >>> index 0000000..d8213e9 >>> --- /dev/null >>> +++ b/libmultipath/prioritizers/delayedpath.h >>> @@ -0,0 +1,17 @@ >>> +#ifndef _DELAYEDPATH_H >>> +#define _DELAYEDPATH_H >>> + >>> +#define PRIO_DELAYED_PATH "delayedpath" >> >> In order for the rest of the code to refer to this prioritizer, this >> define should be in prio.h with the other prioritizer names, and as long >> as delayedpath.c includes prio.h, there's no need to put it in >> delayedpath.h. > OK, as the following patch. >> >>> + >>> +#define PRIO_NO_INFORMATION 5 >> >> The rest of the multipath code only cares if getprio returns a negative >> number of not. It doesn't check what the specific negative number is. I >> realize the the alua prioritizer returns a set of error codes, but they >> aren't used, or even usable in their present form. If we wanted to have >> better error reporting, we should set up a universal set of error codes >> in prio.h, and have all prioritizers use them, instead of having each >> prioritizer define its own error codes. There's no reason why your >> prioritizer needs to return this error code instead of -1. > OK, as the following patch. >> >>> + >>> +#define USEC_PER_SEC 1000000LL >>> +#define USEC_PER_MSEC 1000LL >>> +#define USEC_PER_USEC 1LL >>> + >>> +static inline long long timeval_to_us(const struct timespec *tv) >>> +{ >>> + return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); >>> +} >> >> No other file besides delayedpath.c will likely be including this .h >> file, so I don't see any purpose for these being defined here. In fact, >> I don't see why you can't just have a .c file without a .h file like the >> majority of prioritizers. I'm pretty sure that none of the prioritizers >> really need their own .h file. > OK, as the following patch. >> >>> +#endif >>> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 >>> index 5939688..f1e126e 100644 >>> --- a/multipath/multipath.conf.5 >>> +++ b/multipath/multipath.conf.5 >>> @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. >>> Generate the path priority based on the regular expression and the >>> priority provided as argument. Requires prio_args keyword. >>> .TP >>> +.I delayedpath >>> +Generate the path priority based on a time-delay algorithm. >>> +Requires prio_args keyword. >> >> Really it doesn't require prio_args if you want to use the default >> values, and should probably say so. > The default args is discarded, as the following patch. > >>> +.I delayed >>> +Needs a value of the form >>> +\fI"<delay_interval|cons_num>"\fR >>> +.RS >>> +.TP 8 >>> +.I delay_interval >>> +The interval values of average IO-time-delay between two different neighbour ranks of path priority, used to partition different priority ranks. >>> +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or Millisecond. Valid Values: Integer, s [1, 60], ms [1, 1000), us [1, 1000), >>> +For example: 10s, or 100us, or 100ms. The default is: 10ms. >>> +.TP >>> +.I cons_num >>> +The number of read IOs sent to the current path continuously, used to calculate the average IO-time-delay. Valid Values: Integer, [3, 1000]. >>> +For example: 30. The default is: 20. >>> +.RE >>> +.TP 12 >> >> Looking at the "weighted" prio_args definition just above your "delayed" >> definition, the pipe character "|" is being used to say that any of a >> set of options is allowed. Your definition has it being a literal >> character, but it's still inside the angle brackets that usually >> delineate a variable. perhaps "<delay_interval>|<io_num>" would be >> easier to understand, or even "[delayed_interval]|[io_num]" if you can >> omit these to use the defaults. > OK, as the following patch. > The up-to-date patch as follows: > > --- >>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 > From: Yang Feng <philip.yang@huawei.com> > Date: Fri, 19 May 2017 16:09:07 +0800 > Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority > values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following > arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: > 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. > 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. > > latency_interval latency_interval latency_interval latency_interval > |------------------|------------------|------------------|...|------------------| > | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | > |------------------|------------------|------------------|...|------------------| > Priority Rank Partitioning > --- > libmultipath/prioritizers/Makefile | 6 +- > libmultipath/prioritizers/path_latency.c | 271 +++++++++++++++++++++++++++++++ > multipath/multipath.conf.5 | 18 ++ > libmultipath/prio.h | 1 + > 4 files changed, 295 insertions(+), 1 deletion(-) > create mode 100644 libmultipath/prioritizers/path_latency.c > > diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile > index 36b42e4..d2f20f6 100644 > --- a/libmultipath/prioritizers/Makefile > +++ b/libmultipath/prioritizers/Makefile > @@ -18,13 +18,17 @@ LIBS = \ > libpriorandom.so \ > libpriordac.so \ > libprioweightedpath.so \ > - libpriosysfs.so > + libpriopath_latency.so \ > + libpriosysfs.so > > all: $(LIBS) > > libprioalua.so: alua.o alua_rtpg.o > $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ > > +libpriopath_latency.so: path_latency.o ../checkers/libsg.o > + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm > + > libprio%.so: %.o > $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ > > diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c > new file mode 100644 > index 0000000..a666b6c > --- /dev/null > +++ b/libmultipath/prioritizers/path_latency.c > @@ -0,0 +1,271 @@ > +/* > + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. > + * > + * main.c > + * > + * Prioritizer for device mapper multipath, where the corresponding priority > + * values of specific paths are provided by a latency algorithm. And the > + * latency algorithm is dependent on arguments. > + * > + * The principle of the algorithm as follows: > + * 1. By sending a certain number "io_num" of read IOs to the current path > + * continuously, the IOs' average latency can be calculated. > + * 2. According to the average latency of each path and the weight value > + * "latency_interval", the priority "rc" of each path can be provided. > + * > + * Author(s): Yang Feng <philip.yang@huawei.com> > + * Zou Ming <zouming.zouming@huawei.com> > + * > + * This file is released under the GPL. > + */ > +#include <stdio.h> > +#include <math.h> > +#include <ctype.h> > +#include <time.h> > + > +#include "debug.h" > +#include "prio.h" > +#include "structs.h" > +#include "../checkers/libsg.h" > + > +#define THRES_USEC_VALUE 120000000LL /*unit: us, =120s*/ > + > +#define MAX_IO_NUM 200 > +#define MIN_IO_NUM 10 > + > +#define MAX_LATENCY_INTERVAL 60 /*unit: s*/ > +#define MIN_LATENCY_INTERVAL 1 /*unit: us, or ms, or s*/ > + > +#define DEFAULT_PRIORITY 0 > + > +#define MAX_CHAR_SIZE 30 > + > +#define CHAR_USEC "us" > +#define CHAR_MSEC "ms" > +#define CHAR_SEC "s" > + > +enum interval_type { > + INTERVAL_USEC, > + INTERVAL_MSEC, > + INTERVAL_SEC, > + INTERVAL_INVALID > +}; > + > +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ > +static const char *interval_unit_str[MAX_CHAR_SIZE] = { > + CHAR_USEC, CHAR_MSEC, CHAR_SEC > +}; > +static const int interval_unit_type[] = { > + INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC > +}; > + > +#define USEC_PER_SEC 1000000LL > +#define USEC_PER_MSEC 1000LL > +#define USEC_PER_USEC 1LL > + > +static const int conversion_ratio[] = { > + [INTERVAL_USEC] = USEC_PER_USEC, > + [INTERVAL_MSEC] = USEC_PER_MSEC, > + [INTERVAL_SEC] = USEC_PER_SEC, > + [INTERVAL_INVALID] = 0 > +}; > + > +static long long path_latency[MAX_IO_NUM]; > + > +static inline long long timeval_to_us(const struct timespec *tv) > +{ > + return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); > +} > + > +static int do_readsector0(int fd, unsigned int timeout) > +{ > + unsigned char buf[4096]; > + unsigned char sbuf[SENSE_BUFF_LEN]; > + int ret; > + > + ret = sg_read(fd, &buf[0], 4096, &sbuf[0], > + SENSE_BUFF_LEN, timeout); > + > + return ret; > +} > + > +int check_args_valid(int io_num, long long latency_interval, int type) > +{ > + if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) > + { > + condlog(0, "args io_num is more than the valid values range"); > + return 0; > + } > + > + /* s:[1, 60], ms:[1, 60000], us:[1, 60000000] */ > + if ((latency_interval < MIN_LATENCY_INTERVAL) || (latency_interval > (MAX_LATENCY_INTERVAL * USEC_PER_SEC / conversion_ratio[type]))) > + { > + condlog(0, "args latency_interval is more than the valid values range"); > + return 0; > + } > + > + return 1; > +} > + > +static int get_interval_type(char *type) > +{ > + int index; > + > + for (index = 0; index < sizeof(interval_unit_str)/MAX_CHAR_SIZE; index++) > + { > + if (strcmp(type, interval_unit_str[index]) == 0) > + { > + return interval_unit_type[index]; > + } > + } > + > + return INTERVAL_INVALID; > +} > + > +long long get_conversion_ratio(int type) > +{ > + return conversion_ratio[type]; > +} > + > +/* In multipath.conf, args form: io_num|latency_interval. For example, > +* args is "20|10ms", this function can get 20, 10. > +*/ > +static int get_interval_and_ionum(char *args, > + int *ionum, > + long long *interval) > +{ > + char source[MAX_CHAR_SIZE]; > + char vertica = '|'; > + char *endstrbefore = NULL; > + char *endstrafter = NULL; > + int type; > + unsigned int size = strlen(args); > + long long ratio; > + > + if ((args == NULL) || (ionum == NULL) || (interval == NULL)) > + { > + condlog(0, "args string is NULL"); > + return 0; > + } > + > + if ((size < 1) || (size > MAX_CHAR_SIZE-1)) > + { > + condlog(0, "args string's size is too long"); > + return 0; > + } > + > + memcpy(source, args, size+1); > + > + if (!isdigit(source[0])) > + { > + condlog(0, "args io_num string's first char is not digit"); > + return 0; > + } > + > + *ionum = (int)strtoul(source, &endstrbefore, 10); > + if (endstrbefore[0] != vertica) > + { > + condlog(0, "segmentation char is invalid"); > + return 0; > + } > + > + if (!isdigit(endstrbefore[1])) > + { > + condlog(0, "args latency_interval string's first char is not digit"); > + return 0; > + } > + > + *interval = (long long)strtol(&endstrbefore[1], &endstrafter, 10); > + type = get_interval_type(endstrafter); > + if (type == INTERVAL_INVALID) > + { > + condlog(0, "args latency_interval type is invalid"); > + return 0; > + } > + > + if (check_args_valid(*ionum, *interval, type) == 0) > + { > + return 0; > + } > + > + ratio = get_conversion_ratio(type); > + *interval *= (long long)ratio; > + > + return 1; > +} > + > +long long calc_standard_deviation(long long *path_latency, int size, long long avglatency) > +{ > + int index; > + long long total = 0; > + > + for (index = 0; index < size; index++) > + { > + total += (path_latency[index] - avglatency) * (path_latency[index] - avglatency); > + } > + > + total /= (size-1); > + > + return (long long)sqrt((double)total); > +} > + > +int getprio (struct path *pp, char *args, unsigned int timeout) > +{ > + int rc, temp; > + int index = 0; > + int io_num; > + long long latency_interval; > + long long avglatency; > + long long standard_deviation; > + long long toldelay = 0; > + long long before, after; > + struct timespec tv; > + > + if (pp->fd < 0) > + return -1; > + > + if (get_interval_and_ionum(args, &io_num, &latency_interval) == 0) > + { > + condlog(0, "%s: get path_latency args fail", pp->dev); > + return DEFAULT_PRIORITY; > + } > + > + memset(path_latency, 0, sizeof(path_latency)); > + > + temp = io_num; > + while (temp-- > 0) > + { > + (void)clock_gettime(CLOCK_MONOTONIC, &tv); > + before = timeval_to_us(&tv); > + > + if (do_readsector0(pp->fd, timeout) == 2) > + { > + condlog(0, "%s: path down", pp->dev); > + return -1; > + } > + > + (void)clock_gettime(CLOCK_MONOTONIC, &tv); > + after = timeval_to_us(&tv); > + > + path_latency[index] = after - before; > + toldelay += path_latency[index++]; > + } > + > + avglatency = toldelay/(long long)io_num; > + condlog(4, "%s: average latency is (%lld)", pp->dev, avglatency); > + > + if (avglatency > THRES_USEC_VALUE) > + { > + condlog(0, "%s: average latency (%lld) is more than thresold", pp->dev, avglatency); > + return DEFAULT_PRIORITY; > + } > + > + /* warn the user if the latency_interval set is smaller than (2 * standard deviation), or equal */ > + standard_deviation = calc_standard_deviation(path_latency, index, avglatency); > + if (latency_interval <= (2 * standard_deviation)) > + condlog(3, "%s: args latency_interval set is smaller than 2 * standard deviation (%lld us), or equal", > + pp->dev, standard_deviation); > + > + rc = (int)(THRES_USEC_VALUE - (avglatency/(long long)latency_interval)); > + return rc; > +} > diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 > index 5939688..3dd0d77 100644 > --- a/multipath/multipath.conf.5 > +++ b/multipath/multipath.conf.5 > @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. > Generate the path priority based on the regular expression and the > priority provided as argument. Requires prio_args keyword. > .TP > +.I path_latency > +Generate the path priority based on a latency algorithm. > +Requires prio_args keyword. > +.TP > .I datacore > .\" XXX > ???. Requires prio_args keyword. > @@ -333,6 +337,20 @@ these values can be looked up through sysfs or by running \fImultipathd show pat > "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.* > .RE > .TP 12 > +.I path_latency > +Needs a value of the form > +\fI"<latency_interval>|<io_num>"\fR > +.RS > +.TP 8 > +.I latency_interval > +The interval values of average latency between two different neighbour ranks of path priority, used to partition different priority ranks. > +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or Millisecond. Valid Values: Integer, s [1, 60], ms [1, 60000], us [1, 60000000], > +For example: If latency_interval=10ms, the paths will be grouped in priority groups with path latency 0-10ms, 10-20ms, 20-30ms, etc.. > +.TP > +.I io_num > +The number of read IOs sent to the current path continuously, used to calculate the average path latency. Valid Values: Integer, [10, 200]. > +.RE > +.TP 12 > .I alua > If \fIexclusive_pref_bit\fR is set, paths with the \fIpreferred path\fR bit > set will always be in their own path group. > diff --git a/libmultipath/prio.h b/libmultipath/prio.h > index 0193c52..c97fe39 100644 > --- a/libmultipath/prio.h > +++ b/libmultipath/prio.h > @@ -29,6 +29,7 @@ struct path; > #define PRIO_RDAC "rdac" > #define PRIO_WEIGHTED_PATH "weightedpath" > #define PRIO_SYSFS "sysfs" > +#define PRIO_PATH_LATENCY "path_latency" > > /* > * Value used to mark the fact prio was not defined > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..d2f20f6 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,13 +18,17 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ - libpriosysfs.so + libpriopath_latency.so \ + libpriosysfs.so all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 0000000..a666b6c --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,271 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. + * + * main.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + * continuously, the IOs' average latency can be calculated. + * 2. According to the average latency of each path and the weight value + * "latency_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.yang@huawei.com> + * Zou Ming <zouming.zouming@huawei.com> + * + * This file is released under the GPL. + */ +#include <stdio.h> +#include <math.h> +#include <ctype.h> +#include <time.h> + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define THRES_USEC_VALUE 120000000LL /*unit: us, =120s*/ + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 10 + +#define MAX_LATENCY_INTERVAL 60 /*unit: s*/ +#define MIN_LATENCY_INTERVAL 1 /*unit: us, or ms, or s*/ + +#define DEFAULT_PRIORITY 0 + +#define MAX_CHAR_SIZE 30 + +#define CHAR_USEC "us" +#define CHAR_MSEC "ms" +#define CHAR_SEC "s" + +enum interval_type { + INTERVAL_USEC, + INTERVAL_MSEC, + INTERVAL_SEC, + INTERVAL_INVALID +}; + +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ +static const char *interval_unit_str[MAX_CHAR_SIZE] = { + CHAR_USEC, CHAR_MSEC, CHAR_SEC +}; +static const int interval_unit_type[] = { + INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC +}; + +#define USEC_PER_SEC 1000000LL +#define USEC_PER_MSEC 1000LL +#define USEC_PER_USEC 1LL + +static const int conversion_ratio[] = { + [INTERVAL_USEC] = USEC_PER_USEC, + [INTERVAL_MSEC] = USEC_PER_MSEC, + [INTERVAL_SEC] = USEC_PER_SEC, + [INTERVAL_INVALID] = 0 +}; + +static long long path_latency[MAX_IO_NUM]; + +static inline long long timeval_to_us(const struct timespec *tv) +{ + return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); +} + +static int do_readsector0(int fd, unsigned int timeout) +{ + unsigned char buf[4096]; + unsigned char sbuf[SENSE_BUFF_LEN]; + int ret; + + ret = sg_read(fd, &buf[0], 4096, &sbuf[0], + SENSE_BUFF_LEN, timeout); + + return ret; +} + +int check_args_valid(int io_num, long long latency_interval, int type) +{ + if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) + { + condlog(0, "args io_num is more than the valid values range"); + return 0; + } + + /* s:[1, 60], ms:[1, 60000], us:[1, 60000000] */ + if ((latency_interval < MIN_LATENCY_INTERVAL) || (latency_interval > (MAX_LATENCY_INTERVAL * USEC_PER_SEC / conversion_ratio[type]))) + { + condlog(0, "args latency_interval is more than the valid values range"); + return 0; + } + + return 1; +} + +static int get_interval_type(char *type) +{ + int index; + + for (index = 0; index < sizeof(interval_unit_str)/MAX_CHAR_SIZE; index++) + { + if (strcmp(type, interval_unit_str[index]) == 0) + { + return interval_unit_type[index]; + } + } + + return INTERVAL_INVALID; +} + +long long get_conversion_ratio(int type) +{ + return conversion_ratio[type]; +} + +/* In multipath.conf, args form: io_num|latency_interval. For example, +* args is "20|10ms", this function can get 20, 10. +*/ +static int get_interval_and_ionum(char *args, + int *ionum, + long long *interval) +{ + char source[MAX_CHAR_SIZE]; + char vertica = '|'; + char *endstrbefore = NULL; + char *endstrafter = NULL; + int type; + unsigned int size = strlen(args); + long long ratio; + + if ((args == NULL) || (ionum == NULL) || (interval == NULL)) + { + condlog(0, "args string is NULL"); + return 0; + } + + if ((size < 1) || (size > MAX_CHAR_SIZE-1)) + { + condlog(0, "args string's size is too long"); + return 0; + } + + memcpy(source, args, size+1); + + if (!isdigit(source[0])) + { + condlog(0, "args io_num string's first char is not digit"); + return 0; + } + + *ionum = (int)strtoul(source, &endstrbefore, 10); + if (endstrbefore[0] != vertica) + { + condlog(0, "segmentation char is invalid"); + return 0; + } + + if (!isdigit(endstrbefore[1])) + { + condlog(0, "args latency_interval string's first char is not digit"); + return 0; + } + + *interval = (long long)strtol(&endstrbefore[1], &endstrafter, 10); + type = get_interval_type(endstrafter); + if (type == INTERVAL_INVALID) + { + condlog(0, "args latency_interval type is invalid"); + return 0; + } + + if (check_args_valid(*ionum, *interval, type) == 0) + { + return 0; + } + + ratio = get_conversion_ratio(type); + *interval *= (long long)ratio; + + return 1; +} + +long long calc_standard_deviation(long long *path_latency, int size, long long avglatency) +{ + int index; + long long total = 0; + + for (index = 0; index < size; index++) + { + total += (path_latency[index] - avglatency) * (path_latency[index] - avglatency); + } + + total /= (size-1); + + return (long long)sqrt((double)total); +} + +int getprio (struct path *pp, char *args, unsigned int timeout) +{ + int rc, temp; + int index = 0; + int io_num; + long long latency_interval; + long long avglatency; + long long standard_deviation; + long long toldelay = 0; + long long before, after; + struct timespec tv; + + if (pp->fd < 0) + return -1; + + if (get_interval_and_ionum(args, &io_num, &latency_interval) == 0) + { + condlog(0, "%s: get path_latency args fail", pp->dev); + return DEFAULT_PRIORITY; + } + + memset(path_latency, 0, sizeof(path_latency)); + + temp = io_num; + while (temp-- > 0) + { + (void)clock_gettime(CLOCK_MONOTONIC, &tv); + before = timeval_to_us(&tv); + + if (do_readsector0(pp->fd, timeout) == 2) + { + condlog(0, "%s: path down", pp->dev); + return -1; + } + + (void)clock_gettime(CLOCK_MONOTONIC, &tv); + after = timeval_to_us(&tv); + + path_latency[index] = after - before; + toldelay += path_latency[index++]; + } + + avglatency = toldelay/(long long)io_num; + condlog(4, "%s: average latency is (%lld)", pp->dev, avglatency); + + if (avglatency > THRES_USEC_VALUE) + { + condlog(0, "%s: average latency (%lld) is more than thresold", pp->dev, avglatency); + return DEFAULT_PRIORITY; + } + + /* warn the user if the latency_interval set is smaller than (2 * standard deviation), or equal */ + standard_deviation = calc_standard_deviation(path_latency, index, avglatency); + if (latency_interval <= (2 * standard_deviation)) + condlog(3, "%s: args latency_interval set is smaller than 2 * standard deviation (%lld us), or equal", + pp->dev, standard_deviation); + + rc = (int)(THRES_USEC_VALUE - (avglatency/(long long)latency_interval)); + return rc; +} diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 index 5939688..3dd0d77 100644 --- a/multipath/multipath.conf.5 +++ b/multipath/multipath.conf.5 @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. Generate the path priority based on the regular expression and the priority provided as argument. Requires prio_args keyword. .TP +.I path_latency +Generate the path priority based on a latency algorithm. +Requires prio_args keyword. +.TP .I datacore .\" XXX ???. Requires prio_args keyword. @@ -333,6 +337,20 @@ these values can be looked up through sysfs or by running \fImultipathd show pat "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.* .RE .TP 12 +.I path_latency +Needs a value of the form +\fI"<latency_interval>|<io_num>"\fR +.RS +.TP 8 +.I latency_interval +The interval values of average latency between two different neighbour ranks of path priority, used to partition different priority ranks. +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or Millisecond. Valid Values: Integer, s [1, 60], ms [1, 60000], us [1, 60000000], +For example: If latency_interval=10ms, the paths will be grouped in priority groups with path latency 0-10ms, 10-20ms, 20-30ms, etc.. +.TP +.I io_num +The number of read IOs sent to the current path continuously, used to calculate the average path latency. Valid Values: Integer, [10, 200]. +.RE +.TP 12 .I alua If \fIexclusive_pref_bit\fR is set, paths with the \fIpreferred path\fR bit set will always be in their own path group. diff --git a/libmultipath/prio.h b/libmultipath/prio.h index 0193c52..c97fe39 100644 --- a/libmultipath/prio.h +++ b/libmultipath/prio.h @@ -29,6 +29,7 @@ struct path; #define PRIO_RDAC "rdac" #define PRIO_WEIGHTED_PATH "weightedpath" #define PRIO_SYSFS "sysfs" +#define PRIO_PATH_LATENCY "path_latency" /* * Value used to mark the fact prio was not defined