Submitter | Martin Wilck |
---|---|

Date | Nov. 18, 2017, 12:11 a.m. |

Message ID | <20171118001134.26622-4-mwilck@suse.com> |

Download | mbox | patch |

Permalink | /patch/10064229/ |

State | Not Applicable, archived |

Delegated to: | christophe varoqui |

Headers | show |

## Comments

On 2017/11/18 8:11, Martin Wilck wrote: > The log standard deviation can be calculated much more simply > by realizing > > sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 > I derive the equation: sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2} = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} + sum_n{avg(x)^2} = sum_n{x_i^2} - 2*avg(x)*avg(x) + n*avg(x)^2 = sum_n{x_i^2} + (n-2)*avg(x)^2 > Also, use timespecsub rather than the custom timeval_to_usec, > and avoid taking log(0). > Great. > + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n", latency avg -> latency geometric avg ? Because in most cases, avg means arithmetic avg , but in this case, it means geometric avg. > + pp->dev, exp(lg_avglatency * lg_base), > + exp(standard_deviation * lg_base), rc); How can you get the uncertainty of Log-normal distribution is the exp(standard_deviation * lg_base) ? Thanks. Regards Guan . > + > return rc; > } > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel

Hello Guan, > On 2017/11/18 8:11, Martin Wilck wrote: > > The log standard deviation can be calculated much more simply > > by realizing > > > > sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 > > > > I derive the equation: > sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2} > = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} + > sum_n{avg(x)^2} > = sum_n{x_i^2} - 2*avg(x)*avg(x) + > n*avg(x)^2 > = sum_n{x_i^2} + (n-2)*avg(x)^2 No, that's wrong: avg(x) = (1/n) * sum_n(x_i) => sum_n(x_i) = n * avg(x) Thus the 2nd term in the line before the last in your derivation is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end result becomes sum_n(x_i^2) - n*avg(x)^2. > > > Also, use timespecsub rather than the custom timeval_to_usec, > > and avoid taking log(0). > > > > Great. > > > > + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f > > prio=%d\n", > > latency avg -> latency geometric avg ? Because in most cases, > avg means arithmetic avg , but in this case, it means geometric avg. Yes, I meant the geometric average. I don't think we should bother the user with these subtleties. Well, maybe it would feel better if we'd use "geometric mean" rather than "avg" in the log message, alhough that might again irritate some people who don't know the term ... I really don't care much. > > + pp->dev, exp(lg_avglatency * lg_base), > > + exp(standard_deviation * lg_base), rc); > > How can you get the uncertainty of Log-normal distribution > is the exp(standard_deviation * lg_base) ? The "width" of the normal distribution is measured in terms of the standard deviation, sigma. In your patch, you correctly accounted for the confidence levels of the 2*sigma environment (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule). Here, we're assuming a log-normal distribution for the latency (it's a practical assumption, not a statistical assertion - in reality the latency probably rather follows an exponential or Poisson distribution but we don't need to go into that detail). That means we're assuming that log(latency) can be described by a normal distribution with a certain standard deviation sigma around the log of the geometric mean. Again, sigma is the "width" of that normal distribution. Thus with ~68% probability, the log of the the latency is in the 1-sigma interval around the average. Translating that back into "real" latency, with 68% likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is the geometric mean and F=exp(sigma). Therefore, F (which is exp(standard_deviation * lg_base)) can be used as an estimate of the "uncertainty factor" for the latency. Agreed? Regards Martin

Hi Martin, I have tested this patch and found something needed to be correct. My comments inline. On 2017/11/18 8:11, Martin Wilck wrote: > The log standard deviation can be calculated much more simply > by realizing > > sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 > > @@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args, unsigned int timeout) > ".It is recommend to be set larger", > pp->dev, base_num); > > + standard_deviation = sqrt((sum_squares - lg_toldelay * lg_toldelay) > + / (io_num -1)); > This assignment is wrong. It gets a "NAN" for standard_deviation. It should be the following equation according to sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 standard_deviation = sqrt((sum_squares - io_num* lg_avglatency* lg_avglatency)/ (io_num -1)); = sqrt((sum_squares - lg_toldelay* lg_avglatency)/ (io_num -1)); > rc = calcPrio(lg_avglatency, lg_maxavglatency, lg_minavglatency); > > + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n", > + pp->dev, exp(lg_avglatency * lg_base), > + exp(standard_deviation * lg_base), rc); > + I still have the doubt about the computation of the "uncertainty" item. Because I have observed that the uncertainty is in the range of 1.3 ~ 1.6 whenever the base_num varies from 1.1 to 10. Do you mean the uncertainty as the "Error function" of (log) normal distribution? Here is the definition of https://en.wikipedia.org/wiki/Error_function I prefer to using "Error function" that describes accumulated probability of how prio locates in the (-inf, prio-1) and (prio+1, +inf). Thanks Guan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel

On Thu, 2017-12-07 at 12:26 +0800, Guan Junxiong wrote: > Hi Martin, > > I have tested this patch and found something needed to be correct. My > comments inline. > > > > @@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args, > > unsigned int timeout) > > ".It is recommend to be set larger", > > pp->dev, base_num); > > > > + standard_deviation = sqrt((sum_squares - lg_toldelay * > > lg_toldelay) > > + / (io_num -1)); > > > > This assignment is wrong. It gets a "NAN" for standard_deviation. > It should be the following equation according to sum_n (x_i - > avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 > standard_deviation = sqrt((sum_squares - io_num* lg_avglatency* > lg_avglatency)/ (io_num -1)); > = sqrt((sum_squares - lg_toldelay* > lg_avglatency)/ (io_num -1)); That's of course correct. I'll submit an update asap. > I still have the doubt about the computation of the "uncertainty" > item. > Because I have observed that the uncertainty is in the range of 1.3 ~ > 1.6 whenever > the base_num varies from 1.1 to 10. > Do you mean the uncertainty as the "Error function" of (log) normal > distribution? > Here is the definition of > https://en.wikipedia.org/wiki/Error_function > > I prefer to using "Error function" that describes accumulated > probability of how prio > locates in the (-inf, prio-1) and (prio+1, +inf). There's a misunderstanding here. My "uncertainty factor" describes the width of the distribution of the *measured* latencies. It roughly represents the accuracy of the measurement (indicating that 68%, or error_function (sqrt(1/2)), of the measured values lie in the interval [avg/F, avg*F]). IOW, it tells you how stable your latencies for a certain path are. It has nothing to do with the artificial prio "bins" that the latency prioritizer sets up. Therefore it *has to be independent* of base_num. It just gives an indication how large base_num should be. I hope this makes it clear. Martin

## Patch

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c index ce5c95dd7075..44472b77dd86 100644 --- a/libmultipath/prioritizers/path_latency.c +++ b/libmultipath/prioritizers/path_latency.c @@ -34,6 +34,7 @@ #include "prio.h" #include "structs.h" #include "util.h" +#include "time-util.h" #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, ##args) @@ -56,14 +57,6 @@ #define DEF_BLK_SIZE 4096 -static double lg_path_latency[MAX_IO_NUM]; - -static inline long long timeval_to_us(const struct timespec *tv) -{ - return ((long long)tv->tv_sec * USEC_PER_SEC) + - (tv->tv_nsec / NSEC_PER_USEC); -} - static int prepare_directio_read(int fd, int *blksz, char **pbuf, int *restore_flags) { @@ -199,22 +192,6 @@ out: return 0; } -double calc_standard_deviation(double *lg_path_latency, int size, - double lg_avglatency) -{ - int index; - double sum = 0; - - for (index = 0; index < size; index++) { - sum += (lg_path_latency[index] - lg_avglatency) * - (lg_path_latency[index] - lg_avglatency); - } - - sum /= (size - 1); - - return sqrt(sum); -} - /* * Do not scale the prioriy in a certain range such as [0, 1024] * because scaling will eliminate the effect of base_num. @@ -234,20 +211,16 @@ int calcPrio(double lg_avglatency, double lg_maxavglatency, int getprio(struct path *pp, char *args, unsigned int timeout) { int rc, temp; - int index = 0; int io_num = 0; double base_num = 0; double lg_avglatency, lg_maxavglatency, lg_minavglatency; double standard_deviation; double lg_toldelay = 0; - long long before, after; - struct timespec tv; int blksize; char *buf; int restore_flags = 0; double lg_base; - long long sum_latency = 0; - long long arith_mean_lat; + double sum_squares = 0; if (pp->fd < 0) return -1; @@ -260,7 +233,6 @@ int getprio(struct path *pp, char *args, unsigned int timeout) pp->dev, io_num, base_num); } - memset(lg_path_latency, 0, sizeof(lg_path_latency)); lg_base = log(base_num); lg_maxavglatency = log(MAX_AVG_LATENCY) / lg_base; lg_minavglatency = log(MIN_AVG_LATENCY) / lg_base; @@ -269,8 +241,10 @@ int getprio(struct path *pp, char *args, unsigned int timeout) temp = io_num; while (temp-- > 0) { - (void)clock_gettime(CLOCK_MONOTONIC, &tv); - before = timeval_to_us(&tv); + struct timespec tv_before, tv_after, tv_diff; + double diff, reldiff; + + (void)clock_gettime(CLOCK_MONOTONIC, &tv_before); if (do_directio_read(pp->fd, timeout, buf, blksize)) { pp_pl_log(0, "%s: path down", pp->dev); @@ -278,25 +252,34 @@ int getprio(struct path *pp, char *args, unsigned int timeout) return -1; } - (void)clock_gettime(CLOCK_MONOTONIC, &tv); - after = timeval_to_us(&tv); + (void)clock_gettime(CLOCK_MONOTONIC, &tv_after); + + timespecsub(&tv_after, &tv_before, &tv_diff); + diff = tv_diff.tv_sec * 1000 * 1000 + tv_diff.tv_nsec / 1000; + + if (diff == 0) + /* + * Avoid taking log(0). + * This unlikely case is treated as minimum - + * the sums don't increase + */ + continue; + + /* we scale by lg_base here */ + reldiff = log(diff) / lg_base; + /* * We assume that the latency complies with Log-normal * distribution. The logarithm of latency is in normal * distribution. */ - lg_path_latency[index] = log(after - before) / lg_base; - lg_toldelay += lg_path_latency[index++]; - sum_latency += after - before; + lg_toldelay += reldiff; + sum_squares += reldiff * reldiff; } cleanup_directio_read(pp->fd, buf, restore_flags); lg_avglatency = lg_toldelay / (long long)io_num; - arith_mean_lat = sum_latency / (long long)io_num; - pp_pl_log(4, "%s: arithmetic mean latency is (%lld us), geometric mean latency is (%lld us)", - pp->dev, arith_mean_lat, - (long long)pow(base_num, lg_avglatency)); if (lg_avglatency > lg_maxavglatency) { pp_pl_log(2, @@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args, unsigned int timeout) ".It is recommend to be set larger", pp->dev, base_num); + standard_deviation = sqrt((sum_squares - lg_toldelay * lg_toldelay) + / (io_num -1)); rc = calcPrio(lg_avglatency, lg_maxavglatency, lg_minavglatency); + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n", + pp->dev, exp(lg_avglatency * lg_base), + exp(standard_deviation * lg_base), rc); + return rc; }

`The log standard deviation can be calculated much more simply by realizing sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 Also, use timespecsub rather than the custom timeval_to_usec, and avoid taking log(0). Signed-off-by: Martin Wilck <mwilck@suse.com> --- libmultipath/prioritizers/path_latency.c | 71 ++++++++++++++------------------ 1 file changed, 30 insertions(+), 41 deletions(-)`