Message ID | 20180801002559.36261-1-dennisszhou@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] block: make iolatency avg_lat exponentially decay | expand |
Hi Dennis, Thank you for the patch! Yet something to improve: [auto build test ERROR on block/for-next] [also build test ERROR on next-20180731] [cannot apply to v4.18-rc7] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/block-make-iolatency-avg_lat-exponentially-decay/20180801-100533 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next config: i386-randconfig-i1-201830 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): block/blk-iolatency.o: In function `iolatency_check_latencies': >> block/blk-iolatency.c:511: undefined reference to `__udivdi3' vim +511 block/blk-iolatency.c 478 479 static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) 480 { 481 struct blkcg_gq *blkg = lat_to_blkg(iolat); 482 struct iolatency_grp *parent; 483 struct child_latency_info *lat_info; 484 struct blk_rq_stat stat; 485 unsigned long flags; 486 int cpu, exp_idx; 487 488 blk_rq_stat_init(&stat); 489 preempt_disable(); 490 for_each_online_cpu(cpu) { 491 struct blk_rq_stat *s; 492 s = per_cpu_ptr(iolat->stats, cpu); 493 blk_rq_stat_sum(&stat, s); 494 blk_rq_stat_init(s); 495 } 496 preempt_enable(); 497 498 parent = blkg_to_lat(blkg->parent); 499 if (!parent) 500 return; 501 502 lat_info = &parent->child_lat; 503 504 /* 505 * CALC_LOAD takes in a number stored in fixed point representation. 506 * Because we are using this for IO time in ns, the values stored 507 * are significantly larger than the FIXED_1 denominator (2048). 508 * Therefore, rounding errors in the calculation are negligible and 509 * can be ignored. 510 */ > 511 exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1, 512 iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE); 513 CALC_LOAD(iolat->total_lat_avg, iolatency_exp_factors[exp_idx], 514 stat.mean); 515 516 /* Everything is ok and we don't need to adjust the scale. */ 517 if (stat.mean <= iolat->min_lat_nsec && 518 atomic_read(&lat_info->scale_cookie) == DEFAULT_SCALE_COOKIE) 519 return; 520 521 /* Somebody beat us to the punch, just bail. */ 522 spin_lock_irqsave(&lat_info->lock, flags); 523 lat_info->nr_samples -= iolat->nr_samples; 524 lat_info->nr_samples += stat.nr_samples; 525 iolat->nr_samples = stat.nr_samples; 526 527 if ((lat_info->last_scale_event >= now || 528 now - lat_info->last_scale_event < BLKIOLATENCY_MIN_ADJUST_TIME) && 529 lat_info->scale_lat <= iolat->min_lat_nsec) 530 goto out; 531 532 if (stat.mean <= iolat->min_lat_nsec && 533 stat.nr_samples >= BLKIOLATENCY_MIN_GOOD_SAMPLES) { 534 if (lat_info->scale_grp == iolat) { 535 lat_info->last_scale_event = now; 536 scale_cookie_change(iolat->blkiolat, lat_info, true); 537 } 538 } else if (stat.mean > iolat->min_lat_nsec) { 539 lat_info->last_scale_event = now; 540 if (!lat_info->scale_grp || 541 lat_info->scale_lat > iolat->min_lat_nsec) { 542 WRITE_ONCE(lat_info->scale_lat, iolat->min_lat_nsec); 543 lat_info->scale_grp = iolat; 544 } 545 scale_cookie_change(iolat->blkiolat, lat_info, false); 546 } 547 out: 548 spin_unlock_irqrestore(&lat_info->lock, flags); 549 } 550 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hello, On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote: ... > + /* > + * CALC_LOAD takes in a number stored in fixed point representation. > + * Because we are using this for IO time in ns, the values stored > + * are significantly larger than the FIXED_1 denominator (2048). > + * Therefore, rounding errors in the calculation are negligible and > + * can be ignored. > + */ > + exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1, > + iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE); Build bot is complaining about naked 64bit div. Should use one of the div64*() helpers. Looks good to me. Once Johannes's concerns are addressed, please feel free to add Acked-by: Tejun Heo <tj@kernel.org> Thanks.
On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote: > From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com> > > Currently, avg_lat is calculated by accumulating the mean of every > window in a long running cumulative average. As time goes on, the metric > becomes less and less useful due to the accumulated history. > > This patch reuses the same calculation done in load averages to make the > avg_lat metric more lively. Unlike load averages, the avg only advances > when a window elapses (due to an io). Idle periods extend the most > recent window. Bucketing is used to limit the history of avg_lat by > binding it to the window size. So, the window range for 1/exp (decay > rate) is [1 min, 2.5 min) when windows elapse immediately. > > The current sample window size is exposed in the debug info to enable > calculation of the window range. > > Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Looks great, thanks Dennis. Once the div build error is fixed, please feel free to add: Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Wed, Aug 01, 2018 at 08:49:58AM -0700, Tejun Heo wrote: > Hello, > > On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote: > ... > > + /* > > + * CALC_LOAD takes in a number stored in fixed point representation. > > + * Because we are using this for IO time in ns, the values stored > > + * are significantly larger than the FIXED_1 denominator (2048). > > + * Therefore, rounding errors in the calculation are negligible and > > + * can be ignored. > > + */ > > + exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1, > > + iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE); > > Build bot is complaining about naked 64bit div. Should use one of the > div64*() helpers. > > Looks good to me. Once Johannes's concerns are addressed, please feel > free to add Ooh, one nitpick. total_lat_avg is a bit of a misnomer now. Maybe rename to lat_avg? Thanks.
On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote: > From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com> > > Currently, avg_lat is calculated by accumulating the mean of every > window in a long running cumulative average. As time goes on, the metric > becomes less and less useful due to the accumulated history. > > This patch reuses the same calculation done in load averages to make the > avg_lat metric more lively. Unlike load averages, the avg only advances > when a window elapses (due to an io). Idle periods extend the most > recent window. Bucketing is used to limit the history of avg_lat by > binding it to the window size. So, the window range for 1/exp (decay > rate) is [1 min, 2.5 min) when windows elapse immediately. > > The current sample window size is exposed in the debug info to enable > calculation of the window range. > > Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Fix the nits and you can add Acked-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
Hi Tejun, On Wed, Aug 01, 2018 at 09:10:11AM -0700, Tejun Heo wrote: > On Wed, Aug 01, 2018 at 08:49:58AM -0700, Tejun Heo wrote: > > Hello, > > > > On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote: > > ... > > > + /* > > > + * CALC_LOAD takes in a number stored in fixed point representation. > > > + * Because we are using this for IO time in ns, the values stored > > > + * are significantly larger than the FIXED_1 denominator (2048). > > > + * Therefore, rounding errors in the calculation are negligible and > > > + * can be ignored. > > > + */ > > > + exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1, > > > + iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE); > > > > Build bot is complaining about naked 64bit div. Should use one of the > > div64*() helpers. > > > > Looks good to me. Once Johannes's concerns are addressed, please feel > > free to add > > Ooh, one nitpick. total_lat_avg is a bit of a misnomer now. Maybe > rename to lat_avg? > Yeah, that makes sense. I've renamed it to lat_avg and updated the Documentation file as well. Thanks, Dennis
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index bb59b2929e0d..2a6bb7e31dda 100644 --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -69,6 +69,7 @@ #include <linux/module.h> #include <linux/timer.h> #include <linux/memcontrol.h> +#include <linux/sched/loadavg.h> #include <linux/sched/signal.h> #include <trace/events/block.h> #include "blk-rq-qos.h" @@ -127,7 +128,6 @@ struct iolatency_grp { /* total running average of our io latency. */ u64 total_lat_avg; - u64 total_lat_nr; /* Our current number of IO's for the last summation. */ u64 nr_samples; @@ -135,6 +135,27 @@ struct iolatency_grp { struct child_latency_info child_lat; }; +#define BLKIOLATENCY_MIN_WIN_SIZE (100 * NSEC_PER_MSEC) +#define BLKIOLATENCY_MAX_WIN_SIZE NSEC_PER_SEC +/* + * These are the constants used to fake the fixed-point moving average + * calculation just like load average. The call to CALC_LOAD folds + * (FIXED_1 (2048) - exp_factor) * new_sample into total_lat_avg. + * The sampling window size is bucketed to try to approximately calculate + * average latency such that 1/exp (decay rate) is [1 min, 2.5 min) when + * windows elapse immediately. + */ +#define BLKIOLATENCY_NR_EXP_FACTORS 5 +#define BLKIOLATENCY_EXP_BUCKET_SIZE (BLKIOLATENCY_MAX_WIN_SIZE / \ + (BLKIOLATENCY_NR_EXP_FACTORS - 1)) +static const u64 iolatency_exp_factors[BLKIOLATENCY_NR_EXP_FACTORS] = { + 2045, // exp(1/600) - 600 samples + 2039, // exp(1/240) - 240 samples + 2031, // exp(1/120) - 120 samples + 2023, // exp(1/80) - 80 samples + 2014, // exp(1/60) - 60 samples +}; + static inline struct iolatency_grp *pd_to_lat(struct blkg_policy_data *pd) { return pd ? container_of(pd, struct iolatency_grp, pd) : NULL; @@ -462,7 +483,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) struct child_latency_info *lat_info; struct blk_rq_stat stat; unsigned long flags; - int cpu; + int cpu, exp_idx; blk_rq_stat_init(&stat); preempt_disable(); @@ -480,11 +501,17 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) lat_info = &parent->child_lat; - iolat->total_lat_avg = - div64_u64((iolat->total_lat_avg * iolat->total_lat_nr) + - stat.mean, iolat->total_lat_nr + 1); - - iolat->total_lat_nr++; + /* + * CALC_LOAD takes in a number stored in fixed point representation. + * Because we are using this for IO time in ns, the values stored + * are significantly larger than the FIXED_1 denominator (2048). + * Therefore, rounding errors in the calculation are negligible and + * can be ignored. + */ + exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1, + iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE); + CALC_LOAD(iolat->total_lat_avg, iolatency_exp_factors[exp_idx], + stat.mean); /* Everything is ok and we don't need to adjust the scale. */ if (stat.mean <= iolat->min_lat_nsec && @@ -700,8 +727,9 @@ static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val) u64 oldval = iolat->min_lat_nsec; iolat->min_lat_nsec = val; - iolat->cur_win_nsec = max_t(u64, val << 4, 100 * NSEC_PER_MSEC); - iolat->cur_win_nsec = min_t(u64, iolat->cur_win_nsec, NSEC_PER_SEC); + iolat->cur_win_nsec = max_t(u64, val << 4, BLKIOLATENCY_MIN_WIN_SIZE); + iolat->cur_win_nsec = min_t(u64, iolat->cur_win_nsec, + BLKIOLATENCY_MAX_WIN_SIZE); if (!oldval && val) atomic_inc(&blkiolat->enabled); @@ -811,13 +839,14 @@ static size_t iolatency_pd_stat(struct blkg_policy_data *pd, char *buf, { struct iolatency_grp *iolat = pd_to_lat(pd); unsigned long long avg_lat = div64_u64(iolat->total_lat_avg, NSEC_PER_USEC); + unsigned long long cur_win = div64_u64(iolat->cur_win_nsec, NSEC_PER_MSEC); if (iolat->rq_depth.max_depth == UINT_MAX) - return scnprintf(buf, size, " depth=max avg_lat=%llu", - avg_lat); + return scnprintf(buf, size, " depth=max avg_lat=%llu win=%llu", + avg_lat, cur_win); - return scnprintf(buf, size, " depth=%u avg_lat=%llu", - iolat->rq_depth.max_depth, avg_lat); + return scnprintf(buf, size, " depth=%u avg_lat=%llu win=%llu", + iolat->rq_depth.max_depth, avg_lat, cur_win); }