diff mbox series

[7/7] blk-mq: Adjust hybrid poll sleep time

Message ID 90ea71d810084eec70fb1632587b450b3037ce85.1556609582.git.asml.silence@gmail.com (mailing list archive)
State New, archived
Headers show
Series Adjust hybrid polling sleep time | expand

Commit Message

Pavel Begunkov April 30, 2019, 7:34 a.m. UTC
From: Pavel Begunkov <asml.silence@gmail.com>

Sleep for (mean / 2) in the adaptive polling is often too pessimistic,
use a variation of the 3-sigma rule (mean - 4 * lmd) and tune it in
runtime using percentage of missed (i.e. overslept) requests:
1. if more than ~3% of requests are missed, then fallback to (mean / 2)
2. if more than ~0.4% is missed, then scale down

Pitfalls:
1. any missed request increases the mean, synergistically increasing
mean and sleep time, so, scale down fast in the case
2. even if the sleep time is predicted well, sleep loop could greatly
oversleep by itself. Then try to detect it and skip the miss accounting.

Tested on an NVMe SSD:
{4K,8K} read-only workloads give similar latency distribution (up to
7 nines), and decreases CPU load twice (50% -> 25%). New method even
outperform the old one a bit (in terms of throughput and latencies),
presumably, because it alleviates the 2nd pitfall.
For write-only workload it falls back to (mean / 2).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-mq.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ec7cde754c2f..efa44a617bea 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3338,10 +3338,21 @@  static void blk_mq_poll_stats_start(struct request_queue *q)
 	blk_stat_activate_msecs(q->poll_cb, 100);
 }
 
+/*
+ * Thresholds are ilog2(nr_requests / nr_misses)
+ * To calculate tolerated miss ratio from it, use
+ * f(x) ~= 2 ^ -(x + 1)
+ *
+ * fallback ~ 3.1%
+ * throttle ~ 0.4%
+ */
+#define BLK_POLL_FALLBACK_THRESHOLD	4
+#define BLK_POLL_THROTTLE_THRESHOLD	7
+
 static void blk_mq_update_poll_info(struct poll_info *pi,
 				    struct blk_rq_stat *stat)
 {
-	u64 sleep_ns;
+	u64 half_mean, indent, sleep_ns;
 	u32 nr_misses, nr_samples;
 
 	nr_samples = stat->nr_samples;
@@ -3349,14 +3360,33 @@  static void blk_mq_update_poll_info(struct poll_info *pi,
 	if (nr_misses > nr_samples)
 		nr_misses = nr_samples;
 
-	if (!nr_samples)
+	half_mean = (stat->mean + 1) / 2;
+	indent = stat->lmd * 4;
+
+	if (!stat->nr_samples) {
 		sleep_ns = 0;
-	else
-		sleep_ns = (stat->mean + 1) / 2;
+	} else if (!stat->lmd || stat->mean <= indent) {
+		sleep_ns = half_mean;
+	} else {
+		int ratio = INT_MAX;
 
-	/*
-	 * Use miss ratio here to adjust sleep time
-	 */
+		sleep_ns = stat->mean - indent;
+
+		/*
+		 * If a completion is overslept, the observable time will
+		 * be greater than the actual, so increasing mean. It
+		 * also increases sleep time estimation, synergistically
+		 * backfiring on mean. Need to scale down / fallback early.
+		 */
+		if (nr_misses)
+			ratio = ilog2(nr_samples / nr_misses);
+		if (ratio <= BLK_POLL_FALLBACK_THRESHOLD)
+			sleep_ns = half_mean;
+		else if (ratio <= BLK_POLL_THROTTLE_THRESHOLD)
+			sleep_ns -= sleep_ns / 4;
+
+		sleep_ns = max(sleep_ns, half_mean);
+	}
 
 	pi->stat = *stat;
 	pi->sleep_ns = sleep_ns;