From patchwork Tue Jun  5 13:29:48 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 10448235
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	E9D3D60467 for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  5 Jun 2018 13:30:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D579029446
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  5 Jun 2018 13:30:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D42372949C; Tue,  5 Jun 2018 13:30:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CB6329446
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue,  5 Jun 2018 13:30:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752055AbeFENaX (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Tue, 5 Jun 2018 09:30:23 -0400
Received: from mail-qt0-f195.google.com ([209.85.216.195]:43903 "EHLO
	mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752079AbeFENaL (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 5 Jun 2018 09:30:11 -0400
Received: by mail-qt0-f195.google.com with SMTP id y89-v6so2335844qtd.10
	for <linux-fsdevel@vger.kernel.org>;
	Tue, 05 Jun 2018 06:30:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=oGWHcF+I1FAb0T0RllTc5crNDw5hGehwQIUcOHp09k0=;
	b=yfT7pPdl9t1R3b0MuPzIYyt7JGRe+rVIWQUiyIkTrLwhcJRuBcg22vWB6+TQFNM6JC
	abNzzKbxHYpE3PbjE54Brs9gGhC1HCQgfQoXKTUdqh87fqy7Fm3X+1XMKzZIyenGTE1C
	EO73/4tCeLBJHRKUG17J8ENpdJOVJWgA7AKQXsJ/1vn4fcxdaAR9nwugin8Aa/oknWK4
	XNWz9LDGcpUeKhWA+X2JsyMwyKVhBeNxW9qInNe4EpYcoA9H8ofgXBim8CrkWsTnojal
	jM4nExFfILaFYj/65UCvTsyj/d/eERKju1jkzyyZBCiwY/mJK6/iw5NUsZMj08Ed8ZZY
	84Cg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=oGWHcF+I1FAb0T0RllTc5crNDw5hGehwQIUcOHp09k0=;
	b=tAr60h04qINlQpwNxZw1qHiaXHC8XyKRIi9qv7KYK9FMoRJfPFH+PGrDvyx1lsc8aE
	2A2l+eCucwH256hj8Fhgd9abRiHZd5fapUy5rzOlorOpaQMlGtAyq87MrIRvAbekbqF/
	V+0PP5nhHQ9uIBsVRqaIjpc8NeTI43+YCTTiRC+RZweeK0/IXkqjAEeCiOs1jM9DQuZa
	ZMaDivC+2WoRmbO7kWb6+X4BUmsPM+i7MhfiQthp0aWaslQ7cAYephcM4qk4dGFGe3/e
	6q8VdUHSYtRLJrzP+hd2Fo/9OqKpzFHbsf3uwEpxBl5VC0bFh0JNPN4fGDRt1iu9dorb
	ze+A==
X-Gm-Message-State: APt69E1fqqHjMmTnJ3rxD8HX9qPoAlqnbOS0YcGXsYzGUQPnn5uLMh1e
	trSxim0G9Yg2eLDEOUYl2mxQ+Q==
X-Google-Smtp-Source: 
 ADUXVKKj1AjxfHxBIHrVhOw9q7bRsS6S8+t9j4V7jTNVt5hIntf3u47dV3FBZbEQ1Fu6pSZ8n2stIQ==
X-Received: by 2002:a0c:9ccb:: with SMTP id
	j11-v6mr24201217qvf.58.1528205410175;
	Tue, 05 Jun 2018 06:30:10 -0700 (PDT)
Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id
	a1-v6sm1463295qte.73.2018.06.05.06.30.09
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Tue, 05 Jun 2018 06:30:09 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, tj@kernel.org,
	linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Subject: [PATCH 13/13] Documentation: add a doc for blk-iolatency
Date: Tue,  5 Jun 2018 09:29:48 -0400
Message-Id: <20180605132948.1664-14-josef@toxicpanda.com>
X-Mailer: git-send-email 2.14.3
In-Reply-To: <20180605132948.1664-1-josef@toxicpanda.com>
References: <20180605132948.1664-1-josef@toxicpanda.com>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Josef Bacik <jbacik@fb.com>

A basic documentation to describe the interface, statistics, and
behavior of io.latency.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 Documentation/cgroup-v2.txt | 79 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeaed9f7a..f6684ec99720 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -51,6 +51,9 @@ v1 is available under Documentation/cgroup-v1/.
      5-3. IO
        5-3-1. IO Interface Files
        5-3-2. Writeback
+       5-3-3. IO Latency
+         5-3-3-1. How IO Latency Throttling Works
+         5-3-3-2. IO Latency Interface Files
      5-4. PID
        5-4-1. PID Interface Files
      5-5. Device
@@ -1395,6 +1398,82 @@ writeback as follows.
 	vm.dirty[_background]_ratio.
 
 
+IO Latency
+~~~~~~~~~~
+
+This is a cgroup v2 controller for IO workload protection.  You provide a group
+with a latency target, and if the average latency exceeds that target the
+controller will throttle any peers that have a lower latency target than the
+protected workload.
+
+The limits are only applied at the peer level in the hierarchy.  This means that
+in the diagram below, only groups A, B, and C will influence each other, and
+groups D and F will influence each other.  Group G will influence nobody.
+
+			[root]
+		/	   |		\
+		A	   B		C
+	       /  \        |
+	      D    F	   G
+
+
+So the ideal way to configure this is to set io.latency in groups A, B, and C.
+Generally you do not want to set a value lower than the latency your device
+supports.  Experiment to find the value that works best for your workload, start
+at higher than the expected latency for your device and watch the total_lat_avg
+value in io.stat for your workload group to get an idea of the latency you see
+during normal operation.  Use this value as a basis for your real setting,
+setting at 10-15% higher than the value in io.stat.  Experimentation is key here
+because total_lat_avg is a running total, so is the "statistics" portion of
+"lies, damned lies, and statistics."
+
+How IO Latency Throttling Works
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+io.latency is work conserving, so as long as everybody is meeting their latency
+target the controller doesn't do anything.  Once a group starts missing it's
+target it begins throttling any peer group that has a higher target than itself.
+This throttling takes 2 forms:
+
+- Queue depth throttling.  This is the number of outstanding IO's a group is
+  allowed to have.  We will clamp down relatively quickly, starting at no limit
+  and going all the way down to 1 IO at a time.
+
+- Artificial delay induction.  There are certain types of IO that cannot be
+  throttled without possibly adversely affecting higher priority groups.  This
+  includes swapping and metadata IO.  These types of IO are allowed to occur
+  normally, however they are "charged" to the originating group.  If the
+  originating group is being throttled you will see the use_delay and delay
+  fields in io.stat increase.  The delay value is how many microseconds that are
+  being added to any process that runs in this group.  Because this number can
+  grow quite large if there is a lot of swapping or metadata IO occurring we
+  limit the individual delay events to 1 second at a time.
+
+Once the victimized group starts meeting it's latency target again it will start
+unthrottling any peer groups that were throttled previously.  If the victimized
+group simply stops doing IO the global counter will unthrottle appropriately.
+
+IO Latency Interface Files
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+  io.latency
+	This takes a similar format as the other controllers.
+
+		"MAJOR:MINOR target=<target time in microseconds"
+
+  io.stat
+	If the controller is enabled you will see extra stats in io.stat in
+	addition to the normal ones.
+
+	  depth
+		This is the current queue depth for the group.
+
+	  avg_lat
+		The running average IO latency for this group in microseconds.
+		Running average is generally flawed, but will give an
+		administrator a general idea of the overall latency they can
+		expect for their workload on the given disk.
+
 PID
 ---