diff mbox series

[block-5.13] blk-iocost: fix weight updates of inner active iocgs

Message ID YJsxnLZV1MnBcqjj@slm.duckdns.org (mailing list archive)
State New, archived
Headers show
Series [block-5.13] blk-iocost: fix weight updates of inner active iocgs | expand

Commit Message

Tejun Heo May 12, 2021, 1:38 a.m. UTC
When the weight of an active iocg is updated, weight_updated() is called
which in turn calls __propagate_weights() to update the active and inuse
weights so that the effective hierarchical weights are update accordingly.

The current implementation is incorrect for inner active nodes. For an
active leaf iocg, inuse can be any value between 1 and active and the
difference represents how much the iocg is donating. When weight is updated,
as long as inuse is clamped between 1 and the new weight, we're alright and
this is what __propagate_weights() currently implements.

However, that's not how an active inner node's inuse is set. An inner node's
inuse is solely determined by the ratio between the sums of inuse's and
active's of its children - ie. they're results of propagating the leaves'
active and inuse weights upwards. __propagate_weights() incorrectly applies
the same clamping as for a leaf when an active inner node's weight is
updated. Consider a hierarchy which looks like the following with saturating
workloads in AA and BB.

     R
   /   \
  A     B
  |     |
 AA     BB

1. For both A and B, active=100, inuse=100, hwa=0.5, hwi=0.5.

2. echo 200 > A/io.weight

3. __propagate_weights() update A's active to 200 and leave inuse at 100 as
   it's already between 1 and the new active, making A:active=200,
   A:inuse=100. As R's active_sum is updated along with A's active,
   A:hwa=2/3, B:hwa=1/3. However, because the inuses didn't change, the
   hwi's remain unchanged at 0.5.

4. The weight of A is now twice that of B but AA and BB still have the same
   hwi of 0.5 and thus are doing the same amount of IOs.

Fix it by making __propgate_weights() always calculate the inuse of an
active inner iocg based on the ratio of child_inuse_sum to child_active_sum.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
---
 block/blk-iocost.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Jens Axboe May 12, 2021, 2:50 a.m. UTC | #1
On 5/11/21 7:38 PM, Tejun Heo wrote:
> When the weight of an active iocg is updated, weight_updated() is called
> which in turn calls __propagate_weights() to update the active and inuse
> weights so that the effective hierarchical weights are update accordingly.
> 
> The current implementation is incorrect for inner active nodes. For an
> active leaf iocg, inuse can be any value between 1 and active and the
> difference represents how much the iocg is donating. When weight is updated,
> as long as inuse is clamped between 1 and the new weight, we're alright and
> this is what __propagate_weights() currently implements.
> 
> However, that's not how an active inner node's inuse is set. An inner node's
> inuse is solely determined by the ratio between the sums of inuse's and
> active's of its children - ie. they're results of propagating the leaves'
> active and inuse weights upwards. __propagate_weights() incorrectly applies
> the same clamping as for a leaf when an active inner node's weight is
> updated. Consider a hierarchy which looks like the following with saturating
> workloads in AA and BB.
> 
>      R
>    /   \
>   A     B
>   |     |
>  AA     BB
> 
> 1. For both A and B, active=100, inuse=100, hwa=0.5, hwi=0.5.
> 
> 2. echo 200 > A/io.weight
> 
> 3. __propagate_weights() update A's active to 200 and leave inuse at 100 as
>    it's already between 1 and the new active, making A:active=200,
>    A:inuse=100. As R's active_sum is updated along with A's active,
>    A:hwa=2/3, B:hwa=1/3. However, because the inuses didn't change, the
>    hwi's remain unchanged at 0.5.
> 
> 4. The weight of A is now twice that of B but AA and BB still have the same
>    hwi of 0.5 and thus are doing the same amount of IOs.
> 
> Fix it by making __propgate_weights() always calculate the inuse of an
> active inner iocg based on the ratio of child_inuse_sum to child_active_sum.

Applied, thanks.
diff mbox series

Patch

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index e0c4baa018578..c2d6bc88d3f15 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1069,7 +1069,17 @@  static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse,
 
 	lockdep_assert_held(&ioc->lock);
 
-	inuse = clamp_t(u32, inuse, 1, active);
+	/*
+	 * For an active leaf node, its inuse shouldn't be zero or exceed
+	 * @active. An active internal node's inuse is solely determined by the
+	 * inuse to active ratio of its children regardless of @inuse.
+	 */
+	if (list_empty(&iocg->active_list) && iocg->child_active_sum) {
+		inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum,
+					   iocg->child_active_sum);
+	} else {
+		inuse = clamp_t(u32, inuse, 1, active);
+	}
 
 	iocg->last_inuse = iocg->inuse;
 	if (save)
@@ -1086,7 +1096,7 @@  static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse,
 		/* update the level sums */
 		parent->child_active_sum += (s32)(active - child->active);
 		parent->child_inuse_sum += (s32)(inuse - child->inuse);
-		/* apply the udpates */
+		/* apply the updates */
 		child->active = active;
 		child->inuse = inuse;