diff mbox series

[RFC,v2,5/6] blk-throttle: support to destroy throtl_data when blk-throttle is disabled

Message ID 20240406080059.2248314-6-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series blk-throttle: support enable and disable during runtime | expand

Commit Message

Yu Kuai April 6, 2024, 8 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Currently once blk-throttle is enabled, it can't be destroyed until disk
removal, even it's disabled.

Also prepare to support building it as kernel module.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 65 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 22 deletions(-)

Comments

Tejun Heo April 12, 2024, 6:05 p.m. UTC | #1
Hello,

On Sat, Apr 06, 2024 at 04:00:58PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Currently once blk-throttle is enabled, it can't be destroyed until disk
> removal, even it's disabled.
> 
> Also prepare to support building it as kernel module.

The benefit of doing this whenever the ruleset becomes empty seems marginal.
This isn't necessary to allow unloading blk-throttle and
blkg_conf_exit_blkg() is also necessary because of this, right?

Thanks.
Yu Kuai April 13, 2024, 2:06 a.m. UTC | #2
Hi,

在 2024/04/13 2:05, Tejun Heo 写道:
> Hello,
> 
> On Sat, Apr 06, 2024 at 04:00:58PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently once blk-throttle is enabled, it can't be destroyed until disk
>> removal, even it's disabled.
>>
>> Also prepare to support building it as kernel module.
> 
> The benefit of doing this whenever the ruleset becomes empty seems marginal.
> This isn't necessary to allow unloading blk-throttle and
> blkg_conf_exit_blkg() is also necessary because of this, right?

Yes, this is why blkg_conf_exit_blkg() is necessary.

I think that we need find an appropriate time to unload blk-throttle
other than deleting the gendisk. I also think of adding a new user input
like "8:0 free" to do this. These are the solutions that I can think of
for now.

Thanks,
Kuai

> 
> Thanks.
>
Tejun Heo April 16, 2024, 5:09 p.m. UTC | #3
Hello,

On Sat, Apr 13, 2024 at 10:06:00AM +0800, Yu Kuai wrote:
> I think that we need find an appropriate time to unload blk-throttle
> other than deleting the gendisk. I also think of adding a new user input
> like "8:0 free" to do this. These are the solutions that I can think of
> for now.

Probably a better interface is for unloading to force blk-throtl to be
deactivated rather than asking the user to nuke all configs.

Thanks.
Yu Kuai April 17, 2024, 1:13 a.m. UTC | #4
Hi,

在 2024/04/17 1:09, Tejun Heo 写道:
> Hello,
> 
> On Sat, Apr 13, 2024 at 10:06:00AM +0800, Yu Kuai wrote:
>> I think that we need find an appropriate time to unload blk-throttle
>> other than deleting the gendisk. I also think of adding a new user input
>> like "8:0 free" to do this. These are the solutions that I can think of
>> for now.
> 
> Probably a better interface is for unloading to force blk-throtl to be
> deactivated rather than asking the user to nuke all configs.

I was thinking that rmmod in this case should return busy, for example,
if bfq is currently used for some disk, rmmod bfq will return busy.

Is there any example that unloading will deactivate resources that users
are still using?

Thanks,
Kuai

> 
> Thanks.
>
Tejun Heo April 17, 2024, 1:22 a.m. UTC | #5
Hello,

On Wed, Apr 17, 2024 at 09:13:34AM +0800, Yu Kuai wrote:
> > Probably a better interface is for unloading to force blk-throtl to be
> > deactivated rather than asking the user to nuke all configs.
> 
> I was thinking that rmmod in this case should return busy, for example,
> if bfq is currently used for some disk, rmmod bfq will return busy.
> 
> Is there any example that unloading will deactivate resources that users
> are still using?

Hmm... yeah, I'm not sure. Pinning the module while in use is definitely
more conventional, so let's stick with that. It's usually achieved by
inc'ing the module's ref on each usage, so here, the module refs would be
counting the number of active rules, I guess.

I'm not sure about modularization tho mostly because we've historically had
a lot of lifetime issues around block and blkcg data structures and the
supposed gain here is rather minimal. We only have a handful of these
policies and they aren't that big.

If hot path overhead when not being used is concern, lazy init solves most
of it, no?

Thanks.
Yu Kuai April 17, 2024, 1:39 a.m. UTC | #6
Hi,

在 2024/04/17 9:22, Tejun Heo 写道:
> Hello,
> 
> On Wed, Apr 17, 2024 at 09:13:34AM +0800, Yu Kuai wrote:
>>> Probably a better interface is for unloading to force blk-throtl to be
>>> deactivated rather than asking the user to nuke all configs.
>>
>> I was thinking that rmmod in this case should return busy, for example,
>> if bfq is currently used for some disk, rmmod bfq will return busy.
>>
>> Is there any example that unloading will deactivate resources that users
>> are still using?
> 
> Hmm... yeah, I'm not sure. Pinning the module while in use is definitely
> more conventional, so let's stick with that. It's usually achieved by
> inc'ing the module's ref on each usage, so here, the module refs would be
> counting the number of active rules, I guess.

Yes, aggred.
> 
> I'm not sure about modularization tho mostly because we've historically had
> a lot of lifetime issues around block and blkcg data structures and the
> supposed gain here is rather minimal. We only have a handful of these
> policies and they aren't that big.
> 
> If hot path overhead when not being used is concern, lazy init solves most
> of it, no?

For performance, yes. Another point is that we can reduce kernel size
this way, without losing support for blk-cgroup policies.

Yes, it's just blk-throttle is the most pain in the ass becasue it
exposed lots of implementations to block layer. Other rq_qos based
policy should be much easier.

I guess I'll do lazy init first, and then modularization for rq_qos,
and leave blk-throtl there for now. Perhaps add a new throtl model in
iocost can replace blk-throtl in the future.

BTW, currently during test of iocost, I found that iocost can already
achieve that, for example, by following configure:

echo "$dev enable=1 min=100 max=100" > qos
echo "$dev wbps=4096 wseqiops=1 wrandiops=1" > model

In the test, I found that wbps and iops is actually limited to the
set value.

Thanks,
Kuai

> 
> Thanks.
>
Tejun Heo April 18, 2024, 2:05 a.m. UTC | #7
Hello,

On Wed, Apr 17, 2024 at 09:39:43AM +0800, Yu Kuai wrote:
...
> I guess I'll do lazy init first, and then modularization for rq_qos,
> and leave blk-throtl there for now. Perhaps add a new throtl model in
> iocost can replace blk-throtl in the future.

That sounds like a plan.

> BTW, currently during test of iocost, I found that iocost can already
> achieve that, for example, by following configure:
> 
> echo "$dev enable=1 min=100 max=100" > qos
> echo "$dev wbps=4096 wseqiops=1 wrandiops=1" > model
> 
> In the test, I found that wbps and iops is actually limited to the
> set value.

Yeah, it shouldn't be too difficult to add .max support to iocost so that
you can say something like "this cgroup subtree can't use more than 60% of
available capacity". That'd be really cool to have.

Thanks.
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index b371442131fe..5c16be07a594 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -28,6 +28,7 @@ 
 
 /* A workqueue to queue throttle related work */
 static struct workqueue_struct *kthrotld_workqueue;
+static DECLARE_WAIT_QUEUE_HEAD(destroy_wait);
 
 #define rb_entry_tg(node)	rb_entry((node), struct throtl_grp, rb_node)
 
@@ -906,6 +907,11 @@  static void start_parent_slice_with_credit(struct throtl_grp *child_tg,
 
 }
 
+static bool td_has_io(struct throtl_data *td)
+{
+	return td->nr_queued[READ] + td->nr_queued[WRITE] != 0;
+}
+
 static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 {
 	struct throtl_service_queue *sq = &tg->service_queue;
@@ -941,6 +947,8 @@  static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 				     &parent_sq->queued[rw]);
 		BUG_ON(tg->td->nr_queued[rw] <= 0);
 		tg->td->nr_queued[rw]--;
+		if (!td_has_io(tg->td))
+			wake_up(&destroy_wait);
 	}
 
 	throtl_trim_slice(tg, rw);
@@ -1268,6 +1276,31 @@  static int blk_throtl_init(struct gendisk *disk)
 	return ret;
 }
 
+void blk_throtl_exit(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+
+	if (!q->td)
+		return;
+
+	del_timer_sync(&q->td->service_queue.pending_timer);
+	cancel_work_sync(&q->td->dispatch_work);
+	blkcg_deactivate_policy(disk, &blkcg_policy_throtl);
+	kfree(q->td);
+	q->td = NULL;
+}
+
+static void blk_throtl_destroy(struct gendisk *disk)
+{
+	struct throtl_data *td = disk->queue->td;
+
+	/*
+	 * There are no rules, all throttled BIO should be dispatched
+	 * immediately.
+	 */
+	wait_event(destroy_wait, !td_has_io(td));
+	blk_throtl_exit(disk);
+}
 
 static ssize_t tg_set_conf(struct kernfs_open_file *of,
 			   char *buf, size_t nbytes, loff_t off, bool is_u64)
@@ -1308,7 +1341,11 @@  static ssize_t tg_set_conf(struct kernfs_open_file *of,
 	else
 		*(unsigned int *)((void *)tg + of_cft(of)->private) = v;
 
-	tg_conf_updated(tg, false);
+	blkg_conf_exit_blkg(&ctx);
+
+	if (!tg_conf_updated(tg, false))
+		blk_throtl_destroy(ctx.bdev->bd_disk);
+
 	ret = 0;
 out_finish:
 	blkg_conf_exit(&ctx);
@@ -1516,7 +1553,11 @@  static ssize_t tg_set_limit(struct kernfs_open_file *of,
 	tg->iops[READ] = v[2];
 	tg->iops[WRITE] = v[3];
 
-	tg_conf_updated(tg, false);
+	blkg_conf_exit_blkg(&ctx);
+
+	if (!tg_conf_updated(tg, false))
+		blk_throtl_destroy(ctx.bdev->bd_disk);
+
 	ret = 0;
 out_finish:
 	blkg_conf_exit(&ctx);
@@ -1533,13 +1574,6 @@  static struct cftype throtl_files[] = {
 	{ }	/* terminate */
 };
 
-static void throtl_shutdown_wq(struct request_queue *q)
-{
-	struct throtl_data *td = q->td;
-
-	cancel_work_sync(&td->dispatch_work);
-}
-
 struct blkcg_policy blkcg_policy_throtl = {
 	.dfl_cftypes		= throtl_files,
 	.legacy_cftypes		= throtl_legacy_files,
@@ -1688,19 +1722,6 @@  bool __blk_throtl_bio(struct bio *bio)
 	return throttled;
 }
 
-void blk_throtl_exit(struct gendisk *disk)
-{
-	struct request_queue *q = disk->queue;
-
-	if (!q->td)
-		return;
-
-	del_timer_sync(&q->td->service_queue.pending_timer);
-	throtl_shutdown_wq(q);
-	blkcg_deactivate_policy(disk, &blkcg_policy_throtl);
-	kfree(q->td);
-}
-
 static int __init throtl_init(void)
 {
 	kthrotld_workqueue = alloc_workqueue("kthrotld", WQ_MEM_RECLAIM, 0);