diff mbox series

[v5,2/2] block: cancel all throttled bios in del_gendisk()

Message ID 20211210083143.3181535-3-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series cancel all throttled bios in del_gendisk() | expand

Commit Message

Yu Kuai Dec. 10, 2021, 8:31 a.m. UTC
Throttled bios can't be issued after del_gendisk() is done, thus
it's better to cancel them immediately rather than waiting for
throttle is done.

For example, if user thread is throttled with low bps while it's
issuing large io, and the device is deleted. The user thread will
wait for a long time for io to return.

Noted this patch is mainly from revertion of commit 32e3374304c7
("blk-throttle: remove tg_drain_bios") and commit b77412372b68
("blk-throttle: remove blk_throtl_drain").

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 block/blk-throttle.h |  2 ++
 block/genhd.c        |  2 ++
 3 files changed, 74 insertions(+)

Comments

Michal Koutný Jan. 7, 2022, 3:05 p.m. UTC | #1
On Fri, Dec 10, 2021 at 04:31:43PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> +	 * queue_lock is held, rcu lock is not needed here.
> +	 */
> +	blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg)
> +		tg_drain_bios(&blkg_to_tg(blkg)->service_queue);

FTR, I acknowledge this can work due to RCU peculiarities, however, I
don't understand why is it preferred against more robust explict
rcu_read_lock().

(All in all, this isn't a deal breaker and I'm not confident evaluating
the rest of the patch.)

Michal
Tejun Heo Jan. 7, 2022, 9:18 p.m. UTC | #2
Hello,

On Fri, Jan 07, 2022 at 04:05:19PM +0100, Michal Koutný wrote:
> On Fri, Dec 10, 2021 at 04:31:43PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> > +	 * queue_lock is held, rcu lock is not needed here.
> > +	 */
> > +	blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg)
> > +		tg_drain_bios(&blkg_to_tg(blkg)->service_queue);
> 
> FTR, I acknowledge this can work due to RCU peculiarities, however, I
> don't understand why is it preferred against more robust explict
> rcu_read_lock().
> 
> (All in all, this isn't a deal breaker and I'm not confident evaluating
> the rest of the patch.)

Cc'ing Paul for RCU. Paul, this nit is around whether or not to use
rcu_read_lock() in an irq disabled section. I can see both sides of the
arguments - it's weird to put in an extra rcu_read_lock() when technically
unnecessary but it's also nice to have something explicit and structured to
mark parts which require RCU protection. Putting in a comment is okay but
consistency is difficult to achieve that way.

Maybe all these are unnecessary as lockdep would be able to catch them
anyway, or maybe we'd want something to explicitly mark RCU protected
sections. I don't know but whichever way, I think we need to establish a
convention.

Thanks.
Paul E. McKenney Jan. 7, 2022, 9:36 p.m. UTC | #3
On Fri, Jan 07, 2022 at 11:18:47AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 07, 2022 at 04:05:19PM +0100, Michal Koutný wrote:
> > On Fri, Dec 10, 2021 at 04:31:43PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> > > +	 * queue_lock is held, rcu lock is not needed here.
> > > +	 */
> > > +	blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg)
> > > +		tg_drain_bios(&blkg_to_tg(blkg)->service_queue);
> > 
> > FTR, I acknowledge this can work due to RCU peculiarities, however, I
> > don't understand why is it preferred against more robust explict
> > rcu_read_lock().
> > 
> > (All in all, this isn't a deal breaker and I'm not confident evaluating
> > the rest of the patch.)
> 
> Cc'ing Paul for RCU. Paul, this nit is around whether or not to use
> rcu_read_lock() in an irq disabled section. I can see both sides of the
> arguments - it's weird to put in an extra rcu_read_lock() when technically
> unnecessary but it's also nice to have something explicit and structured to
> mark parts which require RCU protection. Putting in a comment is okay but
> consistency is difficult to achieve that way.
> 
> Maybe all these are unnecessary as lockdep would be able to catch them
> anyway, or maybe we'd want something to explicitly mark RCU protected
> sections. I don't know but whichever way, I think we need to establish a
> convention.

The easiest thing to do is to use rcu_dereference_sched() instead of
rcu_dereference().  This will cause lockdep to insist on preemption
(for example, interrupts) being disabled.

Or is this a case where a function containing rcu_dereference() is invoked
with interrupts disabled from some call sites and under rcu_read_lock()
protection from other call sites?  In this case, it is usually best to
include that redundant rcu_read_lock() [1].

							Thanx, Paul

[1]	If you are a glutton for punishment, or if you would otherwise
	have to add a cubic goatskin of rcu_read_lock() calls, you
	could instead write this priceless gem in place of the calls to
	rcu_dereference() in that common function:

	p = rcu_dereference_check(ptr, rcu_read_lock_sched_held());

	This would cause lockdep to be happy with either rcu_read_lock()
	or preemption being disabled.

	This is more precise, and would be preferable in some cases,
	for example, if there were lots of hotpath callsites with
	interrupts disabled.  "Do we add 25 pairs of rcu_read_lock()
	and rcu_read_unlock()?	Or do we add just the one ugly
	rcu_dereference_check()?"
Yu Kuai Jan. 10, 2022, 1:28 a.m. UTC | #4
在 2022/01/08 5:36, Paul E. McKenney 写道:
> On Fri, Jan 07, 2022 at 11:18:47AM -1000, Tejun Heo wrote:
>> Hello,
>>
>> On Fri, Jan 07, 2022 at 04:05:19PM +0100, Michal Koutný wrote:
>>> On Fri, Dec 10, 2021 at 04:31:43PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
>>>> +	 * queue_lock is held, rcu lock is not needed here.
>>>> +	 */
>>>> +	blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg)
>>>> +		tg_drain_bios(&blkg_to_tg(blkg)->service_queue);
>>>
>>> FTR, I acknowledge this can work due to RCU peculiarities, however, I
>>> don't understand why is it preferred against more robust explict
>>> rcu_read_lock().
>>>
>>> (All in all, this isn't a deal breaker and I'm not confident evaluating
>>> the rest of the patch.)
>>
>> Cc'ing Paul for RCU. Paul, this nit is around whether or not to use
>> rcu_read_lock() in an irq disabled section. I can see both sides of the
>> arguments - it's weird to put in an extra rcu_read_lock() when technically
>> unnecessary but it's also nice to have something explicit and structured to
>> mark parts which require RCU protection. Putting in a comment is okay but
>> consistency is difficult to achieve that way.
>>
>> Maybe all these are unnecessary as lockdep would be able to catch them
>> anyway, or maybe we'd want something to explicitly mark RCU protected
>> sections. I don't know but whichever way, I think we need to establish a
>> convention.
> 
> The easiest thing to do is to use rcu_dereference_sched() instead of
> rcu_dereference().  This will cause lockdep to insist on preemption
> (for example, interrupts) being disabled.
> 
> Or is this a case where a function containing rcu_dereference() is invoked
> with interrupts disabled from some call sites and under rcu_read_lock()
> protection from other call sites?  In this case, it is usually best to
> include that redundant rcu_read_lock() [1].

Hi,

This is the later case, so I guess I'll include that redundant
rcu_read_lock().

Thanks,
Kuai
> 
> 							Thanx, Paul
> 
> [1]	If you are a glutton for punishment, or if you would otherwise
> 	have to add a cubic goatskin of rcu_read_lock() calls, you
> 	could instead write this priceless gem in place of the calls to
> 	rcu_dereference() in that common function:
> 
> 	p = rcu_dereference_check(ptr, rcu_read_lock_sched_held());
> 
> 	This would cause lockdep to be happy with either rcu_read_lock()
> 	or preemption being disabled.
> 
> 	This is more precise, and would be preferable in some cases,
> 	for example, if there were lots of hotpath callsites with
> 	interrupts disabled.  "Do we add 25 pairs of rcu_read_lock()
> 	and rcu_read_unlock()?	Or do we add just the one ugly
> 	rcu_dereference_check()?"
> .
>
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index fa3454041def..ba2122c650a1 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2257,6 +2257,76 @@  void blk_throtl_bio_endio(struct bio *bio)
 }
 #endif
 
+/*
+ * Dispatch all bios from all children tg's queued on @parent_sq.  On
+ * return, @parent_sq is guaranteed to not have any active children tg's
+ * and all bios from previously active tg's are on @parent_sq->bio_lists[].
+ */
+static void tg_drain_bios(struct throtl_service_queue *parent_sq)
+{
+	struct throtl_grp *tg;
+
+	while ((tg = throtl_rb_first(parent_sq))) {
+		struct throtl_service_queue *sq = &tg->service_queue;
+		struct bio *bio;
+
+		throtl_dequeue_tg(tg);
+
+		while ((bio = throtl_peek_queued(&sq->queued[READ])))
+			tg_dispatch_one_bio(tg, bio_data_dir(bio));
+		while ((bio = throtl_peek_queued(&sq->queued[WRITE])))
+			tg_dispatch_one_bio(tg, bio_data_dir(bio));
+	}
+}
+
+/**
+ * blk_throtl_cancel_bios - cancel throttled bios
+ * @q: request_queue to cancel throttled bios for
+ *
+ * This function is called to error all currently throttled bios on @q.
+ */
+void blk_throtl_cancel_bios(struct request_queue *q)
+{
+	struct throtl_data *td = q->td;
+	struct bio_list bio_list_on_stack;
+	struct blkcg_gq *blkg;
+	struct cgroup_subsys_state *pos_css;
+	struct bio *bio;
+	int rw;
+
+	bio_list_init(&bio_list_on_stack);
+
+	/*
+	 * hold queue_lock to prevent concurrent with dispatching
+	 * throttled bios by timer.
+	 */
+	spin_lock_irq(&q->queue_lock);
+
+	/*
+	 * Drain each tg while doing post-order walk on the blkg tree, so
+	 * that all bios are propagated to td->service_queue.  It'd be
+	 * better to walk service_queue tree directly but blkg walk is
+	 * easier.
+	 * queue_lock is held, rcu lock is not needed here.
+	 */
+	blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg)
+		tg_drain_bios(&blkg_to_tg(blkg)->service_queue);
+
+	/* finally, transfer bios from top-level tg's into the td */
+	tg_drain_bios(&td->service_queue);
+
+	/* all bios now should be in td->service_queue, cancel them */
+	for (rw = READ; rw <= WRITE; rw++)
+		while ((bio = throtl_pop_queued(&td->service_queue.queued[rw],
+						NULL)))
+			bio_list_add(&bio_list_on_stack, bio);
+
+	spin_unlock_irq(&q->queue_lock);
+	if (!bio_list_empty(&bio_list_on_stack))
+		while ((bio = bio_list_pop(&bio_list_on_stack)))
+			bio_io_error(bio);
+}
+
 int blk_throtl_init(struct request_queue *q)
 {
 	struct throtl_data *td;
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 175f03abd9e4..9d67d5139954 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -160,12 +160,14 @@  static inline void blk_throtl_exit(struct request_queue *q) { }
 static inline void blk_throtl_register_queue(struct request_queue *q) { }
 static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
 static inline bool blk_throtl_bio(struct bio *bio) { return false; }
+#define blk_throtl_cancel_bios(q)  do { } while (0)
 #else /* CONFIG_BLK_DEV_THROTTLING */
 int blk_throtl_init(struct request_queue *q);
 void blk_throtl_exit(struct request_queue *q);
 void blk_throtl_register_queue(struct request_queue *q);
 void blk_throtl_charge_bio_split(struct bio *bio);
 bool __blk_throtl_bio(struct bio *bio);
+void blk_throtl_cancel_bios(struct request_queue *q);
 static inline bool blk_throtl_bio(struct bio *bio)
 {
 	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
diff --git a/block/genhd.c b/block/genhd.c
index 3c139a1b6f04..5d6c8499b213 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -30,6 +30,7 @@ 
 #include "blk.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
+#include "blk-throttle.h"
 
 static struct kobject *block_depr;
 
@@ -621,6 +622,7 @@  void del_gendisk(struct gendisk *disk)
 
 	blk_mq_freeze_queue_wait(q);
 
+	blk_throtl_cancel_bios(q);
 	rq_qos_exit(q);
 	blk_sync_queue(q);
 	blk_flush_integrity();