diff mbox

block-throttle: avoid double charge

Message ID aac85300238aec9e310b5092260a6ec6e6f02e04.1507918159.git.shli@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shaohua Li Oct. 13, 2017, 6:10 p.m. UTC
If a bio is throttled and splitted after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio is charged multiple times. If the cgroup has an IO limit, the double
charge will significantly harm the performance. The bio split becomes
quite common after arbitrary bio size change.

To fix this, we record the disk info a bio is throttled against. If a
bio is throttled and issued, we record the info. We copy the info to
cloned bio, so cloned bio (including splitted bio) will not be throttled
again. Stacked block device driver will change cloned bio's bi_disk, if
a bio's bi_disk is changed, the recorded throttle disk info is invalid,
we should throttle again. That's the reason why we can't use a single
bit to indicate if a cloned bio should be throttled.

We only record gendisk here, if a cloned bio is remapped to other disk,
it's very unlikely only partno is changed.

Some sort of this patch probably should go into stable since v4.2

Cc: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 block/bio.c               |  3 +++
 block/blk-throttle.c      | 15 ++++++++++++---
 include/linux/blk_types.h |  4 ++++
 3 files changed, 19 insertions(+), 3 deletions(-)

Comments

Tejun Heo Nov. 13, 2017, 8:03 p.m. UTC | #1
Hello, Shaohua.

On Fri, Oct 13, 2017 at 11:10:29AM -0700, Shaohua Li wrote:
> If a bio is throttled and splitted after throttling, the bio could be
> resubmited and enters the throttling again. This will cause part of the
> bio is charged multiple times. If the cgroup has an IO limit, the double
> charge will significantly harm the performance. The bio split becomes
> quite common after arbitrary bio size change.

Missed the patch previously.  Sorry about that.

> Some sort of this patch probably should go into stable since v4.2

Seriously.

> @@ -2130,9 +2130,15 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
>  
>  	WARN_ON_ONCE(!rcu_read_lock_held());
>  
> -	/* see throtl_charge_bio() */
> -	if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw])
> +	/*
> +	 * see throtl_charge_bio() for BIO_THROTTLED. If a bio is throttled
> +	 * against a disk but remapped to other disk, we should throttle it
> +	 * again
> +	 */
> +	if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw] ||
> +	    (bio->bi_throttled_disk && bio->bi_throttled_disk == bio->bi_disk))
>  		goto out;
> +	bio->bi_throttled_disk = NULL;

So, one question I have is whether we need both BIO_THROTTLED and
bi_throttled_disk.  Can't we replace BIO_THROTTLED w/
bi_throttled_disk?

Thanks.
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 8338304..dce8314 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -597,6 +597,9 @@  void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	 * so we don't set nor calculate new physical/hw segment counts here
 	 */
 	bio->bi_disk = bio_src->bi_disk;
+#ifdef CONFIG_BLK_DEV_THROTTLING
+	bio->bi_throttled_disk = bio_src->bi_throttled_disk;
+#endif
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_write_hint = bio_src->bi_write_hint;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ee6d7b0..155549a 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2130,9 +2130,15 @@  bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	/* see throtl_charge_bio() */
-	if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw])
+	/*
+	 * see throtl_charge_bio() for BIO_THROTTLED. If a bio is throttled
+	 * against a disk but remapped to other disk, we should throttle it
+	 * again
+	 */
+	if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw] ||
+	    (bio->bi_throttled_disk && bio->bi_throttled_disk == bio->bi_disk))
 		goto out;
+	bio->bi_throttled_disk = NULL;
 
 	spin_lock_irq(q->queue_lock);
 
@@ -2227,8 +2233,11 @@  bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 	 * don't want bios to leave with the flag set.  Clear the flag if
 	 * being issued.
 	 */
-	if (!throttled)
+	if (!throttled) {
 		bio_clear_flag(bio, BIO_THROTTLED);
+		/* if the bio is cloned, we don't throttle it again */
+		bio->bi_throttled_disk = bio->bi_disk;
+	}
 
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	if (throttled || !td->track_bio_latency)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 3385c89..2507566 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -89,6 +89,10 @@  struct bio {
 	void			*bi_cg_private;
 	struct blk_issue_stat	bi_issue_stat;
 #endif
+#ifdef CONFIG_BLK_DEV_THROTTLING
+	/* record which disk the bio is throttled against */
+	struct gendisk		*bi_throttled_disk;
+#endif
 #endif
 	union {
 #if defined(CONFIG_BLK_DEV_INTEGRITY)