diff mbox series

[1/2] block: add resubmit_bio_noacct()

Message ID 20220110075141.389532-2-ming.lei@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series block/dm: add resubmit_bio_noacct for fixing iops throttling | expand

Commit Message

Ming Lei Jan. 10, 2022, 7:51 a.m. UTC
Add block layer API of resubmit_bio_noacct() for handling blk-throttle
iops limit correctly. Typical use case is that bio split, and it isn't
good to export blk_throtl_charge_bio_split() for drivers, so add new API
for serving such purpose.

Cc: lining <lining2020x@163.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 12 ++++++++++++
 block/blk-merge.c      |  4 +---
 include/linux/blkdev.h |  1 +
 3 files changed, 14 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Jan. 10, 2022, 5:35 p.m. UTC | #1
On Mon, Jan 10, 2022 at 03:51:40PM +0800, Ming Lei wrote:
> Add block layer API of resubmit_bio_noacct() for handling blk-throttle
> iops limit correctly. Typical use case is that bio split, and it isn't
> good to export blk_throtl_charge_bio_split() for drivers, so add new API
> for serving such purpose.

Umm, submit_bio_noacct is meant exactly for this case of resubmitting
a bio.  We should not need another API for that.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 10, 2022, 7:03 p.m. UTC | #2
On Mon, Jan 10 2022 at 12:35P -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, Jan 10, 2022 at 03:51:40PM +0800, Ming Lei wrote:
> > Add block layer API of resubmit_bio_noacct() for handling blk-throttle
> > iops limit correctly. Typical use case is that bio split, and it isn't
> > good to export blk_throtl_charge_bio_split() for drivers, so add new API
> > for serving such purpose.
> 
> Umm, submit_bio_noacct is meant exactly for this case of resubmitting
> a bio.  We should not need another API for that.
> 

Ming is lifting code out of __blk_queue_split() for reuse (by DM in
this instance, because it has its own bio_split+bio_chain).

Are you saying submit_bio_noacct() should be made to call
blk_throtl_charge_bio_split() and blk_throtl_charge_bio_split() simply
return if not a split bio? (not sure bio has enough context to know,
other than looking at some side-effect change from bio_chain)

But Ming: your __blk_queue_split() change seems wrong.
Prior to your patch __blk_queue_split() did:

bio_chain(split, *bio);
submit_bio_noacct(*bio);
*bio = split;
blk_throtl_charge_bio_split(*bio);

After your patch (effectively):

bio_chain(split, *bio);
submit_bio_noacct(*bio);
blk_throtl_charge_bio_split(bio);
*bio = split;

Maybe that was intended? (or maybe it doesn't matter because bio_split
copies fields with bio_clone_fast())?  Regardless, it is subtle.

Should blk_throtl_charge_bio_split() just be pushed down to
bio_split()?

In general, such narrow hacks for how to properly resubmit split bios
are asking for further trouble.  As is, I'm having to triage new
reports of bio-based accounting issues (which has called into question
my hack/fix commit a1e1cb72d9649 ("dm: fix redundant IO accounting for
bios that need splitting") that papered over this bigger issue of
needing proper split IO accounting, so likely needs to be revisited).

We also have the much bigger issue of IO poll support (or
lack-there-of) for split bios.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Ming Lei Jan. 11, 2022, 1:56 a.m. UTC | #3
On Mon, Jan 10, 2022 at 09:35:22AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 03:51:40PM +0800, Ming Lei wrote:
> > Add block layer API of resubmit_bio_noacct() for handling blk-throttle
> > iops limit correctly. Typical use case is that bio split, and it isn't
> > good to export blk_throtl_charge_bio_split() for drivers, so add new API
> > for serving such purpose.
> 
> Umm, submit_bio_noacct is meant exactly for this case of resubmitting
> a bio.  We should not need another API for that.

Follows the background of the issue first:

1) IOPS limit throttle needs to cover split bio since iostat accounts
split bio actually, and user space setup iops limit throttle by the
feedback from iostat/diskstat;

2) block throttle is block layer internal stuff, and we shouldn't expose
blk_throtl_charge_bio_split() to driver.

Maybe rename the new API as submit_split_bio_noacct(), but we can't
reuse submit_bio_noacct() simply, otherwise blk_throtl_charge_bio_split()
needs to be exported.

Another ides is to clearing BIO_THROTTLED before calling submit_bio_noacct(),
meantime blk-throttle code needs to change to avoid double accounting of bio
bytes, so the caller of submit_bio_noacct() still needs some change.
This way can get smooth IOPS throttle, but needs to call __blk_throtl_bio
for split bio one more time.

Or other idea for this bio split vs. iops limit issue?


Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Ming Lei Jan. 11, 2022, 2:26 a.m. UTC | #4
On Mon, Jan 10, 2022 at 02:03:16PM -0500, Mike Snitzer wrote:
> On Mon, Jan 10 2022 at 12:35P -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Mon, Jan 10, 2022 at 03:51:40PM +0800, Ming Lei wrote:
> > > Add block layer API of resubmit_bio_noacct() for handling blk-throttle
> > > iops limit correctly. Typical use case is that bio split, and it isn't
> > > good to export blk_throtl_charge_bio_split() for drivers, so add new API
> > > for serving such purpose.
> > 
> > Umm, submit_bio_noacct is meant exactly for this case of resubmitting
> > a bio.  We should not need another API for that.
> > 
> 
> Ming is lifting code out of __blk_queue_split() for reuse (by DM in
> this instance, because it has its own bio_split+bio_chain).
> 
> Are you saying submit_bio_noacct() should be made to call
> blk_throtl_charge_bio_split() and blk_throtl_charge_bio_split() simply
> return if not a split bio? (not sure bio has enough context to know,

DM or MD may submit split bio to underlying queue directly, so we can't
do that simply. Also some FS may call bio_split() too.

Or we simply let blk_throtl_bio cover everything? That is basically
what V1 did, and the only issue is that we can't run the account
in case that submit_bio_noacct() is called from blk_throtl_dispatch_work_fn().

> other than looking at some side-effect change from bio_chain)
> 
> But Ming: your __blk_queue_split() change seems wrong.
> Prior to your patch __blk_queue_split() did:
> 
> bio_chain(split, *bio);
> submit_bio_noacct(*bio);
> *bio = split;
> blk_throtl_charge_bio_split(*bio);
> 
> After your patch (effectively):
> 
> bio_chain(split, *bio);
> submit_bio_noacct(*bio);
> blk_throtl_charge_bio_split(bio);
> *bio = split;
> 
> Maybe that was intended? (or maybe it doesn't matter because bio_split
> copies fields with bio_clone_fast())?  Regardless, it is subtle.

It doesn't matter, blk_throtl_charge_bio_split() just accounts bio
number, here the split bio is submitted to the same queue.

> 
> Should blk_throtl_charge_bio_split() just be pushed down to
> bio_split()?

It is fragile, will the bio allocated from bio_split() always
submitted finally? or submitted to same queue?

> 
> In general, such narrow hacks for how to properly resubmit split bios
> are asking for further trouble.

I don't think it is hacks, it is one approach which has been verified as
workable in blk-mq.

> As is, I'm having to triage new
> reports of bio-based accounting issues (which has called into question
> my hack/fix commit a1e1cb72d9649 ("dm: fix redundant IO accounting for
> bios that need splitting") that papered over this bigger issue of
> needing proper split IO accounting, so likely needs to be revisited).

Here the issue is just about bio number accounting.

BTW, maybe you can follow blk-mq's way: just account after io is split,
such as, move start_io_acct() to the end of __split_and_process_non_flush(),
then you can just account io start once.


Thanks, 
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Jan. 11, 2022, 8:36 a.m. UTC | #5
On Mon, Jan 10, 2022 at 02:03:16PM -0500, Mike Snitzer wrote:
> Ming is lifting code out of __blk_queue_split() for reuse (by DM in
> this instance, because it has its own bio_split+bio_chain).
> 
> Are you saying submit_bio_noacct() should be made to call
> blk_throtl_charge_bio_split() and blk_throtl_charge_bio_split() simply
> return if not a split bio? (not sure bio has enough context to know,
> other than looking at some side-effect change from bio_chain)

Yes.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Jan. 11, 2022, 8:42 a.m. UTC | #6
On Tue, Jan 11, 2022 at 09:56:36AM +0800, Ming Lei wrote:
> 2) block throttle is block layer internal stuff, and we shouldn't expose
> blk_throtl_charge_bio_split() to driver.
> 
> Maybe rename the new API as submit_split_bio_noacct(), but we can't
> reuse submit_bio_noacct() simply, otherwise blk_throtl_charge_bio_split()
> needs to be exported.

submit_bio_noacct should only ever be used for resubmitting bios that
came up from the upper layer, although they might not always be "split".
blk_throtl_charge_bio_split

> Another ides is to clearing BIO_THROTTLED before calling submit_bio_noacct(),
> meantime blk-throttle code needs to change to avoid double accounting of bio
> bytes, so the caller of submit_bio_noacct() still needs some change.
> This way can get smooth IOPS throttle, but needs to call __blk_throtl_bio
> for split bio one more time.
> 
> Or other idea for this bio split vs. iops limit issue?

Well, if you want a helper specificly for splits, add one that actally
specifically handles splits and makes the callers life easier, something
like:

void bio_submit_splice(struct bio *split, struct bio *orig)
{
	split->bi_opf |= REQ_NOMERGE;
	trace_block_split(split, orig->bi_iter.bi_sector);
	submit_bio_noacct(orig);
	blk_throtl_charge_bio_split(orig);
}

including a proper kerneldoc comment.

But I still fail to grasp how a split is so different from just
resubmitting a not split bio.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index fd029c86d6ac..733fec7dc5d6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -910,6 +910,18 @@  void submit_bio_noacct(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio_noacct);
 
+/*
+ * Usually for submitting one bio which has been checked by
+ * submit_bio_checks already. The typical use case is for handling
+ * blk-throttle iops limit correctly.
+ */
+void resubmit_bio_noacct(struct bio *bio)
+{
+	submit_bio_noacct(bio);
+	blk_throtl_charge_bio_split(bio);
+}
+EXPORT_SYMBOL(resubmit_bio_noacct);
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The &struct bio which describes the I/O
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4de34a332c9f..acc786d872e6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -366,10 +366,8 @@  void __blk_queue_split(struct request_queue *q, struct bio **bio,
 
 		bio_chain(split, *bio);
 		trace_block_split(split, (*bio)->bi_iter.bi_sector);
-		submit_bio_noacct(*bio);
+		resubmit_bio_noacct(*bio);
 		*bio = split;
-
-		blk_throtl_charge_bio_split(*bio);
 	}
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 22746b2d6825..cce2db9fae1f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -600,6 +600,7 @@  static inline unsigned int blk_queue_depth(struct request_queue *q)
 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);
 void submit_bio_noacct(struct bio *bio);
+void resubmit_bio_noacct(struct bio *bio);
 
 extern int blk_lld_busy(struct request_queue *q);
 extern void blk_queue_split(struct bio **);