diff mbox

[v5,01/11] block: make generic_make_request handle arbitrarily sized bios

Message ID 1438581502.26596.24.camel@hasee (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Ming Lin Aug. 3, 2015, 5:58 a.m. UTC
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote:
> On Sat, Aug 01 2015 at  2:58am -0400,
> Ming Lin <mlin@kernel.org> wrote:
> 
> > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> > > 
> > > OK, once setup, to run the 2 tests in question directly you'd do
> > > something like:
> > > 
> > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device
> > > 
> > > dmtest run --suite thin-provisioning -n discard_fully_provisioned_device_benchmark
> > > 
> > > Again, these tests pass without this patchset.
> > 
> > It's caused by patch 4.

Typo. I mean patch 5.

> > When discard size >=4G, the bio->bi_iter.bi_size overflows.
> 
> Thanks for tracking this down!

blkdev_issue_write_same() has same problem.

> 
> > Below is the new patch.
> > 
> > Christoph,
> > Could you also help to review it?
> > 
> > Now we still do "misaligned" check in blkdev_issue_discard().
> > So the same code in blk_bio_discard_split() was removed.
> 
> But I don't agree with this approach.  One of the most meaningful
> benefits of late bio splitting is the upper layers shouldn't _need_ to
> depend on the intermediate devices' queue_limits being stacked properly.
> Your solution to mix discard granularity/alignment checks at the upper
> layer(s) but then split based on max_discard_sectors at the lower layer
> defeats that benefit for discards.
> 
> This will translate to all intermediate layers that might split
> discards needing to worry about granularity/alignment
> too (e.g. how dm-thinp will have to care because it must generate
> discard mappings with associated bios based on how blocks were mapped to
> thinp).

I think the important thing is the late splitting for regular bio.
For discard/write_same bio, how about just don't do late splitting?

That is:
1. remove "PATCH 5: block: remove split code in blkdev_issue_discard"
2. Add below changes to PATCH 1


> 
> Also, it is unfortunate that IO that doesn't have a payload is being
> artificially split simply because bio->bi_iter.bi_size is 32bits.

Indeed.
Will it be possible to make it 64bits? I guess no.

> 
> Mike


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Christoph Hellwig Aug. 4, 2015, 11:36 a.m. UTC | #1
On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> I think the important thing is the late splitting for regular bio.
> For discard/write_same bio, how about just don't do late splitting?

I'd hate having to special case them even more.  Especially as the
discard splitting is nasty and we really don't want to send giant
discards by default anyway (see Jens' patches to limit discard size
by default).

So I'd recommend to keep everything as-is, just make sure we don't
overflow bi_size.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ming Lin Aug. 5, 2015, 6:03 a.m. UTC | #2
On Tue, 2015-08-04 at 13:36 +0200, Christoph Hellwig wrote:
> On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> > I think the important thing is the late splitting for regular bio.
> > For discard/write_same bio, how about just don't do late splitting?
> 
> I'd hate having to special case them even more.  Especially as the
> discard splitting is nasty and we really don't want to send giant
> discards by default anyway (see Jens' patches to limit discard size
> by default).
> 
> So I'd recommend to keep everything as-is, just make sure we don't
> overflow bi_size.

Did you mean to remove "PATCH 4 block: remove split code in
blkdev_issue_discard" or to keep it?

Which of below 2 solutions you prefer?

- Solution 1

remove splits in blkdev_issue_{discard,write_same} and keep
blk_bio_{discard,write_same}_split

But for blkdev_issue_discard(), it's not enough if only make sure
bi_size not overflow, for example, discard 4G

4G bytes = 8388608 sectors
UINT_MAX = 8388607 sectors

So blkdev_issue_discard() will send 2 discard bios.
First bio: sector 0 .. 8388606
Second bio: sector 8388607 .. 8388607

In this case, the 2 discard tests in device-mapper-test-suite still
fail, probably because the second bio start sector is not aligned with
discard_granularity.

So I have to take into account discard_granularity(assume 32 sectors),
then blkdev_issue_discard() will send 2 discard bios, as

First bio: sector 0 .. 8388575
Second bio: sector 8388576 .. 8388607

In this case, both discard tests passed.

- Solution 2

special case discard/write_same bios(You said you hate it).

That is to keep splits in blkdev_issue_{discard,write_same} and remove
blk_bio_{discard,write_same}_split

I think this is more clean way because blkdev_issue_{discard,write_same}
already make sure we don't overflow bi_size.

And blk_bio_{discard,write_same}_split are actually duplicated with the
splits in blkdev_issue_{discard,write_same}. It's OK to remove it.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Kent Overstreet Aug. 7, 2015, midnight UTC | #3
On Tue, Aug 04, 2015 at 01:36:26PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> > I think the important thing is the late splitting for regular bio.
> > For discard/write_same bio, how about just don't do late splitting?
> 
> I'd hate having to special case them even more.  Especially as the
> discard splitting is nasty and we really don't want to send giant
> discards by default anyway (see Jens' patches to limit discard size
> by default).

Why is it that we don't want to send giant discards? Is it a latency issue?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Aug. 7, 2015, 7:30 a.m. UTC | #4
I'm for solution 3:

 - keep blk_bio_{discard,write_same}_split, but ensure we never built
   a > 4GB bio in blkdev_issue_{discard,write_same}.

Note that this isn't special casing, we can't build > 4GB bios for
data either, it's just implemented as a side effect right now instead
of checked explicitly.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Aug. 7, 2015, 7:30 a.m. UTC | #5
On Thu, Aug 06, 2015 at 05:00:04PM -0700, Kent Overstreet wrote:
> Why is it that we don't want to send giant discards? Is it a latency issue?

Yes.  Take a look at the "Configurable max discard size" thread(s).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1f5dfa0..90b085e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,59 +9,6 @@ 
 
 #include "blk.h"
 
-static struct bio *blk_bio_discard_split(struct request_queue *q,
-					 struct bio *bio,
-					 struct bio_set *bs)
-{
-	unsigned int max_discard_sectors, granularity;
-	int alignment;
-	sector_t tmp;
-	unsigned split_sectors;
-
-	/* Zero-sector (unknown) and one-sector granularities are the same.  */
-	granularity = max(q->limits.discard_granularity >> 9, 1U);
-
-	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-
-	if (unlikely(!max_discard_sectors)) {
-		/* XXX: warn */
-		return NULL;
-	}
-
-	if (bio_sectors(bio) <= max_discard_sectors)
-		return NULL;
-
-	split_sectors = max_discard_sectors;
-
-	/*
-	 * If the next starting sector would be misaligned, stop the discard at
-	 * the previous aligned sector.
-	 */
-	alignment = (q->limits.discard_alignment >> 9) % granularity;
-
-	tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
-	tmp = sector_div(tmp, granularity);
-
-	if (split_sectors > tmp)
-		split_sectors -= tmp;
-
-	return bio_split(bio, split_sectors, GFP_NOIO, bs);
-}
-
-static struct bio *blk_bio_write_same_split(struct request_queue *q,
-					    struct bio *bio,
-					    struct bio_set *bs)
-{
-	if (!q->limits.max_write_same_sectors)
-		return NULL;
-
-	if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
-		return NULL;
-
-	return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
-}
-
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 					 struct bio *bio,
 					 struct bio_set *bs)
@@ -129,10 +76,8 @@  void blk_queue_split(struct request_queue *q, struct bio **bio,
 {
 	struct bio *split;
 
-	if ((*bio)->bi_rw & REQ_DISCARD)
-		split = blk_bio_discard_split(q, *bio, bs);
-	else if ((*bio)->bi_rw & REQ_WRITE_SAME)
-		split = blk_bio_write_same_split(q, *bio, bs);
+	if ((*bio)->bi_rw & REQ_DISCARD || (*bio)->bi_rw & REQ_WRITE_SAME)
+		split = NULL;
 	else
 		split = blk_bio_segment_split(q, *bio, q->bio_split);