Message ID | 1438412290.26596.14.camel@hasee (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Sat, Aug 01 2015 at 2:58am -0400, Ming Lin <mlin@kernel.org> wrote: > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: > > > > OK, once setup, to run the 2 tests in question directly you'd do > > something like: > > > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device > > > > dmtest run --suite thin-provisioning -n discard_fully_provisioned_device_benchmark > > > > Again, these tests pass without this patchset. > > It's caused by patch 4. > When discard size >=4G, the bio->bi_iter.bi_size overflows. Thanks for tracking this down! > Below is the new patch. > > Christoph, > Could you also help to review it? > > Now we still do "misaligned" check in blkdev_issue_discard(). > So the same code in blk_bio_discard_split() was removed. But I don't agree with this approach. One of the most meaningful benefits of late bio splitting is the upper layers shouldn't _need_ to depend on the intermediate devices' queue_limits being stacked properly. Your solution to mix discard granularity/alignment checks at the upper layer(s) but then split based on max_discard_sectors at the lower layer defeats that benefit for discards. This will translate to all intermediate layers that might split discards needing to worry about granularity/alignment too (e.g. how dm-thinp will have to care because it must generate discard mappings with associated bios based on how blocks were mapped to thinp). Also, it is unfortunate that IO that doesn't have a payload is being artificially split simply because bio->bi_iter.bi_size is 32bits. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> This will translate to all intermediate layers that might split
Mike> discards needing to worry about granularity/alignment too
Mike> (e.g. how dm-thinp will have to care because it must generate
Mike> discard mappings with associated bios based on how blocks were
Mike> mapped to thinp).
The fundamental issue here is that alignment and granularity should
never, ever have been enforced at the top of the stack. Horrendous idea
from the very beginning.
For the < handful of braindead devices that get confused when you do
partial or misaligned blocks we should have had a quirk that did any
range adjusting at the bottom in sd_setup_discard_cmnd().
There's a reason I turned discard_zeroes_data off for UNMAP!
Wrt. the range size I don't have a problem with capping at the 32-bit
bi_size limit. We probably don't want to send commands much bigger than
that anyway.
diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..b9e2fca 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -43,7 +43,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; + unsigned int granularity; int alignment; struct bio_batch bb; struct bio *bio; @@ -60,17 +60,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, granularity = max(q->limits.discard_granularity >> 9, 1U); alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -92,7 +81,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); + /* Make sure bi_size doesn't overflow */ + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); /* * If splitting a request, and the next starting sector would be