Message ID | 1438990806.24452.8.camel@ssi (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: > > I'm for solution 3: > > > > - keep blk_bio_{discard,write_same}_split, but ensure we never built > > a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: >> >> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> > I'm for solution 3: >> > >> > - keep blk_bio_{discard,write_same}_split, but ensure we never built >> > a > 4GB bio in blkdev_issue_{discard,write_same}. >> >> This has problem as I mentioned in solution 1. >> We need to also make sure max discard size is of proper granularity. >> See below example. >> >> 4G: 8388608 sectors >> UINT_MAX: 8388607 sectors >> >> dm-thinp block size = default discard granularity = 128 sectors >> >> blkdev_issue_discard(sector=0, nr_sectors=8388608) >> >> 1. Only ensure bi_size not overflow >> >> It doesn't work. >> >> [start_sector, end_sector] >> [0, 8388607] >> [0, 8388606], then dm-thinp splits it to 2 bios >> [0, 8388479] >> [8388480, 8388606] ---> this has problem in process_discard_bio(), >> because the discard size(7 sectors) covers less than a block(128 sectors) >> [8388607, 8388607] ---> same problem >> >> 2. Ensure bi_size not overflow and max discard size is of proper granularity > > Ideally we'd get upper layers out of the business of knowing about the queue > limits at all - that was the point of the patch series, after all. > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors?' 1 << 31 = 2G bytes = 0x400000 sectors. Yes, that works as long as it's multiple of granularity. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote: > On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet > > Ideally we'd get upper layers out of the business of knowing about the queue > > limits at all - that was the point of the patch series, after all. > > > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors?' > > 1 << 31 = 2G bytes = 0x400000 sectors. > > Yes, that works as long as it's multiple of granularity. Is granularity required to be a power of two? One would hope, but looking at the code that doesn't appear to be a requirement... ugh, that's terrible... -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 08/08/2015 01:40 AM, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> I'm for solution 3: >> >> - keep blk_bio_{discard,write_same}_split, but ensure we never built >> a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity > > It works. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388479] > [8388480, 8388607] > > > So how about below patch? > > commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 > Author: Ming Lin <ming.l@ssi.samsung.com> > Date: Fri Aug 7 15:07:07 2015 -0700 > > block: remove split code in blkdev_issue_{discard,write_same} > > The split code in blkdev_issue_{discard,write_same} can go away > now that any driver that cares does the split. We have to make > sure bio size doesn't overflow. > > For discard, we ensure max_discard_sectors is of the proper > granularity. So if discard size > 4G, blkdev_issue_discard() always > send multiple granularity requests to lower level, except that the > last one may be not multiple granularity. > > Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> > --- > block/blk-lib.c | 37 +++++++++---------------------------- > 1 file changed, 9 insertions(+), 28 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 7688ee3..e178a07 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > struct request_queue *q = bdev_get_queue(bdev); > int type = REQ_WRITE | REQ_DISCARD; > unsigned int max_discard_sectors, granularity; > - int alignment; > struct bio_batch bb; > struct bio *bio; > int ret = 0; > @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > > /* Zero-sector (unknown) and one-sector granularities are the same. */ > granularity = max(q->limits.discard_granularity >> 9, 1U); > - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; > > /* > - * Ensure that max_discard_sectors is of the proper > - * granularity, so that requests stay aligned after a split. > - */ > - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); > + * Ensure that max_discard_sectors doesn't overflow bi_size and is of > + * the proper granularity. So if discard size > 4G, blkdev_issue_discard() > + * always split and send multiple granularity requests to lower level, > + * except that the last one may be not multiple granularity. > + */ > + max_discard_sectors = UINT_MAX >> 9; > max_discard_sectors -= max_discard_sectors % granularity; > - if (unlikely(!max_discard_sectors)) { > - /* Avoid infinite loop below. Being cautious never hurts. */ > - return -EOPNOTSUPP; > - } > > if (flags & BLKDEV_DISCARD_SECURE) { > if (!blk_queue_secdiscard(q)) > @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > blk_start_plug(&plug); > while (nr_sects) { > unsigned int req_sects; > - sector_t end_sect, tmp; > + sector_t end_sect; > > bio = bio_alloc(gfp_mask, 1); > if (!bio) { > @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > } > > req_sects = min_t(sector_t, nr_sects, max_discard_sectors); > - > - /* > - * If splitting a request, and the next starting sector would be > - * misaligned, stop the discard at the previous aligned sector. > - */ > end_sect = sector + req_sects; > - tmp = end_sect; > - if (req_sects < nr_sects && > - sector_div(tmp, granularity) != alignment) { > - end_sect = end_sect - alignment; > - sector_div(end_sect, granularity); > - end_sect = end_sect * granularity + alignment; > - req_sects = end_sect - sector; > - } > > bio->bi_iter.bi_sector = sector; > bio->bi_end_io = bio_batch_end_io; > @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, > if (!q) > return -ENXIO; > > - max_write_same_sectors = q->limits.max_write_same_sectors; > - > - if (max_write_same_sectors == 0) > - return -EOPNOTSUPP; > + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ > + max_write_same_sectors = UINT_MAX >> 9; > > atomic_set(&bb.done, 1); > bb.flags = 1 << BIO_UPTODATE; > Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... Cheers, Hannes
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: > Wouldn't it be easier to move both max_write_same_sectors and > max_discard sectors to 64 bit (ie to type sector_t) and be done with the > overflow? > Seems to me this is far too much coding around self-imposed restrictions... It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Aug 07, 2015 at 09:22:26PM -0800, Kent Overstreet wrote: > Is granularity required to be a power of two? One would hope, but looking at the > code that doesn't appear to be a requirement... ugh, that's terrible... If devices have an odd granularity we'll just have to split more than nessecary. Let's hope no major hardware does that. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 08/08/2015 11:02 AM, Kent Overstreet wrote: > On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: >> Wouldn't it be easier to move both max_write_same_sectors and >> max_discard sectors to 64 bit (ie to type sector_t) and be done with the >> overflow? >> Seems to me this is far too much coding around self-imposed restrictions... > > It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I > suppose wouldn't actually increase the size of struct bio (when sector_t is 64 > bits), since struct bvec_iter has padding right now... > Which I guess we should be doing anyway. Devices are getting larger and larger, so in the long run in need to be moved to 64 bits. Cheers, Hannes
diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); + * Ensure that max_discard_sectors doesn't overflow bi_size and is of + * the proper granularity. So if discard size > 4G, blkdev_issue_discard() + * always split and send multiple granularity requests to lower level, + * except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX >> 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; - max_write_same_sectors = q->limits.max_write_same_sectors; - - if (max_write_same_sectors == 0) - return -EOPNOTSUPP; + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ + max_write_same_sectors = UINT_MAX >> 9; atomic_set(&bb.done, 1); bb.flags = 1 << BIO_UPTODATE;