diff mbox

[5/5] block: Make blkdev_issue_discard() submit aligned discard requests

Message ID 570D5D50.6000408@sandisk.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 12, 2016, 8:40 p.m. UTC
Split discard requests as follows:
* If the start sector is not aligned, an initial write request up
  to the first aligned sector.
* A discard request from the first aligned sector in the range up
  to the last aligned sector in the discarded range.
* If the end sector is not aligned, a final write request from the
  last aligned sector up to the end.

Note: if the start and/or end sectors are not aligned and if the
range is small enough the discard request will be submitted with
bi_size == 0.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-lib.c   |  4 ++--
 block/blk-merge.c | 55 ++++++++++++++++++++++++++++++-------------------------
 block/blk.h       |  3 +++
 3 files changed, 35 insertions(+), 27 deletions(-)

Comments

Hannes Reinecke April 13, 2016, 2:23 p.m. UTC | #1
On 04/12/2016 10:40 PM, Bart Van Assche wrote:
> Split discard requests as follows:
> * If the start sector is not aligned, an initial write request up
>   to the first aligned sector.
> * A discard request from the first aligned sector in the range up
>   to the last aligned sector in the discarded range.
> * If the end sector is not aligned, a final write request from the
>   last aligned sector up to the end.
> 
> Note: if the start and/or end sectors are not aligned and if the
> range is small enough the discard request will be submitted with
> bi_size == 0.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> ---
>  block/blk-lib.c   |  4 ++--
>  block/blk-merge.c | 55 ++++++++++++++++++++++++++++++-------------------------
>  block/blk.h       |  3 +++
>  3 files changed, 35 insertions(+), 27 deletions(-)
> 
Well.
I do understand the intent (and, in fact, I need a similar thing for
SMR :-), but I'm not sure if the implementation is correct.

From my understanding, 'discard' is telling the device to there are
no outstanding users, and the device may be re-arranging these
blocks as needed.
And the 'discard_zeroes_data' is just a hint that the blocks will be
zeroed while/after being discarded.

WRITE SAME approaches it from the other side, blanking out blocks
and optionally discards them.

So I wonder if we should plumb this into blkdev_issue_zeroout(), not
blk_issue_discard().

Or maybe both ...

Cheers,

Hannes
Bart Van Assche April 13, 2016, 3:22 p.m. UTC | #2
On 04/13/2016 07:23 AM, Hannes Reinecke wrote:
> So I wonder if we should plumb this into blkdev_issue_zeroout(), not
> blk_issue_discard().

That's an excellent question. Let's take a step back and look at the 
functionality that is exposed to user space. What should the behavior of 
the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end 
sectors are not aligned? My patch ensures that these ioctls do something 
meaningful if the start and/or end sector are not aligned on a discard 
boundary. Is that the behavior we want or should rather we make these 
ioctls fail if the start and/or end sectors are not aligned?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke April 13, 2016, 3:25 p.m. UTC | #3
On 04/13/2016 05:22 PM, Bart Van Assche wrote:
> On 04/13/2016 07:23 AM, Hannes Reinecke wrote:
>> So I wonder if we should plumb this into blkdev_issue_zeroout(), not
>> blk_issue_discard().
>
> That's an excellent question. Let's take a step back and look at the
> functionality that is exposed to user space. What should the behavior of
> the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end
> sectors are not aligned? My patch ensures that these ioctls do something
> meaningful if the start and/or end sector are not aligned on a discard
> boundary. Is that the behavior we want or should rather we make these
> ioctls fail if the start and/or end sectors are not aligned?
>
Guess what, I've run into the same issue as I've tried to adapt 
blkdev_issue_discard for SMR drives.
There are always some areas on SMR drives which do not necessarily 
expose a discard functionality, so the same problem applies to that, too.

Lightning talk as LSF?

Cheers,

Hannes
Christoph Hellwig April 13, 2016, 3:32 p.m. UTC | #4
On Wed, Apr 13, 2016 at 08:22:12AM -0700, Bart Van Assche wrote:
> That's an excellent question. Let's take a step back and look at the 
> functionality that is exposed to user space. What should the behavior of 
> the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end sectors 
> are not aligned? My patch ensures that these ioctls do something meaningful 
> if the start and/or end sector are not aligned on a discard boundary. Is 
> that the behavior we want or should rather we make these ioctls fail if the 
> start and/or end sectors are not aligned?

BLKDISCARD should just skip these ranges.  You'll need to use BLKZEROOUT
or the falloc support from Darrick if you actually want zeroes.

BLKSECDISCARD implies it should fully erase data by the name, for
which zeroing might actually not be enough.  I'd be tempted to fail it,
but would like to see input from the people who created it and/or are
using it.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9a93ca4..d78ded5 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -30,7 +30,7 @@  static void bio_batch_end_io(struct bio *bio)
  * Return the largest number that is less than or equal to @s and for which
  * the remainder of the division by @granularity is @alignment.
  */
-static sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment)
+sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment)
 {
 	sector_t tmp = s, res = s;
 	u32 remainder;
@@ -219,7 +219,7 @@  int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
-static void bio_add_zero_pages(struct bio *bio, sector_t nr_sects)
+void bio_add_zero_pages(struct bio *bio, sector_t nr_sects)
 {
 	unsigned int sz;
 	int ret;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..fd15606 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -16,42 +16,47 @@  static struct bio *blk_bio_discard_split(struct request_queue *q,
 					 struct bio_set *bs,
 					 unsigned *nsegs)
 {
+	struct bio *wr;
 	unsigned int max_discard_sectors, granularity;
 	int alignment;
-	sector_t tmp;
-	unsigned split_sectors;
+	sector_t start, start_r, end, end_r, skip;
 
 	*nsegs = 1;
 
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
-
+	alignment = (q->limits.discard_alignment >> 9) % granularity;
 	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-
-	if (unlikely(!max_discard_sectors)) {
-		/* XXX: warn */
-		return NULL;
-	}
-
-	if (bio_sectors(bio) <= max_discard_sectors)
-		return NULL;
-
-	split_sectors = max_discard_sectors;
+	WARN_ON_ONCE(max_discard_sectors == 0);
 
 	/*
-	 * If the next starting sector would be misaligned, stop the discard at
-	 * the previous aligned sector.
+	 * If the start or end sector are misaligned, issue a write same
+	 * same request if the discard_zeroes_data flag has been set.
 	 */
-	alignment = (q->limits.discard_alignment >> 9) % granularity;
-
-	tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
-	tmp = sector_div(tmp, granularity);
-
-	if (split_sectors > tmp)
-		split_sectors -= tmp;
-
-	return bio_split(bio, split_sectors, GFP_NOIO, bs);
+	start = bio->bi_iter.bi_sector;
+	start_r = blk_round_sect_down(start, granularity, alignment);
+	end = start + min(max_discard_sectors, bio_sectors(bio));
+	end_r = blk_round_sect_down(end, granularity, alignment);
+	if (start == start_r && start < end_r) {
+		if (end == end_r && bio_sectors(bio) == end_r - start)
+			return NULL;
+		return bio_split(bio, end_r - start, GFP_NOIO, bs);
+	}
+	if (q->limits.discard_zeroes_data && start < end) {
+		end = min(end, start_r + granularity);
+		wr = bio_alloc_bioset(GFP_NOIO, end - start, bs);
+		if (WARN_ON_ONCE(!wr))
+			return NULL;
+		wr->bi_rw = REQ_WRITE;
+		wr->bi_iter.bi_sector = start;
+		wr->bi_bdev = bio->bi_bdev;
+		bio_add_zero_pages(wr, end - start);
+		bio_advance(bio, wr->bi_iter.bi_size);
+		return wr;
+	}
+	skip = (min(start_r + granularity, end) - start) << 9;
+	bio_advance(bio, skip);
+	return NULL;
 }
 
 static struct bio *blk_bio_write_same_split(struct request_queue *q,
diff --git a/block/blk.h b/block/blk.h
index 70e4aee..31b13f9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -36,6 +36,9 @@  extern struct kmem_cache *request_cachep;
 extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
+sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment);
+void bio_add_zero_pages(struct bio *bio, sector_t nr_sects);
+
 static inline struct blk_flush_queue *blk_get_flush_queue(
 		struct request_queue *q, struct blk_mq_ctx *ctx)
 {