diff mbox series

[v3,30/30] block: Do not special-case plugging of zone write operations

Message ID 20240328004409.594888-31-dlemoal@kernel.org (mailing list archive)
State Superseded
Headers show
Series Zone write plugging | expand

Commit Message

Damien Le Moal March 28, 2024, 12:44 a.m. UTC
With the block layer zone write plugging being automatically done for
any write operation to a zone of a zoned block device, a regular request
plugging handled through current->plug can only ever see at most a
single write request per zone. In such case, any potential reordering
of the plugged requests will be harmless. We can thus remove the special
casing for write operations to zones and have these requests plugged as
well. This allows removing the function blk_mq_plug and instead directly
using current->plug where needed.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c       |  6 ------
 block/blk-merge.c      |  3 +--
 block/blk-mq.c         |  7 +------
 block/blk-mq.h         | 31 -------------------------------
 include/linux/blkdev.h | 12 ------------
 5 files changed, 2 insertions(+), 57 deletions(-)

Comments

Christoph Hellwig March 28, 2024, 4:54 a.m. UTC | #1
On Thu, Mar 28, 2024 at 09:44:09AM +0900, Damien Le Moal wrote:
> With the block layer zone write plugging being automatically done for
> any write operation to a zone of a zoned block device, a regular request
> plugging handled through current->plug can only ever see at most a
> single write request per zone. In such case, any potential reordering
> of the plugged requests will be harmless. We can thus remove the special
> casing for write operations to zones and have these requests plugged as
> well. This allows removing the function blk_mq_plug and instead directly
> using current->plug where needed.

This looks good in general:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But IIRC we recently had a report that plus reorder I/Os, which would
be grave for the extent layout if we haven't fixed that yet, so we
should probably look into it first.
Damien Le Moal March 28, 2024, 6:43 a.m. UTC | #2
On 3/28/24 13:54, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 09:44:09AM +0900, Damien Le Moal wrote:
>> With the block layer zone write plugging being automatically done for
>> any write operation to a zone of a zoned block device, a regular request
>> plugging handled through current->plug can only ever see at most a
>> single write request per zone. In such case, any potential reordering
>> of the plugged requests will be harmless. We can thus remove the special
>> casing for write operations to zones and have these requests plugged as
>> well. This allows removing the function blk_mq_plug and instead directly
>> using current->plug where needed.
> 
> This looks good in general:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> But IIRC we recently had a report that plus reorder I/Os, which would
> be grave for the extent layout if we haven't fixed that yet, so we
> should probably look into it first.

That is indeed not great, but irrelevant for zone writes as the regular BIO plug
is after the zone write plugging. So the regular BIO plug can only see at most
one write request per zone. Even if that order changes, that will not result in
unaligned write errors like before. But the reordering may still be bad for
performance though, especially on HDD, so yes, we should definitely look into this.
Christoph Hellwig March 28, 2024, 6:51 a.m. UTC | #3
On Thu, Mar 28, 2024 at 03:43:02PM +0900, Damien Le Moal wrote:
> That is indeed not great, but irrelevant for zone writes as the regular BIO plug
> is after the zone write plugging. So the regular BIO plug can only see at most
> one write request per zone. Even if that order changes, that will not result in
> unaligned write errors like before. But the reordering may still be bad for
> performance though, especially on HDD, so yes, we should definitely look into this.

Irrelevant is not how I would frame it.  Yes, it will not affect
correctness.  But it will affect performance not just for the write
itself, but also in the long run as it affects the on-disk extent
layout.
Damien Le Moal March 28, 2024, 6:54 a.m. UTC | #4
On 3/28/24 15:51, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 03:43:02PM +0900, Damien Le Moal wrote:
>> That is indeed not great, but irrelevant for zone writes as the regular BIO plug
>> is after the zone write plugging. So the regular BIO plug can only see at most
>> one write request per zone. Even if that order changes, that will not result in
>> unaligned write errors like before. But the reordering may still be bad for
>> performance though, especially on HDD, so yes, we should definitely look into this.
> 
> Irrelevant is not how I would frame it.  Yes, it will not affect
> correctness.  But it will affect performance not just for the write
> itself, but also in the long run as it affects the on-disk extent
> layout.

Agreed.
Bart Van Assche March 29, 2024, 6:58 p.m. UTC | #5
On 3/27/24 5:44 PM, Damien Le Moal wrote:
> With the block layer zone write plugging being automatically done for
> any write operation to a zone of a zoned block device, a regular request
> plugging handled through current->plug can only ever see at most a
> single write request per zone. In such case, any potential reordering
> of the plugged requests will be harmless. We can thus remove the special
> casing for write operations to zones and have these requests plugged as
> well. This allows removing the function blk_mq_plug and instead directly
> using current->plug where needed.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index e1a5344c2257..47400a4fe851 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -907,12 +907,6 @@  int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
 	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
 
-	/*
-	 * As the requests that require a zone lock are not plugged in the
-	 * first place, directly accessing the plug instead of using
-	 * blk_mq_plug() should not have any consequences during flushing for
-	 * zoned devices.
-	 */
 	blk_flush_plug(current->plug, false);
 
 	/*
diff --git a/block/blk-merge.c b/block/blk-merge.c
index b96466d2ba94..1a9a424212ee 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -1112,10 +1112,9 @@  static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs)
 {
-	struct blk_plug *plug;
+	struct blk_plug *plug = current->plug;
 	struct request *rq;
 
-	plug = blk_mq_plug(bio);
 	if (!plug || rq_list_empty(plug->mq_list))
 		return false;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 770d636707dc..823ce64610e0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1332,11 +1332,6 @@  void blk_execute_rq_nowait(struct request *rq, bool at_head)
 
 	blk_account_io_start(rq);
 
-	/*
-	 * As plugging can be enabled for passthrough requests on a zoned
-	 * device, directly accessing the plug instead of using blk_mq_plug()
-	 * should not have any consequences.
-	 */
 	if (current->plug && !at_head) {
 		blk_add_rq_to_plug(current->plug, rq);
 		return;
@@ -2924,7 +2919,7 @@  static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
 void blk_mq_submit_bio(struct bio *bio)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	struct blk_plug *plug = blk_mq_plug(bio);
+	struct blk_plug *plug = current->plug;
 	const int is_sync = op_is_sync(bio->bi_opf);
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int nr_segs = 1;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index f75a9ecfebde..260beea8e332 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -365,37 +365,6 @@  static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
 		qmap->mq_map[cpu] = 0;
 }
 
-/*
- * blk_mq_plug() - Get caller context plug
- * @bio : the bio being submitted by the caller context
- *
- * Plugging, by design, may delay the insertion of BIOs into the elevator in
- * order to increase BIO merging opportunities. This however can cause BIO
- * insertion order to change from the order in which submit_bio() is being
- * executed in the case of multiple contexts concurrently issuing BIOs to a
- * device, even if these context are synchronized to tightly control BIO issuing
- * order. While this is not a problem with regular block devices, this ordering
- * change can cause write BIO failures with zoned block devices as these
- * require sequential write patterns to zones. Prevent this from happening by
- * ignoring the plug state of a BIO issuing context if it is for a zoned block
- * device and the BIO to plug is a write operation.
- *
- * Return current->plug if the bio can be plugged and NULL otherwise
- */
-static inline struct blk_plug *blk_mq_plug( struct bio *bio)
-{
-	/* Zoned block device write operation case: do not plug the BIO */
-	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
-	    bdev_op_is_zoned_write(bio->bi_bdev, bio_op(bio)))
-		return NULL;
-
-	/*
-	 * For regular block devices or read operations, use the context plug
-	 * which may be NULL if blk_start_plug() was not executed.
-	 */
-	return current->plug;
-}
-
 /* Free all requests on the list */
 static inline void blk_mq_free_requests(struct list_head *list)
 {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d2b8d7761269..022d78c5136f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1301,18 +1301,6 @@  static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
 	return disk_zone_no(bdev->bd_disk, sec);
 }
 
-/* Whether write serialization is required for @op on zoned devices. */
-static inline bool op_needs_zoned_write_locking(enum req_op op)
-{
-	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
-}
-
-static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
-					  enum req_op op)
-{
-	return bdev_is_zoned(bdev) && op_needs_zoned_write_locking(op);
-}
-
 static inline sector_t bdev_zone_sectors(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);