diff mbox series

[v3,04/30] block: Introduce blk_zone_update_request_bio()

Message ID 20240328004409.594888-5-dlemoal@kernel.org (mailing list archive)
State Superseded
Headers show
Series Zone write plugging | expand

Commit Message

Damien Le Moal March 28, 2024, 12:43 a.m. UTC
On completion of a zone append request, the request sector indicates the
location of the written data. This value must be returned to the user
through the BIO iter sector. This is done in 2 places: in
blk_complete_request() and in blk_update_request(). Introduce the inline
helper function blk_zone_update_request_bio() to avoid duplicating
this BIO update for zone append requests, and to compile out this
helper call when CONFIG_BLK_DEV_ZONED is not enabled.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 11 +++++------
 block/blk.h    | 19 ++++++++++++++++++-
 2 files changed, 23 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig March 28, 2024, 4:14 a.m. UTC | #1
> -		if (req_op(req) == REQ_OP_ZONE_APPEND)
> -			bio->bi_iter.bi_sector = req->__sector;
> -
> -		if (!is_flush)
> +		if (!is_flush) {
> +			blk_zone_update_request_bio(req, bio);
>  			bio_endio(bio);
> +		}

As noted by Bart last time around, the blk_zone_update_request_bio
really should stay out of the !is_flush check, as otherwise we'd
break zone appends going through the flush state machine.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Damien Le Moal March 28, 2024, 5:20 a.m. UTC | #2
On 3/28/24 13:14, Christoph Hellwig wrote:
>> -		if (req_op(req) == REQ_OP_ZONE_APPEND)
>> -			bio->bi_iter.bi_sector = req->__sector;
>> -
>> -		if (!is_flush)
>> +		if (!is_flush) {
>> +			blk_zone_update_request_bio(req, bio);
>>  			bio_endio(bio);
>> +		}
> 
> As noted by Bart last time around, the blk_zone_update_request_bio
> really should stay out of the !is_flush check, as otherwise we'd
> break zone appends going through the flush state machine.

I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is
set, that is, we are in the middle of a flush sequence. And flush sequence
progression is handled at the request level, not BIOs. Once the sequence
finishes, then and only then the BIO original endio should be done, meaning that
we will then take this path and actually do blk_zone_update_request_bio() and
bio_endio(). So I still think this is correct.

> 
> Otherwise this looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig March 28, 2024, 5:42 a.m. UTC | #3
On Thu, Mar 28, 2024 at 02:20:17PM +0900, Damien Le Moal wrote:
> I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is
> set, that is, we are in the middle of a flush sequence. And flush sequence
> progression is handled at the request level, not BIOs. Once the sequence
> finishes, then and only then the BIO original endio should be done, meaning that
> we will then take this path and actually do blk_zone_update_request_bio() and
> bio_endio(). So I still think this is correct.

Well.

lk_flush_restore_request with the previous patch now restores rq->__sector,
and the blk_mq_end_request call following it will propagate it to the
original bio.  But blk_flush_restore_request grabs the sector from
rq->bio->bi_iter.bi_sector, and we need to actually get it there first,
which is done by the data I/O completion that has RQF_FLUSH_SEQ set.

I think we really need a good test case for zone append and FUA,
i.e. we need the append op for zonefs, which should exercise the
fua code if O_SYNC/O_DSYNC is set.
Damien Le Moal March 28, 2024, 5:54 a.m. UTC | #4
On 3/28/24 14:42, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 02:20:17PM +0900, Damien Le Moal wrote:
>> I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is
>> set, that is, we are in the middle of a flush sequence. And flush sequence
>> progression is handled at the request level, not BIOs. Once the sequence
>> finishes, then and only then the BIO original endio should be done, meaning that
>> we will then take this path and actually do blk_zone_update_request_bio() and
>> bio_endio(). So I still think this is correct.
> 
> Well.
> 
> lk_flush_restore_request with the previous patch now restores rq->__sector,
> and the blk_mq_end_request call following it will propagate it to the
> original bio.  But blk_flush_restore_request grabs the sector from
> rq->bio->bi_iter.bi_sector, and we need to actually get it there first,
> which is done by the data I/O completion that has RQF_FLUSH_SEQ set.

Ah. Yes. There is no issue with the current code for regular writes, but we
would get the original sector and not the written sector in the case of zone
append. Will make the change.

> I think we really need a good test case for zone append and FUA,
> i.e. we need the append op for zonefs, which should exercise the
> fua code if O_SYNC/O_DSYNC is set.

Yep. There is currently no issuer of zone append + FUA. But once I get to add
that for zonefs and block dev files, we indeed will have good tstbed.

>
Bart Van Assche March 28, 2024, 9:31 p.m. UTC | #5
On 3/27/24 5:43 PM, Damien Le Moal wrote:
> On completion of a zone append request, the request sector indicates the
> location of the written data. This value must be returned to the user
> through the BIO iter sector. This is done in 2 places: in
> blk_complete_request() and in blk_update_request(). Introduce the inline
> helper function blk_zone_update_request_bio() to avoid duplicating
> this BIO update for zone append requests, and to compile out this
> helper call when CONFIG_BLK_DEV_ZONED is not enabled.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e55af6058cbf..70dfb4af65cf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -820,11 +820,11 @@  static void blk_complete_request(struct request *req)
 		/* Completion has already been traced */
 		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
 
-		if (req_op(req) == REQ_OP_ZONE_APPEND)
-			bio->bi_iter.bi_sector = req->__sector;
-
-		if (!is_flush)
+		if (!is_flush) {
+			blk_zone_update_request_bio(req, bio);
 			bio_endio(bio);
+		}
+
 		bio = next;
 	} while (bio);
 
@@ -923,8 +923,7 @@  bool blk_update_request(struct request *req, blk_status_t error,
 
 		/* Don't actually finish bio if it's part of flush sequence */
 		if (!bio->bi_iter.bi_size && !is_flush) {
-			if (req_op(req) == REQ_OP_ZONE_APPEND)
-				bio->bi_iter.bi_sector = req->__sector;
+			blk_zone_update_request_bio(req, bio);
 			bio_endio(bio);
 		}
 
diff --git a/block/blk.h b/block/blk.h
index 5cac4e29ae17..a12cde1d45de 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -409,12 +409,29 @@  static inline struct bio *blk_queue_bounce(struct bio *bio,
 
 #ifdef CONFIG_BLK_DEV_ZONED
 void disk_free_zone_bitmaps(struct gendisk *disk);
+static inline void blk_zone_update_request_bio(struct request *rq,
+					       struct bio *bio)
+{
+	/*
+	 * For zone append requests, the request sector indicates the location
+	 * at which the BIO data was written. Return this value to the BIO
+	 * issuer through the BIO iter sector.
+	 */
+	if (req_op(rq) == REQ_OP_ZONE_APPEND)
+		bio->bi_iter.bi_sector = rq->__sector;
+}
 int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd,
 		unsigned long arg);
 int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 		unsigned int cmd, unsigned long arg);
 #else /* CONFIG_BLK_DEV_ZONED */
-static inline void disk_free_zone_bitmaps(struct gendisk *disk) {}
+static inline void disk_free_zone_bitmaps(struct gendisk *disk)
+{
+}
+static inline void blk_zone_update_request_bio(struct request *rq,
+					       struct bio *bio)
+{
+}
 static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
 		unsigned int cmd, unsigned long arg)
 {