diff mbox series

block: fix partial zone append completion handling in req_bio_endio()

Message ID 20240110051559.223436-1-dlemoal@kernel.org (mailing list archive)
State New, archived
Headers show
Series block: fix partial zone append completion handling in req_bio_endio() | expand

Commit Message

Damien Le Moal Jan. 10, 2024, 5:15 a.m. UTC
Partial completions of zone append request is not allowed but if a zone
append completion indicates a number of completed bytes different from
the original BIO size, only the BIO status is set to error. This leads
to bio_advance() not setting the BIO size to 0 and thus to not call
bio_endio() at the end of req_bio_endio().

Make sure a partially completed zone append is failed and completed
immediately by forcing the completed number of bytes (nbytes) to be
equal to the BIO sizei, thus ensuring that bio_endio() is called.

Fixes: 297db731847e ("block: fix req_bio_endio append error handling")
Cc: stable@kernel.vger.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-mq.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Jan. 10, 2024, 6:08 a.m. UTC | #1
On Wed, Jan 10, 2024 at 02:15:59PM +0900, Damien Le Moal wrote:
> Partial completions of zone append request is not allowed but if a zone
> append completion indicates a number of completed bytes different from
> the original BIO size, only the BIO status is set to error. This leads
> to bio_advance() not setting the BIO size to 0 and thus to not call
> bio_endio() at the end of req_bio_endio().
> 
> Make sure a partially completed zone append is failed and completed
> immediately by forcing the completed number of bytes (nbytes) to be
> equal to the BIO sizei, thus ensuring that bio_endio() is called.

This really is a should not ever happen case.  But if it does happen
anyway for some reason, this is the right way to deal with it, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Damien Le Moal Jan. 10, 2024, 6:18 a.m. UTC | #2
On 1/10/24 15:08, Christoph Hellwig wrote:
> On Wed, Jan 10, 2024 at 02:15:59PM +0900, Damien Le Moal wrote:
>> Partial completions of zone append request is not allowed but if a zone
>> append completion indicates a number of completed bytes different from
>> the original BIO size, only the BIO status is set to error. This leads
>> to bio_advance() not setting the BIO size to 0 and thus to not call
>> bio_endio() at the end of req_bio_endio().
>>
>> Make sure a partially completed zone append is failed and completed
>> immediately by forcing the completed number of bytes (nbytes) to be
>> equal to the BIO sizei, thus ensuring that bio_endio() is called.
> 
> This really is a should not ever happen case.  But if it does happen
> anyway for some reason, this is the right way to deal with it, so:

Yes, that is likely impossible for NVMe ZNS. But that is definitely in the realm
of the possible for SMR drives with zone append emulation using regular writes.
ATA drives will never partially complete, but SAS drives may do so.

> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks.
Johannes Thumshirn Jan. 10, 2024, 8:22 a.m. UTC | #3
On 10.01.24 06:16, Damien Le Moal wrote:
> equal to the BIO sizei, thus ensuring that bio_endio() is called.
         typo: size ~^

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Damien Le Moal Jan. 10, 2024, 9:29 a.m. UTC | #4
On 1/10/24 17:22, Johannes Thumshirn wrote:
> On 10.01.24 06:16, Damien Le Moal wrote:
>> equal to the BIO sizei, thus ensuring that bio_endio() is called.
>          typo: size ~^

Oops... Sent a fix. Thanks.

> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11c97afa0bc..cd59b172c8fc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -772,11 +772,16 @@  static void req_bio_endio(struct request *rq, struct bio *bio,
 		/*
 		 * Partial zone append completions cannot be supported as the
 		 * BIO fragments may end up not being written sequentially.
+		 * For such case, force the completed nbytes to be equal to
+		 * the BIO size so that bio_advance() sets the BIO remaining
+		 * size to 0 and we end up calling bio_endio() before returning.
 		 */
-		if (bio->bi_iter.bi_size != nbytes)
+		if (bio->bi_iter.bi_size != nbytes) {
 			bio->bi_status = BLK_STS_IOERR;
-		else
+			nbytes = bio->bi_iter.bi_size;
+		} else {
 			bio->bi_iter.bi_sector = rq->__sector;
+		}
 	}
 
 	bio_advance(bio, nbytes);