Message ID | 20220925185348.120964-2-p.raghav@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | enable plugging only for reads in zoned block devices | expand |
On 9/26/22 03:53, Pankaj Raghav wrote: > Modify blk_mq_plug() to allow plugging only for read operations in zoned > block devices as there are alternative IO paths in the linux block > layer which can end up doing a write via driver private requests in > sequential write zones. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > block/blk-mq.h | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 8ca453ac243d..005343df64ca 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -305,14 +305,15 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) > * change can cause write BIO failures with zoned block devices as these > * require sequential write patterns to zones. Prevent this from happening by > * ignoring the plug state of a BIO issuing context if it is for a zoned block > - * device and the BIO to plug is a write operation. > + * device and the BIO to plug is not a read operation. > + * > * > * Return current->plug if the bio can be plugged and NULL otherwise > */ > static inline struct blk_plug *blk_mq_plug( struct bio *bio) > { > - /* Zoned block device write operation case: do not plug the BIO */ > - if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) > + /* Allow plugging only for read operations in zoned block devices */ nit: s/in/with > + if (bdev_is_zoned(bio->bi_bdev) && bio_op(bio) != REQ_OP_READ) > return NULL; > > /* Otherwise, looks good to me. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: > Modify blk_mq_plug() to allow plugging only for read operations in zoned > block devices as there are alternative IO paths in the linux block > layer which can end up doing a write via driver private requests in > sequential write zones. We should be able to plug for all operations that are not REQ_OP_ZONE_APPEND just fine. I also really can't parse your commit log at all, what alternative paths are you talking about here?
On 9/26/22 8:37 AM, Christoph Hellwig wrote: > On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: >> Modify blk_mq_plug() to allow plugging only for read operations in zoned >> block devices as there are alternative IO paths in the linux block >> layer which can end up doing a write via driver private requests in >> sequential write zones. > > We should be able to plug for all operations that are not > REQ_OP_ZONE_APPEND just fine. Agree, I think we just want to make this about someone doing a series of appends. If you mix-and-match with passthrough you will have a bad time anyway.
On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote: > On 9/26/22 8:37 AM, Christoph Hellwig wrote: > > On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: > >> Modify blk_mq_plug() to allow plugging only for read operations in zoned > >> block devices as there are alternative IO paths in the linux block > >> layer which can end up doing a write via driver private requests in > >> sequential write zones. > > > > We should be able to plug for all operations that are not > > REQ_OP_ZONE_APPEND just fine. > > Agree, I think we just want to make this about someone doing a series > of appends. If you mix-and-match with passthrough you will have a bad > time anyway. Err, sorry - what I wrote about is compelte garbage. I initially wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then realized that we also want various other ones that have the write bit set batched. So I suspect we really want to explicitly check for REQ_OP_WRITE here.
On 9/26/22 8:43 AM, Christoph Hellwig wrote: > On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote: >> On 9/26/22 8:37 AM, Christoph Hellwig wrote: >>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: >>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned >>>> block devices as there are alternative IO paths in the linux block >>>> layer which can end up doing a write via driver private requests in >>>> sequential write zones. >>> >>> We should be able to plug for all operations that are not >>> REQ_OP_ZONE_APPEND just fine. >> >> Agree, I think we just want to make this about someone doing a series >> of appends. If you mix-and-match with passthrough you will have a bad >> time anyway. > > Err, sorry - what I wrote about is compelte garbage. I initially > wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then > realized that we also want various other ones that have the write bit > set batched. So I suspect we really want to explicitly check for > REQ_OP_WRITE here. My memory was a bit hazy, since we have separate ops for the driver in/out, I think just checking for REQ_OP_WRITE is indeed the right choice. That's the single case we need to care about.
On 2022-09-26 18:32, Jens Axboe wrote: > On 9/26/22 8:43 AM, Christoph Hellwig wrote: >> On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote: >>> On 9/26/22 8:37 AM, Christoph Hellwig wrote: >>>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: >>>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned >>>>> block devices as there are alternative IO paths in the linux block >>>>> layer which can end up doing a write via driver private requests in >>>>> sequential write zones. >>>> >>>> We should be able to plug for all operations that are not >>>> REQ_OP_ZONE_APPEND just fine. >>> >>> Agree, I think we just want to make this about someone doing a series >>> of appends. If you mix-and-match with passthrough you will have a bad >>> time anyway. >> >> Err, sorry - what I wrote about is compelte garbage. I initially >> wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then >> realized that we also want various other ones that have the write bit >> set batched. So I suspect we really want to explicitly check for >> REQ_OP_WRITE here. > > My memory was a bit hazy, since we have separate ops for the driver > in/out, I think just checking for REQ_OP_WRITE is indeed the right > choice. That's the single case we need to care about. > Ah. You are right. I missed it as well. There is even a comment from Christoph: * - if the least significant bit is set transfers are TO the device * - if the least significant bit is not set transfers are FROM the device I guess the second patch should be enough to apply plugging when applicable for uring_cmd based nvme passthrough requests.
On 9/26/22 1:20 PM, Pankaj Raghav wrote: > On 2022-09-26 18:32, Jens Axboe wrote: >> On 9/26/22 8:43 AM, Christoph Hellwig wrote: >>> On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote: >>>> On 9/26/22 8:37 AM, Christoph Hellwig wrote: >>>>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote: >>>>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned >>>>>> block devices as there are alternative IO paths in the linux block >>>>>> layer which can end up doing a write via driver private requests in >>>>>> sequential write zones. >>>>> >>>>> We should be able to plug for all operations that are not >>>>> REQ_OP_ZONE_APPEND just fine. >>>> >>>> Agree, I think we just want to make this about someone doing a series >>>> of appends. If you mix-and-match with passthrough you will have a bad >>>> time anyway. >>> >>> Err, sorry - what I wrote about is compelte garbage. I initially >>> wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then >>> realized that we also want various other ones that have the write bit >>> set batched. So I suspect we really want to explicitly check for >>> REQ_OP_WRITE here. >> >> My memory was a bit hazy, since we have separate ops for the driver >> in/out, I think just checking for REQ_OP_WRITE is indeed the right >> choice. That's the single case we need to care about. >> > Ah. You are right. I missed it as well. There is even a comment from > Christoph: > > * - if the least significant bit is set transfers are TO the device > * - if the least significant bit is not set transfers are FROM the device > > I guess the second patch should be enough to apply plugging when > applicable for uring_cmd based nvme passthrough requests. Do we even need the 2nd patch? If we're just doing passthrough for the blk_execute_nowait() API, then the condition should never trigger? If so, then it would be a cleanup just to ensure we're using a consistent API for getting the plug, which may be worthwhile to do separately for sure.
>> I guess the second patch should be enough to apply plugging when >> applicable for uring_cmd based nvme passthrough requests. > > Do we even need the 2nd patch? If we're just doing passthrough for the > blk_execute_nowait() API, then the condition should never trigger? I think this was the question I raised in your first version of the series. If we do a NVMe write using the passthrough interface, then we will be using REQ_OP_DRV_OUT op, which is: REQ_OP_DRV_OUT = (__force blk_opf_t)35, // write bit is set The condition in blk_mq_plug() will trigger as we only check if it is a _write_ (op & (__force blk_opf_t)1) to the device. Am I missing something? > If so, then it would be a cleanup just to ensure we're using a consistent > API for getting the plug, which may be worthwhile to do separately for > sure. > -- Pankaj
On 9/27/22 9:20 AM, Pankaj Raghav wrote: >>> I guess the second patch should be enough to apply plugging when >>> applicable for uring_cmd based nvme passthrough requests. >> >> Do we even need the 2nd patch? If we're just doing passthrough for the >> blk_execute_nowait() API, then the condition should never trigger? > > I think this was the question I raised in your first version of the series. > > If we do a NVMe write using the passthrough interface, then we will be > using REQ_OP_DRV_OUT op, which is: > > REQ_OP_DRV_OUT = (__force blk_opf_t)35, // write bit is set > > The condition in blk_mq_plug() will trigger as we only check if it is a > _write_ (op & (__force blk_opf_t)1) to the device. Am I missing something? Ah yes, good point. We used to have this notion of 'fs' request, don't think we do anymore. Because it really should just be: if (zoned && (op & REQ_OP_WRITE) && fs_request) return NULL; for that condition imho. I guess we could make it: if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) return NULL;
On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: > Ah yes, good point. We used to have this notion of 'fs' request, don't > think we do anymore. Because it really should just be: A fs request is a !passthrough request. > if (zoned && (op & REQ_OP_WRITE) && fs_request) > return NULL; > > for that condition imho. I guess we could make it: > > if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) > return NULL; Well, the only opcodes we do zone locking for is REQ_OP_WRITE and REQ_OP_WRITE_ZEROES. So this should be: if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) return NULL;
On 9/27/22 10:51 AM, Christoph Hellwig wrote: > On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >> Ah yes, good point. We used to have this notion of 'fs' request, don't >> think we do anymore. Because it really should just be: > > A fs request is a !passthrough request. Right, that's the condition I made below too. >> if (zoned && (op & REQ_OP_WRITE) && fs_request) >> return NULL; >> >> for that condition imho. I guess we could make it: >> >> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >> return NULL; > > Well, the only opcodes we do zone locking for is REQ_OP_WRITE and > REQ_OP_WRITE_ZEROES. So this should be: > > if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) > return NULL; I'd rather just make it explicit and use that. Pankaj, do you want to spin a v2 with that?
On 9/28/22 01:52, Jens Axboe wrote: > On 9/27/22 10:51 AM, Christoph Hellwig wrote: >> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >>> Ah yes, good point. We used to have this notion of 'fs' request, don't >>> think we do anymore. Because it really should just be: >> >> A fs request is a !passthrough request. > > Right, that's the condition I made below too. > >>> if (zoned && (op & REQ_OP_WRITE) && fs_request) >>> return NULL; >>> >>> for that condition imho. I guess we could make it: >>> >>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >>> return NULL; >> >> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >> REQ_OP_WRITE_ZEROES. So this should be: >> >> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >> return NULL; > > I'd rather just make it explicit and use that. Pankaj, do you want > to spin a v2 with that? It would be nice to reuse the bio equivalent of blk_req_needs_zone_write_lock(). The test would be: if (bio_needs_zone_write_locking()) return NULL; With something like: static inline bool bio_needs_zone_write_locking() { if (!bdev_is_zoned(bio->bi_bdev)) return false; switch (bio_op(bio)) { case REQ_OP_WRITE_ZEROES: case REQ_OP_WRITE: return true; default: return false; } } Which also has the advantage that going forward, we could refine this to plug writes to conventional zones (as these can be plugged).
On 9/28/22 08:07, Damien Le Moal wrote: > On 9/28/22 01:52, Jens Axboe wrote: >> On 9/27/22 10:51 AM, Christoph Hellwig wrote: >>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >>>> Ah yes, good point. We used to have this notion of 'fs' request, don't >>>> think we do anymore. Because it really should just be: >>> >>> A fs request is a !passthrough request. >> >> Right, that's the condition I made below too. >> >>>> if (zoned && (op & REQ_OP_WRITE) && fs_request) >>>> return NULL; >>>> >>>> for that condition imho. I guess we could make it: >>>> >>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >>>> return NULL; >>> >>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >>> REQ_OP_WRITE_ZEROES. So this should be: >>> >>> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >>> return NULL; >> >> I'd rather just make it explicit and use that. Pankaj, do you want >> to spin a v2 with that? > > It would be nice to reuse the bio equivalent of > blk_req_needs_zone_write_lock(). > > The test would be: > > if (bio_needs_zone_write_locking()) > return NULL; Note that we could also add a "IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&" to the condition or stub the helper to have this hunk disappear for the !CONFIG_BLK_DEV_ZONED case. > > With something like: > > static inline bool bio_needs_zone_write_locking() > { > if (!bdev_is_zoned(bio->bi_bdev)) > return false; > > switch (bio_op(bio)) { > case REQ_OP_WRITE_ZEROES: > > case REQ_OP_WRITE: > > return true; > default: > > return false; > > } > } > > Which also has the advantage that going forward, we could refine this to > plug writes to conventional zones (as these can be plugged). >
On 9/27/22 5:07 PM, Damien Le Moal wrote: > On 9/28/22 01:52, Jens Axboe wrote: >> On 9/27/22 10:51 AM, Christoph Hellwig wrote: >>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >>>> Ah yes, good point. We used to have this notion of 'fs' request, don't >>>> think we do anymore. Because it really should just be: >>> >>> A fs request is a !passthrough request. >> >> Right, that's the condition I made below too. >> >>>> if (zoned && (op & REQ_OP_WRITE) && fs_request) >>>> return NULL; >>>> >>>> for that condition imho. I guess we could make it: >>>> >>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >>>> return NULL; >>> >>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >>> REQ_OP_WRITE_ZEROES. So this should be: >>> >>> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >>> return NULL; >> >> I'd rather just make it explicit and use that. Pankaj, do you want >> to spin a v2 with that? > > It would be nice to reuse the bio equivalent of > blk_req_needs_zone_write_lock(). > > The test would be: > > if (bio_needs_zone_write_locking()) > return NULL; > > With something like: > > static inline bool bio_needs_zone_write_locking() > { > if (!bdev_is_zoned(bio->bi_bdev)) > return false; > > switch (bio_op(bio)) { > case REQ_OP_WRITE_ZEROES: > > case REQ_OP_WRITE: > > return true; > default: > > return false; > > } > } I'd be fine with that (using a shared helper), but let's please just make it: static inline bool op_is_zoned_write(bdev, op) { if (!bdev_is_zoned(bio->bi_bdev)) return false; return op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE; } and avoid a switch for this basic case and name it a bit more logically too. Not married to the above name, but the helper should not imply anything about zone locking. That's for the caller.
On 9/27/22 5:10 PM, Damien Le Moal wrote: > On 9/28/22 08:07, Damien Le Moal wrote: >> On 9/28/22 01:52, Jens Axboe wrote: >>> On 9/27/22 10:51 AM, Christoph Hellwig wrote: >>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't >>>>> think we do anymore. Because it really should just be: >>>> >>>> A fs request is a !passthrough request. >>> >>> Right, that's the condition I made below too. >>> >>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request) >>>>> return NULL; >>>>> >>>>> for that condition imho. I guess we could make it: >>>>> >>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >>>>> return NULL; >>>> >>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >>>> REQ_OP_WRITE_ZEROES. So this should be: >>>> >>>> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >>>> return NULL; >>> >>> I'd rather just make it explicit and use that. Pankaj, do you want >>> to spin a v2 with that? >> >> It would be nice to reuse the bio equivalent of >> blk_req_needs_zone_write_lock(). >> >> The test would be: >> >> if (bio_needs_zone_write_locking()) >> return NULL; > > Note that we could also add a "IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&" to > the condition or stub the helper to have this hunk disappear for the > !CONFIG_BLK_DEV_ZONED case. Indeed, that would be nice.
On 9/28/22 08:12, Jens Axboe wrote: > On 9/27/22 5:07 PM, Damien Le Moal wrote: >> On 9/28/22 01:52, Jens Axboe wrote: >>> On 9/27/22 10:51 AM, Christoph Hellwig wrote: >>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote: >>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't >>>>> think we do anymore. Because it really should just be: >>>> >>>> A fs request is a !passthrough request. >>> >>> Right, that's the condition I made below too. >>> >>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request) >>>>> return NULL; >>>>> >>>>> for that condition imho. I guess we could make it: >>>>> >>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT)) >>>>> return NULL; >>>> >>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >>>> REQ_OP_WRITE_ZEROES. So this should be: >>>> >>>> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >>>> return NULL; >>> >>> I'd rather just make it explicit and use that. Pankaj, do you want >>> to spin a v2 with that? >> >> It would be nice to reuse the bio equivalent of >> blk_req_needs_zone_write_lock(). >> >> The test would be: >> >> if (bio_needs_zone_write_locking()) >> return NULL; >> >> With something like: >> >> static inline bool bio_needs_zone_write_locking() >> { >> if (!bdev_is_zoned(bio->bi_bdev)) >> return false; >> >> switch (bio_op(bio)) { >> case REQ_OP_WRITE_ZEROES: >> >> case REQ_OP_WRITE: >> >> return true; >> default: >> >> return false; >> >> } >> } > > I'd be fine with that (using a shared helper), but let's please just > make it: > > static inline bool op_is_zoned_write(bdev, op) > { > if (!bdev_is_zoned(bio->bi_bdev)) > return false; > > return op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE; Works for me. Nit: should have REQ_OP_WRITE first as that is the most common case. > } > > and avoid a switch for this basic case and name it a bit more logically > too. Not married to the above name, but the helper should not imply > anything about zone locking. That's for the caller. blk_req_needs_zone_write_lock() would become: bool blk_req_needs_zone_write_lock(struct request *rq) { if (blk_rq_is_passthrough(rq)) return false; if (!rq->q->disk->seq_zones_wlock) return false; return op_is_zoned_write(rq->q->disk->part0, req_op(rq)); }
On 2022-09-27 18:52, Jens Axboe wrote: >> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >> REQ_OP_WRITE_ZEROES. So this should be: >> >> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >> return NULL; > > I'd rather just make it explicit and use that. Pankaj, do you want > to spin a v2 with that? > Based on all the suggestions: block: adapt blk_mq_plug() to not plug for writes that require a zone lock The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> diff --git a/block/blk-mq.h b/block/blk-mq.h index 8ca453ac243d..297289cdd521 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -312,7 +312,8 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) static inline struct blk_plug *blk_mq_plug( struct bio *bio) { /* Zoned block device write operation case: do not plug the BIO */ - if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) + if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && + blk_op_is_zoned_write(bio->bi_bdev, bio_op(bio))) return NULL; /* diff --git a/block/blk-zoned.c b/block/blk-zoned.c index a264621d4905..fa926424edb6 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -63,13 +63,10 @@ bool blk_req_needs_zone_write_lock(struct request *rq) if (!rq->q->disk->seq_zones_wlock) return false; - switch (req_op(rq)) { - case REQ_OP_WRITE_ZEROES: - case REQ_OP_WRITE: + if (blk_op_is_zoned_write(rq->q->disk->part0, req_op(rq))) return blk_rq_zone_is_seq(rq); - default: - return false; - } + + return false; } EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 8038c5fbde40..719025028fa4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1300,6 +1300,15 @@ static inline bool bdev_is_zoned(struct block_device *bdev) return false; } +static inline bool blk_op_is_zoned_write(struct block_device *bdev, + blk_opf_t op) +{ + if (!bdev_is_zoned(bdev)) + return false; + + return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES; +} + static inline sector_t bdev_zone_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); Does this look fine?
On 2022/09/28 20:57, Pankaj Raghav wrote: > On 2022-09-27 18:52, Jens Axboe wrote: >>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and >>> REQ_OP_WRITE_ZEROES. So this should be: >>> >>> if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES)) >>> return NULL; >> >> I'd rather just make it explicit and use that. Pankaj, do you want >> to spin a v2 with that? >> > > Based on all the suggestions: > > block: adapt blk_mq_plug() to not plug for writes that require a zone lock > > The current implementation of blk_mq_plug() disables plugging for all > operations that involves a transfer to the device as we just check if > the last bit in op_is_write() function. > > Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and > REQ_OP_WRITE_ZEROS as they might require a zone lock. > > Suggested-by: Christoph Hellwig <hch@infradead.org> > Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 8ca453ac243d..297289cdd521 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -312,7 +312,8 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) > static inline struct blk_plug *blk_mq_plug( struct bio *bio) > { > /* Zoned block device write operation case: do not plug the BIO */ > - if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) > + if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && > + blk_op_is_zoned_write(bio->bi_bdev, bio_op(bio))) > return NULL; > > /* > diff --git a/block/blk-zoned.c b/block/blk-zoned.c > index a264621d4905..fa926424edb6 100644 > --- a/block/blk-zoned.c > +++ b/block/blk-zoned.c > @@ -63,13 +63,10 @@ bool blk_req_needs_zone_write_lock(struct request *rq) > if (!rq->q->disk->seq_zones_wlock) > return false; > > - switch (req_op(rq)) { > - case REQ_OP_WRITE_ZEROES: > - case REQ_OP_WRITE: > + if (blk_op_is_zoned_write(rq->q->disk->part0, req_op(rq))) > return blk_rq_zone_is_seq(rq); > - default: > - return false; > - } > + > + return false; > } > EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 8038c5fbde40..719025028fa4 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1300,6 +1300,15 @@ static inline bool bdev_is_zoned(struct block_device *bdev) > return false; > } > > +static inline bool blk_op_is_zoned_write(struct block_device *bdev, > + blk_opf_t op) To be consistent, I personally would prefer bdev_op_is_zoned_write() as the name here (the first argument is a bdev). > +{ > + if (!bdev_is_zoned(bdev)) > + return false; > + > + return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES; > +} > + > static inline sector_t bdev_zone_sectors(struct block_device *bdev) > { > struct request_queue *q = bdev_get_queue(bdev); > > > Does this look fine?
diff --git a/block/blk-mq.h b/block/blk-mq.h index 8ca453ac243d..005343df64ca 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -305,14 +305,15 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) * change can cause write BIO failures with zoned block devices as these * require sequential write patterns to zones. Prevent this from happening by * ignoring the plug state of a BIO issuing context if it is for a zoned block - * device and the BIO to plug is a write operation. + * device and the BIO to plug is not a read operation. + * * * Return current->plug if the bio can be plugged and NULL otherwise */ static inline struct blk_plug *blk_mq_plug( struct bio *bio) { - /* Zoned block device write operation case: do not plug the BIO */ - if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) + /* Allow plugging only for read operations in zoned block devices */ + if (bdev_is_zoned(bio->bi_bdev) && bio_op(bio) != REQ_OP_READ) return NULL; /*
Modify blk_mq_plug() to allow plugging only for read operations in zoned block devices as there are alternative IO paths in the linux block layer which can end up doing a write via driver private requests in sequential write zones. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> --- block/blk-mq.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)