diff mbox series

[1/2] block: modify blk_mq_plug() to allow only reads for zoned block devices

Message ID 20220925185348.120964-2-p.raghav@samsung.com (mailing list archive)
State New, archived
Headers show
Series enable plugging only for reads in zoned block devices | expand

Commit Message

Pankaj Raghav Sept. 25, 2022, 6:53 p.m. UTC
Modify blk_mq_plug() to allow plugging only for read operations in zoned
block devices as there are alternative IO paths in the linux block
layer which can end up doing a write via driver private requests in
sequential write zones.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 block/blk-mq.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Damien Le Moal Sept. 25, 2022, 10:55 p.m. UTC | #1
On 9/26/22 03:53, Pankaj Raghav wrote:
> Modify blk_mq_plug() to allow plugging only for read operations in zoned
> block devices as there are alternative IO paths in the linux block
> layer which can end up doing a write via driver private requests in
> sequential write zones.
> 
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
>   block/blk-mq.h | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 8ca453ac243d..005343df64ca 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -305,14 +305,15 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
>    * change can cause write BIO failures with zoned block devices as these
>    * require sequential write patterns to zones. Prevent this from happening by
>    * ignoring the plug state of a BIO issuing context if it is for a zoned block
> - * device and the BIO to plug is a write operation.
> + * device and the BIO to plug is not a read operation.
> + *
>    *
>    * Return current->plug if the bio can be plugged and NULL otherwise
>    */
>   static inline struct blk_plug *blk_mq_plug( struct bio *bio)
>   {
> -	/* Zoned block device write operation case: do not plug the BIO */
> -	if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> +	/* Allow plugging only for read operations in zoned block devices */

nit: s/in/with

> +	if (bdev_is_zoned(bio->bi_bdev) && bio_op(bio) != REQ_OP_READ)
>   		return NULL;
>   
>   	/*

Otherwise, looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Christoph Hellwig Sept. 26, 2022, 2:37 p.m. UTC | #2
On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
> Modify blk_mq_plug() to allow plugging only for read operations in zoned
> block devices as there are alternative IO paths in the linux block
> layer which can end up doing a write via driver private requests in
> sequential write zones.

We should be able to plug for all operations that are not
REQ_OP_ZONE_APPEND just fine.

I also really can't parse your commit log at all, what alternative
paths are you talking about here?
Jens Axboe Sept. 26, 2022, 2:40 p.m. UTC | #3
On 9/26/22 8:37 AM, Christoph Hellwig wrote:
> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
>> Modify blk_mq_plug() to allow plugging only for read operations in zoned
>> block devices as there are alternative IO paths in the linux block
>> layer which can end up doing a write via driver private requests in
>> sequential write zones.
> 
> We should be able to plug for all operations that are not
> REQ_OP_ZONE_APPEND just fine.

Agree, I think we just want to make this about someone doing a series
of appends. If you mix-and-match with passthrough you will have a bad
time anyway.
Christoph Hellwig Sept. 26, 2022, 2:43 p.m. UTC | #4
On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote:
> On 9/26/22 8:37 AM, Christoph Hellwig wrote:
> > On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
> >> Modify blk_mq_plug() to allow plugging only for read operations in zoned
> >> block devices as there are alternative IO paths in the linux block
> >> layer which can end up doing a write via driver private requests in
> >> sequential write zones.
> > 
> > We should be able to plug for all operations that are not
> > REQ_OP_ZONE_APPEND just fine.
> 
> Agree, I think we just want to make this about someone doing a series
> of appends. If you mix-and-match with passthrough you will have a bad
> time anyway.

Err, sorry - what I wrote about is compelte garbage.  I initially
wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then
realized that we also want various other ones that have the write bit
set batched.  So I suspect we really want to explicitly check for
REQ_OP_WRITE here.
Jens Axboe Sept. 26, 2022, 4:32 p.m. UTC | #5
On 9/26/22 8:43 AM, Christoph Hellwig wrote:
> On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote:
>> On 9/26/22 8:37 AM, Christoph Hellwig wrote:
>>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
>>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned
>>>> block devices as there are alternative IO paths in the linux block
>>>> layer which can end up doing a write via driver private requests in
>>>> sequential write zones.
>>>
>>> We should be able to plug for all operations that are not
>>> REQ_OP_ZONE_APPEND just fine.
>>
>> Agree, I think we just want to make this about someone doing a series
>> of appends. If you mix-and-match with passthrough you will have a bad
>> time anyway.
> 
> Err, sorry - what I wrote about is compelte garbage.  I initially
> wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then
> realized that we also want various other ones that have the write bit
> set batched.  So I suspect we really want to explicitly check for
> REQ_OP_WRITE here.

My memory was a bit hazy, since we have separate ops for the driver
in/out, I think just checking for REQ_OP_WRITE is indeed the right
choice. That's the single case we need to care about.
Pankaj Raghav Sept. 26, 2022, 7:20 p.m. UTC | #6
On 2022-09-26 18:32, Jens Axboe wrote:
> On 9/26/22 8:43 AM, Christoph Hellwig wrote:
>> On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote:
>>> On 9/26/22 8:37 AM, Christoph Hellwig wrote:
>>>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
>>>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned
>>>>> block devices as there are alternative IO paths in the linux block
>>>>> layer which can end up doing a write via driver private requests in
>>>>> sequential write zones.
>>>>
>>>> We should be able to plug for all operations that are not
>>>> REQ_OP_ZONE_APPEND just fine.
>>>
>>> Agree, I think we just want to make this about someone doing a series
>>> of appends. If you mix-and-match with passthrough you will have a bad
>>> time anyway.
>>
>> Err, sorry - what I wrote about is compelte garbage.  I initially
>> wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then
>> realized that we also want various other ones that have the write bit
>> set batched.  So I suspect we really want to explicitly check for
>> REQ_OP_WRITE here.
> 
> My memory was a bit hazy, since we have separate ops for the driver
> in/out, I think just checking for REQ_OP_WRITE is indeed the right
> choice. That's the single case we need to care about.
> 
Ah. You are right. I missed it as well. There is even a comment from Christoph:

 *   - if the least significant bit is set transfers are TO the device
 *   - if the least significant bit is not set transfers are FROM the device

I guess the second patch should be enough to apply plugging when applicable
for uring_cmd based nvme passthrough requests.
Jens Axboe Sept. 26, 2022, 7:25 p.m. UTC | #7
On 9/26/22 1:20 PM, Pankaj Raghav wrote:
> On 2022-09-26 18:32, Jens Axboe wrote:
>> On 9/26/22 8:43 AM, Christoph Hellwig wrote:
>>> On Mon, Sep 26, 2022 at 08:40:54AM -0600, Jens Axboe wrote:
>>>> On 9/26/22 8:37 AM, Christoph Hellwig wrote:
>>>>> On Sun, Sep 25, 2022 at 08:53:46PM +0200, Pankaj Raghav wrote:
>>>>>> Modify blk_mq_plug() to allow plugging only for read operations in zoned
>>>>>> block devices as there are alternative IO paths in the linux block
>>>>>> layer which can end up doing a write via driver private requests in
>>>>>> sequential write zones.
>>>>>
>>>>> We should be able to plug for all operations that are not
>>>>> REQ_OP_ZONE_APPEND just fine.
>>>>
>>>> Agree, I think we just want to make this about someone doing a series
>>>> of appends. If you mix-and-match with passthrough you will have a bad
>>>> time anyway.
>>>
>>> Err, sorry - what I wrote about is compelte garbage.  I initially
>>> wanted to say you can plug for REQ_OP_ZONE_APPEND just fine, and then
>>> realized that we also want various other ones that have the write bit
>>> set batched.  So I suspect we really want to explicitly check for
>>> REQ_OP_WRITE here.
>>
>> My memory was a bit hazy, since we have separate ops for the driver
>> in/out, I think just checking for REQ_OP_WRITE is indeed the right
>> choice. That's the single case we need to care about.
>>
> Ah. You are right. I missed it as well. There is even a comment from
> Christoph:
> 
>  *   - if the least significant bit is set transfers are TO the device
>  *   - if the least significant bit is not set transfers are FROM the device
> 
> I guess the second patch should be enough to apply plugging when
> applicable for uring_cmd based nvme passthrough requests.

Do we even need the 2nd patch? If we're just doing passthrough for the
blk_execute_nowait() API, then the condition should never trigger? If
so, then it would be a cleanup just to ensure we're using a consistent
API for getting the plug, which may be worthwhile to do separately for
sure.
Pankaj Raghav Sept. 27, 2022, 3:20 p.m. UTC | #8
>> I guess the second patch should be enough to apply plugging when
>> applicable for uring_cmd based nvme passthrough requests.
> 
> Do we even need the 2nd patch? If we're just doing passthrough for the
> blk_execute_nowait() API, then the condition should never trigger? 

I think this was the question I raised in your first version of the series.

If we do a NVMe write using the passthrough interface, then we will be
using REQ_OP_DRV_OUT op, which is:

REQ_OP_DRV_OUT		= (__force blk_opf_t)35, // write bit is set

The condition in blk_mq_plug() will trigger as we only check if it is a
_write_ (op & (__force blk_opf_t)1) to the device. Am I missing something?

> If so, then it would be a cleanup just to ensure we're using a consistent
> API for getting the plug, which may be worthwhile to do separately for
> sure.
> 

--
Pankaj
Jens Axboe Sept. 27, 2022, 4:04 p.m. UTC | #9
On 9/27/22 9:20 AM, Pankaj Raghav wrote:
>>> I guess the second patch should be enough to apply plugging when
>>> applicable for uring_cmd based nvme passthrough requests.
>>
>> Do we even need the 2nd patch? If we're just doing passthrough for the
>> blk_execute_nowait() API, then the condition should never trigger? 
> 
> I think this was the question I raised in your first version of the series.
> 
> If we do a NVMe write using the passthrough interface, then we will be
> using REQ_OP_DRV_OUT op, which is:
> 
> REQ_OP_DRV_OUT		= (__force blk_opf_t)35, // write bit is set
> 
> The condition in blk_mq_plug() will trigger as we only check if it is a
> _write_ (op & (__force blk_opf_t)1) to the device. Am I missing something?

Ah yes, good point. We used to have this notion of 'fs' request, don't
think we do anymore. Because it really should just be:

if (zoned && (op & REQ_OP_WRITE) && fs_request)
         return NULL;

for that condition imho. I guess we could make it:

if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
         return NULL;
Christoph Hellwig Sept. 27, 2022, 4:51 p.m. UTC | #10
On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
> Ah yes, good point. We used to have this notion of 'fs' request, don't
> think we do anymore. Because it really should just be:

A fs request is a !passthrough request.

> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>          return NULL;
> 
> for that condition imho. I guess we could make it:
> 
> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>          return NULL;

Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
REQ_OP_WRITE_ZEROES.  So this should be:

	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
		return NULL;
Jens Axboe Sept. 27, 2022, 4:52 p.m. UTC | #11
On 9/27/22 10:51 AM, Christoph Hellwig wrote:
> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>> think we do anymore. Because it really should just be:
> 
> A fs request is a !passthrough request.

Right, that's the condition I made below too.

>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>          return NULL;
>>
>> for that condition imho. I guess we could make it:
>>
>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>          return NULL;
> 
> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
> REQ_OP_WRITE_ZEROES.  So this should be:
> 
> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
> 		return NULL;

I'd rather just make it explicit and use that. Pankaj, do you want
to spin a v2 with that?
Damien Le Moal Sept. 27, 2022, 11:07 p.m. UTC | #12
On 9/28/22 01:52, Jens Axboe wrote:
> On 9/27/22 10:51 AM, Christoph Hellwig wrote:
>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>>> think we do anymore. Because it really should just be:
>>
>> A fs request is a !passthrough request.
> 
> Right, that's the condition I made below too.
> 
>>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>>          return NULL;
>>>
>>> for that condition imho. I guess we could make it:
>>>
>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>>          return NULL;
>>
>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>> REQ_OP_WRITE_ZEROES.  So this should be:
>>
>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>> 		return NULL;
> 
> I'd rather just make it explicit and use that. Pankaj, do you want
> to spin a v2 with that?

It would be nice to reuse the bio equivalent of
blk_req_needs_zone_write_lock().

The test would be:

	if (bio_needs_zone_write_locking())
		return NULL;

With something like:

static inline bool bio_needs_zone_write_locking()
{
	 if (!bdev_is_zoned(bio->bi_bdev))
		return false;

	switch (bio_op(bio)) {
        case REQ_OP_WRITE_ZEROES:

        case REQ_OP_WRITE:

                return true;
        default:

                return false;

        }
}

Which also has the advantage that going forward, we could refine this to
plug writes to conventional zones (as these can be plugged).
Damien Le Moal Sept. 27, 2022, 11:10 p.m. UTC | #13
On 9/28/22 08:07, Damien Le Moal wrote:
> On 9/28/22 01:52, Jens Axboe wrote:
>> On 9/27/22 10:51 AM, Christoph Hellwig wrote:
>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>>>> think we do anymore. Because it really should just be:
>>>
>>> A fs request is a !passthrough request.
>>
>> Right, that's the condition I made below too.
>>
>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>>>          return NULL;
>>>>
>>>> for that condition imho. I guess we could make it:
>>>>
>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>>>          return NULL;
>>>
>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>>> REQ_OP_WRITE_ZEROES.  So this should be:
>>>
>>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>>> 		return NULL;
>>
>> I'd rather just make it explicit and use that. Pankaj, do you want
>> to spin a v2 with that?
> 
> It would be nice to reuse the bio equivalent of
> blk_req_needs_zone_write_lock().
> 
> The test would be:
> 
> 	if (bio_needs_zone_write_locking())
> 		return NULL;

Note that we could also add a "IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&" to the
condition or stub the helper to have this hunk disappear for the
!CONFIG_BLK_DEV_ZONED case.

> 
> With something like:
> 
> static inline bool bio_needs_zone_write_locking()
> {
> 	 if (!bdev_is_zoned(bio->bi_bdev))
> 		return false;
> 
> 	switch (bio_op(bio)) {
>         case REQ_OP_WRITE_ZEROES:
> 
>         case REQ_OP_WRITE:
> 
>                 return true;
>         default:
> 
>                 return false;
> 
>         }
> }
> 
> Which also has the advantage that going forward, we could refine this to
> plug writes to conventional zones (as these can be plugged).
>
Jens Axboe Sept. 27, 2022, 11:12 p.m. UTC | #14
On 9/27/22 5:07 PM, Damien Le Moal wrote:
> On 9/28/22 01:52, Jens Axboe wrote:
>> On 9/27/22 10:51 AM, Christoph Hellwig wrote:
>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>>>> think we do anymore. Because it really should just be:
>>>
>>> A fs request is a !passthrough request.
>>
>> Right, that's the condition I made below too.
>>
>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>>>          return NULL;
>>>>
>>>> for that condition imho. I guess we could make it:
>>>>
>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>>>          return NULL;
>>>
>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>>> REQ_OP_WRITE_ZEROES.  So this should be:
>>>
>>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>>> 		return NULL;
>>
>> I'd rather just make it explicit and use that. Pankaj, do you want
>> to spin a v2 with that?
> 
> It would be nice to reuse the bio equivalent of
> blk_req_needs_zone_write_lock().
> 
> The test would be:
> 
> 	if (bio_needs_zone_write_locking())
> 		return NULL;
> 
> With something like:
> 
> static inline bool bio_needs_zone_write_locking()
> {
> 	 if (!bdev_is_zoned(bio->bi_bdev))
> 		return false;
> 
> 	switch (bio_op(bio)) {
>         case REQ_OP_WRITE_ZEROES:
> 
>         case REQ_OP_WRITE:
> 
>                 return true;
>         default:
> 
>                 return false;
> 
>         }
> }

I'd be fine with that (using a shared helper), but let's please just
make it:

static inline bool op_is_zoned_write(bdev, op)
{
	 if (!bdev_is_zoned(bio->bi_bdev))
		return false;

	return op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE;
}

and avoid a switch for this basic case and name it a bit more logically
too. Not married to the above name, but the helper should not imply
anything about zone locking. That's for the caller.
Jens Axboe Sept. 27, 2022, 11:13 p.m. UTC | #15
On 9/27/22 5:10 PM, Damien Le Moal wrote:
> On 9/28/22 08:07, Damien Le Moal wrote:
>> On 9/28/22 01:52, Jens Axboe wrote:
>>> On 9/27/22 10:51 AM, Christoph Hellwig wrote:
>>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>>>>> think we do anymore. Because it really should just be:
>>>>
>>>> A fs request is a !passthrough request.
>>>
>>> Right, that's the condition I made below too.
>>>
>>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>>>>          return NULL;
>>>>>
>>>>> for that condition imho. I guess we could make it:
>>>>>
>>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>>>>          return NULL;
>>>>
>>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>>>> REQ_OP_WRITE_ZEROES.  So this should be:
>>>>
>>>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>>>> 		return NULL;
>>>
>>> I'd rather just make it explicit and use that. Pankaj, do you want
>>> to spin a v2 with that?
>>
>> It would be nice to reuse the bio equivalent of
>> blk_req_needs_zone_write_lock().
>>
>> The test would be:
>>
>> 	if (bio_needs_zone_write_locking())
>> 		return NULL;
> 
> Note that we could also add a "IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&" to
> the condition or stub the helper to have this hunk disappear for the
> !CONFIG_BLK_DEV_ZONED case.

Indeed, that would be nice.
Damien Le Moal Sept. 27, 2022, 11:35 p.m. UTC | #16
On 9/28/22 08:12, Jens Axboe wrote:
> On 9/27/22 5:07 PM, Damien Le Moal wrote:
>> On 9/28/22 01:52, Jens Axboe wrote:
>>> On 9/27/22 10:51 AM, Christoph Hellwig wrote:
>>>> On Tue, Sep 27, 2022 at 10:04:19AM -0600, Jens Axboe wrote:
>>>>> Ah yes, good point. We used to have this notion of 'fs' request, don't
>>>>> think we do anymore. Because it really should just be:
>>>>
>>>> A fs request is a !passthrough request.
>>>
>>> Right, that's the condition I made below too.
>>>
>>>>> if (zoned && (op & REQ_OP_WRITE) && fs_request)
>>>>>          return NULL;
>>>>>
>>>>> for that condition imho. I guess we could make it:
>>>>>
>>>>> if (zoned && (op & REQ_OP_WRITE) && !(op & REQ_OP_DRV_OUT))
>>>>>          return NULL;
>>>>
>>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>>>> REQ_OP_WRITE_ZEROES.  So this should be:
>>>>
>>>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>>>> 		return NULL;
>>>
>>> I'd rather just make it explicit and use that. Pankaj, do you want
>>> to spin a v2 with that?
>>
>> It would be nice to reuse the bio equivalent of
>> blk_req_needs_zone_write_lock().
>>
>> The test would be:
>>
>> 	if (bio_needs_zone_write_locking())
>> 		return NULL;
>>
>> With something like:
>>
>> static inline bool bio_needs_zone_write_locking()
>> {
>> 	 if (!bdev_is_zoned(bio->bi_bdev))
>> 		return false;
>>
>> 	switch (bio_op(bio)) {
>>         case REQ_OP_WRITE_ZEROES:
>>
>>         case REQ_OP_WRITE:
>>
>>                 return true;
>>         default:
>>
>>                 return false;
>>
>>         }
>> }
> 
> I'd be fine with that (using a shared helper), but let's please just
> make it:
> 
> static inline bool op_is_zoned_write(bdev, op)
> {
> 	 if (!bdev_is_zoned(bio->bi_bdev))
> 		return false;
> 
> 	return op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE;

Works for me. Nit: should have REQ_OP_WRITE first as that is the most
common case.

> }
> 
> and avoid a switch for this basic case and name it a bit more logically
> too. Not married to the above name, but the helper should not imply
> anything about zone locking. That's for the caller.

blk_req_needs_zone_write_lock() would become:

bool blk_req_needs_zone_write_lock(struct request *rq)

{

	if (blk_rq_is_passthrough(rq))
		return false;

	if (!rq->q->disk->seq_zones_wlock)
		return false;

        return op_is_zoned_write(rq->q->disk->part0, req_op(rq));


}
Pankaj Raghav Sept. 28, 2022, 11:57 a.m. UTC | #17
On 2022-09-27 18:52, Jens Axboe wrote:
>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>> REQ_OP_WRITE_ZEROES.  So this should be:
>>
>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>> 		return NULL;
> 
> I'd rather just make it explicit and use that. Pankaj, do you want
> to spin a v2 with that?
> 

Based on all the suggestions:

block: adapt blk_mq_plug() to not plug for writes that require a zone lock

The current implementation of blk_mq_plug() disables plugging for all
operations that involves a transfer to the device as we just check if
the last bit in op_is_write() function.

Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and
REQ_OP_WRITE_ZEROS as they might require a zone lock.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 8ca453ac243d..297289cdd521 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -312,7 +312,8 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
 static inline struct blk_plug *blk_mq_plug( struct bio *bio)
 {
 	/* Zoned block device write operation case: do not plug the BIO */
-	if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+	    blk_op_is_zoned_write(bio->bi_bdev, bio_op(bio)))
 		return NULL;

 	/*
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index a264621d4905..fa926424edb6 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -63,13 +63,10 @@ bool blk_req_needs_zone_write_lock(struct request *rq)
 	if (!rq->q->disk->seq_zones_wlock)
 		return false;

-	switch (req_op(rq)) {
-	case REQ_OP_WRITE_ZEROES:
-	case REQ_OP_WRITE:
+	if (blk_op_is_zoned_write(rq->q->disk->part0, req_op(rq)))
 		return blk_rq_zone_is_seq(rq);
-	default:
-		return false;
-	}
+
+	return false;
 }
 EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8038c5fbde40..719025028fa4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1300,6 +1300,15 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }

+static inline bool blk_op_is_zoned_write(struct block_device *bdev,
+					 blk_opf_t op)
+{
+	if (!bdev_is_zoned(bdev))
+		return false;
+
+	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
+}
+
 static inline sector_t bdev_zone_sectors(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);


Does this look fine?
Damien Le Moal Sept. 28, 2022, 10:19 p.m. UTC | #18
On 2022/09/28 20:57, Pankaj Raghav wrote:
> On 2022-09-27 18:52, Jens Axboe wrote:
>>> Well, the only opcodes we do zone locking for is REQ_OP_WRITE and
>>> REQ_OP_WRITE_ZEROES.  So this should be:
>>>
>>> 	if (zoned && (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES))
>>> 		return NULL;
>>
>> I'd rather just make it explicit and use that. Pankaj, do you want
>> to spin a v2 with that?
>>
> 
> Based on all the suggestions:
> 
> block: adapt blk_mq_plug() to not plug for writes that require a zone lock
> 
> The current implementation of blk_mq_plug() disables plugging for all
> operations that involves a transfer to the device as we just check if
> the last bit in op_is_write() function.
> 
> Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and
> REQ_OP_WRITE_ZEROS as they might require a zone lock.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> 
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 8ca453ac243d..297289cdd521 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -312,7 +312,8 @@ static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
>  static inline struct blk_plug *blk_mq_plug( struct bio *bio)
>  {
>  	/* Zoned block device write operation case: do not plug the BIO */
> -	if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
> +	    blk_op_is_zoned_write(bio->bi_bdev, bio_op(bio)))
>  		return NULL;
> 
>  	/*
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index a264621d4905..fa926424edb6 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -63,13 +63,10 @@ bool blk_req_needs_zone_write_lock(struct request *rq)
>  	if (!rq->q->disk->seq_zones_wlock)
>  		return false;
> 
> -	switch (req_op(rq)) {
> -	case REQ_OP_WRITE_ZEROES:
> -	case REQ_OP_WRITE:
> +	if (blk_op_is_zoned_write(rq->q->disk->part0, req_op(rq)))
>  		return blk_rq_zone_is_seq(rq);
> -	default:
> -		return false;
> -	}
> +
> +	return false;
>  }
>  EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 8038c5fbde40..719025028fa4 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1300,6 +1300,15 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
>  	return false;
>  }
> 
> +static inline bool blk_op_is_zoned_write(struct block_device *bdev,
> +					 blk_opf_t op)

To be consistent, I personally would prefer bdev_op_is_zoned_write() as the name
here (the first argument is a bdev).

> +{
> +	if (!bdev_is_zoned(bdev))
> +		return false;
> +
> +	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
> +}
> +
>  static inline sector_t bdev_zone_sectors(struct block_device *bdev)
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
> 
> 
> Does this look fine?
diff mbox series

Patch

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 8ca453ac243d..005343df64ca 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -305,14 +305,15 @@  static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
  * change can cause write BIO failures with zoned block devices as these
  * require sequential write patterns to zones. Prevent this from happening by
  * ignoring the plug state of a BIO issuing context if it is for a zoned block
- * device and the BIO to plug is a write operation.
+ * device and the BIO to plug is not a read operation.
+ *
  *
  * Return current->plug if the bio can be plugged and NULL otherwise
  */
 static inline struct blk_plug *blk_mq_plug( struct bio *bio)
 {
-	/* Zoned block device write operation case: do not plug the BIO */
-	if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
+	/* Allow plugging only for read operations in zoned block devices */
+	if (bdev_is_zoned(bio->bi_bdev) && bio_op(bio) != REQ_OP_READ)
 		return NULL;
 
 	/*