[V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
diff mbox series

Message ID 1549936585-1702-1-git-send-email-jianchao.w.wang@oracle.com
State New
Headers show
Series
  • [V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
Related show

Commit Message

jianchao.wang Feb. 12, 2019, 1:56 a.m. UTC
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),

   kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
   kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
   kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
   kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]

(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
V2:
 - refactor the code based on Jens' suggestion

 block/blk-mq.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Jens Axboe Feb. 12, 2019, 2:51 a.m. UTC | #1
On 2/11/19 6:56 PM, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
> 
>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> 
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.

Looks good to me, I'll add this for 5.0.
Ming Lei Feb. 15, 2019, 2 a.m. UTC | #2
On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
> 
>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> 
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.

scsi_mq_requeue_cmd() does uninit the request before requeuing, but
__scsi_queue_insert doesn't do that.


> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.
> 
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> ---
> V2:
>  - refactor the code based on Jens' suggestion
> 
>  block/blk-mq.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8f5b533..9437a5e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	spin_unlock_irq(&q->requeue_lock);
>  
>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>  			continue;
>  
>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>  		list_del_init(&rq->queuelist);
> -		blk_mq_sched_insert_request(rq, true, false, false);
> +		/*
> +		 * If RQF_DONTPREP, rq has contained some driver specific
> +		 * data, so insert it to hctx dispatch list to avoid any
> +		 * merge.
> +		 */
> +		if (rq->rq_flags & RQF_DONTPREP)
> +			blk_mq_request_bypass_insert(rq, false);
> +		else
> +			blk_mq_sched_insert_request(rq, true, false, false);
>  	}

Suppose it is one WRITE request to zone device, this way might break
the order.


Thanks,
Ming
jianchao.wang Feb. 15, 2019, 2:34 a.m. UTC | #3
Hi Ming

Thanks for your kindly response.

On 2/15/19 10:00 AM, Ming Lei wrote:
> On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
>> When requeue, if RQF_DONTPREP, rq has contained some driver
>> specific data, so insert it to hctx dispatch list to avoid any
>> merge. Take scsi as example, here is the trace event log (no
>> io scheduler, because RQF_STARTED would prevent merging),
>>
>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>
>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> 
> scsi_mq_requeue_cmd() does uninit the request before requeuing, but
> __scsi_queue_insert doesn't do that.

Yes.
scsi layer use both of them.

> 
> 
>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>> the sdb only contained the part of (32768 + 8), then only that part
>> was completed. The lucky thing was that scsi_io_completion detected
>> it and requeued the remaining part. So we didn't get corrupted data.
>> However, the requeue of (32776 + 8) is not expected.
>>
>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>> ---
>> V2:
>>  - refactor the code based on Jens' suggestion
>>
>>  block/blk-mq.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 8f5b533..9437a5e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>>  	spin_unlock_irq(&q->requeue_lock);
>>  
>>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
>> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
>> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>>  			continue;
>>  
>>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>>  		list_del_init(&rq->queuelist);
>> -		blk_mq_sched_insert_request(rq, true, false, false);
>> +		/*
>> +		 * If RQF_DONTPREP, rq has contained some driver specific
>> +		 * data, so insert it to hctx dispatch list to avoid any
>> +		 * merge.
>> +		 */
>> +		if (rq->rq_flags & RQF_DONTPREP)
>> +			blk_mq_request_bypass_insert(rq, false);
>> +		else
>> +			blk_mq_sched_insert_request(rq, true, false, false);
>>  	}
> 
> Suppose it is one WRITE request to zone device, this way might break
> the order.

I'm not sure about this.
Since the request is dispatched, it should hold and zone write lock.
And also mq-deadline doesn't have a .requeue_request, zone write lock
wouldn't be released during requeue.

IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
The latter one also issues the request to underlying driver and requeue rqs
on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.

And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
it could be stop merging as RQF_NOMERGE_FLAGS contains it. 

Thanks
Jianchao
Ming Lei Feb. 15, 2019, 3:14 a.m. UTC | #4
On Fri, Feb 15, 2019 at 10:34:39AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Thanks for your kindly response.
> 
> On 2/15/19 10:00 AM, Ming Lei wrote:
> > On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
> >> When requeue, if RQF_DONTPREP, rq has contained some driver
> >> specific data, so insert it to hctx dispatch list to avoid any
> >> merge. Take scsi as example, here is the trace event log (no
> >> io scheduler, because RQF_STARTED would prevent merging),
> >>
> >>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> >> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> >> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
> >>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> >> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> >> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
> >>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> >>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> >> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> >>
> >> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> > 
> > scsi_mq_requeue_cmd() does uninit the request before requeuing, but
> > __scsi_queue_insert doesn't do that.
> 
> Yes.
> scsi layer use both of them.
> 
> > 
> > 
> >> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> >> the sdb only contained the part of (32768 + 8), then only that part
> >> was completed. The lucky thing was that scsi_io_completion detected
> >> it and requeued the remaining part. So we didn't get corrupted data.
> >> However, the requeue of (32776 + 8) is not expected.
> >>
> >> Suggested-by: Jens Axboe <axboe@kernel.dk>
> >> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> >> ---
> >> V2:
> >>  - refactor the code based on Jens' suggestion
> >>
> >>  block/blk-mq.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 8f5b533..9437a5e 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
> >>  	spin_unlock_irq(&q->requeue_lock);
> >>  
> >>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> >> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
> >> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
> >>  			continue;
> >>  
> >>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
> >>  		list_del_init(&rq->queuelist);
> >> -		blk_mq_sched_insert_request(rq, true, false, false);
> >> +		/*
> >> +		 * If RQF_DONTPREP, rq has contained some driver specific
> >> +		 * data, so insert it to hctx dispatch list to avoid any
> >> +		 * merge.
> >> +		 */
> >> +		if (rq->rq_flags & RQF_DONTPREP)
> >> +			blk_mq_request_bypass_insert(rq, false);
> >> +		else
> >> +			blk_mq_sched_insert_request(rq, true, false, false);
> >>  	}
> > 
> > Suppose it is one WRITE request to zone device, this way might break
> > the order.
> 
> I'm not sure about this.
> Since the request is dispatched, it should hold and zone write lock.
> And also mq-deadline doesn't have a .requeue_request, zone write lock
> wouldn't be released during requeue.

You are right, looks I misunderstood the zone write lock, sorry for
the noise.

> 
> IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
> The latter one also issues the request to underlying driver and requeue rqs
> on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.
> 
> And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
> it could be stop merging as RQF_NOMERGE_FLAGS contains it. 

Yes, that is correct.

Then another question is:

Why don't always requeue request in this way so that it can be simplified
into one code path?

1) in block legacy code, blk_requeue_request() doesn't insert the
request into scheduler queue, and simply put the request into
q->queue_head.

2) blk_mq_requeue_request() is basically run from completion context for
handling very unusual cases(partial completion, error, timeout, ...),
and there shouldn't have benefit to schedule/merge requeued request.

3) RQF_DONTPREP is like a driver private flag, and read/write by driver
only before this patch.

Thanks,
Ming
jianchao.wang Feb. 15, 2019, 3:41 a.m. UTC | #5
On 2/15/19 11:14 AM, Ming Lei wrote:
> On Fri, Feb 15, 2019 at 10:34:39AM +0800, jianchao.wang wrote:
>> Hi Ming
>>
>> Thanks for your kindly response.
>>
>> On 2/15/19 10:00 AM, Ming Lei wrote:
>>> On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
>>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>>> specific data, so insert it to hctx dispatch list to avoid any
>>>> merge. Take scsi as example, here is the trace event log (no
>>>> io scheduler, because RQF_STARTED would prevent merging),
>>>>
>>>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>>
>>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>>
>>> scsi_mq_requeue_cmd() does uninit the request before requeuing, but
>>> __scsi_queue_insert doesn't do that.
>>
>> Yes.
>> scsi layer use both of them.
>>
>>>
>>>
>>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>>> the sdb only contained the part of (32768 + 8), then only that part
>>>> was completed. The lucky thing was that scsi_io_completion detected
>>>> it and requeued the remaining part. So we didn't get corrupted data.
>>>> However, the requeue of (32776 + 8) is not expected.
>>>>
>>>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>>>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>>>> ---
>>>> V2:
>>>>  - refactor the code based on Jens' suggestion
>>>>
>>>>  block/blk-mq.c | 12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 8f5b533..9437a5e 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>>>>  	spin_unlock_irq(&q->requeue_lock);
>>>>  
>>>>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
>>>> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
>>>> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>>>>  			continue;
>>>>  
>>>>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>>>>  		list_del_init(&rq->queuelist);
>>>> -		blk_mq_sched_insert_request(rq, true, false, false);
>>>> +		/*
>>>> +		 * If RQF_DONTPREP, rq has contained some driver specific
>>>> +		 * data, so insert it to hctx dispatch list to avoid any
>>>> +		 * merge.
>>>> +		 */
>>>> +		if (rq->rq_flags & RQF_DONTPREP)
>>>> +			blk_mq_request_bypass_insert(rq, false);
>>>> +		else
>>>> +			blk_mq_sched_insert_request(rq, true, false, false);
>>>>  	}
>>>
>>> Suppose it is one WRITE request to zone device, this way might break
>>> the order.
>>
>> I'm not sure about this.
>> Since the request is dispatched, it should hold and zone write lock.
>> And also mq-deadline doesn't have a .requeue_request, zone write lock
>> wouldn't be released during requeue.
> 
> You are right, looks I misunderstood the zone write lock, sorry for
> the noise.
> 
>>
>> IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
>> The latter one also issues the request to underlying driver and requeue rqs
>> on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.
>>
>> And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
>> it could be stop merging as RQF_NOMERGE_FLAGS contains it. 
> 
> Yes, that is correct.
> 
> Then another question is:
> 
> Why don't always requeue request in this way so that it can be simplified
> into one code path?
> 
> 1) in block legacy code, blk_requeue_request() doesn't insert the
> request into scheduler queue, and simply put the request into
> q->queue_head.
> 
> 2) blk_mq_requeue_request() is basically run from completion context for
> handling very unusual cases(partial completion, error, timeout, ...),
> and there shouldn't have benefit to schedule/merge requeued request.

Actually, I'm also confused about questions above when I looked into the code before :)

> 
> 3) RQF_DONTPREP is like a driver private flag, and read/write by driver
> only before this patch.

Yes, indeed.
And it tells us there is driver specific data in the request.

Thanks
Jianchao

Patch
diff mbox series

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533..9437a5e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,20 @@  static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
-		if (!(rq->rq_flags & RQF_SOFTBARRIER))
+		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
 			continue;
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		blk_mq_sched_insert_request(rq, true, false, false);
+		/*
+		 * If RQF_DONTPREP, rq has contained some driver specific
+		 * data, so insert it to hctx dispatch list to avoid any
+		 * merge.
+		 */
+		if (rq->rq_flags & RQF_DONTPREP)
+			blk_mq_request_bypass_insert(rq, false);
+		else
+			blk_mq_sched_insert_request(rq, true, false, false);
 	}
 
 	while (!list_empty(&rq_list)) {