blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
diff mbox series

Message ID 1549863665-1691-1-git-send-email-jianchao.w.wang@oracle.com
State New
Headers show
Series
  • blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
Related show

Commit Message

jianchao.wang Feb. 11, 2019, 5:41 a.m. UTC
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),

   kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
   kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
   kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
   kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]

(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 block/blk-mq.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Jens Axboe Feb. 11, 2019, 3:59 p.m. UTC | #1
On 2/10/19 10:41 PM, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
> 
>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> 
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.

Good catch, but how about something like this? Makes it more integrated,
I think that's cleaner.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44d471ff8754..4c26bbb4330f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
-		if (!(rq->rq_flags & RQF_SOFTBARRIER))
+		/*
+		 * If RQF_DONTPREP is set, rq may contain some driver
+		 * specific data. Insert it to hctx dispatch list to avoid
+		 * any merge.
+		 */
+		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
 			continue;
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		blk_mq_sched_insert_request(rq, true, false, false);
+		if (rq->rq_flags & RQF_SOFTBARRIER)
+			blk_mq_sched_insert_request(rq, true, false, false);
+		else
+			blk_mq_request_bypass_insert(rq, false);
 	}
 
 	while (!list_empty(&rq_list)) {
Jens Axboe Feb. 11, 2019, 11:15 p.m. UTC | #2
On 2/11/19 8:59 AM, Jens Axboe wrote:
> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>> When requeue, if RQF_DONTPREP, rq has contained some driver
>> specific data, so insert it to hctx dispatch list to avoid any
>> merge. Take scsi as example, here is the trace event log (no
>> io scheduler, because RQF_STARTED would prevent merging),
>>
>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>
>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>> the sdb only contained the part of (32768 + 8), then only that part
>> was completed. The lucky thing was that scsi_io_completion detected
>> it and requeued the remaining part. So we didn't get corrupted data.
>> However, the requeue of (32776 + 8) is not expected.
> 
> Good catch, but how about something like this? Makes it more integrated,
> I think that's cleaner.

This is probably better (and safer):


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533764ca..b3908eb3881c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
-		if (!(rq->rq_flags & RQF_SOFTBARRIER))
+		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
 			continue;
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		blk_mq_sched_insert_request(rq, true, false, false);
+
+		/*
+		 * If RQF_DONTPREP is set, rq may contain some driver
+		 * specific data. Insert it to hctx dispatch list to avoid
+		 * any merge.
+		 */
+		if (rq->rq_flags & RQF_DONTPREP)
+			blk_mq_sched_insert_request(rq, true, false, false);
+		else
+			blk_mq_request_bypass_insert(rq, false);
 	}
 
 	while (!list_empty(&rq_list)) {
Jens Axboe Feb. 11, 2019, 11:20 p.m. UTC | #3
On 2/11/19 4:15 PM, Jens Axboe wrote:
> On 2/11/19 8:59 AM, Jens Axboe wrote:
>> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>> specific data, so insert it to hctx dispatch list to avoid any
>>> merge. Take scsi as example, here is the trace event log (no
>>> io scheduler, because RQF_STARTED would prevent merging),
>>>
>>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>
>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>> the sdb only contained the part of (32768 + 8), then only that part
>>> was completed. The lucky thing was that scsi_io_completion detected
>>> it and requeued the remaining part. So we didn't get corrupted data.
>>> However, the requeue of (32776 + 8) is not expected.
>>
>> Good catch, but how about something like this? Makes it more integrated,
>> I think that's cleaner.
> 
> This is probably better (and safer):

Here's the one I wanted to send, not a half done one. Maybe I'll be
luckier this time around?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533764ca..35e6aba52808 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
-		if (!(rq->rq_flags & RQF_SOFTBARRIER))
+		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
 			continue;
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		blk_mq_sched_insert_request(rq, true, false, false);
+
+		/*
+		 * If RQF_DONTPREP is set, rq may contain some driver
+		 * specific data. Insert it to hctx dispatch list to avoid
+		 * any merge.
+		 */
+		if (rq->rq_flags & RQF_DONTPREP)
+			blk_mq_request_bypass_insert(rq, false);
+		else
+			blk_mq_sched_insert_request(rq, true, false, false);
 	}
 
 	while (!list_empty(&rq_list)) {
jianchao.wang Feb. 12, 2019, 1:56 a.m. UTC | #4
Hi Jens

Thanks for your kindly response.

On 2/12/19 7:20 AM, Jens Axboe wrote:
> On 2/11/19 4:15 PM, Jens Axboe wrote:
>> On 2/11/19 8:59 AM, Jens Axboe wrote:
>>> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>>> specific data, so insert it to hctx dispatch list to avoid any
>>>> merge. Take scsi as example, here is the trace event log (no
>>>> io scheduler, because RQF_STARTED would prevent merging),
>>>>
>>>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>>
>>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>>> the sdb only contained the part of (32768 + 8), then only that part
>>>> was completed. The lucky thing was that scsi_io_completion detected
>>>> it and requeued the remaining part. So we didn't get corrupted data.
>>>> However, the requeue of (32776 + 8) is not expected.
>>>
>>> Good catch, but how about something like this? Makes it more integrated,
>>> I think that's cleaner.
>>
>> This is probably better (and safer):
> 
> Here's the one I wanted to send, not a half done one. Maybe I'll be
> luckier this time around?
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8f5b533764ca..35e6aba52808 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	spin_unlock_irq(&q->requeue_lock);
>  
>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>  			continue;
>  
>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>  		list_del_init(&rq->queuelist);
> -		blk_mq_sched_insert_request(rq, true, false, false);
> +
> +		/*
> +		 * If RQF_DONTPREP is set, rq may contain some driver
> +		 * specific data. Insert it to hctx dispatch list to avoid
> +		 * any merge.
> +		 */
> +		if (rq->rq_flags & RQF_DONTPREP)
> +			blk_mq_request_bypass_insert(rq, false);
> +		else
> +			blk_mq_sched_insert_request(rq, true, false, false);
>  	}
>  
>  	while (!list_empty(&rq_list)) {
> 

The test is OK.
And I will send out the V2 based on this.

Thanks
Jianchao

Patch
diff mbox series

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533..2d93eb5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,6 +737,18 @@  static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+		/*
+		 * If RQF_DONTPREP, rq has contained some driver specific
+		 * data, so insert it to hctx dispatch list to avoid any
+		 * merge.
+		 */
+		if (rq->rq_flags & RQF_DONTPREP) {
+			rq->rq_flags &= ~RQF_SOFTBARRIER;
+			list_del_init(&rq->queuelist);
+			blk_mq_request_bypass_insert(rq, false);
+			continue;
+		}
+
 		if (!(rq->rq_flags & RQF_SOFTBARRIER))
 			continue;