diff mbox series

[v4] SCSI: fix queue cleanup race before scsi_requeue_run_queue is done

Message ID 1565667334-22071-1-git-send-email-zhengbin13@huawei.com (mailing list archive)
State Superseded
Headers show
Series [v4] SCSI: fix queue cleanup race before scsi_requeue_run_queue is done | expand

Commit Message

Zheng Bin Aug. 13, 2019, 3:35 a.m. UTC
KASAN reports a use-after-free in 4.19-stable,
which won't happen after commit 47cdee29ef9d
("block: move blk_exit_queue into __blk_release_queue").
However, backport this patch to 4.19-stable will be a lot of work and
the risk is great. Moreover, we should make sure scsi_requeue_run_queue
is done before blk_cleanup_queue in master too.

BUG: KASAN: use-after-free in dd_has_work+0x50/0xe8
Read of size 8 at addr ffff808b57c6f168 by task kworker/53:1H/6910

CPU: 53 PID: 6910 Comm: kworker/53:1H Kdump: loaded Tainted: G
Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.59 01/31/2019
Workqueue: kblockd scsi_requeue_run_queue
Call trace:
 dump_backtrace+0x0/0x270
 show_stack+0x24/0x30
 dump_stack+0xb4/0xe4
 print_address_description+0x68/0x278
 kasan_report+0x204/0x330
 __asan_load8+0x88/0xb0
 dd_has_work+0x50/0xe8
 blk_mq_run_hw_queue+0x19c/0x218
 blk_mq_run_hw_queues+0x7c/0xb0
 scsi_run_queue+0x3ec/0x520
 scsi_requeue_run_queue+0x2c/0x38
 process_one_work+0x2e4/0x6d8
 worker_thread+0x6c/0x6a8
 kthread+0x1b4/0x1c0
 ret_from_fork+0x10/0x18

Allocated by task 46843:
 kasan_kmalloc+0xe0/0x190
 kmem_cache_alloc_node_trace+0x10c/0x258
 dd_init_queue+0x68/0x190
 blk_mq_init_sched+0x1cc/0x300
 elevator_init_mq+0x90/0xe0
 blk_mq_init_allocated_queue+0x700/0x728
 blk_mq_init_queue+0x48/0x90
 scsi_mq_alloc_queue+0x34/0xb0
 scsi_alloc_sdev+0x340/0x530
 scsi_probe_and_add_lun+0x46c/0x1260
 __scsi_scan_target+0x1b8/0x7b0
 scsi_scan_target+0x140/0x150
 fc_scsi_scan_rport+0x164/0x178 [scsi_transport_fc]
 process_one_work+0x2e4/0x6d8
 worker_thread+0x6c/0x6a8
 kthread+0x1b4/0x1c0
 ret_from_fork+0x10/0x18

Freed by task 46843:
 __kasan_slab_free+0x120/0x228
 kasan_slab_free+0x10/0x18
 kfree+0x88/0x218
 dd_exit_queue+0x5c/0x78
 blk_mq_exit_sched+0x104/0x130
 elevator_exit+0xa8/0xc8
 blk_exit_queue+0x48/0x78
 blk_cleanup_queue+0x170/0x248
 __scsi_remove_device+0x84/0x1b0
 scsi_probe_and_add_lun+0xd00/0x1260
 __scsi_scan_target+0x1b8/0x7b0
 scsi_scan_target+0x140/0x150
 fc_scsi_scan_rport+0x164/0x178 [scsi_transport_fc]
 process_one_work+0x2e4/0x6d8
 worker_thread+0x6c/0x6a8
 kthread+0x1b4/0x1c0
 ret_from_fork+0x10/0x18

Fixes: 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
---
 drivers/scsi/scsi_lib.c   | 15 +++++++++++----
 drivers/scsi/scsi_sysfs.c |  3 ++-
 2 files changed, 13 insertions(+), 5 deletions(-)

--
2.7.4

Comments

Bart Van Assche Aug. 15, 2019, 3:40 p.m. UTC | #1
On 8/14/19 6:50 PM, zhengbin (A) wrote:
> ping

Sending a "ping" after 46 hours is way too soon and only causes 
irritation. What would help though is more information about how this 
patch has been tested. Does it e.g. survive the srp tests in blktests?

Thanks,

Bart.
Zheng Bin Aug. 16, 2019, 12:53 a.m. UTC | #2
On 2019/8/15 23:40, Bart Van Assche wrote:

> On 8/14/19 6:50 PM, zhengbin (A) wrote:
>> ping
>
> Sending a "ping" after 46 hours is way too soon and only causes irritation. What would help though is more information about how this patch has been tested. Does it e.g. survive the srp tests in blktests?

Very sorry for the noise, it won't happen again next time.

I have done abnormal operations while running sdtester, it is ok within this patch after a day.

Without this, it will cause oops because of use-after-free after about 4 hours.

>
> Thanks,
>
> Bart.
>
> .
>
Bart Van Assche Aug. 16, 2019, 5:09 p.m. UTC | #3
On 8/12/19 8:35 PM, zhengbin wrote:
> KASAN reports a use-after-free in 4.19-stable,
> which won't happen after commit 47cdee29ef9d
> ("block: move blk_exit_queue into __blk_release_queue").

This patch doesn't apply on top of kernel v4.19.67:

$ git am ~/\[PATCH\ v4\]\ SCSI\:\ fix\ queue\ cleanup\ race\ before\ 
scsi_requeue_run_queue\ is\ done.eml
Applying: SCSI: fix queue cleanup race before scsi_requeue_run_queue is done
error: patch failed: drivers/scsi/scsi_lib.c:531
error: drivers/scsi/scsi_lib.c: patch does not apply
Patch failed at 0001 SCSI: fix queue cleanup race before 
scsi_requeue_run_queue is done

$ patch -p1 < ~/\[PATCH\ v4\]\ SCSI\:\ fix\ queue\ cleanup\ race\ 
before\ scsi_requeue_run_queue\ is\ done.eml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file drivers/scsi/scsi_lib.c
Hunk #1 succeeded at 548 with fuzz 1 (offset 17 lines).
Hunk #2 FAILED at 618.
1 out of 2 hunks FAILED -- saving rejects to file 
drivers/scsi/scsi_lib.c.rej
(Stripping trailing CRs from patch; use --binary to disable.)
patching file drivers/scsi/scsi_sysfs.c
Hunk #1 succeeded at 1392 (offset -18 lines).

Bart.
Zheng Bin Aug. 19, 2019, 1:59 a.m. UTC | #4
On 2019/8/17 1:09, Bart Van Assche wrote:
> On 8/12/19 8:35 PM, zhengbin wrote:
>> KASAN reports a use-after-free in 4.19-stable,
>> which won't happen after commit 47cdee29ef9d
>> ("block: move blk_exit_queue into __blk_release_queue").
>
> This patch doesn't apply on top of kernel v4.19.67:
>
> $ git am ~/\[PATCH\ v4\]\ SCSI\:\ fix\ queue\ cleanup\ race\ before\ scsi_requeue_run_queue\ is\ done.eml
> Applying: SCSI: fix queue cleanup race before scsi_requeue_run_queue is done
> error: patch failed: drivers/scsi/scsi_lib.c:531
> error: drivers/scsi/scsi_lib.c: patch does not apply
> Patch failed at 0001 SCSI: fix queue cleanup race before scsi_requeue_run_queue is done
>
> $ patch -p1 < ~/\[PATCH\ v4\]\ SCSI\:\ fix\ queue\ cleanup\ race\ before\ scsi_requeue_run_queue\ is\ done.eml
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file drivers/scsi/scsi_lib.c
> Hunk #1 succeeded at 548 with fuzz 1 (offset 17 lines).
> Hunk #2 FAILED at 618.
> 1 out of 2 hunks FAILED -- saving rejects to file drivers/scsi/scsi_lib.c.rej
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file drivers/scsi/scsi_sysfs.c
> Hunk #1 succeeded at 1392 (offset -18 lines).

This patch is for master, not for 4.19-stable. In SCSI, master has only blk-mq, while 4.19-stable has blk-sq(single queue) & mq.

I will send a patch for 4.19-stable later.

>
> Bart.
>
> .
>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 11e64b5..b3503f9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -531,6 +531,11 @@  void scsi_requeue_run_queue(struct work_struct *work)
 	sdev = container_of(work, struct scsi_device, requeue_work);
 	q = sdev->request_queue;
 	scsi_run_queue(q);
+	/*
+	 * need to put q_usage_counter which
+	 * is got in scsi_end_request.
+	 */
+	percpu_ref_put(&q->q_usage_counter);
 }

 void scsi_run_host_queues(struct Scsi_Host *shost)
@@ -613,12 +618,14 @@  static bool scsi_end_request(struct request *req, blk_status_t error,
 	__blk_mq_end_request(req, error);

 	if (scsi_target(sdev)->single_lun ||
-	    !list_empty(&sdev->host->starved_list))
-		kblockd_schedule_work(&sdev->requeue_work);
-	else
+	    !list_empty(&sdev->host->starved_list)) {
+		if (!kblockd_schedule_work(&sdev->requeue_work))
+			percpu_ref_put(&q->q_usage_counter);
+	} else {
 		blk_mq_run_hw_queues(q, true);
+		percpu_ref_put(&q->q_usage_counter);
+	}

-	percpu_ref_put(&q->q_usage_counter);
 	return false;
 }

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 64c96c7..03e3567 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1410,7 +1410,8 @@  void __scsi_remove_device(struct scsi_device *sdev)
 	mutex_unlock(&sdev->state_mutex);

 	blk_cleanup_queue(sdev->request_queue);
-	cancel_work_sync(&sdev->requeue_work);
+	if (cancel_work_sync(&sdev->requeue_work))
+		percpu_ref_put(&sdev->request_queue->q_usage_counter);

 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);