diff mbox series

[07/11] scsi: ufs: Fix a deadlock in the error handler

Message ID 20211110004440.3389311-8-bvanassche@acm.org (mailing list archive)
State Superseded
Headers show
Series UFS patches for kernel v5.17 | expand

Commit Message

Bart Van Assche Nov. 10, 2021, 12:44 a.m. UTC
The following deadlock has been observed on a test setup:
* All tags allocated.
* The SCSI error handler calls ufshcd_eh_host_reset_handler()
* ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler()
* ufshcd_err_handler() locks up as follows:

Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
Call trace:
 __switch_to+0x298/0x5d8
 __schedule+0x6cc/0xa94
 schedule+0x12c/0x298
 blk_mq_get_tag+0x210/0x480
 __blk_mq_alloc_request+0x1c8/0x284
 blk_get_request+0x74/0x134
 ufshcd_exec_dev_cmd+0x68/0x640
 ufshcd_verify_dev_init+0x68/0x35c
 ufshcd_probe_hba+0x12c/0x1cb8
 ufshcd_host_reset_and_restore+0x88/0x254
 ufshcd_reset_and_restore+0xd0/0x354
 ufshcd_err_handler+0x408/0xc58
 process_one_work+0x24c/0x66c
 worker_thread+0x3e8/0xa4c
 kthread+0x150/0x1b4
 ret_from_fork+0x10/0x30

Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
request.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Nov. 10, 2021, 6:42 a.m. UTC | #1
Hi Bart,

as pointed out last time: LLDDS have no business directy allocating
tags.  Please work with Hannes and John on the proper APIs, as metnioned
last time as well.
Avri Altman Nov. 11, 2021, 7:33 a.m. UTC | #2
> The following deadlock has been observed on a test setup:
> * All tags allocated.
> * The SCSI error handler calls ufshcd_eh_host_reset_handler()
> * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler()
> * ufshcd_err_handler() locks up as follows:
> 
> Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace:
>  __switch_to+0x298/0x5d8
>  __schedule+0x6cc/0xa94
>  schedule+0x12c/0x298
>  blk_mq_get_tag+0x210/0x480
>  __blk_mq_alloc_request+0x1c8/0x284
>  blk_get_request+0x74/0x134
>  ufshcd_exec_dev_cmd+0x68/0x640
>  ufshcd_verify_dev_init+0x68/0x35c
>  ufshcd_probe_hba+0x12c/0x1cb8
>  ufshcd_host_reset_and_restore+0x88/0x254
>  ufshcd_reset_and_restore+0xd0/0x354
>  ufshcd_err_handler+0x408/0xc58
>  process_one_work+0x24c/0x66c
>  worker_thread+0x3e8/0xa4c
>  kthread+0x150/0x1b4
>  ret_from_fork+0x10/0x30
> 
> Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index
> 8400d8e9a6f7..8f5640647054 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2940,12 +2940,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba
> *hba,
> 
>         down_read(&hba->clk_scaling_lock);
> 
> -       /*
> -        * Get free slot, sleep if slots are unavailable.
> -        * Even though we use wait_event() which sleeps indefinitely,
> -        * the maximum wait time is bounded by SCSI request timeout.
> -        */
> -       req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> +       req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT,
> + BLK_MQ_REQ_RESERVED);
>         if (IS_ERR(req)) {
>                 err = PTR_ERR(req);
>                 goto out_unlock;
> @@ -8152,7 +8147,8 @@ static struct scsi_host_template
> ufshcd_driver_template = {
>         .this_id                = -1,
>         .sg_tablesize           = SG_ALL,
>         .cmd_per_lun            = UFSHCD_CMD_PER_LUN,
UFSHCD_CMD_PER_LUN needs to be < 32 as well?

Thanks,
Avri
> -       .can_queue              = UFSHCD_CAN_QUEUE,
> +       .can_queue              = UFSHCD_CAN_QUEUE - 1,
> +       .reserved_tags          = 1,
>         .max_segment_size       = PRDT_DATA_BYTE_COUNT_MAX,
>         .max_host_blocked       = 1,
>         .track_queue_depth      = 1,
> @@ -9513,8 +9509,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem
> *mmio_base, unsigned int irq)
>         /* Configure LRB */
>         ufshcd_host_memory_configure(hba);
> 
> -       host->can_queue = hba->nutrs;
> -       host->cmd_per_lun = hba->nutrs;
> +       host->can_queue = hba->nutrs - 1;
> +       host->cmd_per_lun = hba->nutrs - 1;
>         host->max_id = UFSHCD_MAX_ID;
>         host->max_lun = UFS_MAX_LUNS;
>         host->max_channel = UFSHCD_MAX_CHANNEL;
Bart Van Assche Nov. 15, 2021, 6:28 p.m. UTC | #3
On 11/9/21 10:42 PM, Christoph Hellwig wrote:
> as pointed out last time: LLDDS have no business directy allocating
> tags.  Please work with Hannes and John on the proper APIs, as metnioned
> last time as well.

Hi Christoph,

I will do this.

Thanks,

Bart.
Bart Van Assche Nov. 15, 2021, 6:29 p.m. UTC | #4
On 11/10/21 11:33 PM, Avri Altman wrote:
>> @@ -8152,7 +8147,8 @@ static struct scsi_host_template
>> ufshcd_driver_template = {
>>          .this_id                = -1,
>>          .sg_tablesize           = SG_ALL,
>>          .cmd_per_lun            = UFSHCD_CMD_PER_LUN,
 >
> UFSHCD_CMD_PER_LUN needs to be < 32 as well?

I will make that change although that change should not affect the 
behavior of the UFS driver since the SCSI core limits cmd_per_lun to 
host->can_queue.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 8400d8e9a6f7..8f5640647054 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2940,12 +2940,7 @@  static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 
 	down_read(&hba->clk_scaling_lock);
 
-	/*
-	 * Get free slot, sleep if slots are unavailable.
-	 * Even though we use wait_event() which sleeps indefinitely,
-	 * the maximum wait time is bounded by SCSI request timeout.
-	 */
-	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+	req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto out_unlock;
@@ -8152,7 +8147,8 @@  static struct scsi_host_template ufshcd_driver_template = {
 	.this_id		= -1,
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
-	.can_queue		= UFSHCD_CAN_QUEUE,
+	.can_queue		= UFSHCD_CAN_QUEUE - 1,
+	.reserved_tags		= 1,
 	.max_segment_size	= PRDT_DATA_BYTE_COUNT_MAX,
 	.max_host_blocked	= 1,
 	.track_queue_depth	= 1,
@@ -9513,8 +9509,8 @@  int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Configure LRB */
 	ufshcd_host_memory_configure(hba);
 
-	host->can_queue = hba->nutrs;
-	host->cmd_per_lun = hba->nutrs;
+	host->can_queue = hba->nutrs - 1;
+	host->cmd_per_lun = hba->nutrs - 1;
 	host->max_id = UFSHCD_MAX_ID;
 	host->max_lun = UFS_MAX_LUNS;
 	host->max_channel = UFSHCD_MAX_CHANNEL;