diff mbox series

qla2xxx: Use correct number of vectors for online CPUs

Message ID 20190111174047.22994-1-hmadhani@marvell.com (mailing list archive)
State Mainlined
Commit f0783d43dde4bb349fcd667df0afabbdbab8b477
Headers show
Series qla2xxx: Use correct number of vectors for online CPUs | expand

Commit Message

Himanshu Madhani Jan. 11, 2019, 5:40 p.m. UTC
From: Ming Lei <ming.lei@redhat.com>

When SCSI-MQ is enabled, in some case system would present
nr_possible_cpus() which is greater than requested vectors
by the driver. This results into driver being able to get
larger number of MSI-X vectors than actual online CPUs.
Driver then uses pci_alloc_irq_vectors_affinity() to assign
1:1 mapping and affinity for each MSI-x vector to CPUs. When
the command is submitted using MSI-x vector, assigned to
offline CPU, it results in an ABTS and system hang. This hang
is result of a driver not being able to process interrupt on a
vector assigned to an Off-line CPUs

This patch fixes this issue by setting irq_offset value for the
blk_mq_pci_map_queues() to use only those CPUs which has CPU mask
affinity assigned and are online. By using the irq_offset value,
driver will allow online cpumask to decide which vectors are used
in blk_mq_pci_map_queues().

Fixes: 5601236b6f794 ("scsi: qla2xxx: Add Block Multi Queue functionality.")
Cc: <stable@vger.kernel.org> #4.19
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
Tested-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
---
Hi Martin, 

With SCSI-MQ set as default starting 4.19, we found regression where systems with lower
online CPUs but higer possible CPUs in a system resulted in the driver assigning MSIx
vectors to offline CPU causing system hang.

Please apply this patch to 5.0/fixes branch at your earliest convenience for inclusion
in 5.0-rc3. 

Thanks,
Himanshu
---
 drivers/scsi/qla2xxx/qla_def.h | 2 ++
 drivers/scsi/qla2xxx/qla_isr.c | 1 +
 drivers/scsi/qla2xxx/qla_os.c  | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

Comments

Ewan Milne Jan. 11, 2019, 5:59 p.m. UTC | #1
On Fri, 2019-01-11 at 09:40 -0800, Himanshu Madhani wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> When SCSI-MQ is enabled, in some case system would present
> nr_possible_cpus() which is greater than requested vectors
> by the driver. This results into driver being able to get
> larger number of MSI-X vectors than actual online CPUs.
> Driver then uses pci_alloc_irq_vectors_affinity() to assign
> 1:1 mapping and affinity for each MSI-x vector to CPUs. When
> the command is submitted using MSI-x vector, assigned to
> offline CPU, it results in an ABTS and system hang. This hang
> is result of a driver not being able to process interrupt on a
> vector assigned to an Off-line CPUs
> 
> This patch fixes this issue by setting irq_offset value for the
> blk_mq_pci_map_queues() to use only those CPUs which has CPU mask
> affinity assigned and are online. By using the irq_offset value,
> driver will allow online cpumask to decide which vectors are used
> in blk_mq_pci_map_queues().
> 
> Fixes: 5601236b6f794 ("scsi: qla2xxx: Add Block Multi Queue functionality.")
> Cc: <stable@vger.kernel.org> #4.19
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
> Tested-by: Himanshu Madhani <hmadhani@marvell.com>
> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
> ---
> Hi Martin, 
> 
> With SCSI-MQ set as default starting 4.19, we found regression where systems with lower
> online CPUs but higer possible CPUs in a system resulted in the driver assigning MSIx
> vectors to offline CPU causing system hang.
> 
> Please apply this patch to 5.0/fixes branch at your earliest convenience for inclusion
> in 5.0-rc3. 
> 
> Thanks,
> Himanshu
> ---
>  drivers/scsi/qla2xxx/qla_def.h | 2 ++
>  drivers/scsi/qla2xxx/qla_isr.c | 1 +
>  drivers/scsi/qla2xxx/qla_os.c  | 2 +-
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> index 26b93c563f92..d1fc4958222a 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -4394,6 +4394,8 @@ typedef struct scsi_qla_host {
>  	uint16_t	n2n_id;
>  	struct list_head gpnid_list;
>  	struct fab_scan scan;
> +
> +	unsigned int irq_offset;
>  } scsi_qla_host_t;
>  
>  struct qla27xx_image_status {
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index 30d3090842f8..8507c43b918c 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -3446,6 +3446,7 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
>  			    "Adjusted Max no of queues pairs: %d.\n", ha->max_qpairs);
>  		}
>  	}
> +	vha->irq_offset = desc.pre_vectors;
>  	ha->msix_entries = kcalloc(ha->msix_count,
>  				   sizeof(struct qla_msix_entry),
>  				   GFP_KERNEL);
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index ea69dafc9774..c6ef83d0d99b 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -6939,7 +6939,7 @@ static int qla2xxx_map_queues(struct Scsi_Host *shost)
>  	if (USER_CTRL_IRQ(vha->hw))
>  		rc = blk_mq_map_queues(qmap);
>  	else
> -		rc = blk_mq_pci_map_queues(qmap, vha->hw->pdev, 0);
> +		rc = blk_mq_pci_map_queues(qmap, vha->hw->pdev, vha->irq_offset);
>  	return rc;
>  }
>  

I have such a machine, SCSI-MQ doesn't work with its 8Gb adapter without this.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Martin K. Petersen Jan. 12, 2019, 2:57 a.m. UTC | #2
Himanshu,

> This patch fixes this issue by setting irq_offset value for the
> blk_mq_pci_map_queues() to use only those CPUs which has CPU mask
> affinity assigned and are online. By using the irq_offset value,
> driver will allow online cpumask to decide which vectors are used in
> blk_mq_pci_map_queues().

Applied to 5.0/scsi-fixes, thanks!
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 26b93c563f92..d1fc4958222a 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -4394,6 +4394,8 @@  typedef struct scsi_qla_host {
 	uint16_t	n2n_id;
 	struct list_head gpnid_list;
 	struct fab_scan scan;
+
+	unsigned int irq_offset;
 } scsi_qla_host_t;
 
 struct qla27xx_image_status {
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 30d3090842f8..8507c43b918c 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -3446,6 +3446,7 @@  qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
 			    "Adjusted Max no of queues pairs: %d.\n", ha->max_qpairs);
 		}
 	}
+	vha->irq_offset = desc.pre_vectors;
 	ha->msix_entries = kcalloc(ha->msix_count,
 				   sizeof(struct qla_msix_entry),
 				   GFP_KERNEL);
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index ea69dafc9774..c6ef83d0d99b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -6939,7 +6939,7 @@  static int qla2xxx_map_queues(struct Scsi_Host *shost)
 	if (USER_CTRL_IRQ(vha->hw))
 		rc = blk_mq_map_queues(qmap);
 	else
-		rc = blk_mq_pci_map_queues(qmap, vha->hw->pdev, 0);
+		rc = blk_mq_pci_map_queues(qmap, vha->hw->pdev, vha->irq_offset);
 	return rc;
 }