diff mbox

[V5,1/5] scsi: hpsa: fix selection of reply queue

Message ID 20180313094243.8710-2-ming.lei@redhat.com (mailing list archive)
State Accepted
Headers show

Commit Message

Ming Lei March 13, 2018, 9:42 a.m. UTC
From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this mapping
table to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++--------------
 drivers/scsi/hpsa.h |  1 +
 2 files changed, 55 insertions(+), 19 deletions(-)

Comments

Christoph Hellwig March 14, 2018, 8:57 a.m. UTC | #1
I still don't like the code duplication, but I guess I can fix this
up in one of the next merge windows myself..

Reviewed-by: Christoph Hellwig <hch@lst.de>
Bityutskiy, Artem March 14, 2018, 3:07 p.m. UTC | #2
On Tue, 2018-03-13 at 17:42 +0800, Ming Lei wrote:
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),

> one msix vector can be created without any online CPU mapped, then one

> command's completion may not be notified.

> 

> This patch setups mapping between cpu and reply queue according to irq

> affinity info retrived by pci_irq_get_affinity(), and uses this mapping

> table to choose reply queue for queuing one command.

> 

> Then the chosen reply queue has to be active, and fixes IO hang caused

> by using inactive reply queue which doesn't have any online CPU mapped.

> 

> Cc: Hannes Reinecke <hare@suse.de>

> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,

> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,

> Cc: Christoph Hellwig <hch@lst.de>,

> Cc: Don Brace <don.brace@microsemi.com>

> Cc: Kashyap Desai <kashyap.desai@broadcom.com>

> Cc: Laurence Oberman <loberman@redhat.com>

> Cc: Meelis Roos <mroos@linux.ee>

> Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>

> Cc: Mike Snitzer <snitzer@redhat.com>

> Tested-by: Laurence Oberman <loberman@redhat.com>

> Tested-by: Don Brace <don.brace@microsemi.com>

> Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>

> Acked-by: Don Brace <don.brace@microsemi.com>

> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")

> Signed-off-by: Ming Lei <ming.lei@redhat.com>


Checked v5 my Skylake Xeon and with this patch the regression that I
reported is fixed.

Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>

Link: https://lkml.kernel.org/r/1519311270.2535.53.camel@intel.com
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Bityutskiy, Artem March 19, 2018, 11:48 a.m. UTC | #3
On Tue, 2018-03-13 at 17:42 +0800, Ming Lei wrote:
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),

> one msix vector can be created without any online CPU mapped, then one

> command's completion may not be notified.


Hello Martin, James,

we have this patch-set and it fixes megaraid regression in v4.16. Do
you plan to mege it to v4.16-rcX? I am worried - there seem to be no
sight that this is going to me merged. They are not in the linux-next.

-- 
Best Regards,
Artem Bityutskiy
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Jens Axboe March 19, 2018, 2:31 p.m. UTC | #4
On 3/19/18 5:48 AM, Bityutskiy, Artem wrote:
> On Tue, 2018-03-13 at 17:42 +0800, Ming Lei wrote:
>> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
>> one msix vector can be created without any online CPU mapped, then one
>> command's completion may not be notified.
> 
> Hello Martin, James,
> 
> we have this patch-set and it fixes megaraid regression in v4.16. Do
> you plan to mege it to v4.16-rcX? I am worried - there seem to be no
> sight that this is going to me merged. They are not in the linux-next.

I'm assuming that Martin will eventually queue this up. But probably
for 4.17, then we can always flag it for a backport to stable once
it's been thoroughly tested.
Artem Bityutskiy March 19, 2018, 2:42 p.m. UTC | #5
On Mon, 2018-03-19 at 08:31 -0600, Jens Axboe wrote:
> I'm assuming that Martin will eventually queue this up. But probably
> for 4.17, then we can always flag it for a backport to stable once
> it's been thoroughly tested.

Jens, thanks for reply.

I wonder if folks agree that in this case we should revert 

84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs

for v4.16.

If this was a minor niche use-case regression the -stable scenario
would probably be OK. But the patch seem to miss the fact that kernel's
"possible CPUs" notion may be way off and side effects are bad.

Christoph, Thomas, what do you think?

Thanks,
Artem.
Kashyap Desai March 19, 2018, 2:55 p.m. UTC | #6
> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
> Sent: Monday, March 19, 2018 8:12 PM
> To: hch@lst.de; Thomas Gleixner
> Cc: linux-block@vger.kernel.org; snitzer@redhat.com; hare@suse.de;
> mroos@linux.ee; linux-scsi@vger.kernel.org; don.brace@microsemi.com;
> pbonzini@redhat.com; loberman@redhat.com;
> kashyap.desai@broadcom.com; Jens Axboe; martin.petersen@oracle.com;
> James.Bottomley@HansenPartnership.com; ming.lei@redhat.com
> Subject: Re: [PATCH V5 1/5] scsi: hpsa: fix selection of reply queue
>
> On Mon, 2018-03-19 at 08:31 -0600, Jens Axboe wrote:
> > I'm assuming that Martin will eventually queue this up. But probably
> > for 4.17, then we can always flag it for a backport to stable once
> > it's been thoroughly tested.
>
> Jens, thanks for reply.
>
> I wonder if folks agree that in this case we should revert
>
> 84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs
>
> for v4.16.
>
> If this was a minor niche use-case regression the -stable scenario would
> probably be OK. But the patch seem to miss the fact that kernel's
> "possible
> CPUs" notion may be way off and side effects are bad.

Also it is performance issue as posted at below link, if we just use
"84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs".

https://www.spinics.net/lists/linux-scsi/msg118301.html

Performance drop was resolved using patch set (available at below link)under
discussion posted by Ming.

https://marc.info/?l=linux-block&m=152050646332092&w=2

Kashyap

>
> Christoph, Thomas, what do you think?
>
> Thanks,
> Artem.
Ming Lei March 19, 2018, 3:26 p.m. UTC | #7
On Mon, Mar 19, 2018 at 04:42:09PM +0200, Artem Bityutskiy wrote:
> On Mon, 2018-03-19 at 08:31 -0600, Jens Axboe wrote:
> > I'm assuming that Martin will eventually queue this up. But probably
> > for 4.17, then we can always flag it for a backport to stable once
> > it's been thoroughly tested.
> 
> Jens, thanks for reply.
> 
> I wonder if folks agree that in this case we should revert 
> 
> 84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs
> 
> for v4.16.

Even 84676c1f21e8 is reverted, IO failure or hang can still be triggered
easily when doing CPU online/offline test.

So this patchset is really needed.

Thanks,
Ming
Martin K. Petersen March 20, 2018, 3:16 a.m. UTC | #8
Artem,

> we have this patch-set and it fixes megaraid regression in v4.16. Do
> you plan to mege it to v4.16-rcX? I am worried - there seem to be no
> sight that this is going to me merged. They are not in the linux-next.

I merged them into scsi-fixes last week.

It happens push a combined fixes+queue to linux-next to get more zeroday
coverage. However, most of the time linux-next is one 4.x+1 material
only.
diff mbox

Patch

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 5293e6827ce5..3a9eca163db8 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1045,11 +1045,7 @@  static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (unlikely(!h->msix_vectors))
 			return;
-		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-			c->Header.ReplyQueue =
-				raw_smp_processor_id() % h->nreply_queues;
-		else
-			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+		c->Header.ReplyQueue = reply_queue;
 	}
 }
 
@@ -1063,10 +1059,7 @@  static void set_ioaccel1_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->ReplyQueue = reply_queue % h->nreply_queues;
+	cp->ReplyQueue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit (bit 0)
@@ -1087,10 +1080,7 @@  static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
 	/* Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/* Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
 	 *  - pull count (bits 0-3)
@@ -1109,10 +1099,7 @@  static void set_ioaccel2_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
@@ -1157,6 +1144,8 @@  static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 {
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
+
+	reply_queue = h->reply_map[raw_smp_processor_id()];
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -7376,6 +7365,26 @@  static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
+static void hpsa_setup_reply_map(struct ctlr_info *h)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < h->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(h->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			h->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		h->reply_map[cpu] = 0;
+}
+
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7771,6 +7780,10 @@  static int hpsa_pci_init(struct ctlr_info *h)
 	err = hpsa_interrupt_mode(h);
 	if (err)
 		goto clean1;
+
+	/* setup mapping between CPU and reply queue */
+	hpsa_setup_reply_map(h);
+
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8480,6 +8493,28 @@  static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 	return wq;
 }
 
+static void hpda_free_ctlr_info(struct ctlr_info *h)
+{
+	kfree(h->reply_map);
+	kfree(h);
+}
+
+static struct ctlr_info *hpda_alloc_ctlr_info(void)
+{
+	struct ctlr_info *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return NULL;
+
+	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
+	if (!h->reply_map) {
+		kfree(h);
+		return NULL;
+	}
+	return h;
+}
+
 static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	int dac, rc;
@@ -8517,7 +8552,7 @@  static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * the driver.  See comments in hpsa.h for more info.
 	 */
 	BUILD_BUG_ON(sizeof(struct CommandList) % COMMANDLIST_ALIGNMENT);
-	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	h = hpda_alloc_ctlr_info();
 	if (!h) {
 		dev_err(&pdev->dev, "Failed to allocate controller head\n");
 		return -ENOMEM;
@@ -8916,7 +8951,7 @@  static void hpsa_remove_one(struct pci_dev *pdev)
 	h->lockup_detected = NULL;			/* init_one 2 */
 	/* (void) pci_disable_pcie_error_reporting(pdev); */	/* init_one 1 */
 
-	kfree(h);					/* init_one 1 */
+	hpda_free_ctlr_info(h);				/* init_one 1 */
 }
 
 static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 018f980a701c..fb9f5e7f8209 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -158,6 +158,7 @@  struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
+	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;