diff mbox series

[3/4] nvme: pci: pass IRQF_RESCURE_THREAD to request_threaded_irq

Message ID 20190827085344.30799-4-ming.lei@redhat.com (mailing list archive)
State Not Applicable
Headers show
Series genirq/nvme: add IRQF_RESCUE_THREAD for avoiding IRQ flood | expand

Commit Message

Ming Lei Aug. 27, 2019, 8:53 a.m. UTC
If one vector is spread on several CPUs, usually the interrupt is only
handled on one of these CPUs. Meantime, IO can be issued to the single
hw queue from different CPUs concurrently, this way is easy to cause
IRQ flood and CPU lockup.

Pass IRQF_RESCURE_THREAD in above case for asking genirq to handle
interrupt in the rescurd thread when irq flood is detected.

Cc: Long Li <longli@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: John Garry <john.garry@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Johannes Thumshirn Aug. 27, 2019, 9:06 a.m. UTC | #1
On 27/08/2019 10:53, Ming Lei wrote:
[...]
> +		char *devname;
> +		const struct cpumask *mask;
> +		unsigned long irqflags = IRQF_SHARED;
> +		int vector = pci_irq_vector(pdev, nvmeq->cq_vector);
> +
> +		devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid);
> +		if (!devname)
> +			return -ENOMEM;
> +
> +		mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector);
> +		if (mask && cpumask_weight(mask) > 1)
> +			irqflags |= IRQF_RESCUE_THREAD;
> +
> +		return request_threaded_irq(vector, nvme_irq, NULL, irqflags,
> +				devname, nvmeq);

This will leak 'devname' which gets allocated by kasprintf() a few lines
above.
Ming Lei Aug. 27, 2019, 9:09 a.m. UTC | #2
On Tue, Aug 27, 2019 at 11:06:20AM +0200, Johannes Thumshirn wrote:
> On 27/08/2019 10:53, Ming Lei wrote:
> [...]
> > +		char *devname;
> > +		const struct cpumask *mask;
> > +		unsigned long irqflags = IRQF_SHARED;
> > +		int vector = pci_irq_vector(pdev, nvmeq->cq_vector);
> > +
> > +		devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid);
> > +		if (!devname)
> > +			return -ENOMEM;
> > +
> > +		mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector);
> > +		if (mask && cpumask_weight(mask) > 1)
> > +			irqflags |= IRQF_RESCUE_THREAD;
> > +
> > +		return request_threaded_irq(vector, nvme_irq, NULL, irqflags,
> > +				devname, nvmeq);
> 
> This will leak 'devname' which gets allocated by kasprintf() a few lines
> above.

It won't, please see pci_free_irq() in which free_irq() returns the
'devname' passed in.

Thanks,
Ming
Johannes Thumshirn Aug. 27, 2019, 9:12 a.m. UTC | #3
On 27/08/2019 11:09, Ming Lei wrote:
[...]
> 
> It won't, please see pci_free_irq() in which free_irq() returns the
> 'devname' passed in.

Ah ok, missed that out.
Keith Busch Aug. 27, 2019, 2:34 p.m. UTC | #4
On Tue, Aug 27, 2019 at 05:09:27PM +0800, Ming Lei wrote:
> On Tue, Aug 27, 2019 at 11:06:20AM +0200, Johannes Thumshirn wrote:
> > On 27/08/2019 10:53, Ming Lei wrote:
> > [...]
> > > +		char *devname;
> > > +		const struct cpumask *mask;
> > > +		unsigned long irqflags = IRQF_SHARED;
> > > +		int vector = pci_irq_vector(pdev, nvmeq->cq_vector);
> > > +
> > > +		devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid);
> > > +		if (!devname)
> > > +			return -ENOMEM;
> > > +
> > > +		mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector);
> > > +		if (mask && cpumask_weight(mask) > 1)
> > > +			irqflags |= IRQF_RESCUE_THREAD;
> > > +
> > > +		return request_threaded_irq(vector, nvme_irq, NULL, irqflags,
> > > +				devname, nvmeq);
> > 
> > This will leak 'devname' which gets allocated by kasprintf() a few lines
> > above.
> 
> It won't, please see pci_free_irq() in which free_irq() returns the
> 'devname' passed in.

Only if request_threaded_irq() doesn't return an error.

I think you should probably just have pci_irq_get_affinity() take a flags
argument, or make a new function like __pci_irq_get_affinity() that
pci_irq_get_affinity() can call with a default flag.
Keith Busch Aug. 27, 2019, 2:44 p.m. UTC | #5
On Tue, Aug 27, 2019 at 08:34:21AM -0600, Keith Busch wrote:
> I think you should probably just have pci_irq_get_affinity() take a flags
> argument, or make a new function like __pci_irq_get_affinity() that
> pci_irq_get_affinity() can call with a default flag.

Sorry, copied the wrong function name; I meant pci_request_irq(),
not pci_irq_get_affinity().
Bart Van Assche Aug. 27, 2019, 3:10 p.m. UTC | #6
On 8/27/19 1:53 AM, Ming Lei wrote:
> If one vector is spread on several CPUs, usually the interrupt is only
> handled on one of these CPUs.

Is that perhaps a limitation of x86 interrupt handling hardware? See 
also the description of physical and logical destination mode of the 
local APIC in the Intel documentation.

Does that limitation also apply to other platforms than x86?

Thanks,

Bart.
Ming Lei Aug. 28, 2019, 1:45 a.m. UTC | #7
On Tue, Aug 27, 2019 at 08:10:42AM -0700, Bart Van Assche wrote:
> On 8/27/19 1:53 AM, Ming Lei wrote:
> > If one vector is spread on several CPUs, usually the interrupt is only
> > handled on one of these CPUs.
> 
> Is that perhaps a limitation of x86 interrupt handling hardware? See also
> the description of physical and logical destination mode of the local APIC
> in the Intel documentation.
> 
> Does that limitation also apply to other platforms than x86?

Please see the following excellent explanation from Thomas.

	https://lkml.org/lkml/2018/4/4/734

Especially the following words:

	So at some point we ripped out the multi target support on X86 and moved
	everything to single target delivery mode.
	
	Other architectures never supported multi target delivery either due to
	hardware restrictions or for similar reasons why X86 dropped it. There
	might be a few architectures which support it, but I have no overview at
	the moment.


thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 45a80b708ef4..0b8d49470230 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1501,8 +1501,21 @@  static int queue_request_irq(struct nvme_queue *nvmeq)
 		return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq_check,
 				nvme_irq, nvmeq, "nvme%dq%d", nr, nvmeq->qid);
 	} else {
-		return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq,
-				NULL, nvmeq, "nvme%dq%d", nr, nvmeq->qid);
+		char *devname;
+		const struct cpumask *mask;
+		unsigned long irqflags = IRQF_SHARED;
+		int vector = pci_irq_vector(pdev, nvmeq->cq_vector);
+
+		devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid);
+		if (!devname)
+			return -ENOMEM;
+
+		mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector);
+		if (mask && cpumask_weight(mask) > 1)
+			irqflags |= IRQF_RESCUE_THREAD;
+
+		return request_threaded_irq(vector, nvme_irq, NULL, irqflags,
+				devname, nvmeq);
 	}
 }