Message ID | 20190827085344.30799-4-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | genirq/nvme: add IRQF_RESCUE_THREAD for avoiding IRQ flood | expand |
On 27/08/2019 10:53, Ming Lei wrote: [...] > + char *devname; > + const struct cpumask *mask; > + unsigned long irqflags = IRQF_SHARED; > + int vector = pci_irq_vector(pdev, nvmeq->cq_vector); > + > + devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid); > + if (!devname) > + return -ENOMEM; > + > + mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector); > + if (mask && cpumask_weight(mask) > 1) > + irqflags |= IRQF_RESCUE_THREAD; > + > + return request_threaded_irq(vector, nvme_irq, NULL, irqflags, > + devname, nvmeq); This will leak 'devname' which gets allocated by kasprintf() a few lines above.
On Tue, Aug 27, 2019 at 11:06:20AM +0200, Johannes Thumshirn wrote: > On 27/08/2019 10:53, Ming Lei wrote: > [...] > > + char *devname; > > + const struct cpumask *mask; > > + unsigned long irqflags = IRQF_SHARED; > > + int vector = pci_irq_vector(pdev, nvmeq->cq_vector); > > + > > + devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid); > > + if (!devname) > > + return -ENOMEM; > > + > > + mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector); > > + if (mask && cpumask_weight(mask) > 1) > > + irqflags |= IRQF_RESCUE_THREAD; > > + > > + return request_threaded_irq(vector, nvme_irq, NULL, irqflags, > > + devname, nvmeq); > > This will leak 'devname' which gets allocated by kasprintf() a few lines > above. It won't, please see pci_free_irq() in which free_irq() returns the 'devname' passed in. Thanks, Ming
On 27/08/2019 11:09, Ming Lei wrote: [...] > > It won't, please see pci_free_irq() in which free_irq() returns the > 'devname' passed in. Ah ok, missed that out.
On Tue, Aug 27, 2019 at 05:09:27PM +0800, Ming Lei wrote: > On Tue, Aug 27, 2019 at 11:06:20AM +0200, Johannes Thumshirn wrote: > > On 27/08/2019 10:53, Ming Lei wrote: > > [...] > > > + char *devname; > > > + const struct cpumask *mask; > > > + unsigned long irqflags = IRQF_SHARED; > > > + int vector = pci_irq_vector(pdev, nvmeq->cq_vector); > > > + > > > + devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid); > > > + if (!devname) > > > + return -ENOMEM; > > > + > > > + mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector); > > > + if (mask && cpumask_weight(mask) > 1) > > > + irqflags |= IRQF_RESCUE_THREAD; > > > + > > > + return request_threaded_irq(vector, nvme_irq, NULL, irqflags, > > > + devname, nvmeq); > > > > This will leak 'devname' which gets allocated by kasprintf() a few lines > > above. > > It won't, please see pci_free_irq() in which free_irq() returns the > 'devname' passed in. Only if request_threaded_irq() doesn't return an error. I think you should probably just have pci_irq_get_affinity() take a flags argument, or make a new function like __pci_irq_get_affinity() that pci_irq_get_affinity() can call with a default flag.
On Tue, Aug 27, 2019 at 08:34:21AM -0600, Keith Busch wrote: > I think you should probably just have pci_irq_get_affinity() take a flags > argument, or make a new function like __pci_irq_get_affinity() that > pci_irq_get_affinity() can call with a default flag. Sorry, copied the wrong function name; I meant pci_request_irq(), not pci_irq_get_affinity().
On 8/27/19 1:53 AM, Ming Lei wrote: > If one vector is spread on several CPUs, usually the interrupt is only > handled on one of these CPUs. Is that perhaps a limitation of x86 interrupt handling hardware? See also the description of physical and logical destination mode of the local APIC in the Intel documentation. Does that limitation also apply to other platforms than x86? Thanks, Bart.
On Tue, Aug 27, 2019 at 08:10:42AM -0700, Bart Van Assche wrote: > On 8/27/19 1:53 AM, Ming Lei wrote: > > If one vector is spread on several CPUs, usually the interrupt is only > > handled on one of these CPUs. > > Is that perhaps a limitation of x86 interrupt handling hardware? See also > the description of physical and logical destination mode of the local APIC > in the Intel documentation. > > Does that limitation also apply to other platforms than x86? Please see the following excellent explanation from Thomas. https://lkml.org/lkml/2018/4/4/734 Especially the following words: So at some point we ripped out the multi target support on X86 and moved everything to single target delivery mode. Other architectures never supported multi target delivery either due to hardware restrictions or for similar reasons why X86 dropped it. There might be a few architectures which support it, but I have no overview at the moment. thanks, Ming
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 45a80b708ef4..0b8d49470230 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1501,8 +1501,21 @@ static int queue_request_irq(struct nvme_queue *nvmeq) return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq_check, nvme_irq, nvmeq, "nvme%dq%d", nr, nvmeq->qid); } else { - return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq, - NULL, nvmeq, "nvme%dq%d", nr, nvmeq->qid); + char *devname; + const struct cpumask *mask; + unsigned long irqflags = IRQF_SHARED; + int vector = pci_irq_vector(pdev, nvmeq->cq_vector); + + devname = kasprintf(GFP_KERNEL, "nvme%dq%d", nr, nvmeq->qid); + if (!devname) + return -ENOMEM; + + mask = pci_irq_get_affinity(pdev, nvmeq->cq_vector); + if (mask && cpumask_weight(mask) > 1) + irqflags |= IRQF_RESCUE_THREAD; + + return request_threaded_irq(vector, nvme_irq, NULL, irqflags, + devname, nvmeq); } }
If one vector is spread on several CPUs, usually the interrupt is only handled on one of these CPUs. Meantime, IO can be issued to the single hw queue from different CPUs concurrently, this way is easy to cause IRQ flood and CPU lockup. Pass IRQF_RESCURE_THREAD in above case for asking genirq to handle interrupt in the rescurd thread when irq flood is detected. Cc: Long Li <longli@microsoft.com> Cc: Ingo Molnar <mingo@redhat.com>, Cc: Peter Zijlstra <peterz@infradead.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: John Garry <john.garry@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hannes Reinecke <hare@suse.com> Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/nvme/host/pci.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)