diff mbox

[RFC] pci: add pci_irq_get_affinity_vector()

Message ID 1478591241-123356-1-git-send-email-hare@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Hannes Reinecke Nov. 8, 2016, 7:47 a.m. UTC
Add a reverse-mapping function to return the interrupt vector for
any CPU if interrupt affinity is enabled.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/pci/msi.c   | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  1 +
 2 files changed, 37 insertions(+)

Comments

Thomas Gleixner Nov. 8, 2016, 2:48 p.m. UTC | #1
On Tue, 8 Nov 2016, Hannes Reinecke wrote:

> Add a reverse-mapping function to return the interrupt vector for
> any CPU if interrupt affinity is enabled.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/pci/msi.c   | 36 ++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 37 insertions(+)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index bfdd074..de5ed32 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1302,6 +1302,42 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
>  }
>  EXPORT_SYMBOL(pci_irq_get_affinity);
>  
> +/**
> + * pci_irq_get_affinity_vector - return the vector number for a given CPU
> + * @dev:	PCI device to operate on
> + * @cpu:	cpu number
> + *
> + * Returns the vector number for CPU @cpu or a negative error number
> + * if interrupt affinity is not set.
> + */
> +int pci_irq_get_affinity_vector(struct pci_dev *dev, int cpu)
> +{
> +	if (dev->msix_enabled) {
> +		struct msi_desc *entry;
> +
> +		for_each_pci_msi_entry(entry, dev) {
> +			if (cpumask_test_cpu(cpu, entry->affinity))

entry->affinity can be NULL

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 8, 2016, 2:56 p.m. UTC | #2
On Tue, Nov 08, 2016 at 08:47:21AM +0100, Hannes Reinecke wrote:
> Add a reverse-mapping function to return the interrupt vector for
> any CPU if interrupt affinity is enabled.

What's the use case of it?

Also as-is this won't work due to the non-affinity vectors that
have the affinity set to all cpus.  It will get even worse if we have
to support things like virtio_net that have multiple interrupts per
CPU due to the send and receive virtqueues.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Nov. 8, 2016, 3:08 p.m. UTC | #3
On 11/08/2016 03:56 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2016 at 08:47:21AM +0100, Hannes Reinecke wrote:
>> Add a reverse-mapping function to return the interrupt vector for
>> any CPU if interrupt affinity is enabled.
>
> What's the use case of it?
>
> Also as-is this won't work due to the non-affinity vectors that
> have the affinity set to all cpus.  It will get even worse if we have
> to support things like virtio_net that have multiple interrupts per
> CPU due to the send and receive virtqueues.
>
The use-case here is that one needs to feed the MSI-X index into the 
driver command structure. While we can extract that number trivially 
with scsi-mq, but for scsi-sq we don't have such means.

So if we start assigning interrupt affinity per default we need to 
figure out the msi-x index from a given SCSI command.
Currently most of these drivers keep an internal CPU map which I'd love 
to get rid of.
Hence this patch.

And before you complain: Yes, this patch is wrong; it returns the vector 
and not the index (which is what I'm after).
I found that on my test machine :-(

The main impetus of this RFC is to figure out if such a function would 
have a chance of getting upstream, or if I have to continue use cpumaps 
in the drivers.

Cheers,

Hannes
Christoph Hellwig Nov. 8, 2016, 3:11 p.m. UTC | #4
On Tue, Nov 08, 2016 at 04:08:51PM +0100, Hannes Reinecke wrote:
> The use-case here is that one needs to feed the MSI-X index into the driver 
> command structure. While we can extract that number trivially with scsi-mq, 
> but for scsi-sq we don't have such means.

> The main impetus of this RFC is to figure out if such a function would have 
> a chance of getting upstream, or if I have to continue use cpumaps in the 
> drivers.

There should be no need for a cpumap, nor should there be any need
for a lookup.  Driver will need the vector index for some admin ops,
but it can store it in it's driver-private queue structure (e.g. take
a look at the cq_vector field in NVMe).  Drivers really should not need
this during I/O, but if for some weird reason they do that driver specific
field is trivially reachable through the hw_ctx which gets passed to
->queue_rq.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Nov. 8, 2016, 3:20 p.m. UTC | #5
On 11/08/2016 04:11 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2016 at 04:08:51PM +0100, Hannes Reinecke wrote:
>> The use-case here is that one needs to feed the MSI-X index into the driver
>> command structure. While we can extract that number trivially with scsi-mq,
>> but for scsi-sq we don't have such means.
>
>> The main impetus of this RFC is to figure out if such a function would have
>> a chance of getting upstream, or if I have to continue use cpumaps in the
>> drivers.
>
> There should be no need for a cpumap, nor should there be any need
> for a lookup.  Driver will need the vector index for some admin ops,
> but it can store it in it's driver-private queue structure (e.g. take
> a look at the cq_vector field in NVMe).  Drivers really should not need
> this during I/O, but if for some weird reason they do that driver specific
> field is trivially reachable through the hw_ctx which gets passed to
> ->queue_rq.

I did mention that this is trivial for scsi-mq, right?
The issue here is scsi-sq.
(Much as you despise it).

As long a scsi-mq is not the standard we _have_ to provide a way of 
retaining the original functionality which allowed for interrupt 
distribution even for scsi-sq.
Hence either the driver has to keep this functionality (looking up 
msi-index based on a give scsi command) internal within the driver or we 
provide a common function allowing drivers to look it up.

And my patch is aiming to provide the latter.
If you don't agree with it or think it'll be pointless, fine, I'll go 
ahead and modify the drivers.

Cheers,

Hannes
Christoph Hellwig Nov. 8, 2016, 3:25 p.m. UTC | #6
If people want to use multiple queues they should use blk-mq, period.
And while we can't just rip out existing code in lpfc that supports
multiple queues without blk-mq we should not let any new users in
either.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index bfdd074..de5ed32 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1302,6 +1302,42 @@  const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 }
 EXPORT_SYMBOL(pci_irq_get_affinity);
 
+/**
+ * pci_irq_get_affinity_vector - return the vector number for a given CPU
+ * @dev:	PCI device to operate on
+ * @cpu:	cpu number
+ *
+ * Returns the vector number for CPU @cpu or a negative error number
+ * if interrupt affinity is not set.
+ */
+int pci_irq_get_affinity_vector(struct pci_dev *dev, int cpu)
+{
+	if (dev->msix_enabled) {
+		struct msi_desc *entry;
+
+		for_each_pci_msi_entry(entry, dev) {
+			if (cpumask_test_cpu(cpu, entry->affinity))
+				return entry->irq;
+		}
+		return -EINVAL;
+	} else if (dev->msi_enabled) {
+		struct msi_desc *entry = first_pci_msi_entry(dev);
+		int nr;
+
+		if (!entry)
+			return -ENOENT;
+
+		for (nr = 0; nr < entry->nvec_used; nr++) {
+			if (cpumask_test_cpu(cpu, &entry->affinity[nr]))
+				return dev->irq + nr;
+		}
+		return -EINVAL;
+	} else {
+		return dev->irq;
+	}
+}
+EXPORT_SYMBOL(pci_irq_get_affinity_vector);
+
 struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc)
 {
 	return to_pci_dev(desc->dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e49f70..2dd0817 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1315,6 +1315,7 @@  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 void pci_free_irq_vectors(struct pci_dev *dev);
 int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);
+int pci_irq_get_affinity_vector(struct pci_dev *pdev, int cpu);
 
 #else
 static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; }