[23/33] PCI/MSI: Split MSIX descriptor setup

Message ID	20221111135206.577311313@linutronix.de (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-pci-owner@kernel.org> Message-ID: <20221111135206.577311313@linutronix.de> From: Thomas Gleixner <tglx@linutronix.de> To: LKML <linux-kernel@vger.kernel.org> Cc: x86@kernel.org, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Marc Zyngier <maz@kernel.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Jason Gunthorpe <jgg@mellanox.com>, Dave Jiang <dave.jiang@intel.com>, Alex Williamson <alex.williamson@redhat.com>, Kevin Tian <kevin.tian@intel.com>, Dan Williams <dan.j.williams@intel.com>, Logan Gunthorpe <logang@deltatee.com>, Ashok Raj <ashok.raj@intel.com>, Jon Mason <jdmason@kudzu.us>, Allen Hubbe <allenbh@gmail.com>, "Ahmed S. Darwish" <darwi@linutronix.de>, Reinette Chatre <reinette.chatre@intel.com> Subject: [patch 23/33] PCI/MSI: Split MSIX descriptor setup References: <20221111133158.196269823@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Date: Fri, 11 Nov 2022 14:58:47 +0100 (CET) Precedence: bulk
Series	genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation \| expand [00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation [01/33] genirq/msi: Rearrange MSI domain flags [02/33] genirq/msi: Provide struct msi_parent_ops [03/33] genirq/msi: Provide data structs for per device domains [04/33] genirq/msi: Add size info to struct msi_domain_info [05/33] genirq/msi: Split msi_create_irq_domain() [06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains [07/33] genirq/msi: Provide msi_create/free_device_irq_domain() [08/33] genirq/msi: Provide msi_match_device_domain() [09/33] genirq/msi: Add range checking to msi_insert_desc() [10/33] PCI/MSI: Split __pci_write_msi_msg() [11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] [12/33] PCI/MSI: Add support for per device MSI[X] domains [13/33] x86/apic/vector: Provide MSI parent domain [14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() [15/33] iommu/vt-d: Switch to MSI parent domains [16/33] iommu/amd: Switch to MSI base domains [17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() [18/33] genirq/msi: Provide struct msi_map [19/33] genirq/msi: Provide msi_desc::msi_data [20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() [21/33] genirq/msi: Provide msi_domain_alloc_irq_at() [22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN [23/33] PCI/MSI: Split MSIX descriptor setup [24/33] PCI/MSI: Provide prepare_desc() MSI domain op [25/33] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X [26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN [27/33] genirq/msi: Provide constants for PCI/IMS support [28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support [29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() [30/33] x86/apic/msi: Enable PCI/IMS [31/33] iommu/vt-d: Enable PCI/IMS [32/33] iommu/amd: Enable PCI/IMS [33/33] irqchip: Add IDXD Interrupt Message Store driver

Message ID

20221111135206.577311313@linutronix.de (mailing list archive)

State

Superseded

Headers

Message-ID: <20221111135206.577311313@linutronix.de>
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Joerg Roedel <joro@8bytes.org>,
        Will Deacon <will@kernel.org>, linux-pci@vger.kernel.org,
        Bjorn Helgaas <bhelgaas@google.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
        Marc Zyngier <maz@kernel.org>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jason Gunthorpe <jgg@mellanox.com>,
        Dave Jiang <dave.jiang@intel.com>,
        Alex Williamson <alex.williamson@redhat.com>,
        Kevin Tian <kevin.tian@intel.com>,
        Dan Williams <dan.j.williams@intel.com>,
        Logan Gunthorpe <logang@deltatee.com>,
        Ashok Raj <ashok.raj@intel.com>, Jon Mason <jdmason@kudzu.us>,
        Allen Hubbe <allenbh@gmail.com>,
        "Ahmed S. Darwish" <darwi@linutronix.de>,
        Reinette Chatre <reinette.chatre@intel.com>
Subject: [patch 23/33] PCI/MSI: Split MSIX descriptor setup
References: <20221111133158.196269823@linutronix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Date: Fri, 11 Nov 2022 14:58:47 +0100 (CET)
Precedence: bulk

Series

genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation | expand

Commit Message

Thomas Gleixner Nov. 11, 2022, 1:58 p.m. UTC

The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
to share some of the MSI-X descriptor setup.

The regular descriptor setup on enable has the following code flow:

    1) Allocate descriptor
    2) Setup descriptor with PCI specific data
    3) Insert descriptor
    4) Allocate interrupts which in turn scans the inserted
       descriptors

This cannot be easily changed because the PCI/MSI code needs to handle the
legacy architecture specific allocation model and the irq domain model
where quite some domains have the assumption that the above flow is how it
works.

Ideally the code flow should look like this:

   1) Invoke allocation at the MSI core
   2) MSI core allocates descriptor
   3) MSI core calls back into the irq domain which fills in
      the domain specific parts

This could be done for underlying parent MSI domains which support
post-enable allocation/free but that would create significantly different
code pathes for MSI/MSI-X enable.

Though for dynamic allocation which wants to share the allocation code with
the upcoming PCI/IMS support its the right thing to do.

Split the MSIX descriptor setup into the preallocation part which just sets
the index and fills in the horrible hack of virtual IRQs and the real PCI
specific MSI-X setup part which solely depends on the index in the
descriptor. This allows to provide a common dynami allocation interface at
the MSI core level for both PCI/MSI-X and PCI/IMS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/pci/msi/msi.c |   72 +++++++++++++++++++++++++++++++-------------------
 drivers/pci/msi/msi.h |    2 +
 2 files changed, 47 insertions(+), 27 deletions(-)

Comments

Bjorn Helgaas Nov. 16, 2022, 8:13 p.m. UTC | #1

Spelled "MSI-X" elsewhere (subject line).

On Fri, Nov 11, 2022 at 02:58:47PM +0100, Thomas Gleixner wrote:
> The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
> to share some of the MSI-X descriptor setup.
> 
> The regular descriptor setup on enable has the following code flow:
> 
>     1) Allocate descriptor
>     2) Setup descriptor with PCI specific data
>     3) Insert descriptor
>     4) Allocate interrupts which in turn scans the inserted
>        descriptors
> 
> This cannot be easily changed because the PCI/MSI code needs to handle the
> legacy architecture specific allocation model and the irq domain model
> where quite some domains have the assumption that the above flow is how it
> works.
> 
> Ideally the code flow should look like this:
> 
>    1) Invoke allocation at the MSI core
>    2) MSI core allocates descriptor
>    3) MSI core calls back into the irq domain which fills in
>       the domain specific parts
> 
> This could be done for underlying parent MSI domains which support
> post-enable allocation/free but that would create significantly different
> code pathes for MSI/MSI-X enable.
> 
> Though for dynamic allocation which wants to share the allocation code with
> the upcoming PCI/IMS support its the right thing to do.

s/its/it's/

> Split the MSIX descriptor setup into the preallocation part which just sets

MSI-X

> the index and fills in the horrible hack of virtual IRQs and the real PCI
> specific MSI-X setup part which solely depends on the index in the
> descriptor. This allows to provide a common dynami allocation interface at

dynamic

> the MSI core level for both PCI/MSI-X and PCI/IMS.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Typos below.

> ---
>  drivers/pci/msi/msi.c |   72 +++++++++++++++++++++++++++++++-------------------
>  drivers/pci/msi/msi.h |    2 +
>  2 files changed, 47 insertions(+), 27 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -569,34 +569,56 @@ static void __iomem *msix_map_region(str
>  	return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
>  }
>  
> -static int msix_setup_msi_descs(struct pci_dev *dev, void __iomem *base,
> -				struct msix_entry *entries, int nvec,
> -				struct irq_affinity_desc *masks)
> +/**
> + * msix_prepare_msi_desc - Prepare a half initialized MSI descriptor for operation
> + * @dev:	The PCI device for which the descriptor is prepared
> + * @desc:	The MSI descriptor for preparation
> + *
> + * This is seperate from msix_setup_msi_descs() below to handle dynamic

separate

> + * allocations for MSIX after initial enablement.

MSI-X (and again below)

> + * Ideally the whole MSIX setup would work that way, but there is no way to
> + * support this for the legacy arch_setup_msi_irqs() mechanism and for the
> + * fake irq domains like the x86 XEN one. Sigh...
> + *
> + * The descriptor is zeroed and only @desc::msi_index and @desc::affinity
> + * are set. When called from msix_setup_msi_descs() then the is_virtual
> + * attribute is initialized as well.
> + *
> + * Fill in the rest.
> + */
> +void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
> +{
> +	desc->nvec_used				= 1;
> +	desc->pci.msi_attrib.is_msix		= 1;
> +	desc->pci.msi_attrib.is_64		= 1;
> +	desc->pci.msi_attrib.default_irq	= dev->irq;
> +	desc->pci.mask_base			= dev->msix_base;
> +	desc->pci.msi_attrib.can_mask		= !pci_msi_ignore_mask &&
> +						  !desc->pci.msi_attrib.is_virtual;
> +
> +	if (desc->pci.msi_attrib.can_mask) {
> +		void __iomem *addr = pci_msix_desc_addr(desc);
> +
> +		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> +	}
> +}
> +
> +static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
> +				int nvec, struct irq_affinity_desc *masks)
>  {
>  	int ret = 0, i, vec_count = pci_msix_vec_count(dev);
>  	struct irq_affinity_desc *curmsk;
>  	struct msi_desc desc;
> -	void __iomem *addr;
>  
>  	memset(&desc, 0, sizeof(desc));
>  
> -	desc.nvec_used			= 1;
> -	desc.pci.msi_attrib.is_msix	= 1;
> -	desc.pci.msi_attrib.is_64	= 1;
> -	desc.pci.msi_attrib.default_irq	= dev->irq;
> -	desc.pci.mask_base		= base;
> -
>  	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
>  		desc.msi_index = entries ? entries[i].entry : i;
>  		desc.affinity = masks ? curmsk : NULL;
>  		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
> -		desc.pci.msi_attrib.can_mask = !pci_msi_ignore_mask &&
> -					       !desc.pci.msi_attrib.is_virtual;
>  
> -		if (desc.pci.msi_attrib.can_mask) {
> -			addr = pci_msix_desc_addr(&desc);
> -			desc.pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> -		}
> +		msix_prepare_msi_desc(dev, &desc);
>  
>  		ret = msi_insert_msi_desc(&dev->dev, &desc);
>  		if (ret)
> @@ -629,9 +651,8 @@ static void msix_mask_all(void __iomem *
>  		writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL);
>  }
>  
> -static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
> -				 struct msix_entry *entries, int nvec,
> -				 struct irq_affinity *affd)
> +static int msix_setup_interrupts(struct pci_dev *dev, struct msix_entry *entries,
> +				 int nvec, struct irq_affinity *affd)
>  {
>  	struct irq_affinity_desc *masks = NULL;
>  	int ret;
> @@ -640,7 +661,7 @@ static int msix_setup_interrupts(struct
>  		masks = irq_create_affinity_masks(nvec, affd);
>  
>  	msi_lock_descs(&dev->dev);
> -	ret = msix_setup_msi_descs(dev, base, entries, nvec, masks);
> +	ret = msix_setup_msi_descs(dev, entries, nvec, masks);
>  	if (ret)
>  		goto out_free;
>  
> @@ -678,7 +699,6 @@ static int msix_setup_interrupts(struct
>  static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  				int nvec, struct irq_affinity *affd)
>  {
> -	void __iomem *base;
>  	int ret, tsize;
>  	u16 control;
>  
> @@ -696,15 +716,13 @@ static int msix_capability_init(struct p
>  	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
>  	/* Request & Map MSI-X table region */
>  	tsize = msix_table_size(control);
> -	base = msix_map_region(dev, tsize);
> -	if (!base) {
> +	dev->msix_base = msix_map_region(dev, tsize);
> +	if (!dev->msix_base) {
>  		ret = -ENOMEM;
>  		goto out_disable;
>  	}
>  
> -	dev->msix_base = base;
> -
> -	ret = msix_setup_interrupts(dev, base, entries, nvec, affd);
> +	ret = msix_setup_interrupts(dev, entries, nvec, affd);
>  	if (ret)
>  		goto out_disable;
>  
> @@ -719,7 +737,7 @@ static int msix_capability_init(struct p
>  	 * which takes the MSI-X mask bits into account even
>  	 * when MSI-X is disabled, which prevents MSI delivery.
>  	 */
> -	msix_mask_all(base, tsize);
> +	msix_mask_all(dev->msix_base, tsize);
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  
>  	pcibios_free_irq(dev);
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -84,6 +84,8 @@ static inline __attribute_const__ u32 ms
>  	return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
>  }
>  
> +void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc);
> +
>  /* Subsystem variables */
>  extern int pci_msi_enable;
>  
>

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -569,34 +569,56 @@  static void __iomem *msix_map_region(str
 	return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
 }
 
-static int msix_setup_msi_descs(struct pci_dev *dev, void __iomem *base,
-				struct msix_entry *entries, int nvec,
-				struct irq_affinity_desc *masks)
+/**
+ * msix_prepare_msi_desc - Prepare a half initialized MSI descriptor for operation
+ * @dev:	The PCI device for which the descriptor is prepared
+ * @desc:	The MSI descriptor for preparation
+ *
+ * This is seperate from msix_setup_msi_descs() below to handle dynamic
+ * allocations for MSIX after initial enablement.
+ *
+ * Ideally the whole MSIX setup would work that way, but there is no way to
+ * support this for the legacy arch_setup_msi_irqs() mechanism and for the
+ * fake irq domains like the x86 XEN one. Sigh...
+ *
+ * The descriptor is zeroed and only @desc::msi_index and @desc::affinity
+ * are set. When called from msix_setup_msi_descs() then the is_virtual
+ * attribute is initialized as well.
+ *
+ * Fill in the rest.
+ */
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
+{
+	desc->nvec_used				= 1;
+	desc->pci.msi_attrib.is_msix		= 1;
+	desc->pci.msi_attrib.is_64		= 1;
+	desc->pci.msi_attrib.default_irq	= dev->irq;
+	desc->pci.mask_base			= dev->msix_base;
+	desc->pci.msi_attrib.can_mask		= !pci_msi_ignore_mask &&
+						  !desc->pci.msi_attrib.is_virtual;
+
+	if (desc->pci.msi_attrib.can_mask) {
+		void __iomem *addr = pci_msix_desc_addr(desc);
+
+		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	}
+}
+
+static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
+				int nvec, struct irq_affinity_desc *masks)
 {
 	int ret = 0, i, vec_count = pci_msix_vec_count(dev);
 	struct irq_affinity_desc *curmsk;
 	struct msi_desc desc;
-	void __iomem *addr;
 
 	memset(&desc, 0, sizeof(desc));
 
-	desc.nvec_used			= 1;
-	desc.pci.msi_attrib.is_msix	= 1;
-	desc.pci.msi_attrib.is_64	= 1;
-	desc.pci.msi_attrib.default_irq	= dev->irq;
-	desc.pci.mask_base		= base;
-
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-		desc.pci.msi_attrib.can_mask = !pci_msi_ignore_mask &&
-					       !desc.pci.msi_attrib.is_virtual;
 
-		if (desc.pci.msi_attrib.can_mask) {
-			addr = pci_msix_desc_addr(&desc);
-			desc.pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
-		}
+		msix_prepare_msi_desc(dev, &desc);
 
 		ret = msi_insert_msi_desc(&dev->dev, &desc);
 		if (ret)
@@ -629,9 +651,8 @@  static void msix_mask_all(void __iomem *
 		writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
-static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
-				 struct msix_entry *entries, int nvec,
-				 struct irq_affinity *affd)
+static int msix_setup_interrupts(struct pci_dev *dev, struct msix_entry *entries,
+				 int nvec, struct irq_affinity *affd)
 {
 	struct irq_affinity_desc *masks = NULL;
 	int ret;
@@ -640,7 +661,7 @@  static int msix_setup_interrupts(struct
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msix_setup_msi_descs(dev, base, entries, nvec, masks);
+	ret = msix_setup_msi_descs(dev, entries, nvec, masks);
 	if (ret)
 		goto out_free;
 
@@ -678,7 +699,6 @@  static int msix_setup_interrupts(struct
 static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 				int nvec, struct irq_affinity *affd)
 {
-	void __iomem *base;
 	int ret, tsize;
 	u16 control;
 
@@ -696,15 +716,13 @@  static int msix_capability_init(struct p
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
 	/* Request & Map MSI-X table region */
 	tsize = msix_table_size(control);
-	base = msix_map_region(dev, tsize);
-	if (!base) {
+	dev->msix_base = msix_map_region(dev, tsize);
+	if (!dev->msix_base) {
 		ret = -ENOMEM;
 		goto out_disable;
 	}
 
-	dev->msix_base = base;
-
-	ret = msix_setup_interrupts(dev, base, entries, nvec, affd);
+	ret = msix_setup_interrupts(dev, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
 
@@ -719,7 +737,7 @@  static int msix_capability_init(struct p
 	 * which takes the MSI-X mask bits into account even
 	 * when MSI-X is disabled, which prevents MSI delivery.
 	 */
-	msix_mask_all(base, tsize);
+	msix_mask_all(dev->msix_base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -84,6 +84,8 @@  static inline __attribute_const__ u32 ms
 	return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
 }
 
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc);
+
 /* Subsystem variables */
 extern int pci_msi_enable;

[23/33] PCI/MSI: Split MSIX descriptor setup

Commit Message

Comments

Patch