diff mbox series

[v12,2/9] PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller

Message ID 20241211-ep-msi-v12-2-33d4532fa520@nxp.com (mailing list archive)
State Superseded
Headers show
Series PCI: EP: Add RC-to-EP doorbell with platform MSI controller | expand

Commit Message

Frank Li Dec. 11, 2024, 8:57 p.m. UTC
Doorbell feature is implemented by mapping the EP's MSI interrupt
controller message address to a dedicated BAR in the EPC core. It is the
responsibility of the EPF driver to pass the actual message data to be
written by the host to the doorbell BAR region through its own logic.

Tested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
Change from v10 to v12
- none

Change from v9 to v10
- Create msi domain for each function device.
- Remove only function support limiation. My hardware only support one
function, so not test more than one case.
- use "msi-map" descript msi information

  msi-map = <func_no << 8  | vfunc_no, &its, start_stream_id,  size>;

Chagne from v8 to v9
- sort header file
- use pci_epc_get(dev_name(msi_desc_to_dev(desc)));
- check epf number at pci_epf_alloc_doorbell
- Add comments for miss msi-parent case

change from v5 to v8
-none

Change from v4 to v5
- Remove request_irq() in pci_epc_alloc_doorbell() and leave to EP function
driver, so ep function driver can register differece call back function for
difference doorbell events and set irq affinity to differece CPU core.
- Improve error message when MSI allocate failure.

Change from v3 to v4
- msi change to use msi_get_virq() avoid use msi_for_each_desc().
- add new struct for pci_epf_doorbell_msg to msi msg,virq and irq name.
- move mutex lock to epc function
- initialize variable at declear place.
- passdown epf to epc*() function to simplify code.
---
 drivers/pci/endpoint/Makefile     |   2 +-
 drivers/pci/endpoint/pci-ep-msi.c | 148 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci-ep-msi.h        |  15 ++++
 include/linux/pci-epf.h           |  16 +++++
 4 files changed, 180 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Dec. 17, 2024, 10:56 p.m. UTC | #1
On Wed, Dec 11 2024 at 15:57, Frank Li wrote:
> +static int pci_epf_msi_prepare(struct irq_domain *domain, struct device *dev,
> +			       int nvec, msi_alloc_info_t *arg)
> +{
> +	struct pci_epf *epf = to_pci_epf(dev);
> +	struct msi_domain_info *msi_info;
> +	struct pci_epc *epc = epf->epc;
> +
> +	memset(arg, 0, sizeof(*arg));
> +	arg->scratchpad[0].ul = of_msi_map_id(epc->dev.parent, NULL,
> +					      (epf->func_no << 8) | epf->vfunc_no);
> +
> +	/*
> +	 * @domain->msi_domain_info->hwsize contains the size of the device
> +	 * domain, but vector allocation happens one by one.
> +	 */
> +	msi_info = msi_get_domain_info(domain);
> +	if (msi_info->hwsize > nvec)
> +		nvec = msi_info->hwsize;
> +
> +	/* Allocate at least 32 MSIs, and always as a power of 2 */
> +	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
> +
> +	msi_info = msi_get_domain_info(domain->parent);
> +	return msi_info->ops->msi_prepare(domain->parent, dev, nvec, arg);

While I was trying to make sense of the change log of patch [1/9] I
looked at this function to understand why this needs an override.

This is a copy of its_msi_prepare() except for the scratchpad[0].ul
part. But that's a GIC-V3 implementation specific detail, which has
absolutely no business in code which claims to be a generic library for
PCI endpoints.

Worse you created a GIC-V3 only PCI endpoint library under the
assumption that the underlying ITS/MSI implementation is immutable. Of
course there is no safety net either to validate that the underlying
parent domain is actually GIC-V3-ITS. That's wrong in every aspect.

So let's take a step back and analyze what is actually required to make
this a proper generic library.

The endpoint function device needs its own device ID which is required
to set up a device specific translation in the interrupt remapping unit.

Now you decided that this is bound to a DT mapping, which is odd to
begin with. What's DT specific about this? The cirumstance that your
hardware is DT based and the endpoint controller ID map needs to be
retrieved from there? How is this generic in any way? How is this
supposed to work with ACPI enumerated hardware? Not to ask the question
how this should work with non GIC-V3-ITS based hardware.

That's all but generic, it's an ad hoc hack to support your particular
setup implemented by layering violations.

In fact the mapping ID is composed by the parent mapping ID and the
function numbers, right?

The general PCIe convention here is:

    domain:bus:slot.func

That's well defined and if you look at real devices then lspci shows:

0000:3d:00.1 Ethernet controller: Ethernet Connection for 10GBASE-T
0000:3d:06.0 Ethernet controller: Ethernet Virtual Function
0000:...
0000:3d:06.7 Ethernet controller: Ethernet Virtual Function
0000:3d:07.0 Ethernet controller: Ethernet Virtual Function
0000:...
0000:3d:07.7 Ethernet controller: Ethernet Virtual Function

In PCI address representation:

   domain:bus:slot:function

which is usually condensed into a single word based on the range limits
of function, device and bus:

   function:    bit 0-2         (max. 8)
   device:      bit 3-7         (max. 32)
   bus:         bit 8-15        (max. 256)
   domain:      bit 16-31       (mostly theoretical)

Endpoint devices should follow exactly the same scheme, no?

Now looking at your ID retrieval:

> +	arg->scratchpad[0].ul = of_msi_map_id(epc->dev.parent, NULL,
> +					      (epf->func_no << 8) | epf->vfunc_no);

I really have to ask why this is making up its own representation
instead of simply using the standard PCI B/D/F conventions?

Whatever the reason is, fact is that the actual interrupt domain support
needs to be done differently. There is no way that the endpoint library
makes assumption about the underlying interrupt domain and copies a
function just because. This has to be completely agnostic, no if, no
but.

So the consequence is that the underlying MSI parent domains needs to
know about the endpoint requirements, which is how all MSI variants are
modeled, i.e. with a MSI domain bus.

That also solves the problem of immutable MSI messages without any
further magic. Interrupt domains, which do not provide them, won't
provide the endpoint MSI domain bus and therefore the lookup of the
parent MSI domain for the endpoint fails.

The uncompilable mockup below should give you a hint.

Thanks,

        tglx
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c |   50 ++++++++++++++++++++--------
 drivers/irqchip/irq-msi-lib.c               |    5 ++
 drivers/irqchip/irq-msi-lib.h               |   12 +++++-
 include/linux/irqdomain_defs.h              |    2 +
 4 files changed, 51 insertions(+), 18 deletions(-)

--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -126,20 +126,9 @@ int __weak iort_pmsi_get_dev_id(struct d
 	return -1;
 }
 
-static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
-			    int nvec, msi_alloc_info_t *info)
+static int __its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
+			      int nvec, msi_alloc_info_t *info, u32 dev_id)
 {
-	struct msi_domain_info *msi_info;
-	u32 dev_id;
-	int ret;
-
-	if (dev->of_node)
-		ret = of_pmsi_get_dev_id(domain->parent, dev, &dev_id);
-	else
-		ret = iort_pmsi_get_dev_id(dev, &dev_id);
-	if (ret)
-		return ret;
-
 	/* ITS specific DeviceID, as the core ITS ignores dev. */
 	info->scratchpad[0].ul = dev_id;
 
@@ -159,6 +148,36 @@ static int its_pmsi_prepare(struct irq_d
 					  dev, nvec, info);
 }
 
+static int its_pci_ep_msi_prepare(struct irq_domain *domain, struct device *dev,
+				  int nvec, msi_alloc_info_t *info)
+{
+	u32 dev_id = dev_get_pci_ep_id(dev);
+	struct msi_domain_info *msi_info;
+	int ret = -ENOTSUPP;
+
+	if (dev->of_node)
+		ret = do_magic_ep_id_map();
+	if (ret)
+		return ret;
+	return __its_pmsi_prepare(domain, dev, nvec, info, dev_id);
+}
+
+static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
+			    int nvec, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+	u32 dev_id;
+	int ret;
+
+	if (dev->of_node)
+		ret = of_pmsi_get_dev_id(domain->parent, dev, &dev_id);
+	else
+		ret = iort_pmsi_get_dev_id(dev, &dev_id);
+	if (ret)
+		return ret;
+	return __its_pmsi_prepare(domain, dev, nvec, info, dev_id);
+}
+
 static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
@@ -183,6 +202,9 @@ static bool its_init_dev_msi_info(struct
 		 */
 		info->ops->msi_prepare = its_pci_msi_prepare;
 		break;
+	case DOMAIN_BUS_PCI_DEVICE_EP_MSI:
+		info->ops->msi_prepare = its_pci_ep_msi_prepare;
+		break;
 	case DOMAIN_BUS_DEVICE_MSI:
 	case DOMAIN_BUS_WIRED_TO_MSI:
 		/*
@@ -204,7 +226,7 @@ const struct msi_parent_ops gic_v3_its_m
 	.supported_flags	= ITS_MSI_FLAGS_SUPPORTED,
 	.required_flags		= ITS_MSI_FLAGS_REQUIRED,
 	.bus_select_token	= DOMAIN_BUS_NEXUS,
-	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_MSI,
+	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_PCI_EP_MSI | MATCH_PLATFORM_MSI,
 	.prefix			= "ITS-",
 	.init_dev_msi_info	= its_init_dev_msi_info,
 };
--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -55,8 +55,11 @@ bool msi_lib_init_dev_msi_info(struct de
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 		if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PCI_MSI)))
 			return false;
-
 		break;
+	case DOMAIN_BUS_DEVICE_PCI_EP_MSI:
+		if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PCI_ENDPOINT)))
+			return false;
+		fallthrough;
 	case DOMAIN_BUS_DEVICE_MSI:
 		/*
 		 * Per device MSI should never have any MSI feature bits
--- a/drivers/irqchip/irq-msi-lib.h
+++ b/drivers/irqchip/irq-msi-lib.h
@@ -10,12 +10,18 @@
 #include <linux/msi.h>
 
 #ifdef CONFIG_PCI_MSI
-#define MATCH_PCI_MSI		BIT(DOMAIN_BUS_PCI_MSI)
+#define MATCH_PCI_MSI			BIT(DOMAIN_BUS_PCI_MSI)
 #else
-#define MATCH_PCI_MSI		(0)
+#define MATCH_PCI_MSI			(0)
 #endif
 
-#define MATCH_PLATFORM_MSI	BIT(DOMAIN_BUS_PLATFORM_MSI)
+#ifdef CONFIG_PCI_ENDPOINT
+#define MATCH_PLATFORM_PCI_EP_MSI	BIT(DOMAIN_BUS_PLATFORM_PCI_EP_MSI)
+#else
+#define MATCH_PLATFORM_PCI_EP_MSI	(0)
+#endif
+
+#define MATCH_PLATFORM_MSI		BIT(DOMAIN_BUS_PLATFORM_MSI)
 
 int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec,
 			      enum irq_domain_bus_token bus_token);
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -15,6 +15,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_GENERIC_MSI,
 	DOMAIN_BUS_PCI_MSI,
 	DOMAIN_BUS_PLATFORM_MSI,
+	DOMAIN_BUS_PLATFORM_PCI_EP_MSI,
 	DOMAIN_BUS_NEXUS,
 	DOMAIN_BUS_IPI,
 	DOMAIN_BUS_FSL_MC_MSI,
@@ -27,6 +28,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_AMDVI,
 	DOMAIN_BUS_DEVICE_MSI,
 	DOMAIN_BUS_WIRED_TO_MSI,
+	DOMAIN_BUS_DEVICE_PCI_EP_MSI,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
Frank Li Dec. 18, 2024, 5:44 p.m. UTC | #2
On Tue, Dec 17, 2024 at 11:56:18PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 11 2024 at 15:57, Frank Li wrote:
> > +static int pci_epf_msi_prepare(struct irq_domain *domain, struct device *dev,
> > +			       int nvec, msi_alloc_info_t *arg)
> > +{
> > +	struct pci_epf *epf = to_pci_epf(dev);
> > +	struct msi_domain_info *msi_info;
> > +	struct pci_epc *epc = epf->epc;
> > +
> > +	memset(arg, 0, sizeof(*arg));
> > +	arg->scratchpad[0].ul = of_msi_map_id(epc->dev.parent, NULL,
> > +					      (epf->func_no << 8) | epf->vfunc_no);
> > +
> > +	/*
> > +	 * @domain->msi_domain_info->hwsize contains the size of the device
> > +	 * domain, but vector allocation happens one by one.
> > +	 */
> > +	msi_info = msi_get_domain_info(domain);
> > +	if (msi_info->hwsize > nvec)
> > +		nvec = msi_info->hwsize;
> > +
> > +	/* Allocate at least 32 MSIs, and always as a power of 2 */
> > +	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
> > +
> > +	msi_info = msi_get_domain_info(domain->parent);
> > +	return msi_info->ops->msi_prepare(domain->parent, dev, nvec, arg);
>
> While I was trying to make sense of the change log of patch [1/9] I
> looked at this function to understand why this needs an override.
>
> This is a copy of its_msi_prepare() except for the scratchpad[0].ul
> part. But that's a GIC-V3 implementation specific detail, which has
> absolutely no business in code which claims to be a generic library for
> PCI endpoints.
>
> Worse you created a GIC-V3 only PCI endpoint library under the
> assumption that the underlying ITS/MSI implementation is immutable. Of
> course there is no safety net either to validate that the underlying
> parent domain is actually GIC-V3-ITS. That's wrong in every aspect.
>
> So let's take a step back and analyze what is actually required to make
> this a proper generic library.
>
> The endpoint function device needs its own device ID which is required
> to set up a device specific translation in the interrupt remapping unit.
>
> Now you decided that this is bound to a DT mapping, which is odd to
> begin with. What's DT specific about this? The cirumstance that your
> hardware is DT based and the endpoint controller ID map needs to be
> retrieved from there? How is this generic in any way? How is this
> supposed to work with ACPI enumerated hardware? Not to ask the question
> how this should work with non GIC-V3-ITS based hardware.
>
> That's all but generic, it's an ad hoc hack to support your particular
> setup implemented by layering violations.
>
> In fact the mapping ID is composed by the parent mapping ID and the
> function numbers, right?
>
> The general PCIe convention here is:
>
>     domain:bus:slot.func
>
> That's well defined and if you look at real devices then lspci shows:
>
> 0000:3d:00.1 Ethernet controller: Ethernet Connection for 10GBASE-T
> 0000:3d:06.0 Ethernet controller: Ethernet Virtual Function
> 0000:...
> 0000:3d:06.7 Ethernet controller: Ethernet Virtual Function
> 0000:3d:07.0 Ethernet controller: Ethernet Virtual Function
> 0000:...
> 0000:3d:07.7 Ethernet controller: Ethernet Virtual Function
>
> In PCI address representation:
>
>    domain:bus:slot:function
>
> which is usually condensed into a single word based on the range limits
> of function, device and bus:
>
>    function:    bit 0-2         (max. 8)
>    device:      bit 3-7         (max. 32)
>    bus:         bit 8-15        (max. 256)
>    domain:      bit 16-31       (mostly theoretical)
>
> Endpoint devices should follow exactly the same scheme, no?

Can't reuse BDF beasue Bus because device have not any means for EP side.
such as PCI EP controller have 8 physical functions. called EP.1 EP.2 ....

EP.n may connect PCI host 1, which bus number is 1. so BDF
	0000.01.00.1: sampe EP devices

but if it connect another PCI bus 3,  so BDF is
	0000.03.00.1 sampe EP devices.

At EP side, it don't care connect to RC bus1 or bus3.

	EP.1  use MSI irq 3
	EP.2  use MSI ire 4

regardless EP connect bus 1 or bus 3, door bell always trigger EP side
irq 3 for EP.1,  irq 4 for EP.2

So only function and vfunction should be used for as request id.

I worry about some hardware (I have not met yet) direct use BDF from RC
side because it is simple hardware implementation for dual role pci
controller.

>
> Now looking at your ID retrieval:
>
> > +	arg->scratchpad[0].ul = of_msi_map_id(epc->dev.parent, NULL,
> > +					      (epf->func_no << 8) | epf->vfunc_no);
>
> I really have to ask why this is making up its own representation
> instead of simply using the standard PCI B/D/F conventions?

See above!

Addtional difference vfunc need difference doorbell msi domain.
I have not sure how virtual function to map to BDF. If it map D, we just
simple mask B. I want reserve it to avoid ABI change in future if vfunc
support since my hardware don't support EP's side vfunc yet.

Anyway, let me implement what your below suggestion then fine tune this.

>
> Whatever the reason is, fact is that the actual interrupt domain support
> needs to be done differently. There is no way that the endpoint library
> makes assumption about the underlying interrupt domain and copies a
> function just because. This has to be completely agnostic, no if, no
> but.
>
> So the consequence is that the underlying MSI parent domains needs to
> know about the endpoint requirements, which is how all MSI variants are
> modeled, i.e. with a MSI domain bus.
>
> That also solves the problem of immutable MSI messages without any
> further magic. Interrupt domains, which do not provide them, won't
> provide the endpoint MSI domain bus and therefore the lookup of the
> parent MSI domain for the endpoint fails.
>
> The uncompilable mockup below should give you a hint.

Thank for your example, let me try to implement it.

Frank

>
> Thanks,
>
>         tglx
> ---
>  drivers/irqchip/irq-gic-v3-its-msi-parent.c |   50 ++++++++++++++++++++--------
>  drivers/irqchip/irq-msi-lib.c               |    5 ++
>  drivers/irqchip/irq-msi-lib.h               |   12 +++++-
>  include/linux/irqdomain_defs.h              |    2 +
>  4 files changed, 51 insertions(+), 18 deletions(-)
>
> --- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> +++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> @@ -126,20 +126,9 @@ int __weak iort_pmsi_get_dev_id(struct d
>  	return -1;
>  }
>
> -static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
> -			    int nvec, msi_alloc_info_t *info)
> +static int __its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
> +			      int nvec, msi_alloc_info_t *info, u32 dev_id)
>  {
> -	struct msi_domain_info *msi_info;
> -	u32 dev_id;
> -	int ret;
> -
> -	if (dev->of_node)
> -		ret = of_pmsi_get_dev_id(domain->parent, dev, &dev_id);
> -	else
> -		ret = iort_pmsi_get_dev_id(dev, &dev_id);
> -	if (ret)
> -		return ret;
> -
>  	/* ITS specific DeviceID, as the core ITS ignores dev. */
>  	info->scratchpad[0].ul = dev_id;
>
> @@ -159,6 +148,36 @@ static int its_pmsi_prepare(struct irq_d
>  					  dev, nvec, info);
>  }
>
> +static int its_pci_ep_msi_prepare(struct irq_domain *domain, struct device *dev,
> +				  int nvec, msi_alloc_info_t *info)
> +{
> +	u32 dev_id = dev_get_pci_ep_id(dev);
> +	struct msi_domain_info *msi_info;
> +	int ret = -ENOTSUPP;
> +
> +	if (dev->of_node)
> +		ret = do_magic_ep_id_map();
> +	if (ret)
> +		return ret;
> +	return __its_pmsi_prepare(domain, dev, nvec, info, dev_id);
> +}
> +
> +static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
> +			    int nvec, msi_alloc_info_t *info)
> +{
> +	struct msi_domain_info *msi_info;
> +	u32 dev_id;
> +	int ret;
> +
> +	if (dev->of_node)
> +		ret = of_pmsi_get_dev_id(domain->parent, dev, &dev_id);
> +	else
> +		ret = iort_pmsi_get_dev_id(dev, &dev_id);
> +	if (ret)
> +		return ret;
> +	return __its_pmsi_prepare(domain, dev, nvec, info, dev_id);
> +}
> +
>  static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
>  				  struct irq_domain *real_parent, struct msi_domain_info *info)
>  {
> @@ -183,6 +202,9 @@ static bool its_init_dev_msi_info(struct
>  		 */
>  		info->ops->msi_prepare = its_pci_msi_prepare;
>  		break;
> +	case DOMAIN_BUS_PCI_DEVICE_EP_MSI:
> +		info->ops->msi_prepare = its_pci_ep_msi_prepare;
> +		break;
>  	case DOMAIN_BUS_DEVICE_MSI:
>  	case DOMAIN_BUS_WIRED_TO_MSI:
>  		/*
> @@ -204,7 +226,7 @@ const struct msi_parent_ops gic_v3_its_m
>  	.supported_flags	= ITS_MSI_FLAGS_SUPPORTED,
>  	.required_flags		= ITS_MSI_FLAGS_REQUIRED,
>  	.bus_select_token	= DOMAIN_BUS_NEXUS,
> -	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_MSI,
> +	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_PCI_EP_MSI | MATCH_PLATFORM_MSI,
>  	.prefix			= "ITS-",
>  	.init_dev_msi_info	= its_init_dev_msi_info,
>  };
> --- a/drivers/irqchip/irq-msi-lib.c
> +++ b/drivers/irqchip/irq-msi-lib.c
> @@ -55,8 +55,11 @@ bool msi_lib_init_dev_msi_info(struct de
>  	case DOMAIN_BUS_PCI_DEVICE_MSIX:
>  		if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PCI_MSI)))
>  			return false;
> -
>  		break;
> +	case DOMAIN_BUS_DEVICE_PCI_EP_MSI:
> +		if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PCI_ENDPOINT)))
> +			return false;
> +		fallthrough;
>  	case DOMAIN_BUS_DEVICE_MSI:
>  		/*
>  		 * Per device MSI should never have any MSI feature bits
> --- a/drivers/irqchip/irq-msi-lib.h
> +++ b/drivers/irqchip/irq-msi-lib.h
> @@ -10,12 +10,18 @@
>  #include <linux/msi.h>
>
>  #ifdef CONFIG_PCI_MSI
> -#define MATCH_PCI_MSI		BIT(DOMAIN_BUS_PCI_MSI)
> +#define MATCH_PCI_MSI			BIT(DOMAIN_BUS_PCI_MSI)
>  #else
> -#define MATCH_PCI_MSI		(0)
> +#define MATCH_PCI_MSI			(0)
>  #endif
>
> -#define MATCH_PLATFORM_MSI	BIT(DOMAIN_BUS_PLATFORM_MSI)
> +#ifdef CONFIG_PCI_ENDPOINT
> +#define MATCH_PLATFORM_PCI_EP_MSI	BIT(DOMAIN_BUS_PLATFORM_PCI_EP_MSI)
> +#else
> +#define MATCH_PLATFORM_PCI_EP_MSI	(0)
> +#endif
> +
> +#define MATCH_PLATFORM_MSI		BIT(DOMAIN_BUS_PLATFORM_MSI)
>
>  int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec,
>  			      enum irq_domain_bus_token bus_token);
> --- a/include/linux/irqdomain_defs.h
> +++ b/include/linux/irqdomain_defs.h
> @@ -15,6 +15,7 @@ enum irq_domain_bus_token {
>  	DOMAIN_BUS_GENERIC_MSI,
>  	DOMAIN_BUS_PCI_MSI,
>  	DOMAIN_BUS_PLATFORM_MSI,
> +	DOMAIN_BUS_PLATFORM_PCI_EP_MSI,
>  	DOMAIN_BUS_NEXUS,
>  	DOMAIN_BUS_IPI,
>  	DOMAIN_BUS_FSL_MC_MSI,
> @@ -27,6 +28,7 @@ enum irq_domain_bus_token {
>  	DOMAIN_BUS_AMDVI,
>  	DOMAIN_BUS_DEVICE_MSI,
>  	DOMAIN_BUS_WIRED_TO_MSI,
> +	DOMAIN_BUS_DEVICE_PCI_EP_MSI,
>  };
>
>  #endif /* _LINUX_IRQDOMAIN_DEFS_H */
diff mbox series

Patch

diff --git a/drivers/pci/endpoint/Makefile b/drivers/pci/endpoint/Makefile
index 95b2fe47e3b06..a1ccce440c2c5 100644
--- a/drivers/pci/endpoint/Makefile
+++ b/drivers/pci/endpoint/Makefile
@@ -5,4 +5,4 @@ 
 
 obj-$(CONFIG_PCI_ENDPOINT_CONFIGFS)	+= pci-ep-cfs.o
 obj-$(CONFIG_PCI_ENDPOINT)		+= pci-epc-core.o pci-epf-core.o\
-					   pci-epc-mem.o functions/
+					   pci-epc-mem.o pci-ep-msi.o functions/
diff --git a/drivers/pci/endpoint/pci-ep-msi.c b/drivers/pci/endpoint/pci-ep-msi.c
new file mode 100644
index 0000000000000..b0a91fde202f3
--- /dev/null
+++ b/drivers/pci/endpoint/pci-ep-msi.c
@@ -0,0 +1,148 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCI Endpoint *Controller* (EPC) MSI library
+ *
+ * Copyright (C) 2024 NXP
+ * Author: Frank Li <Frank.Li@nxp.com>
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/msi.h>
+#include <linux/of_irq.h>
+#include <linux/pci-epc.h>
+#include <linux/pci-epf.h>
+#include <linux/pci-ep-cfs.h>
+#include <linux/pci-ep-msi.h>
+#include <linux/slab.h>
+
+static void pci_epf_write_msi_msg(struct irq_data *d, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(d);
+	struct pci_epf *epf = to_pci_epf(desc->dev);
+
+	if (epf && epf->db_msg && desc->msi_index < epf->num_db)
+		memcpy(&epf->db_msg[desc->msi_index].msg, msg, sizeof(*msg));
+}
+
+static void pci_epf_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = desc->msi_index;
+}
+
+static int pci_epf_msi_prepare(struct irq_domain *domain, struct device *dev,
+			       int nvec, msi_alloc_info_t *arg)
+{
+	struct pci_epf *epf = to_pci_epf(dev);
+	struct msi_domain_info *msi_info;
+	struct pci_epc *epc = epf->epc;
+
+	memset(arg, 0, sizeof(*arg));
+	arg->scratchpad[0].ul = of_msi_map_id(epc->dev.parent, NULL,
+					      (epf->func_no << 8) | epf->vfunc_no);
+
+	/*
+	 * @domain->msi_domain_info->hwsize contains the size of the device
+	 * domain, but vector allocation happens one by one.
+	 */
+	msi_info = msi_get_domain_info(domain);
+	if (msi_info->hwsize > nvec)
+		nvec = msi_info->hwsize;
+
+	/* Allocate at least 32 MSIs, and always as a power of 2 */
+	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
+
+	msi_info = msi_get_domain_info(domain->parent);
+	return msi_info->ops->msi_prepare(domain->parent, dev, nvec, arg);
+}
+
+static const struct msi_domain_template pci_epf_msi_template = {
+	.chip = {
+		.name			= "EP-MSI",
+		.irq_mask		= irq_chip_mask_parent,
+		.irq_unmask		= irq_chip_unmask_parent,
+		.irq_write_msi_msg	= pci_epf_write_msi_msg,
+		/* The rest is filled in by the MSI parent */
+	},
+
+	.ops = {
+		.msi_prepare		= pci_epf_msi_prepare,
+		.set_desc		= pci_epf_msi_set_desc,
+	},
+
+	.info = {
+		.bus_token		= DOMAIN_BUS_DEVICE_MSI,
+	},
+};
+
+static int pci_epf_device_msi_init_and_alloc_irqs(struct device *dev, unsigned int nvec)
+{
+	struct irq_domain *domain = dev->msi.domain;
+
+	if (!domain)
+		return -EINVAL;
+
+	if (!msi_create_device_irq_domain(dev, MSI_DEFAULT_DOMAIN,
+					  &pci_epf_msi_template, nvec, NULL, NULL))
+		return -ENODEV;
+
+	return msi_domain_alloc_irqs_range(dev, MSI_DEFAULT_DOMAIN, 0, nvec - 1);
+}
+
+int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 num_db)
+{
+	struct pci_epc *epc = epf->epc;
+	struct device *dev = &epf->dev;
+	struct irq_domain *dom;
+	void *msg;
+	u32 rid;
+	int ret;
+	int i;
+
+	rid = (epf->func_no << 8) | epf->vfunc_no;
+	dom = of_msi_map_get_device_domain(epc->dev.parent, rid, DOMAIN_BUS_PLATFORM_MSI);
+	if (!dom) {
+		dev_err(dev, "Can't find msi domain\n");
+		return -EINVAL;
+	}
+
+	dev_set_msi_domain(dev, dom);
+
+	msg = kcalloc(num_db, sizeof(struct pci_epf_doorbell_msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	epf->num_db = num_db;
+	epf->db_msg = msg;
+
+	ret = pci_epf_device_msi_init_and_alloc_irqs(dev, num_db);
+	if (ret) {
+		/*
+		 * The pcie_ep DT node has to specify 'msi-parent' for EP
+		 * doorbell support to work. Right now only GIC ITS is
+		 * supported. If you have GIC ITS and reached this print,
+		 * perhaps you are missing 'msi-map' in DT.
+		 */
+		dev_err(dev, "Failed to allocate MSI\n");
+		kfree(msg);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < num_db; i++)
+		epf->db_msg[i].virq = msi_get_virq(dev, i);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_epf_alloc_doorbell);
+
+void pci_epf_free_doorbell(struct pci_epf *epf)
+{
+	msi_domain_free_irqs_all(&epf->dev, MSI_DEFAULT_DOMAIN);
+	msi_remove_device_irq_domain(&epf->dev, MSI_DEFAULT_DOMAIN);
+
+	kfree(epf->db_msg);
+	epf->db_msg = NULL;
+	epf->num_db = 0;
+}
+EXPORT_SYMBOL_GPL(pci_epf_free_doorbell);
diff --git a/include/linux/pci-ep-msi.h b/include/linux/pci-ep-msi.h
new file mode 100644
index 0000000000000..f0cfecf491199
--- /dev/null
+++ b/include/linux/pci-ep-msi.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Endpoint *Function* side MSI header file
+ *
+ * Copyright (C) 2024 NXP
+ * Author: Frank Li <Frank.Li@nxp.com>
+ */
+
+#ifndef __PCI_EP_MSI__
+#define __PCI_EP_MSI__
+
+int pci_epf_alloc_doorbell(struct pci_epf *epf, u16 nums);
+void pci_epf_free_doorbell(struct pci_epf *epf);
+
+#endif /* __PCI_EP_MSI__ */
diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
index 18a3aeb62ae4e..5374e6515ffa0 100644
--- a/include/linux/pci-epf.h
+++ b/include/linux/pci-epf.h
@@ -12,6 +12,7 @@ 
 #include <linux/configfs.h>
 #include <linux/device.h>
 #include <linux/mod_devicetable.h>
+#include <linux/msi.h>
 #include <linux/pci.h>
 
 struct pci_epf;
@@ -125,6 +126,17 @@  struct pci_epf_bar {
 	int		flags;
 };
 
+/**
+ * struct pci_epf_doorbell_msg - represents doorbell message
+ * @msi_msg: MSI message
+ * @virq: irq number of this doorbell MSI message
+ * @name: irq name for doorbell interrupt
+ */
+struct pci_epf_doorbell_msg {
+	struct msi_msg msg;
+	int virq;
+};
+
 /**
  * struct pci_epf - represents the PCI EPF device
  * @dev: the PCI EPF device
@@ -152,6 +164,8 @@  struct pci_epf_bar {
  * @vfunction_num_map: bitmap to manage virtual function number
  * @pci_vepf: list of virtual endpoint functions associated with this function
  * @event_ops: Callbacks for capturing the EPC events
+ * @db_msg: data for MSI from RC side
+ * @num_db: number of doorbells
  */
 struct pci_epf {
 	struct device		dev;
@@ -182,6 +196,8 @@  struct pci_epf {
 	unsigned long		vfunction_num_map;
 	struct list_head	pci_vepf;
 	const struct pci_epc_event_ops *event_ops;
+	struct pci_epf_doorbell_msg *db_msg;
+	u16 num_db;
 };
 
 /**