diff mbox series

[v13,4/9] irqchip/gic-v3-its: Add DOMAIN_BUS_DEVICE_PCI_EP_MSI support

Message ID 20241218-ep-msi-v13-4-646e2192dc24@nxp.com (mailing list archive)
State New
Headers show
Series PCI: EP: Add RC-to-EP doorbell with platform MSI controller | expand

Commit Message

Frank Li Dec. 18, 2024, 11:08 p.m. UTC
┌────────────────────────────────┐
           │                                │
           │     PCI Endpoint Controller    │
           │                                │
           │   ┌─────┐  ┌─────┐     ┌─────┐ │
PCI Bus    │   │     │  │     │     │     │ │
─────────► │   │Func1│  │Func2│ ... │Func │ │
Doorbell   │   │     │  │     │     │<n>  │ │
           │   │     │  │     │     │     │ │
           │   └──┬──┘  └──┬──┘     └──┬──┘ │
           │      │        │           │    │
           └──────┼────────┼───────────┼────┘
                  │        │           │
                  ▼        ▼           ▼
               ┌────────────────────────┐
               │    MSI Controller      │
               └────────────────────────┘

Add domain DOMAIN_BUS_DEVICE_PCI_EP_MSI to allocate MSI domain for Endpoint
function in PCI Endpoint (EP) controller, So PCI Root Complex (RC) can
write MSI message to MSI controller to trigger doorbell IRQ for difference
EP functions.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
change from v12 to v13
- new patch
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Dec. 19, 2024, 10:52 a.m. UTC | #1
On Wed, 18 Dec 2024 23:08:39 +0000,
Frank Li <Frank.Li@nxp.com> wrote:
> 
>            ┌────────────────────────────────┐
>            │                                │
>            │     PCI Endpoint Controller    │
>            │                                │
>            │   ┌─────┐  ┌─────┐     ┌─────┐ │
> PCI Bus    │   │     │  │     │     │     │ │
> ─────────► │   │Func1│  │Func2│ ... │Func │ │
> Doorbell   │   │     │  │     │     │<n>  │ │
>            │   │     │  │     │     │     │ │
>            │   └──┬──┘  └──┬──┘     └──┬──┘ │
>            │      │        │           │    │
>            └──────┼────────┼───────────┼────┘
>                   │        │           │
>                   ▼        ▼           ▼
>                ┌────────────────────────┐
>                │    MSI Controller      │
>                └────────────────────────┘
> 
> Add domain DOMAIN_BUS_DEVICE_PCI_EP_MSI to allocate MSI domain for Endpoint
> function in PCI Endpoint (EP) controller, So PCI Root Complex (RC) can
> write MSI message to MSI controller to trigger doorbell IRQ for difference
> EP functions.
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> ---
> change from v12 to v13
> - new patch

This might be v13, but after all this time, I have no idea what you
are trying to do. You keep pasting this non-ASCII drawing in commit
messages, but I still have no idea what this PCI Bus Doorbell
represents.

I appreciate the knowledge shortage is on my end, but it would
definitely help if someone would take the time to explain what this is
all about.

From what I gather, the ITS is actually on an end-point, and get
writes from the host, but that doesn't answer much.

> ---
>  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> index b2a4b67545b82..16e7d53f0b133 100644
> --- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> +++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> @@ -5,6 +5,7 @@
>  // Copyright (C) 2022 Intel
>  
>  #include <linux/acpi_iort.h>
> +#include <linux/pci-ep-msi.h>
>  #include <linux/pci.h>
>  
>  #include "irq-gic-common.h"
> @@ -173,6 +174,19 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
>  	return its_pmsi_prepare_devid(domain, dev, nvec, info, dev_id);
>  }
>  
> +static int its_pci_ep_msi_prepare(struct irq_domain *domain, struct device *dev,
> +				  int nvec, msi_alloc_info_t *info)
> +{
> +	u32 dev_id;
> +	int ret;
> +
> +	ret = pci_epf_msi_domain_get_msi_rid(dev, &dev_id);

What this doesn't express is *how* are the writes conveyed to the ITS.
Specifically, the DevID is normally sampled as sideband information at
during the write transaction.

Obviously, you can't do that over PCI. So there is a lot of
undisclosed assumption about how the ITS is integrated, and how it
samples the DevID.

My conclusion is that this is not as generic as it seems to be. It is
definitely tied to implementation-specific behaviours, none of which
are explained.

Thanks,

	M.
Frank Li Dec. 19, 2024, 5:02 p.m. UTC | #2
On Thu, Dec 19, 2024 at 10:52:30AM +0000, Marc Zyngier wrote:
> On Wed, 18 Dec 2024 23:08:39 +0000,
> Frank Li <Frank.Li@nxp.com> wrote:
> >
> >            ┌────────────────────────────────┐
> >            │                                │
> >            │     PCI Endpoint Controller    │
> >            │                                │
> >            │   ┌─────┐  ┌─────┐     ┌─────┐ │
> > PCI Bus    │   │     │  │     │     │     │ │
> > ─────────► │   │Func1│  │Func2│ ... │Func │ │
> > Doorbell   │   │     │  │     │     │<n>  │ │
> >            │   │     │  │     │     │     │ │
> >            │   └──┬──┘  └──┬──┘     └──┬──┘ │
> >            │      │        │           │    │
> >            └──────┼────────┼───────────┼────┘
> >                   │        │           │
> >                   ▼        ▼           ▼
> >                ┌────────────────────────┐
> >                │    MSI Controller      │
> >                └────────────────────────┘
> >
> > Add domain DOMAIN_BUS_DEVICE_PCI_EP_MSI to allocate MSI domain for Endpoint
> > function in PCI Endpoint (EP) controller, So PCI Root Complex (RC) can
> > write MSI message to MSI controller to trigger doorbell IRQ for difference
> > EP functions.
> >
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> > change from v12 to v13
> > - new patch
>
> This might be v13, but after all this time, I have no idea what you
> are trying to do. You keep pasting this non-ASCII drawing in commit
> messages, but I still have no idea what this PCI Bus Doorbell
> represents.

PCI Bus/Doorbell is two words. Basic over picture is a PCI EP devices (such
as imx95), which run linux and PCI Endpoint framework. i.MX95 connect to
PCI Host, such as PC (x86).

i.MX95 can use standard PCI MSI framework to issue a irq to X86. but there
are not reverse direction. X86 try write some MMIO register ( mapped PCI
bar0). But i.MX95 don't know it have been modified. So currently solution
is create a polling thread to check every 10ms.

So this patches try resolve this problem at the platform, which have MSI
controller such as ITS.

after this patches, i.MX95 can create a PCI Bar1, which map to MSI
controller register space,  when X86 write data to Bar1 (call as doorbell),
a irq will be triggered at i.MX95.

Doorbell in diagram means 'push doorbell' (write data to bar<n>).

>
> I appreciate the knowledge shortage is on my end, but it would
> definitely help if someone would take the time to explain what this is
> all about.

I am not sure if diagram in corver letter can help this, or above
descriptions is enough.


┌────────────┐   ┌───────────────────────────────────┐   ┌────────────────┐
│            │   │                                   │   │                │
│            │   │ PCI Endpoint                      │   │ PCI Host       │
│            │   │                                   │   │                │
│            │◄──┤ 1.platform_msi_domain_alloc_irqs()│   │                │
│            │   │                                   │   │                │
│ MSI        ├──►│ 2.write_msi_msg()                 ├──►├─BAR<n>         │
│ Controller │   │   update doorbell register address│   │                │
│            │   │   for BAR                         │   │                │
│            │   │                                   │   │ 3. Write BAR<n>│
│            │◄──┼───────────────────────────────────┼───┤                │
│            │   │                                   │   │                │
│            ├──►│ 4.Irq Handle                      │   │                │
│            │   │                                   │   │                │
│            │   │                                   │   │                │
└────────────┘   └───────────────────────────────────┘   └────────────────┘
(* some detail have been changed and don't affect understand overall
picture)

>
> From what I gather, the ITS is actually on an end-point, and get
> writes from the host, but that doesn't answer much.

Yes, baisc it is correct. PCI RC -> PCIe Bus TLP -> PCI Endpoint Ctrl ->
AXI transaction -> ITS MMIO map register -> CPU IRQ.

The major problem how to distingiush from difference PCI Endpoint function
driver. There are have many EP functions as much as 8, which quite similar
standard PCI, one PCIe device can have 8 physical functions.

>
> > ---
> >  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> > index b2a4b67545b82..16e7d53f0b133 100644
> > --- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> > +++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
> > @@ -5,6 +5,7 @@
> >  // Copyright (C) 2022 Intel
> >
> >  #include <linux/acpi_iort.h>
> > +#include <linux/pci-ep-msi.h>
> >  #include <linux/pci.h>
> >
> >  #include "irq-gic-common.h"
> > @@ -173,6 +174,19 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
> >  	return its_pmsi_prepare_devid(domain, dev, nvec, info, dev_id);
> >  }
> >
> > +static int its_pci_ep_msi_prepare(struct irq_domain *domain, struct device *dev,
> > +				  int nvec, msi_alloc_info_t *info)
> > +{
> > +	u32 dev_id;
> > +	int ret;
> > +
> > +	ret = pci_epf_msi_domain_get_msi_rid(dev, &dev_id);
>
> What this doesn't express is *how* are the writes conveyed to the ITS.
> Specifically, the DevID is normally sampled as sideband information at
> during the write transaction.

Like PCI host, there msi-map in dts file, which descript how map PCI RID
to DevID, such as
	msi-map = <0 $its 0x80 8>;

This informtion should be descripted in DTS or ACPI ...

>
> Obviously, you can't do that over PCI. So there is a lot of
> undisclosed assumption about how the ITS is integrated, and how it
> samples the DevID.

Yes, it should be platform PCI endpoint ctrl driver jobs.  Platform EP
driver should implement this type of covert. Such as i.MX95, there are
hardware call LUT in PCI ctrl,  which can convert PCI' request ID to DevID
here.

On going patch may help understand these
https://lore.kernel.org/linux-pci/20241210-imx95_lut-v8-0-2e730b2e5fde@nxp.com/

If use latest ITS MSI64 should be simple, only need descript it at DTS
(I have not hardware to test this case yet).
pci-ep {
	...
	msi-map = <0 &its, 0x<8_0000, 0xff>;
			      ^, ctrl ID.
	msi-mask = <0xff>;
	...
}

>
> My conclusion is that this is not as generic as it seems to be. It is
> definitely tied to implementation-specific behaviours, none of which
> are explained.

Compared to standard PCI MSI, which also have implementation-specific
behaviours, which convert PCI request ID to DevID Or stream ID.
https://lore.kernel.org/linux-pci/20241210-imx95_lut-v8-0-2e730b2e5fde@nxp.com/
(I have struggle this for almost one year for this implementation-specific
part)

Well defined and mature PCI standard, MSI still need two parts, common part
and "implementation-specific" part.

Common part of standard PCI is at several place, such its driver/msi
libary/ kernel msi code ...

"implementation-specific" part is in PCI host bridge driver, such as
drivers/pci/controller/dwc/pcie-qcom.c

This solution already test by Tested-by: Niklas Cassel <cassel@kernel.org>
who use another dwc controller, which they already implemented
"implementation-specific" by only update dts to provide hardware
information.(I guest he use ITS's MSI64)

Because it is new patches, I have not added Niklas's test-by tag. There
are not big functional change since Nikas test. The major change is make
msi part better align current MSI framework according to Thomas's
suggestion.

Frank
>
> Thanks,
>
> 	M.
>
> --
> Without deviation from the norm, progress is not possible.
Niklas Cassel Dec. 19, 2024, 8:10 p.m. UTC | #3
On Thu, Dec 19, 2024 at 12:02:02PM -0500, Frank Li wrote:
> On Thu, Dec 19, 2024 at 10:52:30AM +0000, Marc Zyngier wrote:
> > On Wed, 18 Dec 2024 23:08:39 +0000,

[...]

> If use latest ITS MSI64 should be simple, only need descript it at DTS
> (I have not hardware to test this case yet).
> pci-ep {
> 	...
> 	msi-map = <0 &its, 0x<8_0000, 0xff>;
> 			      ^, ctrl ID.
> 	msi-mask = <0xff>;
> 	...
> }

[...]

> This solution already test by Tested-by: Niklas Cassel <cassel@kernel.org>
> who use another dwc controller, which they already implemented
> "implementation-specific" by only update dts to provide hardware
> information.(I guest he use ITS's MSI64)
> 
> Because it is new patches, I have not added Niklas's test-by tag. There
> are not big functional change since Nikas test. The major change is make
> msi part better align current MSI framework according to Thomas's
> suggestion.

Frank, I tested this series (a few revisions back) on the rockchip rk3588,
which just like imx95, uses a DWC based PCIe EP controller, and ARM GIC ITS,
but unlike imx95, it does not require any additional look up table registers
to be configured.


While the rk3588 PCIe host controller node has:
msi-map = <0x0000 &its1 0x0000 0x1000>;
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588-extra.dtsi?h=v6.13-rc3#n164

The rk3588 PCIe endpoint controller node, which is the only one relevant
in this case, only has:
msi-parent = <&its1 0x0000>;
no msi-map.
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/commit/?h=v6.14-armsoc/dts64&id=b6f09f497b07008aa65c31341138cecafa78222c


Kind regards,
Niklas
Frank Li Dec. 19, 2024, 8:17 p.m. UTC | #4
On Thu, Dec 19, 2024 at 09:10:16PM +0100, Niklas Cassel wrote:
> On Thu, Dec 19, 2024 at 12:02:02PM -0500, Frank Li wrote:
> > On Thu, Dec 19, 2024 at 10:52:30AM +0000, Marc Zyngier wrote:
> > > On Wed, 18 Dec 2024 23:08:39 +0000,
>
> [...]
>
> > If use latest ITS MSI64 should be simple, only need descript it at DTS
> > (I have not hardware to test this case yet).
> > pci-ep {
> > 	...
> > 	msi-map = <0 &its, 0x<8_0000, 0xff>;
> > 			      ^, ctrl ID.
> > 	msi-mask = <0xff>;
> > 	...
> > }
>
> [...]
>
> > This solution already test by Tested-by: Niklas Cassel <cassel@kernel.org>
> > who use another dwc controller, which they already implemented
> > "implementation-specific" by only update dts to provide hardware
> > information.(I guest he use ITS's MSI64)
> >
> > Because it is new patches, I have not added Niklas's test-by tag. There
> > are not big functional change since Nikas test. The major change is make
> > msi part better align current MSI framework according to Thomas's
> > suggestion.
>
> Frank, I tested this series (a few revisions back) on the rockchip rk3588,
> which just like imx95, uses a DWC based PCIe EP controller, and ARM GIC ITS,
> but unlike imx95, it does not require any additional look up table registers
> to be configured.
>
>
> While the rk3588 PCIe host controller node has:
> msi-map = <0x0000 &its1 0x0000 0x1000>;
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588-extra.dtsi?h=v6.13-rc3#n164
>
> The rk3588 PCIe endpoint controller node, which is the only one relevant
> in this case, only has:
> msi-parent = <&its1 0x0000>;
> no msi-map.
> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/commit/?h=v6.14-armsoc/dts64&id=b6f09f497b07008aa65c31341138cecafa78222c

Thank you very much. I update msi part, so endpoint controller node align
host controller node.

It should be
msi-map = <0x0000 &its1 0x0000 0x1000>;

So if your hardware support multi physical function, your can create more
than one pci_test func. Previous version only support one EP func.

Frank

>
>
> Kind regards,
> Niklas
Niklas Cassel Dec. 19, 2024, 8:43 p.m. UTC | #5
On Thu, Dec 19, 2024 at 03:17:15PM -0500, Frank Li wrote:
> 
> Thank you very much. I update msi part, so endpoint controller node align
> host controller node.
> 
> It should be
> msi-map = <0x0000 &its1 0x0000 0x1000>;
> 
> So if your hardware support multi physical function, your can create more
> than one pci_test func. Previous version only support one EP func.

I see. That seems like an improvement.
I will need to ask Rockchip maintainer to drop my msi-parent patch for PCIe
EP node then. (Which is currently queued up in for-6.14)

However, for the PCIe host node, rk3588 has:
iommu-map = <0x0000 &mmu600_pcie 0x0000 0x1000>;

For the PCIe endpoint node, rk3588 has:
iommus = <&mmu600_pcie 0x0000>;

https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/commit/?h=v6.14-armsoc/dts64&id=da92d3dfc871e821a1bface3ba5afcf8cda19805

Is it fine that for the PCIe EP node, we specify iommu mapping using:
iommus = <&mmu600_pcie 0x0000>;
but the ITS/MSI map will be:
msi-map = <0x0000 &its1 0x0000 0x1000>;

isn't this a bit inconsistent?

The physical function is the "F" in the BDF.
Does this mean that:
iommus = <&mmu600_pcie 0x0000>;
the IOMMU will not be able to distinguish different PCI physical functions
from the same PCI device? So two different physical functions on the same
PCI device share the same IOMMU mappings?


Kind regards,
Niklas
Frank Li Dec. 19, 2024, 8:53 p.m. UTC | #6
On Thu, Dec 19, 2024 at 09:43:15PM +0100, Niklas Cassel wrote:
> On Thu, Dec 19, 2024 at 03:17:15PM -0500, Frank Li wrote:
> >
> > Thank you very much. I update msi part, so endpoint controller node align
> > host controller node.
> >
> > It should be
> > msi-map = <0x0000 &its1 0x0000 0x1000>;
> >
> > So if your hardware support multi physical function, your can create more
> > than one pci_test func. Previous version only support one EP func.
>
> I see. That seems like an improvement.
> I will need to ask Rockchip maintainer to drop my msi-parent patch for PCIe
> EP node then. (Which is currently queued up in for-6.14)
>
> However, for the PCIe host node, rk3588 has:
> iommu-map = <0x0000 &mmu600_pcie 0x0000 0x1000>;
>
> For the PCIe endpoint node, rk3588 has:
> iommus = <&mmu600_pcie 0x0000>;
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/commit/?h=v6.14-armsoc/dts64&id=da92d3dfc871e821a1bface3ba5afcf8cda19805
>
> Is it fine that for the PCIe EP node, we specify iommu mapping using:
> iommus = <&mmu600_pcie 0x0000>;
> but the ITS/MSI map will be:
> msi-map = <0x0000 &its1 0x0000 0x1000>;

For doorbell feature, it should be fine.

>
> isn't this a bit inconsistent?

Ideally, iommus need do similar map.

>
> The physical function is the "F" in the BDF.
> Does this mean that:
> iommus = <&mmu600_pcie 0x0000>;
> the IOMMU will not be able to distinguish different PCI physical functions
> from the same PCI device?

You are right. All physical functions will share one IOMMU space.

> So two different physical functions on the same
> PCI device share the same IOMMU mappings?

Yes,
Function should be okay. Only miss isolation protection. func1 and access
func2's dma memory address. At most system it should be fine.

Frank

>
>
> Kind regards,
> Niklas
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index b2a4b67545b82..16e7d53f0b133 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -5,6 +5,7 @@ 
 // Copyright (C) 2022 Intel
 
 #include <linux/acpi_iort.h>
+#include <linux/pci-ep-msi.h>
 #include <linux/pci.h>
 
 #include "irq-gic-common.h"
@@ -173,6 +174,19 @@  static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 	return its_pmsi_prepare_devid(domain, dev, nvec, info, dev_id);
 }
 
+static int its_pci_ep_msi_prepare(struct irq_domain *domain, struct device *dev,
+				  int nvec, msi_alloc_info_t *info)
+{
+	u32 dev_id;
+	int ret;
+
+	ret = pci_epf_msi_domain_get_msi_rid(dev, &dev_id);
+	if (ret)
+		return ret;
+
+	return its_pmsi_prepare_devid(domain, dev, nvec, info, dev_id);
+}
+
 static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
@@ -205,6 +219,9 @@  static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 */
 		info->ops->msi_prepare = its_pmsi_prepare;
 		break;
+	case DOMAIN_BUS_DEVICE_PCI_EP_MSI:
+		info->ops->msi_prepare = its_pci_ep_msi_prepare;
+		break;
 	default:
 		/* Confused. How did the lib return true? */
 		WARN_ON_ONCE(1);
@@ -218,7 +235,7 @@  const struct msi_parent_ops gic_v3_its_msi_parent_ops = {
 	.supported_flags	= ITS_MSI_FLAGS_SUPPORTED,
 	.required_flags		= ITS_MSI_FLAGS_REQUIRED,
 	.bus_select_token	= DOMAIN_BUS_NEXUS,
-	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_MSI,
+	.bus_select_mask	= MATCH_PCI_MSI | MATCH_PLATFORM_MSI | MATCH_PLATFORM_PCI_EP_MSI,
 	.prefix			= "ITS-",
 	.init_dev_msi_info	= its_init_dev_msi_info,
 };