diff mbox series

[v6,4/4] pcie: endpoint: pci-epf-vntb: add endpoint MSI support

Message ID 20220818151127.2449064-5-Frank.Li@nxp.com (mailing list archive)
State Superseded
Headers show
Series [v6,1/4] irqchip: allow pass down .pm field at IRQCHIP_PLATFORM_DRIVER_END | expand

Commit Message

Frank Li Aug. 18, 2022, 3:11 p.m. UTC
┌───────┐          ┌──────────┐
                        │       │          │          │
      ┌─────────────┐   │       │          │ PCI Host │
      │ MSI         │◄┐ │       │          │          │
      │ Controller  │ │ │       │          │          │
      └─────────────┘ └─┼───────┼──────────┼─BAR0     │
                        │ PCI   │          │ BAR1     │
                        │ Func  │          │ BAR2     │
                        │       │          │ BAR3     │
                        │       │          │ BAR4     │
                        │       ├─────────►│          │
                        └───────┘          └──────────┘

Linux supports endpoint functions. PCI Host write BAR<n> space like write
to memory. The EP side can't know memory changed by the host driver.

PCI Spec has not defined a standard method to do that. Only define MSI(x)
to let EP notified RC status change.

The basic idea is to trigger an IRQ when PCI RC writes to a memory
address. That's what MSI controller provided. EP drivers just need to
request a platform MSI interrupt, struct msi_msg *msg will pass down a
memory address and data. EP driver will map such memory address to one of
PCI BAR<n>.  Host just writes such an address to trigger EP side irq.

Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
have MSI controller. Setup doorbell address according to struct msi_msg.

So PCIe host can write this doorbell address to triger EP side's irq.

If no MSI controller exist, fall back to software polling.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++---
 1 file changed, 112 insertions(+), 22 deletions(-)

Comments

Bjorn Helgaas Aug. 18, 2022, 9:30 p.m. UTC | #1
Please pay attention to the style of previous subject lines and copy
them:

  $ git log --oneline drivers/pci/endpoint/functions/pci-epf-vntb.c | cat
  b8c0aa9b16bb NTB: EPF: Tidy up some bounds checks
  3305f43cb6a8 NTB: EPF: Fix error code in epf_ntb_bind()
  ae9f38adac26 PCI: endpoint: pci-epf-vntb: reduce several globals to statics
  8e4bfbe644a6 PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
  7b14a5e96128 NTB: EPF: set pointer addr to null using NULL rather than 0
  e35f56bb0330 PCI: endpoint: Support NTB transfer between RC and EP

Nobody has paid much attention to consistency here in the past, but we
will in the future.

Maybe "PCI: endpoint: Add NTB MSI support" or similar?

On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
>                         ┌───────┐          ┌──────────┐
>                         │       │          │          │
>       ┌─────────────┐   │       │          │ PCI Host │
>       │ MSI         │◄┐ │       │          │          │
>       │ Controller  │ │ │       │          │          │
>       └─────────────┘ └─┼───────┼──────────┼─BAR0     │
>                         │ PCI   │          │ BAR1     │
>                         │ Func  │          │ BAR2     │
>                         │       │          │ BAR3     │
>                         │       │          │ BAR4     │
>                         │       ├─────────►│          │
>                         └───────┘          └──────────┘
> 
> Linux supports endpoint functions. PCI Host write BAR<n> space like write
> to memory. The EP side can't know memory changed by the host driver.

The diagram is pretty but I don't quite understand what this is
telling me.  I assume "PCI Func" is the "EP side"?  If so, label it
appropriately.

Is "PCI Host" referring to a host CPU?  I guess not, since you include
BARs in the box.

What are the arrows?  I assume one is an MMIO write to a BAR, since
you mention that below.

> PCI Spec has not defined a standard method to do that. Only define MSI(x)
> to let EP notified RC status change.
> 
> The basic idea is to trigger an IRQ when PCI RC writes to a memory
> address. That's what MSI controller provided. EP drivers just need to
> request a platform MSI interrupt, struct msi_msg *msg will pass down a
> memory address and data. EP driver will map such memory address to one of
> PCI BAR<n>.  Host just writes such an address to trigger EP side irq.

I think "PCI RC writes to memory" and "Host writes such an address"
are referring to the same write.  If so, use the same words both
times, not "PCI RC" once and "host" the other.

s/irq/IRQ/, as you did above.  I don't want to have to figure out
whether "irq" is the same as "IRQ", so spell it the same way all the
time.

> Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> have MSI controller. Setup doorbell address according to struct msi_msg.

s/pci-epf-vntb driver query/Query/
s/have/has an/
s/Setup/Set up/

> So PCIe host can write this doorbell address to triger EP side's irq.

I guess "PCIe host" is something else that means the same as "PCI RC"
or "host" above?  Use consistent terminology.  This doesn't seem like
something specific to PCIe.  If it's not, use "PCI" to be generic or
omit it altogether.

s/triger/trigger/
s/irq/IRQ/ again

> If no MSI controller exist, fall back to software polling.

s/exist/exists/

> @@ -253,7 +256,7 @@ static void epf_ntb_cmd_handler(struct work_struct *work)
>  
>  	ntb = container_of(work, struct epf_ntb, cmd_handler.work);
>  
> -	for (i = 1; i < ntb->db_count; i++) {
> +	for (i = 1; i < ntb->db_count && !ntb->epf_db_phy; i++) {

This loop condition is hard to read.  It would be simpler as:

  if (!ntb->epf_db_phy) {
    for (i = 1; i < ntb->db_count; i++) {
      ...
    }
  }

> @@ -520,35 +543,33 @@ static int epf_ntb_db_bar_init(struct epf_ntb *ntb)
>  	struct device *dev = &ntb->epf->dev;
>  	int ret;
>  	struct pci_epf_bar *epf_bar;
> -	void __iomem *mw_addr;
> +	void __iomem *mw_addr = NULL;
>  	enum pci_barno barno;
> -	size_t size = 4 * ntb->db_count;
> +	size_t size;
>  
>  	epc_features = pci_epc_get_features(ntb->epf->epc,
>  					    ntb->epf->func_no,
>  					    ntb->epf->vfunc_no);
>  	align = epc_features->align;
> -
> -	if (size < 128)
> -		size = 128;
> -
> -	if (align)
> -		size = ALIGN(size, align);
> -	else
> -		size = roundup_pow_of_two(size);
> +	size = epf_ntb_db_size(ntb);
>  
>  	barno = ntb->epf_ntb_bar[BAR_DB];
> +	epf_bar = &ntb->epf->bar[barno];
>  
> -	mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> -	if (!mw_addr) {
> -		dev_err(dev, "Failed to allocate OB address\n");
> -		return -ENOMEM;
> +	if (!ntb->epf_db_phy) {
> +		mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> +		if (!mw_addr) {
> +			dev_err(dev, "Failed to allocate OB address\n");
> +			return -ENOMEM;
> +		}
> +	} else {
> +		epf_bar->phys_addr = ntb->epf_db_phy;
> +		epf_bar->barno = barno;
> +		epf_bar->size = size;

I think inverted tests are hard to read, and setting mw_addr here
instead of at the declaration will make the cases more parallel, so
maybe omit the initialization above and do this:

  if (ntb->epf_db_phy) {
    mw_addr = NULL;
    epf_bar->phys_addr = ntb->epf_db_phy;
    ...
  } else {
    mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
    ...
  }

> +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> +{
> +	struct device *dev = &ntb->epf->dev;
> +	struct irq_domain *domain;
> +	int virq;
> +	int ret;
> +	int i;
> +
> +	domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> +	if (!domain)
> +		return;
> +
> +	dev_set_msi_domain(dev, domain);
> +
> +	if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> +		ntb->db_count,
> +		epf_ntb_write_msi_msg)) {
> +		dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> +		return;
> +	}
> +
> +	dev_info(dev, "vntb use MSI as doorbell\n");
> +
> +	for (i = 0; i < ntb->db_count; i++) {
> +		virq = msi_get_virq(dev, i);
> +		ret = devm_request_irq(dev, virq,
> +			       epf_ntb_interrupt_handler, 0,
> +			       "ntb", ntb);
> +
> +		if (ret)
> +			dev_err(dev, "devm_request_irq() failure\n");

You don't return anything to indicate success or failure.  Does the
caller care if this fails?  A message is only for debugging; it's not
a way to tell the caller anything.

> +
> +		if (!i)
> +			ntb->msi_virqbase = virq;

I don't understand what you're doing here, but it looks weird to set
ntb->msi_virqbase even if devm_request_irq() fails.

> +	}
> +}
Manivannan Sadhasivam Aug. 31, 2022, 10:42 a.m. UTC | #2
On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
>                         ┌───────┐          ┌──────────┐
>                         │       │          │          │
>       ┌─────────────┐   │       │          │ PCI Host │
>       │ MSI         │◄┐ │       │          │          │
>       │ Controller  │ │ │       │          │          │
>       └─────────────┘ └─┼───────┼──────────┼─BAR0     │
>                         │ PCI   │          │ BAR1     │
>                         │ Func  │          │ BAR2     │
>                         │       │          │ BAR3     │
>                         │       │          │ BAR4     │
>                         │       ├─────────►│          │
>                         └───────┘          └──────────┘
> 

This diagram doesn't say which side is host and which one is endpoint.
And not conveying any useful information.

> Linux supports endpoint functions. PCI Host write BAR<n> space like write
> to memory. The EP side can't know memory changed by the host driver.
> 

I think you just say, that there is no defined way of raising IRQs by host
to the endpoint.

> PCI Spec has not defined a standard method to do that. Only define MSI(x)
> to let EP notified RC status change.
> 

MSI is from EP, right? Throughout the driver you should call it as "doorbell"
and not MSI.

> The basic idea is to trigger an IRQ when PCI RC writes to a memory
> address. That's what MSI controller provided. EP drivers just need to
> request a platform MSI interrupt, struct msi_msg *msg will pass down a
> memory address and data. EP driver will map such memory address to one of
> PCI BAR<n>.  Host just writes such an address to trigger EP side irq.
> 

IIUC (by looking at other patches in the series), the memory assigned for BAR
region by the PCI host is mapped to the platform interrupt controller in
PCI Endpoint. Such that, whenever the PCI host writes to the BAR region, it
will trigger an IRQ in the Endpoint.

This kind of setup is available in other platforms like Qualcomm where the
mapping of a register region available in BAR0 and interrupt controller is
done in the hardware itself. So whenever the PCI host writes to that register
in BAR0, an IRQ will be delivered to the endpoint.

> Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> have MSI controller. Setup doorbell address according to struct msi_msg.
> 
> So PCIe host can write this doorbell address to triger EP side's irq.
> 
> If no MSI controller exist, fall back to software polling.
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> ---
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++---
>  1 file changed, 112 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 1466dd1904175..ad4f7ec8a39fc 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -44,6 +44,7 @@
>  #include <linux/pci-epc.h>
>  #include <linux/pci-epf.h>
>  #include <linux/ntb.h>
> +#include <linux/msi.h>
>  
>  static struct workqueue_struct *kpcintb_workqueue;
>  
> @@ -143,6 +144,8 @@ struct epf_ntb {
>  	void __iomem *vpci_mw_addr[MAX_MW];
>  
>  	struct delayed_work cmd_handler;
> +
> +	int msi_virqbase;

db_base?

>  };
>  
>  #define to_epf_ntb(epf_group) container_of((epf_group), struct epf_ntb, group)
> @@ -253,7 +256,7 @@ static void epf_ntb_cmd_handler(struct work_struct *work)
>  
>  	ntb = container_of(work, struct epf_ntb, cmd_handler.work);
>  
> -	for (i = 1; i < ntb->db_count; i++) {
> +	for (i = 1; i < ntb->db_count && !ntb->epf_db_phy; i++) {

epf_db_phy is a wierd name. "phy" usually means the PHY controller (Physical
layer) in kernel. If you are referring to physicall address of the doorbell,
then you could use "phys".

>  		if (readl(ntb->epf_db + i * 4)) {
>  			if (readl(ntb->epf_db + i * 4))
>  				ntb->db |= 1 << (i - 1);
> @@ -454,11 +457,9 @@ static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
>  	ctrl->num_mws = ntb->num_mws;
>  	ntb->spad_size = spad_size;
>  
> -	ctrl->db_entry_size = 4;
> -
>  	for (i = 0; i < ntb->db_count; i++) {
>  		ntb->reg->db_data[i] = 1 + i;
> -		ntb->reg->db_offset[i] = 0;
> +		ntb->reg->db_offset[i] = 4 * i;
>  	}
>  
>  	return 0;
> @@ -509,6 +510,28 @@ static int epf_ntb_configure_interrupt(struct epf_ntb *ntb)
>  	return 0;
>  }
>  
> +static int epf_ntb_db_size(struct epf_ntb *ntb)
> +{
> +	const struct pci_epc_features *epc_features;
> +	size_t size = 4 * ntb->db_count;
> +	u32 align;
> +
> +	epc_features = pci_epc_get_features(ntb->epf->epc,
> +					    ntb->epf->func_no,
> +					    ntb->epf->vfunc_no);
> +	align = epc_features->align;
> +
> +	if (size < 128)
> +		size = 128;
> +
> +	if (align)
> +		size = ALIGN(size, align);
> +	else
> +		size = roundup_pow_of_two(size);
> +
> +	return size;
> +}
> +
>  /**
>   * epf_ntb_db_bar_init() - Configure Doorbell window BARs
>   * @ntb: NTB device that facilitates communication between HOST and vHOST
> @@ -520,35 +543,33 @@ static int epf_ntb_db_bar_init(struct epf_ntb *ntb)
>  	struct device *dev = &ntb->epf->dev;
>  	int ret;
>  	struct pci_epf_bar *epf_bar;
> -	void __iomem *mw_addr;
> +	void __iomem *mw_addr = NULL;
>  	enum pci_barno barno;
> -	size_t size = 4 * ntb->db_count;
> +	size_t size;
>  
>  	epc_features = pci_epc_get_features(ntb->epf->epc,
>  					    ntb->epf->func_no,
>  					    ntb->epf->vfunc_no);
>  	align = epc_features->align;
> -
> -	if (size < 128)
> -		size = 128;
> -
> -	if (align)
> -		size = ALIGN(size, align);
> -	else
> -		size = roundup_pow_of_two(size);
> +	size = epf_ntb_db_size(ntb);
>  
>  	barno = ntb->epf_ntb_bar[BAR_DB];
> +	epf_bar = &ntb->epf->bar[barno];
>  
> -	mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> -	if (!mw_addr) {
> -		dev_err(dev, "Failed to allocate OB address\n");
> -		return -ENOMEM;
> +	if (!ntb->epf_db_phy) {
> +		mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> +		if (!mw_addr) {
> +			dev_err(dev, "Failed to allocate OB address\n");

Expand OB.

> +			return -ENOMEM;
> +		}
> +	} else {
> +		epf_bar->phys_addr = ntb->epf_db_phy;
> +		epf_bar->barno = barno;
> +		epf_bar->size = size;
>  	}
>  
>  	ntb->epf_db = mw_addr;
>  
> -	epf_bar = &ntb->epf->bar[barno];
> -
>  	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no, ntb->epf->vfunc_no, epf_bar);
>  	if (ret) {
>  		dev_err(dev, "Doorbell BAR set failed\n");
> @@ -704,6 +725,74 @@ static int epf_ntb_init_epc_bar(struct epf_ntb *ntb)
>  	return 0;
>  }
>  
> +static void epf_ntb_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> +{
> +	struct epf_ntb *ntb = dev_get_drvdata(desc->dev);
> +	struct epf_ntb_ctrl *reg = ntb->reg;
> +	int size = epf_ntb_db_size(ntb);
> +	u64 addr;
> +
> +	addr = msg->address_hi;
> +	addr <<= 32;
> +	addr |= msg->address_lo;
> +
> +	reg->db_data[desc->msi_index] = msg->data;
> +
> +	if (desc->msi_index == 0)
> +		ntb->epf_db_phy = round_down(addr, size);
> +
> +	reg->db_offset[desc->msi_index] = addr - ntb->epf_db_phy;
> +}
> +
> +static irqreturn_t epf_ntb_interrupt_handler(int irq, void *data)
> +{
> +	struct epf_ntb *ntb = data;
> +	int index;
> +
> +	index = irq - ntb->msi_virqbase;
> +	ntb->db |= 1 << (index - 1);
> +	ntb_db_event(&ntb->ntb, index);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> +{
> +	struct device *dev = &ntb->epf->dev;
> +	struct irq_domain *domain;
> +	int virq;
> +	int ret;
> +	int i;
> +
> +	domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> +	if (!domain)
> +		return;
> +
> +	dev_set_msi_domain(dev, domain);
> +
> +	if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> +		ntb->db_count,
> +		epf_ntb_write_msi_msg)) {
> +		dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> +		return;
> +	}
> +
> +	dev_info(dev, "vntb use MSI as doorbell\n");
> +

Why are you using the interrupt controller as the MSI controller? Why not just
a plain interrupt controller?

> +	for (i = 0; i < ntb->db_count; i++) {
> +		virq = msi_get_virq(dev, i);
> +		ret = devm_request_irq(dev, virq,
> +			       epf_ntb_interrupt_handler, 0,
> +			       "ntb", ntb);

"ntb" as a IRQ name seems quite generic. You might want to prefix it with epf
or vntb...

Thanks,
Mani

> +
> +		if (ret)
> +			dev_err(dev, "devm_request_irq() failure\n");
> +
> +		if (!i)
> +			ntb->msi_virqbase = virq;
> +	}
> +}
> +
>  /**
>   * epf_ntb_epc_init() - Initialize NTB interface
>   * @ntb: NTB device that facilitates communication between HOST and vHOST2
> @@ -1299,14 +1388,15 @@ static int epf_ntb_bind(struct pci_epf *epf)
>  		goto err_bar_alloc;
>  	}
>  
> +	epf_set_drvdata(epf, ntb);
> +	epf_ntb_epc_msi_init(ntb);
> +
>  	ret = epf_ntb_epc_init(ntb);
>  	if (ret) {
>  		dev_err(dev, "Failed to initialize EPC\n");
>  		goto err_bar_alloc;
>  	}
>  
> -	epf_set_drvdata(epf, ntb);
> -
>  	pci_space[0] = (ntb->vntb_pid << 16) | ntb->vntb_vid;
>  	pci_vntb_table[0].vendor = ntb->vntb_vid;
>  	pci_vntb_table[0].device = ntb->vntb_pid;
> -- 
> 2.35.1
>
Frank Li Aug. 31, 2022, 4:19 p.m. UTC | #3
> -----Original Message-----
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Sent: Wednesday, August 31, 2022 5:42 AM
> To: Frank Li <frank.li@nxp.com>
> Cc: maz@kernel.org; tglx@linutronix.de; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; shawnguo@kernel.org;
> s.hauer@pengutronix.de; kw@linux.com; bhelgaas@google.com; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-pci@vger.kernel.org; Peng Fan
> <peng.fan@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> jdmason@kudzu.us; kernel@pengutronix.de; festevam@gmail.com; dl-linux-
> imx <linux-imx@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> ntb@lists.linux.dev; lznuaa@gmail.com
> Subject: [EXT] Re: [PATCH v6 4/4] pcie: endpoint: pci-epf-vntb: add endpoint
> MSI support
> 
> Caution: EXT Email
> 
> On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
> >                         ┌───────┐          ┌──────────┐
> >                         │       │          │          │
> >       ┌─────────────┐   │       │          │ PCI Host │
> >       │ MSI         │◄┐ │       │          │          │
> >       │ Controller  │ │ │       │          │          │
> >       └─────────────┘ └─┼───────┼───
> ───────┼─BAR0     │
> >                         │ PCI   │          │ BAR1     │
> >                         │ Func  │          │ BAR2     │
> >                         │       │          │ BAR3     │
> >                         │       │          │ BAR4     │
> >                         │       ├─────────►│          │
> >                         └───────┘          └──────────┘
> >
> 
> This diagram doesn't say which side is host and which one is endpoint.
> And not conveying any useful information.

[Frank Li] At V2 version, this diagram is in cover letter.  Bjorn suggest move to here.
I think you have good background knowledge.  But it should be helpful for new
People,  who just touch this area. 

I already mark "PCI Func" and "PCI Host".  

> 
> > Linux supports endpoint functions. PCI Host write BAR<n> space like write
> > to memory. The EP side can't know memory changed by the host driver.
> >
> 
> I think you just say, that there is no defined way of raising IRQs by host
> to the endpoint.
> 
> > PCI Spec has not defined a standard method to do that. Only define MSI(x)
> > to let EP notified RC status change.
> >
> 
> MSI is from EP, right? Throughout the driver you should call it as "doorbell"
> and not MSI.

[Frank Li] What's I want said is that PCI standard define MSI(x) to let EP notify RC. 
But there are not standard way for reverse direction.  MSI should be correct here.

> 
> > The basic idea is to trigger an IRQ when PCI RC writes to a memory
> > address. That's what MSI controller provided. EP drivers just need to
> > request a platform MSI interrupt, struct msi_msg *msg will pass down a
> > memory address and data. EP driver will map such memory address to one
> of
> > PCI BAR<n>.  Host just writes such an address to trigger EP side irq.
> >
> 
> IIUC (by looking at other patches in the series), the memory assigned for BAR
> region by the PCI host is mapped to the platform interrupt controller in
> PCI Endpoint. Such that, whenever the PCI host writes to the BAR region, it
> will trigger an IRQ in the Endpoint.
> 
> This kind of setup is available in other platforms like Qualcomm where the
> mapping of a register region available in BAR0 and interrupt controller is
> done in the hardware itself. So whenever the PCI host writes to that register
> in BAR0, an IRQ will be delivered to the endpoint.

[Frank Li] Yes,  not all platform have it. And EP driver have not provide a API
to get register region.  I think platform msi API is pretty good API. 
Many system have GIC ITS,  so EP function driver can use it.  Our test platform
have not ITS yet,  so we added a simple MU-MSI driver to do it. I think qualcomm
platform can use similar method.  So all EP function driver can use common method
to get notification from PCI host.

> 
> > Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> > have MSI controller. Setup doorbell address according to struct msi_msg.
> >
> > So PCIe host can write this doorbell address to triger EP side's irq.
> >
> > If no MSI controller exist, fall back to software polling.
> >
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++---
> >  1 file changed, 112 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 1466dd1904175..ad4f7ec8a39fc 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -44,6 +44,7 @@
> >  #include <linux/pci-epc.h>
> >  #include <linux/pci-epf.h>
> >  #include <linux/ntb.h>
> > +#include <linux/msi.h>
> >
> >  static struct workqueue_struct *kpcintb_workqueue;
> >
> > @@ -143,6 +144,8 @@ struct epf_ntb {
> >       void __iomem *vpci_mw_addr[MAX_MW];
> >
> >       struct delayed_work cmd_handler;
> > +
> > +     int msi_virqbase;
> 
> db_base?
> 
> >  };
> >
> >  #define to_epf_ntb(epf_group) container_of((epf_group), struct epf_ntb,
> group)
> > @@ -253,7 +256,7 @@ static void epf_ntb_cmd_handler(struct
> work_struct *work)
> >
> >       ntb = container_of(work, struct epf_ntb, cmd_handler.work);
> >
> > -     for (i = 1; i < ntb->db_count; i++) {
> > +     for (i = 1; i < ntb->db_count && !ntb->epf_db_phy; i++) {
> 
> epf_db_phy is a wierd name. "phy" usually means the PHY controller (Physical
> layer) in kernel. If you are referring to physicall address of the doorbell,
> then you could use "phys".
> 
> >               if (readl(ntb->epf_db + i * 4)) {
> >                       if (readl(ntb->epf_db + i * 4))
> >                               ntb->db |= 1 << (i - 1);
> > @@ -454,11 +457,9 @@ static int epf_ntb_config_spad_bar_alloc(struct
> epf_ntb *ntb)
> >       ctrl->num_mws = ntb->num_mws;
> >       ntb->spad_size = spad_size;
> >
> > -     ctrl->db_entry_size = 4;
> > -
> >       for (i = 0; i < ntb->db_count; i++) {
> >               ntb->reg->db_data[i] = 1 + i;
> > -             ntb->reg->db_offset[i] = 0;
> > +             ntb->reg->db_offset[i] = 4 * i;
> >       }
> >
> >       return 0;
> > @@ -509,6 +510,28 @@ static int epf_ntb_configure_interrupt(struct
> epf_ntb *ntb)
> >       return 0;
> >  }
> >
> > +static int epf_ntb_db_size(struct epf_ntb *ntb)
> > +{
> > +     const struct pci_epc_features *epc_features;
> > +     size_t size = 4 * ntb->db_count;
> > +     u32 align;
> > +
> > +     epc_features = pci_epc_get_features(ntb->epf->epc,
> > +                                         ntb->epf->func_no,
> > +                                         ntb->epf->vfunc_no);
> > +     align = epc_features->align;
> > +
> > +     if (size < 128)
> > +             size = 128;
> > +
> > +     if (align)
> > +             size = ALIGN(size, align);
> > +     else
> > +             size = roundup_pow_of_two(size);
> > +
> > +     return size;
> > +}
> > +
> >  /**
> >   * epf_ntb_db_bar_init() - Configure Doorbell window BARs
> >   * @ntb: NTB device that facilitates communication between HOST and
> vHOST
> > @@ -520,35 +543,33 @@ static int epf_ntb_db_bar_init(struct epf_ntb
> *ntb)
> >       struct device *dev = &ntb->epf->dev;
> >       int ret;
> >       struct pci_epf_bar *epf_bar;
> > -     void __iomem *mw_addr;
> > +     void __iomem *mw_addr = NULL;
> >       enum pci_barno barno;
> > -     size_t size = 4 * ntb->db_count;
> > +     size_t size;
> >
> >       epc_features = pci_epc_get_features(ntb->epf->epc,
> >                                           ntb->epf->func_no,
> >                                           ntb->epf->vfunc_no);
> >       align = epc_features->align;
> > -
> > -     if (size < 128)
> > -             size = 128;
> > -
> > -     if (align)
> > -             size = ALIGN(size, align);
> > -     else
> > -             size = roundup_pow_of_two(size);
> > +     size = epf_ntb_db_size(ntb);
> >
> >       barno = ntb->epf_ntb_bar[BAR_DB];
> > +     epf_bar = &ntb->epf->bar[barno];
> >
> > -     mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> > -     if (!mw_addr) {
> > -             dev_err(dev, "Failed to allocate OB address\n");
> > -             return -ENOMEM;
> > +     if (!ntb->epf_db_phy) {
> > +             mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> > +             if (!mw_addr) {
> > +                     dev_err(dev, "Failed to allocate OB address\n");
> 
> Expand OB.
> 
> > +                     return -ENOMEM;
> > +             }
> > +     } else {
> > +             epf_bar->phys_addr = ntb->epf_db_phy;
> > +             epf_bar->barno = barno;
> > +             epf_bar->size = size;
> >       }
> >
> >       ntb->epf_db = mw_addr;
> >
> > -     epf_bar = &ntb->epf->bar[barno];
> > -
> >       ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no, ntb->epf-
> >vfunc_no, epf_bar);
> >       if (ret) {
> >               dev_err(dev, "Doorbell BAR set failed\n");
> > @@ -704,6 +725,74 @@ static int epf_ntb_init_epc_bar(struct epf_ntb
> *ntb)
> >       return 0;
> >  }
> >
> > +static void epf_ntb_write_msi_msg(struct msi_desc *desc, struct msi_msg
> *msg)
> > +{
> > +     struct epf_ntb *ntb = dev_get_drvdata(desc->dev);
> > +     struct epf_ntb_ctrl *reg = ntb->reg;
> > +     int size = epf_ntb_db_size(ntb);
> > +     u64 addr;
> > +
> > +     addr = msg->address_hi;
> > +     addr <<= 32;
> > +     addr |= msg->address_lo;
> > +
> > +     reg->db_data[desc->msi_index] = msg->data;
> > +
> > +     if (desc->msi_index == 0)
> > +             ntb->epf_db_phy = round_down(addr, size);
> > +
> > +     reg->db_offset[desc->msi_index] = addr - ntb->epf_db_phy;
> > +}
> > +
> > +static irqreturn_t epf_ntb_interrupt_handler(int irq, void *data)
> > +{
> > +     struct epf_ntb *ntb = data;
> > +     int index;
> > +
> > +     index = irq - ntb->msi_virqbase;
> > +     ntb->db |= 1 << (index - 1);
> > +     ntb_db_event(&ntb->ntb, index);
> > +
> > +     return IRQ_HANDLED;
> > +}
> > +
> > +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> > +{
> > +     struct device *dev = &ntb->epf->dev;
> > +     struct irq_domain *domain;
> > +     int virq;
> > +     int ret;
> > +     int i;
> > +
> > +     domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> > +     if (!domain)
> > +             return;
> > +
> > +     dev_set_msi_domain(dev, domain);
> > +
> > +     if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> > +             ntb->db_count,
> > +             epf_ntb_write_msi_msg)) {
> > +             dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> > +             return;
> > +     }
> > +
> > +     dev_info(dev, "vntb use MSI as doorbell\n");
> > +
> 
> Why are you using the interrupt controller as the MSI controller? Why not
> just
> a plain interrupt controller?

[Frank Li] what's your means?   I think only MSI controller support write memory to trigger irq.

> 
> > +     for (i = 0; i < ntb->db_count; i++) {
> > +             virq = msi_get_virq(dev, i);
> > +             ret = devm_request_irq(dev, virq,
> > +                            epf_ntb_interrupt_handler, 0,
> > +                            "ntb", ntb);
> 
> "ntb" as a IRQ name seems quite generic. You might want to prefix it with
> epf
> or vntb...
> 
> Thanks,
> Mani
> 
> > +
> > +             if (ret)
> > +                     dev_err(dev, "devm_request_irq() failure\n");
> > +
> > +             if (!i)
> > +                     ntb->msi_virqbase = virq;
> > +     }
> > +}
> > +
> >  /**
> >   * epf_ntb_epc_init() - Initialize NTB interface
> >   * @ntb: NTB device that facilitates communication between HOST and
> vHOST2
> > @@ -1299,14 +1388,15 @@ static int epf_ntb_bind(struct pci_epf *epf)
> >               goto err_bar_alloc;
> >       }
> >
> > +     epf_set_drvdata(epf, ntb);
> > +     epf_ntb_epc_msi_init(ntb);
> > +
> >       ret = epf_ntb_epc_init(ntb);
> >       if (ret) {
> >               dev_err(dev, "Failed to initialize EPC\n");
> >               goto err_bar_alloc;
> >       }
> >
> > -     epf_set_drvdata(epf, ntb);
> > -
> >       pci_space[0] = (ntb->vntb_pid << 16) | ntb->vntb_vid;
> >       pci_vntb_table[0].vendor = ntb->vntb_vid;
> >       pci_vntb_table[0].device = ntb->vntb_pid;
> > --
> > 2.35.1
> >
> 
> --
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Sept. 2, 2022, 1:38 a.m. UTC | #4
On Wed, Aug 31, 2022 at 04:19:17PM +0000, Frank Li wrote:
> 
> 
> > -----Original Message-----
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Sent: Wednesday, August 31, 2022 5:42 AM
> > To: Frank Li <frank.li@nxp.com>
> > Cc: maz@kernel.org; tglx@linutronix.de; robh+dt@kernel.org;
> > krzysztof.kozlowski+dt@linaro.org; shawnguo@kernel.org;
> > s.hauer@pengutronix.de; kw@linux.com; bhelgaas@google.com; linux-
> > kernel@vger.kernel.org; devicetree@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-pci@vger.kernel.org; Peng Fan
> > <peng.fan@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> > jdmason@kudzu.us; kernel@pengutronix.de; festevam@gmail.com; dl-linux-
> > imx <linux-imx@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> > ntb@lists.linux.dev; lznuaa@gmail.com
> > Subject: [EXT] Re: [PATCH v6 4/4] pcie: endpoint: pci-epf-vntb: add endpoint
> > MSI support
> > 
> > Caution: EXT Email
> > 
> > On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
> > >                         ┌───────┐          ┌──────────┐
> > >                         │       │          │          │
> > >       ┌─────────────┐   │       │          │ PCI Host │
> > >       │ MSI         │◄┐ │       │          │          │
> > >       │ Controller  │ │ │       │          │          │
> > >       └─────────────┘ └─┼───────┼───
> > ───────┼─BAR0     │
> > >                         │ PCI   │          │ BAR1     │
> > >                         │ Func  │          │ BAR2     │
> > >                         │       │          │ BAR3     │
> > >                         │       │          │ BAR4     │
> > >                         │       ├─────────►│          │
> > >                         └───────┘          └──────────┘
> > >
> > 
> > This diagram doesn't say which side is host and which one is endpoint.
> > And not conveying any useful information.
> 
> [Frank Li] At V2 version, this diagram is in cover letter.  Bjorn suggest move to here.
> I think you have good background knowledge.  But it should be helpful for new
> People,  who just touch this area. 
> 

Having the block diagram always helps but my point is that this diagram doesn't
convey the immediate knowledge that it is supposed to do so. Like there is no
partition between host and endpoint and you did not add any explanation about
it in the below text. So in v2, please incorporate those.

> I already mark "PCI Func" and "PCI Host".  
> 

Sorry, that's not helpful and you need to improve it.

> > 
> > > Linux supports endpoint functions. PCI Host write BAR<n> space like write
> > > to memory. The EP side can't know memory changed by the host driver.
> > >
> > 
> > I think you just say, that there is no defined way of raising IRQs by host
> > to the endpoint.
> > 
> > > PCI Spec has not defined a standard method to do that. Only define MSI(x)
> > > to let EP notified RC status change.
> > >
> > 
> > MSI is from EP, right? Throughout the driver you should call it as "doorbell"
> > and not MSI.
> 
> [Frank Li] What's I want said is that PCI standard define MSI(x) to let EP notify RC. 
> But there are not standard way for reverse direction.  MSI should be correct here.
> 

Right. But also use "MSI/MSI-X" instead of "MSI(x)"

> > 
> > > The basic idea is to trigger an IRQ when PCI RC writes to a memory
> > > address. That's what MSI controller provided. EP drivers just need to
> > > request a platform MSI interrupt, struct msi_msg *msg will pass down a
> > > memory address and data. EP driver will map such memory address to one
> > of
> > > PCI BAR<n>.  Host just writes such an address to trigger EP side irq.
> > >
> > 
> > IIUC (by looking at other patches in the series), the memory assigned for BAR
> > region by the PCI host is mapped to the platform interrupt controller in
> > PCI Endpoint. Such that, whenever the PCI host writes to the BAR region, it
> > will trigger an IRQ in the Endpoint.
> > 
> > This kind of setup is available in other platforms like Qualcomm where the
> > mapping of a register region available in BAR0 and interrupt controller is
> > done in the hardware itself. So whenever the PCI host writes to that register
> > in BAR0, an IRQ will be delivered to the endpoint.
> 
> [Frank Li] Yes,  not all platform have it. And EP driver have not provide a API
> to get register region.  I think platform msi API is pretty good API. 
> Many system have GIC ITS,  so EP function driver can use it.  Our test platform
> have not ITS yet,  so we added a simple MU-MSI driver to do it. I think qualcomm
> platform can use similar method.  So all EP function driver can use common method
> to get notification from PCI host.
> 

What is the common method here? If you want to make this doorbell feature
common across all EPF drivers, then you need to provide EPF APIs.

> > 
> > > Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> > > have MSI controller. Setup doorbell address according to struct msi_msg.
> > >
> > > So PCIe host can write this doorbell address to triger EP side's irq.
> > >
> > > If no MSI controller exist, fall back to software polling.
> > >
> > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > ---
> > >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++---
> > >  1 file changed, 112 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > b/drivers/pci/endpoint/functions/pci-epf-vntb.c

[...]

> > > +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> > > +{
> > > +     struct device *dev = &ntb->epf->dev;
> > > +     struct irq_domain *domain;
> > > +     int virq;
> > > +     int ret;
> > > +     int i;
> > > +
> > > +     domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> > > +     if (!domain)
> > > +             return;
> > > +
> > > +     dev_set_msi_domain(dev, domain);
> > > +
> > > +     if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> > > +             ntb->db_count,
> > > +             epf_ntb_write_msi_msg)) {
> > > +             dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> > > +             return;
> > > +     }
> > > +
> > > +     dev_info(dev, "vntb use MSI as doorbell\n");
> > > +
> > 
> > Why are you using the interrupt controller as the MSI controller? Why not
> > just
> > a plain interrupt controller?
> 
> [Frank Li] what's your means?   I think only MSI controller support write memory to trigger irq.
> 

From EPF driver perspective, only the IRQs need to be requested, right? So why
cannot you expose MU as a generic irqchip driver, instead of a MSI controller? 

Thanks,
Mani
Frank Li Sept. 2, 2022, 1:48 a.m. UTC | #5
> -----Original Message-----
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Sent: Thursday, September 1, 2022 8:39 PM
> To: Frank Li <frank.li@nxp.com>
> Cc: maz@kernel.org; tglx@linutronix.de; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; shawnguo@kernel.org;
> s.hauer@pengutronix.de; kw@linux.com; bhelgaas@google.com; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-pci@vger.kernel.org; Peng Fan
> <peng.fan@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> jdmason@kudzu.us; kernel@pengutronix.de; festevam@gmail.com; dl-linux-
> imx <linux-imx@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> ntb@lists.linux.dev; lznuaa@gmail.com
> Subject: Re: [EXT] Re: [PATCH v6 4/4] pcie: endpoint: pci-epf-vntb: add
> endpoint MSI support
> 
> Caution: EXT Email
> 
> On Wed, Aug 31, 2022 at 04:19:17PM +0000, Frank Li wrote:
> >
> >
> > > -----Original Message-----
> > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > Sent: Wednesday, August 31, 2022 5:42 AM
> > > To: Frank Li <frank.li@nxp.com>
> > > Cc: maz@kernel.org; tglx@linutronix.de; robh+dt@kernel.org;
> > > krzysztof.kozlowski+dt@linaro.org; shawnguo@kernel.org;
> > > s.hauer@pengutronix.de; kw@linux.com; bhelgaas@google.com; linux-
> > > kernel@vger.kernel.org; devicetree@vger.kernel.org; linux-arm-
> > > kernel@lists.infradead.org; linux-pci@vger.kernel.org; Peng Fan
> > > <peng.fan@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> > > jdmason@kudzu.us; kernel@pengutronix.de; festevam@gmail.com; dl-
> linux-
> > > imx <linux-imx@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> > > ntb@lists.linux.dev; lznuaa@gmail.com
> > > Subject: [EXT] Re: [PATCH v6 4/4] pcie: endpoint: pci-epf-vntb: add
> endpoint
> > > MSI support
> > >
> > > Caution: EXT Email
> > >
> > > On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
> > > >                         ┌───────┐          ┌──────────┐
> > > >                         │       │          │          │
> > > >       ┌─────────────┐   │       │          │ PCI Host │
> > > >       │ MSI         │◄┐ │       │          │          │
> > > >       │ Controller  │ │ │       │          │          │
> > > >       └─────────────┘ └─┼───────┼──
> ─
> > > ───────┼─BAR0     │
> > > >                         │ PCI   │          │ BAR1     │
> > > >                         │ Func  │          │ BAR2     │
> > > >                         │       │          │ BAR3     │
> > > >                         │       │          │ BAR4     │
> > > >                         │       ├─────────►│          │
> > > >                         └───────┘          └──────────┘
> > > >
> > >
> > > This diagram doesn't say which side is host and which one is endpoint.
> > > And not conveying any useful information.
> >
> > [Frank Li] At V2 version, this diagram is in cover letter.  Bjorn suggest move
> to here.
> > I think you have good background knowledge.  But it should be helpful for
> new
> > People,  who just touch this area.
> >
> 
> Having the block diagram always helps but my point is that this diagram
> doesn't
> convey the immediate knowledge that it is supposed to do so. Like there is no
> partition between host and endpoint and you did not add any explanation
> about
> it in the below text. So in v2, please incorporate those.
> 
> > I already mark "PCI Func" and "PCI Host".
> >
> 
> Sorry, that's not helpful and you need to improve it.
> 
> > >
> > > > Linux supports endpoint functions. PCI Host write BAR<n> space like
> write
> > > > to memory. The EP side can't know memory changed by the host driver.
> > > >
> > >
> > > I think you just say, that there is no defined way of raising IRQs by host
> > > to the endpoint.
> > >
> > > > PCI Spec has not defined a standard method to do that. Only define
> MSI(x)
> > > > to let EP notified RC status change.
> > > >
> > >
> > > MSI is from EP, right? Throughout the driver you should call it as
> "doorbell"
> > > and not MSI.
> >
> > [Frank Li] What's I want said is that PCI standard define MSI(x) to let EP
> notify RC.
> > But there are not standard way for reverse direction.  MSI should be
> correct here.
> >
> 
> Right. But also use "MSI/MSI-X" instead of "MSI(x)"
> 
> > >
> > > > The basic idea is to trigger an IRQ when PCI RC writes to a memory
> > > > address. That's what MSI controller provided. EP drivers just need to
> > > > request a platform MSI interrupt, struct msi_msg *msg will pass down a
> > > > memory address and data. EP driver will map such memory address to
> one
> > > of
> > > > PCI BAR<n>.  Host just writes such an address to trigger EP side irq.
> > > >
> > >
> > > IIUC (by looking at other patches in the series), the memory assigned for
> BAR
> > > region by the PCI host is mapped to the platform interrupt controller in
> > > PCI Endpoint. Such that, whenever the PCI host writes to the BAR region,
> it
> > > will trigger an IRQ in the Endpoint.
> > >
> > > This kind of setup is available in other platforms like Qualcomm where
> the
> > > mapping of a register region available in BAR0 and interrupt controller is
> > > done in the hardware itself. So whenever the PCI host writes to that
> register
> > > in BAR0, an IRQ will be delivered to the endpoint.
> >
> > [Frank Li] Yes,  not all platform have it. And EP driver have not provide a API
> > to get register region.  I think platform msi API is pretty good API.
> > Many system have GIC ITS,  so EP function driver can use it.  Our test
> platform
> > have not ITS yet,  so we added a simple MU-MSI driver to do it. I think
> qualcomm
> > platform can use similar method.  So all EP function driver can use common
> method
> > to get notification from PCI host.
> >
> 
> What is the common method here? If you want to make this doorbell feature
> common across all EPF drivers, then you need to provide EPF APIs.

[Frank Li] Existed MSI API have matched requirement.  EPF just reused it.
This patch provided demo method to show how to use platform MSI API to make this
Doorbell. 

> 
> > >
> > > > Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> > > > have MSI controller. Setup doorbell address according to struct
> msi_msg.
> > > >
> > > > So PCIe host can write this doorbell address to triger EP side's irq.
> > > >
> > > > If no MSI controller exist, fall back to software polling.
> > > >
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> > > >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++-
> --
> > > >  1 file changed, 112 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > > b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> 
> [...]
> 
> > > > +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> > > > +{
> > > > +     struct device *dev = &ntb->epf->dev;
> > > > +     struct irq_domain *domain;
> > > > +     int virq;
> > > > +     int ret;
> > > > +     int i;
> > > > +
> > > > +     domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> > > > +     if (!domain)
> > > > +             return;
> > > > +
> > > > +     dev_set_msi_domain(dev, domain);
> > > > +
> > > > +     if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> > > > +             ntb->db_count,
> > > > +             epf_ntb_write_msi_msg)) {
> > > > +             dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> > > > +             return;
> > > > +     }
> > > > +
> > > > +     dev_info(dev, "vntb use MSI as doorbell\n");
> > > > +
> > >
> > > Why are you using the interrupt controller as the MSI controller? Why not
> > > just
> > > a plain interrupt controller?
> >
> > [Frank Li] what's your means?   I think only MSI controller support write
> memory to trigger irq.
> >
> 
> From EPF driver perspective, only the IRQs need to be requested, right? So
> why
> cannot you expose MU as a generic irqchip driver, instead of a MSI controller?	

[Frank Li] No.  EPF need two information. 
1. IRQ number. 
2. A physical address, which map such physical address to PCIe Bar, so PCI host can 
Write it and trigger EP side IRQ.

> 
> Thanks,
> Mani
> 
> --
> மணிவண்ணன் சதாசிவம்
Frank Li Sept. 2, 2022, 5:03 p.m. UTC | #6
> -----Original Message-----
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Sent: Wednesday, August 31, 2022 5:42 AM
> To: Frank Li <frank.li@nxp.com>
> Cc: maz@kernel.org; tglx@linutronix.de; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; shawnguo@kernel.org;
> s.hauer@pengutronix.de; kw@linux.com; bhelgaas@google.com; linux-
> kernel@vger.kernel.org; devicetree@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-pci@vger.kernel.org; Peng Fan
> <peng.fan@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> jdmason@kudzu.us; kernel@pengutronix.de; festevam@gmail.com; dl-linux-
> imx <linux-imx@nxp.com>; kishon@ti.com; lorenzo.pieralisi@arm.com;
> ntb@lists.linux.dev; lznuaa@gmail.com
> Subject: [EXT] Re: [PATCH v6 4/4] pcie: endpoint: pci-epf-vntb: add endpoint
> MSI support
> 
> Caution: EXT Email
> 
> On Thu, Aug 18, 2022 at 10:11:27AM -0500, Frank Li wrote:
> >                         ┌───────┐          ┌──────────┐
> >                         │       │          │          │
> >       ┌─────────────┐   │       │          │ PCI Host │
> >       │ MSI         │◄┐ │       │          │          │
> >       │ Controller  │ │ │       │          │          │
> >       └─────────────┘ └─┼───────┼───
> ───────┼─BAR0     │
> >                         │ PCI   │          │ BAR1     │
> >                         │ Func  │          │ BAR2     │
> >                         │       │          │ BAR3     │
> >                         │       │          │ BAR4     │
> >                         │       ├─────────►│          │
> >                         └───────┘          └──────────┘
> >
> 
> This diagram doesn't say which side is host and which one is endpoint.
> And not conveying any useful information.
> 
> > Linux supports endpoint functions. PCI Host write BAR<n> space like write
> > to memory. The EP side can't know memory changed by the host driver.
> >
> 
> I think you just say, that there is no defined way of raising IRQs by host
> to the endpoint.
> 
> > PCI Spec has not defined a standard method to do that. Only define MSI(x)
> > to let EP notified RC status change.
> >
> 
> MSI is from EP, right? Throughout the driver you should call it as "doorbell"
> and not MSI.
> 
> > The basic idea is to trigger an IRQ when PCI RC writes to a memory
> > address. That's what MSI controller provided. EP drivers just need to
> > request a platform MSI interrupt, struct msi_msg *msg will pass down a
> > memory address and data. EP driver will map such memory address to one
> of
> > PCI BAR<n>.  Host just writes such an address to trigger EP side irq.
> >
> 
> IIUC (by looking at other patches in the series), the memory assigned for BAR
> region by the PCI host is mapped to the platform interrupt controller in
> PCI Endpoint. Such that, whenever the PCI host writes to the BAR region, it
> will trigger an IRQ in the Endpoint.
> 
> This kind of setup is available in other platforms like Qualcomm where the
> mapping of a register region available in BAR0 and interrupt controller is
> done in the hardware itself. So whenever the PCI host writes to that register
> in BAR0, an IRQ will be delivered to the endpoint.
> 
> > Add MSI support for pci-epf-vntb. pci-epf-vntb driver query if system
> > have MSI controller. Setup doorbell address according to struct msi_msg.
> >
> > So PCIe host can write this doorbell address to triger EP side's irq.
> >
> > If no MSI controller exist, fall back to software polling.
> >
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 134 +++++++++++++++---
> >  1 file changed, 112 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 1466dd1904175..ad4f7ec8a39fc 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -44,6 +44,7 @@
> >  #include <linux/pci-epc.h>
> >  #include <linux/pci-epf.h>
> >  #include <linux/ntb.h>
> > +#include <linux/msi.h>
> >
> >  static struct workqueue_struct *kpcintb_workqueue;
> >
> > @@ -143,6 +144,8 @@ struct epf_ntb {
> >       void __iomem *vpci_mw_addr[MAX_MW];
> >
> >       struct delayed_work cmd_handler;
> > +
> > +     int msi_virqbase;
> 
> db_base?

[Frank Li] it is not door_bell base address.
Db_base look like it doorbell base address.

Actually, it is virq irq number base.
MSI allocate many irqs, such 4 irq (0 for link info, 1 for queue 0, 2 for queue 1, 3 for queue 2 ....) . 
Irq provided will provide 4 virtual irq number
Such as 100, 101, 102, and 103.

100 is the base and save to msi_virqbase.

We need know which irq by (virq - msi_virqbase).

So I think msi_virqbase is better naming. 


> 
> >  };
> >
> >  #define to_epf_ntb(epf_group) container_of((epf_group), struct epf_ntb,
> group)
> > @@ -253,7 +256,7 @@ static void epf_ntb_cmd_handler(struct
> work_struct *work)
> >
> >       ntb = container_of(work, struct epf_ntb, cmd_handler.work);
> >
> > -     for (i = 1; i < ntb->db_count; i++) {
> > +     for (i = 1; i < ntb->db_count && !ntb->epf_db_phy; i++) {
> 
> epf_db_phy is a wierd name. "phy" usually means the PHY controller
> (Physical
> layer) in kernel. If you are referring to physicall address of the doorbell,
> then you could use "phys".
> 
> >               if (readl(ntb->epf_db + i * 4)) {
> >                       if (readl(ntb->epf_db + i * 4))
> >                               ntb->db |= 1 << (i - 1);
> > @@ -454,11 +457,9 @@ static int epf_ntb_config_spad_bar_alloc(struct
> epf_ntb *ntb)
> >       ctrl->num_mws = ntb->num_mws;
> >       ntb->spad_size = spad_size;
> >
> > -     ctrl->db_entry_size = 4;
> > -
> >       for (i = 0; i < ntb->db_count; i++) {
> >               ntb->reg->db_data[i] = 1 + i;
> > -             ntb->reg->db_offset[i] = 0;
> > +             ntb->reg->db_offset[i] = 4 * i;
> >       }
> >
> >       return 0;
> > @@ -509,6 +510,28 @@ static int epf_ntb_configure_interrupt(struct
> epf_ntb *ntb)
> >       return 0;
> >  }
> >
> > +static int epf_ntb_db_size(struct epf_ntb *ntb)
> > +{
> > +     const struct pci_epc_features *epc_features;
> > +     size_t size = 4 * ntb->db_count;
> > +     u32 align;
> > +
> > +     epc_features = pci_epc_get_features(ntb->epf->epc,
> > +                                         ntb->epf->func_no,
> > +                                         ntb->epf->vfunc_no);
> > +     align = epc_features->align;
> > +
> > +     if (size < 128)
> > +             size = 128;
> > +
> > +     if (align)
> > +             size = ALIGN(size, align);
> > +     else
> > +             size = roundup_pow_of_two(size);
> > +
> > +     return size;
> > +}
> > +
> >  /**
> >   * epf_ntb_db_bar_init() - Configure Doorbell window BARs
> >   * @ntb: NTB device that facilitates communication between HOST and
> vHOST
> > @@ -520,35 +543,33 @@ static int epf_ntb_db_bar_init(struct epf_ntb
> *ntb)
> >       struct device *dev = &ntb->epf->dev;
> >       int ret;
> >       struct pci_epf_bar *epf_bar;
> > -     void __iomem *mw_addr;
> > +     void __iomem *mw_addr = NULL;
> >       enum pci_barno barno;
> > -     size_t size = 4 * ntb->db_count;
> > +     size_t size;
> >
> >       epc_features = pci_epc_get_features(ntb->epf->epc,
> >                                           ntb->epf->func_no,
> >                                           ntb->epf->vfunc_no);
> >       align = epc_features->align;
> > -
> > -     if (size < 128)
> > -             size = 128;
> > -
> > -     if (align)
> > -             size = ALIGN(size, align);
> > -     else
> > -             size = roundup_pow_of_two(size);
> > +     size = epf_ntb_db_size(ntb);
> >
> >       barno = ntb->epf_ntb_bar[BAR_DB];
> > +     epf_bar = &ntb->epf->bar[barno];
> >
> > -     mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> > -     if (!mw_addr) {
> > -             dev_err(dev, "Failed to allocate OB address\n");
> > -             return -ENOMEM;
> > +     if (!ntb->epf_db_phy) {
> > +             mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
> > +             if (!mw_addr) {
> > +                     dev_err(dev, "Failed to allocate OB address\n");
> 
> Expand OB.
> 
> > +                     return -ENOMEM;
> > +             }
> > +     } else {
> > +             epf_bar->phys_addr = ntb->epf_db_phy;
> > +             epf_bar->barno = barno;
> > +             epf_bar->size = size;
> >       }
> >
> >       ntb->epf_db = mw_addr;
> >
> > -     epf_bar = &ntb->epf->bar[barno];
> > -
> >       ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no, ntb->epf-
> >vfunc_no, epf_bar);
> >       if (ret) {
> >               dev_err(dev, "Doorbell BAR set failed\n");
> > @@ -704,6 +725,74 @@ static int epf_ntb_init_epc_bar(struct epf_ntb
> *ntb)
> >       return 0;
> >  }
> >
> > +static void epf_ntb_write_msi_msg(struct msi_desc *desc, struct msi_msg
> *msg)
> > +{
> > +     struct epf_ntb *ntb = dev_get_drvdata(desc->dev);
> > +     struct epf_ntb_ctrl *reg = ntb->reg;
> > +     int size = epf_ntb_db_size(ntb);
> > +     u64 addr;
> > +
> > +     addr = msg->address_hi;
> > +     addr <<= 32;
> > +     addr |= msg->address_lo;
> > +
> > +     reg->db_data[desc->msi_index] = msg->data;
> > +
> > +     if (desc->msi_index == 0)
> > +             ntb->epf_db_phy = round_down(addr, size);
> > +
> > +     reg->db_offset[desc->msi_index] = addr - ntb->epf_db_phy;
> > +}
> > +
> > +static irqreturn_t epf_ntb_interrupt_handler(int irq, void *data)
> > +{
> > +     struct epf_ntb *ntb = data;
> > +     int index;
> > +
> > +     index = irq - ntb->msi_virqbase;
> > +     ntb->db |= 1 << (index - 1);
> > +     ntb_db_event(&ntb->ntb, index);
> > +
> > +     return IRQ_HANDLED;
> > +}
> > +
> > +static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
> > +{
> > +     struct device *dev = &ntb->epf->dev;
> > +     struct irq_domain *domain;
> > +     int virq;
> > +     int ret;
> > +     int i;
> > +
> > +     domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
> > +     if (!domain)
> > +             return;
> > +
> > +     dev_set_msi_domain(dev, domain);
> > +
> > +     if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
> > +             ntb->db_count,
> > +             epf_ntb_write_msi_msg)) {
> > +             dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
> > +             return;
> > +     }
> > +
> > +     dev_info(dev, "vntb use MSI as doorbell\n");
> > +
> 
> Why are you using the interrupt controller as the MSI controller? Why not
> just
> a plain interrupt controller?
> 
> > +     for (i = 0; i < ntb->db_count; i++) {
> > +             virq = msi_get_virq(dev, i);
> > +             ret = devm_request_irq(dev, virq,
> > +                            epf_ntb_interrupt_handler, 0,
> > +                            "ntb", ntb);
> 
> "ntb" as a IRQ name seems quite generic. You might want to prefix it with epf
> or vntb...
> 
> Thanks,
> Mani
> 
> > +
> > +             if (ret)
> > +                     dev_err(dev, "devm_request_irq() failure\n");
> > +
> > +             if (!i)
> > +                     ntb->msi_virqbase = virq;
> > +     }
> > +}
> > +
> >  /**
> >   * epf_ntb_epc_init() - Initialize NTB interface
> >   * @ntb: NTB device that facilitates communication between HOST and
> vHOST2
> > @@ -1299,14 +1388,15 @@ static int epf_ntb_bind(struct pci_epf *epf)
> >               goto err_bar_alloc;
> >       }
> >
> > +     epf_set_drvdata(epf, ntb);
> > +     epf_ntb_epc_msi_init(ntb);
> > +
> >       ret = epf_ntb_epc_init(ntb);
> >       if (ret) {
> >               dev_err(dev, "Failed to initialize EPC\n");
> >               goto err_bar_alloc;
> >       }
> >
> > -     epf_set_drvdata(epf, ntb);
> > -
> >       pci_space[0] = (ntb->vntb_pid << 16) | ntb->vntb_vid;
> >       pci_vntb_table[0].vendor = ntb->vntb_vid;
> >       pci_vntb_table[0].device = ntb->vntb_pid;
> > --
> > 2.35.1
> >
> 
> --
> மணிவண்ணன் சதாசிவம்
diff mbox series

Patch

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 1466dd1904175..ad4f7ec8a39fc 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -44,6 +44,7 @@ 
 #include <linux/pci-epc.h>
 #include <linux/pci-epf.h>
 #include <linux/ntb.h>
+#include <linux/msi.h>
 
 static struct workqueue_struct *kpcintb_workqueue;
 
@@ -143,6 +144,8 @@  struct epf_ntb {
 	void __iomem *vpci_mw_addr[MAX_MW];
 
 	struct delayed_work cmd_handler;
+
+	int msi_virqbase;
 };
 
 #define to_epf_ntb(epf_group) container_of((epf_group), struct epf_ntb, group)
@@ -253,7 +256,7 @@  static void epf_ntb_cmd_handler(struct work_struct *work)
 
 	ntb = container_of(work, struct epf_ntb, cmd_handler.work);
 
-	for (i = 1; i < ntb->db_count; i++) {
+	for (i = 1; i < ntb->db_count && !ntb->epf_db_phy; i++) {
 		if (readl(ntb->epf_db + i * 4)) {
 			if (readl(ntb->epf_db + i * 4))
 				ntb->db |= 1 << (i - 1);
@@ -454,11 +457,9 @@  static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
 	ctrl->num_mws = ntb->num_mws;
 	ntb->spad_size = spad_size;
 
-	ctrl->db_entry_size = 4;
-
 	for (i = 0; i < ntb->db_count; i++) {
 		ntb->reg->db_data[i] = 1 + i;
-		ntb->reg->db_offset[i] = 0;
+		ntb->reg->db_offset[i] = 4 * i;
 	}
 
 	return 0;
@@ -509,6 +510,28 @@  static int epf_ntb_configure_interrupt(struct epf_ntb *ntb)
 	return 0;
 }
 
+static int epf_ntb_db_size(struct epf_ntb *ntb)
+{
+	const struct pci_epc_features *epc_features;
+	size_t size = 4 * ntb->db_count;
+	u32 align;
+
+	epc_features = pci_epc_get_features(ntb->epf->epc,
+					    ntb->epf->func_no,
+					    ntb->epf->vfunc_no);
+	align = epc_features->align;
+
+	if (size < 128)
+		size = 128;
+
+	if (align)
+		size = ALIGN(size, align);
+	else
+		size = roundup_pow_of_two(size);
+
+	return size;
+}
+
 /**
  * epf_ntb_db_bar_init() - Configure Doorbell window BARs
  * @ntb: NTB device that facilitates communication between HOST and vHOST
@@ -520,35 +543,33 @@  static int epf_ntb_db_bar_init(struct epf_ntb *ntb)
 	struct device *dev = &ntb->epf->dev;
 	int ret;
 	struct pci_epf_bar *epf_bar;
-	void __iomem *mw_addr;
+	void __iomem *mw_addr = NULL;
 	enum pci_barno barno;
-	size_t size = 4 * ntb->db_count;
+	size_t size;
 
 	epc_features = pci_epc_get_features(ntb->epf->epc,
 					    ntb->epf->func_no,
 					    ntb->epf->vfunc_no);
 	align = epc_features->align;
-
-	if (size < 128)
-		size = 128;
-
-	if (align)
-		size = ALIGN(size, align);
-	else
-		size = roundup_pow_of_two(size);
+	size = epf_ntb_db_size(ntb);
 
 	barno = ntb->epf_ntb_bar[BAR_DB];
+	epf_bar = &ntb->epf->bar[barno];
 
-	mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
-	if (!mw_addr) {
-		dev_err(dev, "Failed to allocate OB address\n");
-		return -ENOMEM;
+	if (!ntb->epf_db_phy) {
+		mw_addr = pci_epf_alloc_space(ntb->epf, size, barno, align, 0);
+		if (!mw_addr) {
+			dev_err(dev, "Failed to allocate OB address\n");
+			return -ENOMEM;
+		}
+	} else {
+		epf_bar->phys_addr = ntb->epf_db_phy;
+		epf_bar->barno = barno;
+		epf_bar->size = size;
 	}
 
 	ntb->epf_db = mw_addr;
 
-	epf_bar = &ntb->epf->bar[barno];
-
 	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no, ntb->epf->vfunc_no, epf_bar);
 	if (ret) {
 		dev_err(dev, "Doorbell BAR set failed\n");
@@ -704,6 +725,74 @@  static int epf_ntb_init_epc_bar(struct epf_ntb *ntb)
 	return 0;
 }
 
+static void epf_ntb_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+	struct epf_ntb *ntb = dev_get_drvdata(desc->dev);
+	struct epf_ntb_ctrl *reg = ntb->reg;
+	int size = epf_ntb_db_size(ntb);
+	u64 addr;
+
+	addr = msg->address_hi;
+	addr <<= 32;
+	addr |= msg->address_lo;
+
+	reg->db_data[desc->msi_index] = msg->data;
+
+	if (desc->msi_index == 0)
+		ntb->epf_db_phy = round_down(addr, size);
+
+	reg->db_offset[desc->msi_index] = addr - ntb->epf_db_phy;
+}
+
+static irqreturn_t epf_ntb_interrupt_handler(int irq, void *data)
+{
+	struct epf_ntb *ntb = data;
+	int index;
+
+	index = irq - ntb->msi_virqbase;
+	ntb->db |= 1 << (index - 1);
+	ntb_db_event(&ntb->ntb, index);
+
+	return IRQ_HANDLED;
+}
+
+static void epf_ntb_epc_msi_init(struct epf_ntb *ntb)
+{
+	struct device *dev = &ntb->epf->dev;
+	struct irq_domain *domain;
+	int virq;
+	int ret;
+	int i;
+
+	domain = dev_get_msi_domain(ntb->epf->epc->dev.parent);
+	if (!domain)
+		return;
+
+	dev_set_msi_domain(dev, domain);
+
+	if (platform_msi_domain_alloc_irqs(&ntb->epf->dev,
+		ntb->db_count,
+		epf_ntb_write_msi_msg)) {
+		dev_info(dev, "Can't allocate MSI, fall back to poll mode\n");
+		return;
+	}
+
+	dev_info(dev, "vntb use MSI as doorbell\n");
+
+	for (i = 0; i < ntb->db_count; i++) {
+		virq = msi_get_virq(dev, i);
+		ret = devm_request_irq(dev, virq,
+			       epf_ntb_interrupt_handler, 0,
+			       "ntb", ntb);
+
+		if (ret)
+			dev_err(dev, "devm_request_irq() failure\n");
+
+		if (!i)
+			ntb->msi_virqbase = virq;
+	}
+}
+
 /**
  * epf_ntb_epc_init() - Initialize NTB interface
  * @ntb: NTB device that facilitates communication between HOST and vHOST2
@@ -1299,14 +1388,15 @@  static int epf_ntb_bind(struct pci_epf *epf)
 		goto err_bar_alloc;
 	}
 
+	epf_set_drvdata(epf, ntb);
+	epf_ntb_epc_msi_init(ntb);
+
 	ret = epf_ntb_epc_init(ntb);
 	if (ret) {
 		dev_err(dev, "Failed to initialize EPC\n");
 		goto err_bar_alloc;
 	}
 
-	epf_set_drvdata(epf, ntb);
-
 	pci_space[0] = (ntb->vntb_pid << 16) | ntb->vntb_vid;
 	pci_vntb_table[0].vendor = ntb->vntb_vid;
 	pci_vntb_table[0].device = ntb->vntb_pid;