diff mbox series

[v5,20/20] PCI: dwc: Add Baikal-T1 PCIe controller support

Message ID 20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series PCI: dwc: Add generic resources and Baikal-T1 support | expand

Commit Message

Serge Semin Aug. 22, 2022, 6:47 p.m. UTC
Baikal-T1 SoC is equipped with DWC PCIe v4.60a host controller. It can be
trained to work up to Gen.3 speed over up to x4 lanes. The host controller
is attached to the DW PCIe 3.0 PCS via the PIPE-4 interface, which in its
turn is connected to the DWC 10G PHY. The whole system is supposed to be
fed up with four clock sources: DBI peripheral clock, AXI application
clocks and external PHY/core reference clock generating the 100MHz signal.
In addition to that the platform provide a way to reset each part of the
controller: sticky/non-sticky bits, host controller core, PIPE interface,
PCS/PHY and Hot/Power reset signal. The driver also provides a way to
handle the GPIO-based PERST# signal.

Note due to the Baikal-T1 MMIO peculiarity we have to implement the DBI
interface accessors which make sure the IO operations are dword-aligned.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>

---

Changelog v2:
- Rename 'syscon' property to 'baikal,bt1-syscon'.

Changelog v3:
- Use the clocks/resets handlers defined in the DW PCIe core descriptor.
  (@Rob)
- Redefine PCI host bridge config space accessors with the generic
  pci_generic_config_read32() and pci_generic_config_write32() methods.
  (@Rob)

Changelog v4:
- Drop PCIBIOS_* macros usage. (@Rob)
- Add "static const" to the dw_pcie_ops and dw_pcie_host_ops structure
  instances. (@Bjorn)
- Rename bt1_pcie_dw_ops to bt1_pcie_ops. (@Bjorn)
- Rename bt1_pcie_ops to bt1_pci_ops. (@Bjorn)
- Use start_link/stop_link suffixes in the corresponding callbacks.
  (@Bjorn)
- Change the get_res() method suffix to being get_resources(). (@Bjorn)
- Change *_{add,del}_dw_port() method to *_{add,del}_port(). (@Bjorn)
- Drop dma_coerce_mask_and_coherent() applied to the PCI host bridge
  kernel device instance. (@Bjorn)
- Add the comment above the dma_set_mask_and_coherent() method usage
  regarding the controller eDMA feature. (@Bjorn)
- Fix the comment above the core reset controls assertion. (@Bjorn)
- Replace delays and timeout numeric literals with macros. (@Bjorn)
---
 drivers/pci/controller/dwc/Kconfig    |   9 +
 drivers/pci/controller/dwc/Makefile   |   1 +
 drivers/pci/controller/dwc/pcie-bt1.c | 653 ++++++++++++++++++++++++++
 3 files changed, 663 insertions(+)
 create mode 100644 drivers/pci/controller/dwc/pcie-bt1.c

Comments

Lorenzo Pieralisi Aug. 29, 2022, 3:28 p.m. UTC | #1
[+Robin, Will - please jump to DMA mask set-up]

On Mon, Aug 22, 2022 at 09:47:01PM +0300, Serge Semin wrote:
> Baikal-T1 SoC is equipped with DWC PCIe v4.60a host controller. It can be
> trained to work up to Gen.3 speed over up to x4 lanes. The host controller
> is attached to the DW PCIe 3.0 PCS via the PIPE-4 interface, which in its
> turn is connected to the DWC 10G PHY. The whole system is supposed to be
> fed up with four clock sources: DBI peripheral clock, AXI application
> clocks and external PHY/core reference clock generating the 100MHz signal.
> In addition to that the platform provide a way to reset each part of the
> controller: sticky/non-sticky bits, host controller core, PIPE interface,
> PCS/PHY and Hot/Power reset signal. The driver also provides a way to
> handle the GPIO-based PERST# signal.
> 
> Note due to the Baikal-T1 MMIO peculiarity we have to implement the DBI
> interface accessors which make sure the IO operations are dword-aligned.
> 
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> 
> ---
> 
> Changelog v2:
> - Rename 'syscon' property to 'baikal,bt1-syscon'.
> 
> Changelog v3:
> - Use the clocks/resets handlers defined in the DW PCIe core descriptor.
>   (@Rob)
> - Redefine PCI host bridge config space accessors with the generic
>   pci_generic_config_read32() and pci_generic_config_write32() methods.
>   (@Rob)
> 
> Changelog v4:
> - Drop PCIBIOS_* macros usage. (@Rob)
> - Add "static const" to the dw_pcie_ops and dw_pcie_host_ops structure
>   instances. (@Bjorn)
> - Rename bt1_pcie_dw_ops to bt1_pcie_ops. (@Bjorn)
> - Rename bt1_pcie_ops to bt1_pci_ops. (@Bjorn)
> - Use start_link/stop_link suffixes in the corresponding callbacks.
>   (@Bjorn)
> - Change the get_res() method suffix to being get_resources(). (@Bjorn)
> - Change *_{add,del}_dw_port() method to *_{add,del}_port(). (@Bjorn)
> - Drop dma_coerce_mask_and_coherent() applied to the PCI host bridge
>   kernel device instance. (@Bjorn)
> - Add the comment above the dma_set_mask_and_coherent() method usage
>   regarding the controller eDMA feature. (@Bjorn)
> - Fix the comment above the core reset controls assertion. (@Bjorn)
> - Replace delays and timeout numeric literals with macros. (@Bjorn)
> ---
>  drivers/pci/controller/dwc/Kconfig    |   9 +
>  drivers/pci/controller/dwc/Makefile   |   1 +
>  drivers/pci/controller/dwc/pcie-bt1.c | 653 ++++++++++++++++++++++++++
>  3 files changed, 663 insertions(+)
>  create mode 100644 drivers/pci/controller/dwc/pcie-bt1.c
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> index 62ce3abf0f19..771b8b146623 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -222,6 +222,15 @@ config PCIE_ARTPEC6_EP
>  	  Enables support for the PCIe controller in the ARTPEC-6 SoC to work in
>  	  endpoint mode. This uses the DesignWare core.
>  
> +config PCIE_BT1
> +	tristate "Baikal-T1 PCIe controller"
> +	depends on MIPS_BAIKAL_T1 || COMPILE_TEST
> +	depends on PCI_MSI_IRQ_DOMAIN
> +	select PCIE_DW_HOST
> +	help
> +	  Enables support for the PCIe controller in the Baikal-T1 SoC to work
> +	  in host mode. It's based on the Synopsys DWC PCIe v4.60a IP-core.
> +
>  config PCIE_ROCKCHIP_DW_HOST
>  	bool "Rockchip DesignWare PCIe controller"
>  	select PCIE_DW
> diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile
> index 8ba7b67f5e50..bf5c311875a1 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
>  obj-$(CONFIG_PCIE_DW_HOST) += pcie-designware-host.o
>  obj-$(CONFIG_PCIE_DW_EP) += pcie-designware-ep.o
>  obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
> +obj-$(CONFIG_PCIE_BT1) += pcie-bt1.o
>  obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
>  obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCIE_FU740) += pcie-fu740.o
> diff --git a/drivers/pci/controller/dwc/pcie-bt1.c b/drivers/pci/controller/dwc/pcie-bt1.c
> new file mode 100644
> index 000000000000..86b230575ddc
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pcie-bt1.c
> @@ -0,0 +1,653 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2021 BAIKAL ELECTRONICS, JSC
> + *
> + * Authors:
> + *   Vadim Vlasov <Vadim.Vlasov@baikalelectronics.ru>
> + *   Serge Semin <Sergey.Semin@baikalelectronics.ru>
> + *
> + * Baikal-T1 PCIe controller driver
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/bits.h>
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/gpio/consumer.h>
> +#include <linux/kernel.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/platform_device.h>
> +#include <linux/regmap.h>
> +#include <linux/reset.h>
> +#include <linux/types.h>
> +
> +#include "pcie-designware.h"
> +
> +/* Baikal-T1 System CCU control registers */
> +#define BT1_CCU_PCIE_CLKC			0x140
> +#define BT1_CCU_PCIE_REQ_PCS_CLK		BIT(16)
> +#define BT1_CCU_PCIE_REQ_MAC_CLK		BIT(17)
> +#define BT1_CCU_PCIE_REQ_PIPE_CLK		BIT(18)
> +
> +#define BT1_CCU_PCIE_RSTC			0x144
> +#define BT1_CCU_PCIE_REQ_LINK_RST		BIT(13)
> +#define BT1_CCU_PCIE_REQ_SMLH_RST		BIT(14)
> +#define BT1_CCU_PCIE_REQ_PHY_RST		BIT(16)
> +#define BT1_CCU_PCIE_REQ_CORE_RST		BIT(24)
> +#define BT1_CCU_PCIE_REQ_STICKY_RST		BIT(26)
> +#define BT1_CCU_PCIE_REQ_NSTICKY_RST		BIT(27)
> +
> +#define BT1_CCU_PCIE_PMSC			0x148
> +#define BT1_CCU_PCIE_LTSSM_STATE_MASK		GENMASK(5, 0)
> +#define BT1_CCU_PCIE_LTSSM_DET_QUIET		0x00
> +#define BT1_CCU_PCIE_LTSSM_DET_ACT		0x01
> +#define BT1_CCU_PCIE_LTSSM_POLL_ACT		0x02
> +#define BT1_CCU_PCIE_LTSSM_POLL_COMP		0x03
> +#define BT1_CCU_PCIE_LTSSM_POLL_CONF		0x04
> +#define BT1_CCU_PCIE_LTSSM_PRE_DET_QUIET	0x05
> +#define BT1_CCU_PCIE_LTSSM_DET_WAIT		0x06
> +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_START	0x07
> +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_ACEPT	0x08
> +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_WAIT	0x09
> +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_ACEPT	0x0a
> +#define BT1_CCU_PCIE_LTSSM_CFG_COMPLETE		0x0b
> +#define BT1_CCU_PCIE_LTSSM_CFG_IDLE		0x0c
> +#define BT1_CCU_PCIE_LTSSM_RCVR_LOCK		0x0d
> +#define BT1_CCU_PCIE_LTSSM_RCVR_SPEED		0x0e
> +#define BT1_CCU_PCIE_LTSSM_RCVR_RCVRCFG		0x0f
> +#define BT1_CCU_PCIE_LTSSM_RCVR_IDLE		0x10
> +#define BT1_CCU_PCIE_LTSSM_L0			0x11
> +#define BT1_CCU_PCIE_LTSSM_L0S			0x12
> +#define BT1_CCU_PCIE_LTSSM_L123_SEND_IDLE	0x13
> +#define BT1_CCU_PCIE_LTSSM_L1_IDLE		0x14
> +#define BT1_CCU_PCIE_LTSSM_L2_IDLE		0x15
> +#define BT1_CCU_PCIE_LTSSM_L2_WAKE		0x16
> +#define BT1_CCU_PCIE_LTSSM_DIS_ENTRY		0x17
> +#define BT1_CCU_PCIE_LTSSM_DIS_IDLE		0x18
> +#define BT1_CCU_PCIE_LTSSM_DISABLE		0x19
> +#define BT1_CCU_PCIE_LTSSM_LPBK_ENTRY		0x1a
> +#define BT1_CCU_PCIE_LTSSM_LPBK_ACTIVE		0x1b
> +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT		0x1c
> +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT_TOUT	0x1d
> +#define BT1_CCU_PCIE_LTSSM_HOT_RST_ENTRY	0x1e
> +#define BT1_CCU_PCIE_LTSSM_HOT_RST		0x1f
> +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ0		0x20
> +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ1		0x21
> +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ2		0x22
> +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ3		0x23

You could make this an enum and define only the states
that are actually used.

[...]

> +
> +/* Generic Baikal-T1 PCIe interface resources */
> +#define BT1_PCIE_NUM_APP_CLKS			ARRAY_SIZE(bt1_pcie_app_clks)
> +#define BT1_PCIE_NUM_CORE_CLKS			ARRAY_SIZE(bt1_pcie_core_clks)
> +#define BT1_PCIE_NUM_APP_RSTS			ARRAY_SIZE(bt1_pcie_app_rsts)
> +#define BT1_PCIE_NUM_CORE_RSTS			ARRAY_SIZE(bt1_pcie_core_rsts)
> +
> +/* PCIe bus setup delays and timeouts */
> +#define BT1_PCIE_RST_DELAY_MS			100
> +#define BT1_PCIE_RUN_DELAY_US			100
> +#define BT1_PCIE_REQ_DELAY_US			1
> +#define BT1_PCIE_REQ_TIMEOUT_US			1000
> +#define BT1_PCIE_LNK_DELAY_US			1000
> +#define BT1_PCIE_LNK_TIMEOUT_US			1000000
> +
> +static const enum dw_pcie_app_clk bt1_pcie_app_clks[] = {
> +	DW_PCIE_DBI_CLK, DW_PCIE_MSTR_CLK, DW_PCIE_SLV_CLK,
> +};
> +
> +static const enum dw_pcie_core_clk bt1_pcie_core_clks[] = {
> +	DW_PCIE_REF_CLK,
> +};
> +
> +static const enum dw_pcie_app_rst bt1_pcie_app_rsts[] = {
> +	DW_PCIE_MSTR_RST, DW_PCIE_SLV_RST,
> +};
> +
> +static const enum dw_pcie_core_rst bt1_pcie_core_rsts[] = {
> +	DW_PCIE_NON_STICKY_RST, DW_PCIE_STICKY_RST, DW_PCIE_CORE_RST,
> +	DW_PCIE_PIPE_RST, DW_PCIE_PHY_RST, DW_PCIE_HOT_RST, DW_PCIE_PWR_RST,
> +};

I wonder whether we could allocate the rst/clks in DWC dynamically,
by using these configuration arrays.

> +
> +struct bt1_pcie {
> +	struct dw_pcie dw;
> +	struct platform_device *pdev;
> +	struct regmap *sys_regs;
> +};
> +#define to_bt1_pcie(_dw) container_of(_dw, struct bt1_pcie, dw)
> +
> +/*
> + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> + * instructions. Note the methods are optimized to have the dword operations
> + * performed with minimum overhead as the most frequently used ones.
> + */
> +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> +{
> +	unsigned int ofs = (uintptr_t)addr & 0x3;
> +
> +	if (!IS_ALIGNED((uintptr_t)addr, size))
> +		return -EINVAL;
> +
> +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;

Is it always safe to read more than requested ?

> +	if (size == 4) {
> +		return 0;
> +	} else if (size == 2) {
> +		*val &= 0xffff;
> +		return 0;
> +	} else if (size == 1) {
> +		*val &= 0xff;
> +		return 0;
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> +{
> +	unsigned int ofs = (uintptr_t)addr & 0x3;
> +	u32 tmp, mask;
> +
> +	if (!IS_ALIGNED((uintptr_t)addr, size))
> +		return -EINVAL;
> +
> +	if (size == 4) {
> +		writel(val, addr);
> +		return 0;
> +	} else if (size == 2 || size == 1) {
> +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> +		writel(tmp, addr - ofs);
> +		return 0;
> +	}

Same question read/modify/write, is it always safe to do it
regardless of size ?

> +
> +	return -EINVAL;
> +}
> +
> +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> +			     size_t size)
> +{
> +	int ret;
> +	u32 val;
> +
> +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> +	if (ret) {
> +		dev_err(pci->dev, "Read DBI address failed\n");
> +		return ~0U;

Is this a special magic value the DWC core is expecting ?

Does it clash with a _valid_ return value ?

> +	}
> +
> +	return val;
> +}
> +
> +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> +			       size_t size, u32 val)
> +{
> +	int ret;
> +
> +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> +	if (ret)
> +		dev_err(pci->dev, "Write DBI address failed\n");
> +}
> +
> +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> +				size_t size, u32 val)
> +{
> +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> +	int ret;
> +
> +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> +
> +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> +	if (ret)
> +		dev_err(pci->dev, "Write DBI2 address failed\n");
> +
> +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> +			   BT1_CCU_PCIE_DBI2_MODE, 0);

IIUC the regmap_update_bits() set up decoding for DBI2. Hopefully the
DBI/DBI2 writes are sequentialized, this is a question valid also
for other DWC controllers.

What I want to say is, the regmap update in this function sets the
DWC HW in a way that can decode DBI2 (please correct me if I am wrong),
between the two _update_bits() no DBI access should happen because
it just would not work.

It is a question.

> +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> +	int ret;
> +
> +	ret = bt1_pcie_get_resources(btpci);
> +	if (ret)
> +		return ret;
> +
> +	bt1_pcie_full_stop_bus(btpci, true);
> +
> +	return bt1_pcie_cold_start_bus(btpci);
> +}
> +
> +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> +
> +	bt1_pcie_full_stop_bus(btpci, false);
> +}
> +
> +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> +	.host_init = bt1_pcie_host_init,
> +	.host_deinit = bt1_pcie_host_deinit,
> +};
> +
> +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> +{
> +	struct bt1_pcie *btpci;
> +
> +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> +	if (!btpci)
> +		return ERR_PTR(-ENOMEM);
> +
> +	btpci->pdev = pdev;
> +
> +	platform_set_drvdata(pdev, btpci);
> +
> +	return btpci;
> +}
> +
> +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> +{
> +	struct device *dev = &btpci->pdev->dev;
> +	int ret;
> +
> +	/*
> +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> +	 * working with the 64-bit memory addresses.
> +	 */
> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> +	if (ret)
> +		return ret;

Is this the right place to set the DMA mask for the host controller
embedded DMA controller (actually, the dev pointer is the _host_
controller device) ?

How this is going to play when combined with:

https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com

It is getting a bit confusing. I believe the code in the link
above sets the mask so that through the DMA API we are capable
of getting an MSI doorbell virtual address whose physical address
can be addressed by the endpoint; this through the DMA API.

This patch is setting the DMA mask for a different reason, namely
setting the host controller embedded DMA controller addressing
capabilities.

AFAICS - both approaches set the mask for the same device - now
the question is about which one is legitimate and how to handle
the other.

> +
> +	btpci->dw.version = DW_PCIE_VER_460A;
> +	btpci->dw.dev = dev;
> +	btpci->dw.ops = &bt1_pcie_ops;
> +
> +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> +
> +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> +
> +	ret = dw_pcie_host_init(&btpci->dw.pp);
> +	if (ret)
> +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> +
> +	return ret;
> +}
> +
> +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> +{
> +	dw_pcie_host_deinit(&btpci->dw.pp);
> +}
> +
> +static int bt1_pcie_probe(struct platform_device *pdev)
> +{
> +	struct bt1_pcie *btpci;
> +
> +	btpci = bt1_pcie_create_data(pdev);

Do we really need a function for that ? I am not too
bothered but I think it is overkill.

Thanks,
Lorenzo

> +	if (IS_ERR(btpci))
> +		return PTR_ERR(btpci);
> +
> +	return bt1_pcie_add_port(btpci);
> +}
> +
> +static int bt1_pcie_remove(struct platform_device *pdev)
> +{
> +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> +
> +	bt1_pcie_del_port(btpci);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id bt1_pcie_of_match[] = {
> +	{ .compatible = "baikal,bt1-pcie" },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> +
> +static struct platform_driver bt1_pcie_driver = {
> +	.probe = bt1_pcie_probe,
> +	.remove = bt1_pcie_remove,
> +	.driver = {
> +		.name	= "bt1-pcie",
> +		.of_match_table = bt1_pcie_of_match,
> +	},
> +};
> +module_platform_driver(bt1_pcie_driver);
> +
> +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> +MODULE_LICENSE("GPL");
> -- 
> 2.35.1
>
William McVicker Aug. 29, 2022, 5:32 p.m. UTC | #2
On 08/29/2022, Lorenzo Pieralisi wrote:
> [+Robin, Will - please jump to DMA mask set-up]
> 
> On Mon, Aug 22, 2022 at 09:47:01PM +0300, Serge Semin wrote:
> > Baikal-T1 SoC is equipped with DWC PCIe v4.60a host controller. It can be
> > trained to work up to Gen.3 speed over up to x4 lanes. The host controller
> > is attached to the DW PCIe 3.0 PCS via the PIPE-4 interface, which in its
> > turn is connected to the DWC 10G PHY. The whole system is supposed to be
> > fed up with four clock sources: DBI peripheral clock, AXI application
> > clocks and external PHY/core reference clock generating the 100MHz signal.
> > In addition to that the platform provide a way to reset each part of the
> > controller: sticky/non-sticky bits, host controller core, PIPE interface,
> > PCS/PHY and Hot/Power reset signal. The driver also provides a way to
> > handle the GPIO-based PERST# signal.
> > 
> > Note due to the Baikal-T1 MMIO peculiarity we have to implement the DBI
> > interface accessors which make sure the IO operations are dword-aligned.
> > 
> > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > 
> > ---
> > 
> > Changelog v2:
> > - Rename 'syscon' property to 'baikal,bt1-syscon'.
> > 
> > Changelog v3:
> > - Use the clocks/resets handlers defined in the DW PCIe core descriptor.
> >   (@Rob)
> > - Redefine PCI host bridge config space accessors with the generic
> >   pci_generic_config_read32() and pci_generic_config_write32() methods.
> >   (@Rob)
> > 
> > Changelog v4:
> > - Drop PCIBIOS_* macros usage. (@Rob)
> > - Add "static const" to the dw_pcie_ops and dw_pcie_host_ops structure
> >   instances. (@Bjorn)
> > - Rename bt1_pcie_dw_ops to bt1_pcie_ops. (@Bjorn)
> > - Rename bt1_pcie_ops to bt1_pci_ops. (@Bjorn)
> > - Use start_link/stop_link suffixes in the corresponding callbacks.
> >   (@Bjorn)
> > - Change the get_res() method suffix to being get_resources(). (@Bjorn)
> > - Change *_{add,del}_dw_port() method to *_{add,del}_port(). (@Bjorn)
> > - Drop dma_coerce_mask_and_coherent() applied to the PCI host bridge
> >   kernel device instance. (@Bjorn)
> > - Add the comment above the dma_set_mask_and_coherent() method usage
> >   regarding the controller eDMA feature. (@Bjorn)
> > - Fix the comment above the core reset controls assertion. (@Bjorn)
> > - Replace delays and timeout numeric literals with macros. (@Bjorn)
> > ---
> >  drivers/pci/controller/dwc/Kconfig    |   9 +
> >  drivers/pci/controller/dwc/Makefile   |   1 +
> >  drivers/pci/controller/dwc/pcie-bt1.c | 653 ++++++++++++++++++++++++++
> >  3 files changed, 663 insertions(+)
> >  create mode 100644 drivers/pci/controller/dwc/pcie-bt1.c
> > 
> > diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> > index 62ce3abf0f19..771b8b146623 100644
> > --- a/drivers/pci/controller/dwc/Kconfig
> > +++ b/drivers/pci/controller/dwc/Kconfig
> > @@ -222,6 +222,15 @@ config PCIE_ARTPEC6_EP
> >  	  Enables support for the PCIe controller in the ARTPEC-6 SoC to work in
> >  	  endpoint mode. This uses the DesignWare core.
> >  
> > +config PCIE_BT1
> > +	tristate "Baikal-T1 PCIe controller"
> > +	depends on MIPS_BAIKAL_T1 || COMPILE_TEST
> > +	depends on PCI_MSI_IRQ_DOMAIN
> > +	select PCIE_DW_HOST
> > +	help
> > +	  Enables support for the PCIe controller in the Baikal-T1 SoC to work
> > +	  in host mode. It's based on the Synopsys DWC PCIe v4.60a IP-core.
> > +
> >  config PCIE_ROCKCHIP_DW_HOST
> >  	bool "Rockchip DesignWare PCIe controller"
> >  	select PCIE_DW
> > diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile
> > index 8ba7b67f5e50..bf5c311875a1 100644
> > --- a/drivers/pci/controller/dwc/Makefile
> > +++ b/drivers/pci/controller/dwc/Makefile
> > @@ -3,6 +3,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
> >  obj-$(CONFIG_PCIE_DW_HOST) += pcie-designware-host.o
> >  obj-$(CONFIG_PCIE_DW_EP) += pcie-designware-ep.o
> >  obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
> > +obj-$(CONFIG_PCIE_BT1) += pcie-bt1.o
> >  obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
> >  obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
> >  obj-$(CONFIG_PCIE_FU740) += pcie-fu740.o
> > diff --git a/drivers/pci/controller/dwc/pcie-bt1.c b/drivers/pci/controller/dwc/pcie-bt1.c
> > new file mode 100644
> > index 000000000000..86b230575ddc
> > --- /dev/null
> > +++ b/drivers/pci/controller/dwc/pcie-bt1.c
> > @@ -0,0 +1,653 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2021 BAIKAL ELECTRONICS, JSC
> > + *
> > + * Authors:
> > + *   Vadim Vlasov <Vadim.Vlasov@baikalelectronics.ru>
> > + *   Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > + *
> > + * Baikal-T1 PCIe controller driver
> > + */
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/bits.h>
> > +#include <linux/clk.h>
> > +#include <linux/delay.h>
> > +#include <linux/gpio/consumer.h>
> > +#include <linux/kernel.h>
> > +#include <linux/mfd/syscon.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/regmap.h>
> > +#include <linux/reset.h>
> > +#include <linux/types.h>
> > +
> > +#include "pcie-designware.h"
> > +
> > +/* Baikal-T1 System CCU control registers */
> > +#define BT1_CCU_PCIE_CLKC			0x140
> > +#define BT1_CCU_PCIE_REQ_PCS_CLK		BIT(16)
> > +#define BT1_CCU_PCIE_REQ_MAC_CLK		BIT(17)
> > +#define BT1_CCU_PCIE_REQ_PIPE_CLK		BIT(18)
> > +
> > +#define BT1_CCU_PCIE_RSTC			0x144
> > +#define BT1_CCU_PCIE_REQ_LINK_RST		BIT(13)
> > +#define BT1_CCU_PCIE_REQ_SMLH_RST		BIT(14)
> > +#define BT1_CCU_PCIE_REQ_PHY_RST		BIT(16)
> > +#define BT1_CCU_PCIE_REQ_CORE_RST		BIT(24)
> > +#define BT1_CCU_PCIE_REQ_STICKY_RST		BIT(26)
> > +#define BT1_CCU_PCIE_REQ_NSTICKY_RST		BIT(27)
> > +
> > +#define BT1_CCU_PCIE_PMSC			0x148
> > +#define BT1_CCU_PCIE_LTSSM_STATE_MASK		GENMASK(5, 0)
> > +#define BT1_CCU_PCIE_LTSSM_DET_QUIET		0x00
> > +#define BT1_CCU_PCIE_LTSSM_DET_ACT		0x01
> > +#define BT1_CCU_PCIE_LTSSM_POLL_ACT		0x02
> > +#define BT1_CCU_PCIE_LTSSM_POLL_COMP		0x03
> > +#define BT1_CCU_PCIE_LTSSM_POLL_CONF		0x04
> > +#define BT1_CCU_PCIE_LTSSM_PRE_DET_QUIET	0x05
> > +#define BT1_CCU_PCIE_LTSSM_DET_WAIT		0x06
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_START	0x07
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_ACEPT	0x08
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_WAIT	0x09
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_ACEPT	0x0a
> > +#define BT1_CCU_PCIE_LTSSM_CFG_COMPLETE		0x0b
> > +#define BT1_CCU_PCIE_LTSSM_CFG_IDLE		0x0c
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_LOCK		0x0d
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_SPEED		0x0e
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_RCVRCFG		0x0f
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_IDLE		0x10
> > +#define BT1_CCU_PCIE_LTSSM_L0			0x11
> > +#define BT1_CCU_PCIE_LTSSM_L0S			0x12
> > +#define BT1_CCU_PCIE_LTSSM_L123_SEND_IDLE	0x13
> > +#define BT1_CCU_PCIE_LTSSM_L1_IDLE		0x14
> > +#define BT1_CCU_PCIE_LTSSM_L2_IDLE		0x15
> > +#define BT1_CCU_PCIE_LTSSM_L2_WAKE		0x16
> > +#define BT1_CCU_PCIE_LTSSM_DIS_ENTRY		0x17
> > +#define BT1_CCU_PCIE_LTSSM_DIS_IDLE		0x18
> > +#define BT1_CCU_PCIE_LTSSM_DISABLE		0x19
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_ENTRY		0x1a
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_ACTIVE		0x1b
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT		0x1c
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT_TOUT	0x1d
> > +#define BT1_CCU_PCIE_LTSSM_HOT_RST_ENTRY	0x1e
> > +#define BT1_CCU_PCIE_LTSSM_HOT_RST		0x1f
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ0		0x20
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ1		0x21
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ2		0x22
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ3		0x23
> 
> You could make this an enum and define only the states
> that are actually used.
> 
> [...]
> 
> > +
> > +/* Generic Baikal-T1 PCIe interface resources */
> > +#define BT1_PCIE_NUM_APP_CLKS			ARRAY_SIZE(bt1_pcie_app_clks)
> > +#define BT1_PCIE_NUM_CORE_CLKS			ARRAY_SIZE(bt1_pcie_core_clks)
> > +#define BT1_PCIE_NUM_APP_RSTS			ARRAY_SIZE(bt1_pcie_app_rsts)
> > +#define BT1_PCIE_NUM_CORE_RSTS			ARRAY_SIZE(bt1_pcie_core_rsts)
> > +
> > +/* PCIe bus setup delays and timeouts */
> > +#define BT1_PCIE_RST_DELAY_MS			100
> > +#define BT1_PCIE_RUN_DELAY_US			100
> > +#define BT1_PCIE_REQ_DELAY_US			1
> > +#define BT1_PCIE_REQ_TIMEOUT_US			1000
> > +#define BT1_PCIE_LNK_DELAY_US			1000
> > +#define BT1_PCIE_LNK_TIMEOUT_US			1000000
> > +
> > +static const enum dw_pcie_app_clk bt1_pcie_app_clks[] = {
> > +	DW_PCIE_DBI_CLK, DW_PCIE_MSTR_CLK, DW_PCIE_SLV_CLK,
> > +};
> > +
> > +static const enum dw_pcie_core_clk bt1_pcie_core_clks[] = {
> > +	DW_PCIE_REF_CLK,
> > +};
> > +
> > +static const enum dw_pcie_app_rst bt1_pcie_app_rsts[] = {
> > +	DW_PCIE_MSTR_RST, DW_PCIE_SLV_RST,
> > +};
> > +
> > +static const enum dw_pcie_core_rst bt1_pcie_core_rsts[] = {
> > +	DW_PCIE_NON_STICKY_RST, DW_PCIE_STICKY_RST, DW_PCIE_CORE_RST,
> > +	DW_PCIE_PIPE_RST, DW_PCIE_PHY_RST, DW_PCIE_HOT_RST, DW_PCIE_PWR_RST,
> > +};
> 
> I wonder whether we could allocate the rst/clks in DWC dynamically,
> by using these configuration arrays.
> 
> > +
> > +struct bt1_pcie {
> > +	struct dw_pcie dw;
> > +	struct platform_device *pdev;
> > +	struct regmap *sys_regs;
> > +};
> > +#define to_bt1_pcie(_dw) container_of(_dw, struct bt1_pcie, dw)
> > +
> > +/*
> > + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> > + * instructions. Note the methods are optimized to have the dword operations
> > + * performed with minimum overhead as the most frequently used ones.
> > + */
> > +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> > +{
> > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > +
> > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > +		return -EINVAL;
> > +
> > +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
> 
> Is it always safe to read more than requested ?
> 
> > +	if (size == 4) {
> > +		return 0;
> > +	} else if (size == 2) {
> > +		*val &= 0xffff;
> > +		return 0;
> > +	} else if (size == 1) {
> > +		*val &= 0xff;
> > +		return 0;
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> > +{
> > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > +	u32 tmp, mask;
> > +
> > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > +		return -EINVAL;
> > +
> > +	if (size == 4) {
> > +		writel(val, addr);
> > +		return 0;
> > +	} else if (size == 2 || size == 1) {
> > +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> > +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> > +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> > +		writel(tmp, addr - ofs);
> > +		return 0;
> > +	}
> 
> Same question read/modify/write, is it always safe to do it
> regardless of size ?
> 
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +			     size_t size)
> > +{
> > +	int ret;
> > +	u32 val;
> > +
> > +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> > +	if (ret) {
> > +		dev_err(pci->dev, "Read DBI address failed\n");
> > +		return ~0U;
> 
> Is this a special magic value the DWC core is expecting ?
> 
> Does it clash with a _valid_ return value ?
> 
> > +	}
> > +
> > +	return val;
> > +}
> > +
> > +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +			       size_t size, u32 val)
> > +{
> > +	int ret;
> > +
> > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > +	if (ret)
> > +		dev_err(pci->dev, "Write DBI address failed\n");
> > +}
> > +
> > +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +				size_t size, u32 val)
> > +{
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +	int ret;
> > +
> > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> > +
> > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > +	if (ret)
> > +		dev_err(pci->dev, "Write DBI2 address failed\n");
> > +
> > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > +			   BT1_CCU_PCIE_DBI2_MODE, 0);
> 
> IIUC the regmap_update_bits() set up decoding for DBI2. Hopefully the
> DBI/DBI2 writes are sequentialized, this is a question valid also
> for other DWC controllers.
> 
> What I want to say is, the regmap update in this function sets the
> DWC HW in a way that can decode DBI2 (please correct me if I am wrong),
> between the two _update_bits() no DBI access should happen because
> it just would not work.
> 
> It is a question.
> 
> > +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> > +{
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +	int ret;
> > +
> > +	ret = bt1_pcie_get_resources(btpci);
> > +	if (ret)
> > +		return ret;
> > +
> > +	bt1_pcie_full_stop_bus(btpci, true);
> > +
> > +	return bt1_pcie_cold_start_bus(btpci);
> > +}
> > +
> > +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> > +{
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +
> > +	bt1_pcie_full_stop_bus(btpci, false);
> > +}
> > +
> > +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> > +	.host_init = bt1_pcie_host_init,
> > +	.host_deinit = bt1_pcie_host_deinit,
> > +};
> > +
> > +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci;
> > +
> > +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> > +	if (!btpci)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	btpci->pdev = pdev;
> > +
> > +	platform_set_drvdata(pdev, btpci);
> > +
> > +	return btpci;
> > +}
> > +
> > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > +{
> > +	struct device *dev = &btpci->pdev->dev;
> > +	int ret;
> > +
> > +	/*
> > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > +	 * working with the 64-bit memory addresses.
> > +	 */
> > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > +	if (ret)
> > +		return ret;
> 
> Is this the right place to set the DMA mask for the host controller
> embedded DMA controller (actually, the dev pointer is the _host_
> controller device) ?
> 
> How this is going to play when combined with:
> 
> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> 
> It is getting a bit confusing. I believe the code in the link
> above sets the mask so that through the DMA API we are capable
> of getting an MSI doorbell virtual address whose physical address
> can be addressed by the endpoint; this through the DMA API.
> 
> This patch is setting the DMA mask for a different reason, namely
> setting the host controller embedded DMA controller addressing
> capabilities.
> 
> AFAICS - both approaches set the mask for the same device - now
> the question is about which one is legitimate and how to handle
> the other.

Since the mask is being set before calling dw_pcie_host_init() and the
msi_host_init op is not set, setting the mask here won't have any affect due to
the fact that dw_pcie_msi_host_init() unconditionally sets the DMA mask. AFAIK,
you don't technically need a 64-bit address, right? If your platform disables
32-bit allocations, then you'll need to set the MSI capabilities flag to
support a 64-bit allocation for when the 32-bit allocation fails in the host
controller driver.

Here is how I'm doing that with the Exynos PCIe driver during probe before
calling dw_pcie_host_init():

	dw_pcie_dbi_ro_wr_en(exynos_pcie->pci);
	offset = dw_pcie_find_capability(exynos_pcie->pci, PCI_CAP_ID_MSI);
	cap = dw_pcie_readw_dbi(exynos_pcie->pci, offset + PCI_MSI_FLAGS);

	/* Enable MSI with 64-bit support */
	cap |= PCI_MSI_FLAGS_ENABLE | PCI_MSI_FLAGS_64BIT;
	dw_pcie_writew_dbi(exynos_pcie->pci, offset + PCI_MSI_FLAGS, cap);
	dw_pcie_dbi_ro_wr_dis(exynos_pcie->pci);

I hope that helps!

Regards,
Will

> 
> > +
> > +	btpci->dw.version = DW_PCIE_VER_460A;
> > +	btpci->dw.dev = dev;
> > +	btpci->dw.ops = &bt1_pcie_ops;
> > +
> > +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> > +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> > +
> > +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> > +
> > +	ret = dw_pcie_host_init(&btpci->dw.pp);
> > +	if (ret)
> > +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> > +
> > +	return ret;
> > +}
> > +
> > +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> > +{
> > +	dw_pcie_host_deinit(&btpci->dw.pp);
> > +}
> > +
> > +static int bt1_pcie_probe(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci;
> > +
> > +	btpci = bt1_pcie_create_data(pdev);
> 
> Do we really need a function for that ? I am not too
> bothered but I think it is overkill.
> 
> Thanks,
> Lorenzo
> 
> > +	if (IS_ERR(btpci))
> > +		return PTR_ERR(btpci);
> > +
> > +	return bt1_pcie_add_port(btpci);
> > +}
> > +
> > +static int bt1_pcie_remove(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > +
> > +	bt1_pcie_del_port(btpci);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct of_device_id bt1_pcie_of_match[] = {
> > +	{ .compatible = "baikal,bt1-pcie" },
> > +	{},
> > +};
> > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > +
> > +static struct platform_driver bt1_pcie_driver = {
> > +	.probe = bt1_pcie_probe,
> > +	.remove = bt1_pcie_remove,
> > +	.driver = {
> > +		.name	= "bt1-pcie",
> > +		.of_match_table = bt1_pcie_of_match,
> > +	},
> > +};
> > +module_platform_driver(bt1_pcie_driver);
> > +
> > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > +MODULE_LICENSE("GPL");
> > -- 
> > 2.35.1
> >
Robin Murphy Aug. 31, 2022, 8:36 a.m. UTC | #3
On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
[...]
>> +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
>> +{
>> +	struct device *dev = &btpci->pdev->dev;
>> +	int ret;
>> +
>> +	/*
>> +	 * DW PCIe Root Port controller is equipped with eDMA capable of
>> +	 * working with the 64-bit memory addresses.
>> +	 */
>> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
>> +	if (ret)
>> +		return ret;
> 
> Is this the right place to set the DMA mask for the host controller
> embedded DMA controller (actually, the dev pointer is the _host_
> controller device) ?
> 
> How this is going to play when combined with:
> 
> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> 
> It is getting a bit confusing. I believe the code in the link
> above sets the mask so that through the DMA API we are capable
> of getting an MSI doorbell virtual address whose physical address
> can be addressed by the endpoint; this through the DMA API.
> 
> This patch is setting the DMA mask for a different reason, namely
> setting the host controller embedded DMA controller addressing
> capabilities.
> 
> AFAICS - both approaches set the mask for the same device - now
> the question is about which one is legitimate and how to handle
> the other.

Assuming the dw-edma-pcie driver is the relevant one, that already sets 
its own masks on its own device, so I also don't see why this is here.

Thanks,
Robin.
Robin Murphy Aug. 31, 2022, 8:54 a.m. UTC | #4
On 2022-08-31 09:36, Robin Murphy wrote:
> On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
> [...]
>>> +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
>>> +{
>>> +    struct device *dev = &btpci->pdev->dev;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * DW PCIe Root Port controller is equipped with eDMA capable of
>>> +     * working with the 64-bit memory addresses.
>>> +     */
>>> +    ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
>>> +    if (ret)
>>> +        return ret;
>>
>> Is this the right place to set the DMA mask for the host controller
>> embedded DMA controller (actually, the dev pointer is the _host_
>> controller device) ?
>>
>> How this is going to play when combined with:
>>
>> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
>>
>> It is getting a bit confusing. I believe the code in the link
>> above sets the mask so that through the DMA API we are capable
>> of getting an MSI doorbell virtual address whose physical address
>> can be addressed by the endpoint; this through the DMA API.
>>
>> This patch is setting the DMA mask for a different reason, namely
>> setting the host controller embedded DMA controller addressing
>> capabilities.
>>
>> AFAICS - both approaches set the mask for the same device - now
>> the question is about which one is legitimate and how to handle
>> the other.
> 
> Assuming the dw-edma-pcie driver is the relevant one, that already sets 
> its own masks on its own device, so I also don't see why this is here.

Ah, I just found the patch at [1], which further implies that this is 
indeed completely bogus.

Robin.

[1] 
https://lore.kernel.org/dmaengine/20220822185332.26149-23-Sergey.Semin@baikalelectronics.ru/
Serge Semin Sept. 12, 2022, 12:02 a.m. UTC | #5
On Mon, Aug 29, 2022 at 05:28:01PM +0200, Lorenzo Pieralisi wrote:
> [+Robin, Will - please jump to DMA mask set-up]
> 
> On Mon, Aug 22, 2022 at 09:47:01PM +0300, Serge Semin wrote:
> > Baikal-T1 SoC is equipped with DWC PCIe v4.60a host controller. It can be
> > trained to work up to Gen.3 speed over up to x4 lanes. The host controller
> > is attached to the DW PCIe 3.0 PCS via the PIPE-4 interface, which in its
> > turn is connected to the DWC 10G PHY. The whole system is supposed to be
> > fed up with four clock sources: DBI peripheral clock, AXI application
> > clocks and external PHY/core reference clock generating the 100MHz signal.
> > In addition to that the platform provide a way to reset each part of the
> > controller: sticky/non-sticky bits, host controller core, PIPE interface,
> > PCS/PHY and Hot/Power reset signal. The driver also provides a way to
> > handle the GPIO-based PERST# signal.
> > 
> > Note due to the Baikal-T1 MMIO peculiarity we have to implement the DBI
> > interface accessors which make sure the IO operations are dword-aligned.
> > 
> > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > 
> > ---
> > 
> > Changelog v2:
> > - Rename 'syscon' property to 'baikal,bt1-syscon'.
> > 
> > Changelog v3:
> > - Use the clocks/resets handlers defined in the DW PCIe core descriptor.
> >   (@Rob)
> > - Redefine PCI host bridge config space accessors with the generic
> >   pci_generic_config_read32() and pci_generic_config_write32() methods.
> >   (@Rob)
> > 
> > Changelog v4:
> > - Drop PCIBIOS_* macros usage. (@Rob)
> > - Add "static const" to the dw_pcie_ops and dw_pcie_host_ops structure
> >   instances. (@Bjorn)
> > - Rename bt1_pcie_dw_ops to bt1_pcie_ops. (@Bjorn)
> > - Rename bt1_pcie_ops to bt1_pci_ops. (@Bjorn)
> > - Use start_link/stop_link suffixes in the corresponding callbacks.
> >   (@Bjorn)
> > - Change the get_res() method suffix to being get_resources(). (@Bjorn)
> > - Change *_{add,del}_dw_port() method to *_{add,del}_port(). (@Bjorn)
> > - Drop dma_coerce_mask_and_coherent() applied to the PCI host bridge
> >   kernel device instance. (@Bjorn)
> > - Add the comment above the dma_set_mask_and_coherent() method usage
> >   regarding the controller eDMA feature. (@Bjorn)
> > - Fix the comment above the core reset controls assertion. (@Bjorn)
> > - Replace delays and timeout numeric literals with macros. (@Bjorn)
> > ---
> >  drivers/pci/controller/dwc/Kconfig    |   9 +
> >  drivers/pci/controller/dwc/Makefile   |   1 +
> >  drivers/pci/controller/dwc/pcie-bt1.c | 653 ++++++++++++++++++++++++++
> >  3 files changed, 663 insertions(+)
> >  create mode 100644 drivers/pci/controller/dwc/pcie-bt1.c
> > 
> > diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> > index 62ce3abf0f19..771b8b146623 100644
> > --- a/drivers/pci/controller/dwc/Kconfig
> > +++ b/drivers/pci/controller/dwc/Kconfig
> > @@ -222,6 +222,15 @@ config PCIE_ARTPEC6_EP
> >  	  Enables support for the PCIe controller in the ARTPEC-6 SoC to work in
> >  	  endpoint mode. This uses the DesignWare core.
> >  
> > +config PCIE_BT1
> > +	tristate "Baikal-T1 PCIe controller"
> > +	depends on MIPS_BAIKAL_T1 || COMPILE_TEST
> > +	depends on PCI_MSI_IRQ_DOMAIN
> > +	select PCIE_DW_HOST
> > +	help
> > +	  Enables support for the PCIe controller in the Baikal-T1 SoC to work
> > +	  in host mode. It's based on the Synopsys DWC PCIe v4.60a IP-core.
> > +
> >  config PCIE_ROCKCHIP_DW_HOST
> >  	bool "Rockchip DesignWare PCIe controller"
> >  	select PCIE_DW
> > diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile
> > index 8ba7b67f5e50..bf5c311875a1 100644
> > --- a/drivers/pci/controller/dwc/Makefile
> > +++ b/drivers/pci/controller/dwc/Makefile
> > @@ -3,6 +3,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
> >  obj-$(CONFIG_PCIE_DW_HOST) += pcie-designware-host.o
> >  obj-$(CONFIG_PCIE_DW_EP) += pcie-designware-ep.o
> >  obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
> > +obj-$(CONFIG_PCIE_BT1) += pcie-bt1.o
> >  obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
> >  obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
> >  obj-$(CONFIG_PCIE_FU740) += pcie-fu740.o
> > diff --git a/drivers/pci/controller/dwc/pcie-bt1.c b/drivers/pci/controller/dwc/pcie-bt1.c
> > new file mode 100644
> > index 000000000000..86b230575ddc
> > --- /dev/null
> > +++ b/drivers/pci/controller/dwc/pcie-bt1.c
> > @@ -0,0 +1,653 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2021 BAIKAL ELECTRONICS, JSC
> > + *
> > + * Authors:
> > + *   Vadim Vlasov <Vadim.Vlasov@baikalelectronics.ru>
> > + *   Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > + *
> > + * Baikal-T1 PCIe controller driver
> > + */
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/bits.h>
> > +#include <linux/clk.h>
> > +#include <linux/delay.h>
> > +#include <linux/gpio/consumer.h>
> > +#include <linux/kernel.h>
> > +#include <linux/mfd/syscon.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/regmap.h>
> > +#include <linux/reset.h>
> > +#include <linux/types.h>
> > +
> > +#include "pcie-designware.h"
> > +
> > +/* Baikal-T1 System CCU control registers */
> > +#define BT1_CCU_PCIE_CLKC			0x140
> > +#define BT1_CCU_PCIE_REQ_PCS_CLK		BIT(16)
> > +#define BT1_CCU_PCIE_REQ_MAC_CLK		BIT(17)
> > +#define BT1_CCU_PCIE_REQ_PIPE_CLK		BIT(18)
> > +
> > +#define BT1_CCU_PCIE_RSTC			0x144
> > +#define BT1_CCU_PCIE_REQ_LINK_RST		BIT(13)
> > +#define BT1_CCU_PCIE_REQ_SMLH_RST		BIT(14)
> > +#define BT1_CCU_PCIE_REQ_PHY_RST		BIT(16)
> > +#define BT1_CCU_PCIE_REQ_CORE_RST		BIT(24)
> > +#define BT1_CCU_PCIE_REQ_STICKY_RST		BIT(26)
> > +#define BT1_CCU_PCIE_REQ_NSTICKY_RST		BIT(27)
> > +
> > +#define BT1_CCU_PCIE_PMSC			0x148
> > +#define BT1_CCU_PCIE_LTSSM_STATE_MASK		GENMASK(5, 0)
> > +#define BT1_CCU_PCIE_LTSSM_DET_QUIET		0x00
> > +#define BT1_CCU_PCIE_LTSSM_DET_ACT		0x01
> > +#define BT1_CCU_PCIE_LTSSM_POLL_ACT		0x02
> > +#define BT1_CCU_PCIE_LTSSM_POLL_COMP		0x03
> > +#define BT1_CCU_PCIE_LTSSM_POLL_CONF		0x04
> > +#define BT1_CCU_PCIE_LTSSM_PRE_DET_QUIET	0x05
> > +#define BT1_CCU_PCIE_LTSSM_DET_WAIT		0x06
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_START	0x07
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_ACEPT	0x08
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_WAIT	0x09
> > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_ACEPT	0x0a
> > +#define BT1_CCU_PCIE_LTSSM_CFG_COMPLETE		0x0b
> > +#define BT1_CCU_PCIE_LTSSM_CFG_IDLE		0x0c
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_LOCK		0x0d
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_SPEED		0x0e
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_RCVRCFG		0x0f
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_IDLE		0x10
> > +#define BT1_CCU_PCIE_LTSSM_L0			0x11
> > +#define BT1_CCU_PCIE_LTSSM_L0S			0x12
> > +#define BT1_CCU_PCIE_LTSSM_L123_SEND_IDLE	0x13
> > +#define BT1_CCU_PCIE_LTSSM_L1_IDLE		0x14
> > +#define BT1_CCU_PCIE_LTSSM_L2_IDLE		0x15
> > +#define BT1_CCU_PCIE_LTSSM_L2_WAKE		0x16
> > +#define BT1_CCU_PCIE_LTSSM_DIS_ENTRY		0x17
> > +#define BT1_CCU_PCIE_LTSSM_DIS_IDLE		0x18
> > +#define BT1_CCU_PCIE_LTSSM_DISABLE		0x19
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_ENTRY		0x1a
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_ACTIVE		0x1b
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT		0x1c
> > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT_TOUT	0x1d
> > +#define BT1_CCU_PCIE_LTSSM_HOT_RST_ENTRY	0x1e
> > +#define BT1_CCU_PCIE_LTSSM_HOT_RST		0x1f
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ0		0x20
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ1		0x21
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ2		0x22
> > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ3		0x23
> 

> You could make this an enum and define only the states
> that are actually used.

I'd prefer to leave it as is.

> 
> [...]
> 
> > +
> > +/* Generic Baikal-T1 PCIe interface resources */
> > +#define BT1_PCIE_NUM_APP_CLKS			ARRAY_SIZE(bt1_pcie_app_clks)
> > +#define BT1_PCIE_NUM_CORE_CLKS			ARRAY_SIZE(bt1_pcie_core_clks)
> > +#define BT1_PCIE_NUM_APP_RSTS			ARRAY_SIZE(bt1_pcie_app_rsts)
> > +#define BT1_PCIE_NUM_CORE_RSTS			ARRAY_SIZE(bt1_pcie_core_rsts)
> > +
> > +/* PCIe bus setup delays and timeouts */
> > +#define BT1_PCIE_RST_DELAY_MS			100
> > +#define BT1_PCIE_RUN_DELAY_US			100
> > +#define BT1_PCIE_REQ_DELAY_US			1
> > +#define BT1_PCIE_REQ_TIMEOUT_US			1000
> > +#define BT1_PCIE_LNK_DELAY_US			1000
> > +#define BT1_PCIE_LNK_TIMEOUT_US			1000000
> > +
> > +static const enum dw_pcie_app_clk bt1_pcie_app_clks[] = {
> > +	DW_PCIE_DBI_CLK, DW_PCIE_MSTR_CLK, DW_PCIE_SLV_CLK,
> > +};
> > +
> > +static const enum dw_pcie_core_clk bt1_pcie_core_clks[] = {
> > +	DW_PCIE_REF_CLK,
> > +};
> > +
> > +static const enum dw_pcie_app_rst bt1_pcie_app_rsts[] = {
> > +	DW_PCIE_MSTR_RST, DW_PCIE_SLV_RST,
> > +};
> > +
> > +static const enum dw_pcie_core_rst bt1_pcie_core_rsts[] = {
> > +	DW_PCIE_NON_STICKY_RST, DW_PCIE_STICKY_RST, DW_PCIE_CORE_RST,
> > +	DW_PCIE_PIPE_RST, DW_PCIE_PHY_RST, DW_PCIE_HOT_RST, DW_PCIE_PWR_RST,
> > +};
> 

> I wonder whether we could allocate the rst/clks in DWC dynamically,
> by using these configuration arrays.

The resets and clocks are now requested by means of the bulk API in the
dw_pcie_get_resources() method. So they are already dynamically
allocated. What these config arrays are needed for is to make sure the
required resets and clocks have been retrieved and in order to access the
requested handlers.

> 
> > +
> > +struct bt1_pcie {
> > +	struct dw_pcie dw;
> > +	struct platform_device *pdev;
> > +	struct regmap *sys_regs;
> > +};
> > +#define to_bt1_pcie(_dw) container_of(_dw, struct bt1_pcie, dw)
> > +
> > +/*
> > + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> > + * instructions. Note the methods are optimized to have the dword operations
> > + * performed with minimum overhead as the most frequently used ones.
> > + */
> > +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> > +{
> > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > +
> > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > +		return -EINVAL;
> > +
> > +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
> 

> Is it always safe to read more than requested ?

This question is kind of contradicting. No matter whether it's safe or
not we just can't perform the IOs with size other than of the dword
size. Doing otherwise will cause the bus access error.

> 
> > +	if (size == 4) {
> > +		return 0;
> > +	} else if (size == 2) {
> > +		*val &= 0xffff;
> > +		return 0;
> > +	} else if (size == 1) {
> > +		*val &= 0xff;
> > +		return 0;
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> > +{
> > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > +	u32 tmp, mask;
> > +
> > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > +		return -EINVAL;
> > +
> > +	if (size == 4) {
> > +		writel(val, addr);
> > +		return 0;
> > +	} else if (size == 2 || size == 1) {
> > +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> > +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> > +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> > +		writel(tmp, addr - ofs);
> > +		return 0;
> > +	}
> 

> Same question read/modify/write, is it always safe to do it
> regardless of size ?

ditto

> 
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +			     size_t size)
> > +{
> > +	int ret;
> > +	u32 val;
> > +
> > +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> > +	if (ret) {
> > +		dev_err(pci->dev, "Read DBI address failed\n");
> > +		return ~0U;
> 

> Is this a special magic value the DWC core is expecting ?
> 
> Does it clash with a _valid_ return value ?

It's a normal return value if the PCIe IO wasn't successful. In this
particular case there is no actual PCIe-bus IO though, but there are
conditions which can cause the errors. So the error status is still
sanity checked. This part was already commented by Rob here:
https://lore.kernel.org/linux-pci/20220615171045.GD1413880-robh@kernel.org/
my response was:
https://lore.kernel.org/linux-pci/20220619203904.h7q2eb7e4ctsujsk@mobilestation/

> 
> > +	}
> > +
> > +	return val;
> > +}
> > +
> > +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +			       size_t size, u32 val)
> > +{
> > +	int ret;
> > +
> > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > +	if (ret)
> > +		dev_err(pci->dev, "Write DBI address failed\n");
> > +}
> > +
> > +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > +				size_t size, u32 val)
> > +{
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +	int ret;
> > +
> > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> > +
> > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > +	if (ret)
> > +		dev_err(pci->dev, "Write DBI2 address failed\n");
> > +
> > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > +			   BT1_CCU_PCIE_DBI2_MODE, 0);
> 

> IIUC the regmap_update_bits() set up decoding for DBI2.

Right and then switches it back off.

> Hopefully the
> DBI/DBI2 writes are sequentialized, this is a question valid also
> for other DWC controllers.

In general you are right, but not in particular case of the DW PCIe
Root Ports. So the concurrent access to DBI and DBI2 won't cause any
problem.

> 
> What I want to say is, the regmap update in this function sets the
> DWC HW in a way that can decode DBI2 (please correct me if I am wrong),

Right.

> between the two _update_bits() no DBI access should happen because
> it just would not work.

No. Because in case of the DW PCIe Root Ports, DBI and DBI2 are almost
identical. The difference is only in two CSR fields which turn to be
R/W in DBI2 instead of being RO in DBI. Other than that the DBI and
DBI2 spaces are identical. That's why we don't need any software-based
synchronization between the DBI/DBI2 accesses.

Moreover we won't need to worry about the synchronisation at all if
DBI2 is mapped via a separate reg-space (see dw_pcie.dbi_base2 field)
because any concurrency is resolved behind the scene by means of the
DBI bus HW implementation.

> 
> It is a question.

The situation gets to be more complex in case of DW PCIe End-points
because some of the DBI CSRs change semantics in DBI2. At the very
least it concerns the TYPE0_HDR.{BAR0-BAR5} registers, which determine
the corresponding BARx size and whether it is enabled in DBI2 (see the
reset_bar() and set_bar() methods implementation in
drivers/pci/controller/dwc/pcie-designware-ep.c). But my controller is
the Root Port controller, so the denoted peculiarity doesn't concern
it.

> 
> > +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> > +{
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +	int ret;
> > +
> > +	ret = bt1_pcie_get_resources(btpci);
> > +	if (ret)
> > +		return ret;
> > +
> > +	bt1_pcie_full_stop_bus(btpci, true);
> > +
> > +	return bt1_pcie_cold_start_bus(btpci);
> > +}
> > +
> > +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> > +{
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > +
> > +	bt1_pcie_full_stop_bus(btpci, false);
> > +}
> > +
> > +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> > +	.host_init = bt1_pcie_host_init,
> > +	.host_deinit = bt1_pcie_host_deinit,
> > +};
> > +
> > +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci;
> > +
> > +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> > +	if (!btpci)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	btpci->pdev = pdev;
> > +
> > +	platform_set_drvdata(pdev, btpci);
> > +
> > +	return btpci;
> > +}
> > +
> > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > +{
> > +	struct device *dev = &btpci->pdev->dev;
> > +	int ret;
> > +
> > +	/*
> > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > +	 * working with the 64-bit memory addresses.
> > +	 */
> > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > +	if (ret)
> > +		return ret;
> 

> Is this the right place to set the DMA mask for the host controller
> embedded DMA controller (actually, the dev pointer is the _host_
> controller device) ?

Yes it's. The DMA controller is embedded into the PCIe Root Port
controller. It CSRs are accessed via either the same CSR space or via
a separate space but synchronously clocked by the same clock source
(it's called unrolled iATU/eDMA mode). The memory range the
controller is capable to reach is platform specific. So the glue
driver is the best place to set the device DMA-mask. (For instance the
DW PCIe master AXI-bus width is selected by means of the
MASTER_BUS_ADDR_WIDTH parameter of the DW PCIe IP-core.)

> 
> How this is going to play when combined with:
> 
> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> 
> It is getting a bit confusing. I believe the code in the link
> above sets the mask so that through the DMA API we are capable
> of getting an MSI doorbell virtual address whose physical address
> can be addressed by the endpoint; this through the DMA API.

I don't really understand why the code in the link above tries to
analyze the MSI capability of the DW PCIe Root Port in the framework
of the dw_pcie_msi_host_init() method. The method utilizes the iMSI-RX
engine which is specific to the DW PCIe AXI-bus controller
implementation. It has nothing to do with the PCIe MSI capability
normally available over the standard PCIe config space.

As Rob correctly noted here
https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com
MSI TLPs never reaches the system memory. (But I would add that this
only concerns the iMSI-RX engine.) So no matter which memory
allocated and where, the only thing that matters is the PCIe-bus
address specified to the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI CSRs,
which are the DW PCIe-specific and both are always available thus
supporting the 64-bit messages in any case. So if we had a way to just
reserve a PCIe-bus address range which at the same time wouldn't have
a system memory behind, we could have used the reserved range to
initialize the iMSI-RX MSI-address without need to allocate any
DMA-able memory at all. That's why the commit 07940c369a6b ("PCI: dwc:
Fix MSI page leakage in suspend/resume") was absolutely correct.

> 
> This patch is setting the DMA mask for a different reason, namely
> setting the host controller embedded DMA controller addressing
> capabilities.

AFAIU what is done in that patch is incorrect.

> 
> AFAICS - both approaches set the mask for the same device - now
> the question is about which one is legitimate and how to handle
> the other.

That's simple. Mine is legitimate for sure. Another one isn't.

> 
> > +
> > +	btpci->dw.version = DW_PCIE_VER_460A;
> > +	btpci->dw.dev = dev;
> > +	btpci->dw.ops = &bt1_pcie_ops;
> > +
> > +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> > +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> > +
> > +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> > +
> > +	ret = dw_pcie_host_init(&btpci->dw.pp);
> > +	if (ret)
> > +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> > +
> > +	return ret;
> > +}
> > +
> > +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> > +{
> > +	dw_pcie_host_deinit(&btpci->dw.pp);
> > +}
> > +
> > +static int bt1_pcie_probe(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci;
> > +
> > +	btpci = bt1_pcie_create_data(pdev);
> 

> Do we really need a function for that ? I am not too
> bothered but I think it is overkill.

I prefer splitting the probe method up into a set of small and
coherent methods. It IMO improves the code readability for just no
price since the compiler will embed the single-time used static
methods anyway.

-Sergey

> 
> Thanks,
> Lorenzo
> 
> > +	if (IS_ERR(btpci))
> > +		return PTR_ERR(btpci);
> > +
> > +	return bt1_pcie_add_port(btpci);
> > +}
> > +
> > +static int bt1_pcie_remove(struct platform_device *pdev)
> > +{
> > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > +
> > +	bt1_pcie_del_port(btpci);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct of_device_id bt1_pcie_of_match[] = {
> > +	{ .compatible = "baikal,bt1-pcie" },
> > +	{},
> > +};
> > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > +
> > +static struct platform_driver bt1_pcie_driver = {
> > +	.probe = bt1_pcie_probe,
> > +	.remove = bt1_pcie_remove,
> > +	.driver = {
> > +		.name	= "bt1-pcie",
> > +		.of_match_table = bt1_pcie_of_match,
> > +	},
> > +};
> > +module_platform_driver(bt1_pcie_driver);
> > +
> > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > +MODULE_LICENSE("GPL");
> > -- 
> > 2.35.1
> >
Serge Semin Sept. 12, 2022, 12:20 a.m. UTC | #6
On Mon, Aug 29, 2022 at 05:32:24PM +0000, William McVicker wrote:
> On 08/29/2022, Lorenzo Pieralisi wrote:
> > [+Robin, Will - please jump to DMA mask set-up]
> > 
> > On Mon, Aug 22, 2022 at 09:47:01PM +0300, Serge Semin wrote:
> > > Baikal-T1 SoC is equipped with DWC PCIe v4.60a host controller. It can be
> > > trained to work up to Gen.3 speed over up to x4 lanes. The host controller
> > > is attached to the DW PCIe 3.0 PCS via the PIPE-4 interface, which in its
> > > turn is connected to the DWC 10G PHY. The whole system is supposed to be
> > > fed up with four clock sources: DBI peripheral clock, AXI application
> > > clocks and external PHY/core reference clock generating the 100MHz signal.
> > > In addition to that the platform provide a way to reset each part of the
> > > controller: sticky/non-sticky bits, host controller core, PIPE interface,
> > > PCS/PHY and Hot/Power reset signal. The driver also provides a way to
> > > handle the GPIO-based PERST# signal.
> > > 
> > > Note due to the Baikal-T1 MMIO peculiarity we have to implement the DBI
> > > interface accessors which make sure the IO operations are dword-aligned.
> > > 
> > > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > > 
> > > ---
> > > 
> > > Changelog v2:
> > > - Rename 'syscon' property to 'baikal,bt1-syscon'.
> > > 
> > > Changelog v3:
> > > - Use the clocks/resets handlers defined in the DW PCIe core descriptor.
> > >   (@Rob)
> > > - Redefine PCI host bridge config space accessors with the generic
> > >   pci_generic_config_read32() and pci_generic_config_write32() methods.
> > >   (@Rob)
> > > 
> > > Changelog v4:
> > > - Drop PCIBIOS_* macros usage. (@Rob)
> > > - Add "static const" to the dw_pcie_ops and dw_pcie_host_ops structure
> > >   instances. (@Bjorn)
> > > - Rename bt1_pcie_dw_ops to bt1_pcie_ops. (@Bjorn)
> > > - Rename bt1_pcie_ops to bt1_pci_ops. (@Bjorn)
> > > - Use start_link/stop_link suffixes in the corresponding callbacks.
> > >   (@Bjorn)
> > > - Change the get_res() method suffix to being get_resources(). (@Bjorn)
> > > - Change *_{add,del}_dw_port() method to *_{add,del}_port(). (@Bjorn)
> > > - Drop dma_coerce_mask_and_coherent() applied to the PCI host bridge
> > >   kernel device instance. (@Bjorn)
> > > - Add the comment above the dma_set_mask_and_coherent() method usage
> > >   regarding the controller eDMA feature. (@Bjorn)
> > > - Fix the comment above the core reset controls assertion. (@Bjorn)
> > > - Replace delays and timeout numeric literals with macros. (@Bjorn)
> > > ---
> > >  drivers/pci/controller/dwc/Kconfig    |   9 +
> > >  drivers/pci/controller/dwc/Makefile   |   1 +
> > >  drivers/pci/controller/dwc/pcie-bt1.c | 653 ++++++++++++++++++++++++++
> > >  3 files changed, 663 insertions(+)
> > >  create mode 100644 drivers/pci/controller/dwc/pcie-bt1.c
> > > 
> > > diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> > > index 62ce3abf0f19..771b8b146623 100644
> > > --- a/drivers/pci/controller/dwc/Kconfig
> > > +++ b/drivers/pci/controller/dwc/Kconfig
> > > @@ -222,6 +222,15 @@ config PCIE_ARTPEC6_EP
> > >  	  Enables support for the PCIe controller in the ARTPEC-6 SoC to work in
> > >  	  endpoint mode. This uses the DesignWare core.
> > >  
> > > +config PCIE_BT1
> > > +	tristate "Baikal-T1 PCIe controller"
> > > +	depends on MIPS_BAIKAL_T1 || COMPILE_TEST
> > > +	depends on PCI_MSI_IRQ_DOMAIN
> > > +	select PCIE_DW_HOST
> > > +	help
> > > +	  Enables support for the PCIe controller in the Baikal-T1 SoC to work
> > > +	  in host mode. It's based on the Synopsys DWC PCIe v4.60a IP-core.
> > > +
> > >  config PCIE_ROCKCHIP_DW_HOST
> > >  	bool "Rockchip DesignWare PCIe controller"
> > >  	select PCIE_DW
> > > diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile
> > > index 8ba7b67f5e50..bf5c311875a1 100644
> > > --- a/drivers/pci/controller/dwc/Makefile
> > > +++ b/drivers/pci/controller/dwc/Makefile
> > > @@ -3,6 +3,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
> > >  obj-$(CONFIG_PCIE_DW_HOST) += pcie-designware-host.o
> > >  obj-$(CONFIG_PCIE_DW_EP) += pcie-designware-ep.o
> > >  obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
> > > +obj-$(CONFIG_PCIE_BT1) += pcie-bt1.o
> > >  obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
> > >  obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
> > >  obj-$(CONFIG_PCIE_FU740) += pcie-fu740.o
> > > diff --git a/drivers/pci/controller/dwc/pcie-bt1.c b/drivers/pci/controller/dwc/pcie-bt1.c
> > > new file mode 100644
> > > index 000000000000..86b230575ddc
> > > --- /dev/null
> > > +++ b/drivers/pci/controller/dwc/pcie-bt1.c
> > > @@ -0,0 +1,653 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (C) 2021 BAIKAL ELECTRONICS, JSC
> > > + *
> > > + * Authors:
> > > + *   Vadim Vlasov <Vadim.Vlasov@baikalelectronics.ru>
> > > + *   Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > > + *
> > > + * Baikal-T1 PCIe controller driver
> > > + */
> > > +
> > > +#include <linux/bitfield.h>
> > > +#include <linux/bits.h>
> > > +#include <linux/clk.h>
> > > +#include <linux/delay.h>
> > > +#include <linux/gpio/consumer.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/mfd/syscon.h>
> > > +#include <linux/module.h>
> > > +#include <linux/pci.h>
> > > +#include <linux/platform_device.h>
> > > +#include <linux/regmap.h>
> > > +#include <linux/reset.h>
> > > +#include <linux/types.h>
> > > +
> > > +#include "pcie-designware.h"
> > > +
> > > +/* Baikal-T1 System CCU control registers */
> > > +#define BT1_CCU_PCIE_CLKC			0x140
> > > +#define BT1_CCU_PCIE_REQ_PCS_CLK		BIT(16)
> > > +#define BT1_CCU_PCIE_REQ_MAC_CLK		BIT(17)
> > > +#define BT1_CCU_PCIE_REQ_PIPE_CLK		BIT(18)
> > > +
> > > +#define BT1_CCU_PCIE_RSTC			0x144
> > > +#define BT1_CCU_PCIE_REQ_LINK_RST		BIT(13)
> > > +#define BT1_CCU_PCIE_REQ_SMLH_RST		BIT(14)
> > > +#define BT1_CCU_PCIE_REQ_PHY_RST		BIT(16)
> > > +#define BT1_CCU_PCIE_REQ_CORE_RST		BIT(24)
> > > +#define BT1_CCU_PCIE_REQ_STICKY_RST		BIT(26)
> > > +#define BT1_CCU_PCIE_REQ_NSTICKY_RST		BIT(27)
> > > +
> > > +#define BT1_CCU_PCIE_PMSC			0x148
> > > +#define BT1_CCU_PCIE_LTSSM_STATE_MASK		GENMASK(5, 0)
> > > +#define BT1_CCU_PCIE_LTSSM_DET_QUIET		0x00
> > > +#define BT1_CCU_PCIE_LTSSM_DET_ACT		0x01
> > > +#define BT1_CCU_PCIE_LTSSM_POLL_ACT		0x02
> > > +#define BT1_CCU_PCIE_LTSSM_POLL_COMP		0x03
> > > +#define BT1_CCU_PCIE_LTSSM_POLL_CONF		0x04
> > > +#define BT1_CCU_PCIE_LTSSM_PRE_DET_QUIET	0x05
> > > +#define BT1_CCU_PCIE_LTSSM_DET_WAIT		0x06
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_START	0x07
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_ACEPT	0x08
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_WAIT	0x09
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_ACEPT	0x0a
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_COMPLETE		0x0b
> > > +#define BT1_CCU_PCIE_LTSSM_CFG_IDLE		0x0c
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_LOCK		0x0d
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_SPEED		0x0e
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_RCVRCFG		0x0f
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_IDLE		0x10
> > > +#define BT1_CCU_PCIE_LTSSM_L0			0x11
> > > +#define BT1_CCU_PCIE_LTSSM_L0S			0x12
> > > +#define BT1_CCU_PCIE_LTSSM_L123_SEND_IDLE	0x13
> > > +#define BT1_CCU_PCIE_LTSSM_L1_IDLE		0x14
> > > +#define BT1_CCU_PCIE_LTSSM_L2_IDLE		0x15
> > > +#define BT1_CCU_PCIE_LTSSM_L2_WAKE		0x16
> > > +#define BT1_CCU_PCIE_LTSSM_DIS_ENTRY		0x17
> > > +#define BT1_CCU_PCIE_LTSSM_DIS_IDLE		0x18
> > > +#define BT1_CCU_PCIE_LTSSM_DISABLE		0x19
> > > +#define BT1_CCU_PCIE_LTSSM_LPBK_ENTRY		0x1a
> > > +#define BT1_CCU_PCIE_LTSSM_LPBK_ACTIVE		0x1b
> > > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT		0x1c
> > > +#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT_TOUT	0x1d
> > > +#define BT1_CCU_PCIE_LTSSM_HOT_RST_ENTRY	0x1e
> > > +#define BT1_CCU_PCIE_LTSSM_HOT_RST		0x1f
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ0		0x20
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ1		0x21
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ2		0x22
> > > +#define BT1_CCU_PCIE_LTSSM_RCVR_EQ3		0x23
> > 
> > You could make this an enum and define only the states
> > that are actually used.
> > 
> > [...]
> > 
> > > +
> > > +/* Generic Baikal-T1 PCIe interface resources */
> > > +#define BT1_PCIE_NUM_APP_CLKS			ARRAY_SIZE(bt1_pcie_app_clks)
> > > +#define BT1_PCIE_NUM_CORE_CLKS			ARRAY_SIZE(bt1_pcie_core_clks)
> > > +#define BT1_PCIE_NUM_APP_RSTS			ARRAY_SIZE(bt1_pcie_app_rsts)
> > > +#define BT1_PCIE_NUM_CORE_RSTS			ARRAY_SIZE(bt1_pcie_core_rsts)
> > > +
> > > +/* PCIe bus setup delays and timeouts */
> > > +#define BT1_PCIE_RST_DELAY_MS			100
> > > +#define BT1_PCIE_RUN_DELAY_US			100
> > > +#define BT1_PCIE_REQ_DELAY_US			1
> > > +#define BT1_PCIE_REQ_TIMEOUT_US			1000
> > > +#define BT1_PCIE_LNK_DELAY_US			1000
> > > +#define BT1_PCIE_LNK_TIMEOUT_US			1000000
> > > +
> > > +static const enum dw_pcie_app_clk bt1_pcie_app_clks[] = {
> > > +	DW_PCIE_DBI_CLK, DW_PCIE_MSTR_CLK, DW_PCIE_SLV_CLK,
> > > +};
> > > +
> > > +static const enum dw_pcie_core_clk bt1_pcie_core_clks[] = {
> > > +	DW_PCIE_REF_CLK,
> > > +};
> > > +
> > > +static const enum dw_pcie_app_rst bt1_pcie_app_rsts[] = {
> > > +	DW_PCIE_MSTR_RST, DW_PCIE_SLV_RST,
> > > +};
> > > +
> > > +static const enum dw_pcie_core_rst bt1_pcie_core_rsts[] = {
> > > +	DW_PCIE_NON_STICKY_RST, DW_PCIE_STICKY_RST, DW_PCIE_CORE_RST,
> > > +	DW_PCIE_PIPE_RST, DW_PCIE_PHY_RST, DW_PCIE_HOT_RST, DW_PCIE_PWR_RST,
> > > +};
> > 
> > I wonder whether we could allocate the rst/clks in DWC dynamically,
> > by using these configuration arrays.
> > 
> > > +
> > > +struct bt1_pcie {
> > > +	struct dw_pcie dw;
> > > +	struct platform_device *pdev;
> > > +	struct regmap *sys_regs;
> > > +};
> > > +#define to_bt1_pcie(_dw) container_of(_dw, struct bt1_pcie, dw)
> > > +
> > > +/*
> > > + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> > > + * instructions. Note the methods are optimized to have the dword operations
> > > + * performed with minimum overhead as the most frequently used ones.
> > > + */
> > > +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> > > +{
> > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > +
> > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > +		return -EINVAL;
> > > +
> > > +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
> > 
> > Is it always safe to read more than requested ?
> > 
> > > +	if (size == 4) {
> > > +		return 0;
> > > +	} else if (size == 2) {
> > > +		*val &= 0xffff;
> > > +		return 0;
> > > +	} else if (size == 1) {
> > > +		*val &= 0xff;
> > > +		return 0;
> > > +	}
> > > +
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> > > +{
> > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > +	u32 tmp, mask;
> > > +
> > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > +		return -EINVAL;
> > > +
> > > +	if (size == 4) {
> > > +		writel(val, addr);
> > > +		return 0;
> > > +	} else if (size == 2 || size == 1) {
> > > +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> > > +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> > > +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> > > +		writel(tmp, addr - ofs);
> > > +		return 0;
> > > +	}
> > 
> > Same question read/modify/write, is it always safe to do it
> > regardless of size ?
> > 
> > > +
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +			     size_t size)
> > > +{
> > > +	int ret;
> > > +	u32 val;
> > > +
> > > +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> > > +	if (ret) {
> > > +		dev_err(pci->dev, "Read DBI address failed\n");
> > > +		return ~0U;
> > 
> > Is this a special magic value the DWC core is expecting ?
> > 
> > Does it clash with a _valid_ return value ?
> > 
> > > +	}
> > > +
> > > +	return val;
> > > +}
> > > +
> > > +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +			       size_t size, u32 val)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > +	if (ret)
> > > +		dev_err(pci->dev, "Write DBI address failed\n");
> > > +}
> > > +
> > > +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +				size_t size, u32 val)
> > > +{
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +	int ret;
> > > +
> > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> > > +
> > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > +	if (ret)
> > > +		dev_err(pci->dev, "Write DBI2 address failed\n");
> > > +
> > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > +			   BT1_CCU_PCIE_DBI2_MODE, 0);
> > 
> > IIUC the regmap_update_bits() set up decoding for DBI2. Hopefully the
> > DBI/DBI2 writes are sequentialized, this is a question valid also
> > for other DWC controllers.
> > 
> > What I want to say is, the regmap update in this function sets the
> > DWC HW in a way that can decode DBI2 (please correct me if I am wrong),
> > between the two _update_bits() no DBI access should happen because
> > it just would not work.
> > 
> > It is a question.
> > 
> > > +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> > > +{
> > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +	int ret;
> > > +
> > > +	ret = bt1_pcie_get_resources(btpci);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	bt1_pcie_full_stop_bus(btpci, true);
> > > +
> > > +	return bt1_pcie_cold_start_bus(btpci);
> > > +}
> > > +
> > > +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> > > +{
> > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +
> > > +	bt1_pcie_full_stop_bus(btpci, false);
> > > +}
> > > +
> > > +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> > > +	.host_init = bt1_pcie_host_init,
> > > +	.host_deinit = bt1_pcie_host_deinit,
> > > +};
> > > +
> > > +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci;
> > > +
> > > +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> > > +	if (!btpci)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	btpci->pdev = pdev;
> > > +
> > > +	platform_set_drvdata(pdev, btpci);
> > > +
> > > +	return btpci;
> > > +}
> > > +
> > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > +{
> > > +	struct device *dev = &btpci->pdev->dev;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > > +	 * working with the 64-bit memory addresses.
> > > +	 */
> > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > +	if (ret)
> > > +		return ret;
> > 
> > Is this the right place to set the DMA mask for the host controller
> > embedded DMA controller (actually, the dev pointer is the _host_
> > controller device) ?
> > 
> > How this is going to play when combined with:
> > 
> > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > 
> > It is getting a bit confusing. I believe the code in the link
> > above sets the mask so that through the DMA API we are capable
> > of getting an MSI doorbell virtual address whose physical address
> > can be addressed by the endpoint; this through the DMA API.
> > 
> > This patch is setting the DMA mask for a different reason, namely
> > setting the host controller embedded DMA controller addressing
> > capabilities.
> > 
> > AFAICS - both approaches set the mask for the same device - now
> > the question is about which one is legitimate and how to handle
> > the other.
> 

> Since the mask is being set before calling dw_pcie_host_init() and the
> msi_host_init op is not set, setting the mask here won't have any affect due to
> the fact that dw_pcie_msi_host_init() unconditionally sets the DMA mask.

Good note. Indeed the dw_pcie_msi_host_init() method forces the device
DMA-mask being 32-bits. It's absolutely wrong. First the iMSI-RX
engine doesn't perform any DMA. Second the 32-bit mask set by default
anyway. Third that most likely causes significant performance drop (or
even the mapping failures) since the DMAs performed by the PCIe-bus
peripheral devices to the memory above 4GB will be bounced buffered to
the 4GB limit if SWIOTLB/CMA is enabled.

> AFAIK,
> you don't technically need a 64-bit address, right?

No. I actually need it. It's a DMA-engine after all.

> If your platform disables
> 32-bit allocations, then you'll need to set the MSI capabilities flag to
> support a 64-bit allocation for when the 32-bit allocation fails in the host
> controller driver.
> 
> Here is how I'm doing that with the Exynos PCIe driver during probe before
> calling dw_pcie_host_init():
> 
> 	dw_pcie_dbi_ro_wr_en(exynos_pcie->pci);
> 	offset = dw_pcie_find_capability(exynos_pcie->pci, PCI_CAP_ID_MSI);
> 	cap = dw_pcie_readw_dbi(exynos_pcie->pci, offset + PCI_MSI_FLAGS);
> 
> 	/* Enable MSI with 64-bit support */
> 	cap |= PCI_MSI_FLAGS_ENABLE | PCI_MSI_FLAGS_64BIT;
> 	dw_pcie_writew_dbi(exynos_pcie->pci, offset + PCI_MSI_FLAGS, cap);
> 	dw_pcie_dbi_ro_wr_dis(exynos_pcie->pci);

Don't really see how that helps to you. The native MSI capability
isn't used neither in the Exynos driver nor in the DW PCIe driver
core.

> 
> I hope that helps!

Thanks for the suggestion regarding the DMA-mask anyway.

-Sergey

> 
> Regards,
> Will
> 
> > 
> > > +
> > > +	btpci->dw.version = DW_PCIE_VER_460A;
> > > +	btpci->dw.dev = dev;
> > > +	btpci->dw.ops = &bt1_pcie_ops;
> > > +
> > > +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> > > +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> > > +
> > > +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> > > +
> > > +	ret = dw_pcie_host_init(&btpci->dw.pp);
> > > +	if (ret)
> > > +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> > > +{
> > > +	dw_pcie_host_deinit(&btpci->dw.pp);
> > > +}
> > > +
> > > +static int bt1_pcie_probe(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci;
> > > +
> > > +	btpci = bt1_pcie_create_data(pdev);
> > 
> > Do we really need a function for that ? I am not too
> > bothered but I think it is overkill.
> > 
> > Thanks,
> > Lorenzo
> > 
> > > +	if (IS_ERR(btpci))
> > > +		return PTR_ERR(btpci);
> > > +
> > > +	return bt1_pcie_add_port(btpci);
> > > +}
> > > +
> > > +static int bt1_pcie_remove(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > > +
> > > +	bt1_pcie_del_port(btpci);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static const struct of_device_id bt1_pcie_of_match[] = {
> > > +	{ .compatible = "baikal,bt1-pcie" },
> > > +	{},
> > > +};
> > > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > > +
> > > +static struct platform_driver bt1_pcie_driver = {
> > > +	.probe = bt1_pcie_probe,
> > > +	.remove = bt1_pcie_remove,
> > > +	.driver = {
> > > +		.name	= "bt1-pcie",
> > > +		.of_match_table = bt1_pcie_of_match,
> > > +	},
> > > +};
> > > +module_platform_driver(bt1_pcie_driver);
> > > +
> > > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > > +MODULE_LICENSE("GPL");
> > > -- 
> > > 2.35.1
> > >
Serge Semin Sept. 12, 2022, 12:22 a.m. UTC | #7
On Wed, Aug 31, 2022 at 09:36:46AM +0100, Robin Murphy wrote:
> On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
> [...]
> > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > +{
> > > +	struct device *dev = &btpci->pdev->dev;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > > +	 * working with the 64-bit memory addresses.
> > > +	 */
> > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > +	if (ret)
> > > +		return ret;
> > 
> > Is this the right place to set the DMA mask for the host controller
> > embedded DMA controller (actually, the dev pointer is the _host_
> > controller device) ?
> > 
> > How this is going to play when combined with:
> > 
> > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > 
> > It is getting a bit confusing. I believe the code in the link
> > above sets the mask so that through the DMA API we are capable
> > of getting an MSI doorbell virtual address whose physical address
> > can be addressed by the endpoint; this through the DMA API.
> > 
> > This patch is setting the DMA mask for a different reason, namely
> > setting the host controller embedded DMA controller addressing
> > capabilities.
> > 
> > AFAICS - both approaches set the mask for the same device - now
> > the question is about which one is legitimate and how to handle
> > the other.
> 

> Assuming the dw-edma-pcie driver is the relevant one, that already sets its
> own masks on its own device, so I also don't see why this is here.

No. dw-edma-pcie has nothing to do with this driver.

-Sergey

> 
> Thanks,
> Robin.
Serge Semin Sept. 12, 2022, 12:25 a.m. UTC | #8
On Wed, Aug 31, 2022 at 09:54:14AM +0100, Robin Murphy wrote:
> On 2022-08-31 09:36, Robin Murphy wrote:
> > On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
> > [...]
> > > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > > +{
> > > > +    struct device *dev = &btpci->pdev->dev;
> > > > +    int ret;
> > > > +
> > > > +    /*
> > > > +     * DW PCIe Root Port controller is equipped with eDMA capable of
> > > > +     * working with the 64-bit memory addresses.
> > > > +     */
> > > > +    ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > > +    if (ret)
> > > > +        return ret;
> > > 
> > > Is this the right place to set the DMA mask for the host controller
> > > embedded DMA controller (actually, the dev pointer is the _host_
> > > controller device) ?
> > > 
> > > How this is going to play when combined with:
> > > 
> > > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > > 
> > > It is getting a bit confusing. I believe the code in the link
> > > above sets the mask so that through the DMA API we are capable
> > > of getting an MSI doorbell virtual address whose physical address
> > > can be addressed by the endpoint; this through the DMA API.
> > > 
> > > This patch is setting the DMA mask for a different reason, namely
> > > setting the host controller embedded DMA controller addressing
> > > capabilities.
> > > 
> > > AFAICS - both approaches set the mask for the same device - now
> > > the question is about which one is legitimate and how to handle
> > > the other.
> > 
> > Assuming the dw-edma-pcie driver is the relevant one, that already sets
> > its own masks on its own device, so I also don't see why this is here.
> 

> Ah, I just found the patch at [1], which further implies that this is indeed
> completely bogus.

Really? Elaborate please. What you said in the comment to that patch
has nothing to do with the change you comment here.

-Sergey

> 
> Robin.
> 
> [1] https://lore.kernel.org/dmaengine/20220822185332.26149-23-Sergey.Semin@baikalelectronics.ru/
Lorenzo Pieralisi Sept. 17, 2022, 10:44 a.m. UTC | #9
On Mon, Sep 12, 2022 at 03:02:11AM +0300, Serge Semin wrote:

[...]

> I prefer splitting the probe method up into a set of small and
> coherent methods. It IMO improves the code readability for just no
> price since the compiler will embed the single-time used static
> methods anyway.

I will get back to this thread at -rc7 - we will decide a merge
strategy then.

Lorenzo

> -Sergey
> 
> > 
> > Thanks,
> > Lorenzo
> > 
> > > +	if (IS_ERR(btpci))
> > > +		return PTR_ERR(btpci);
> > > +
> > > +	return bt1_pcie_add_port(btpci);
> > > +}
> > > +
> > > +static int bt1_pcie_remove(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > > +
> > > +	bt1_pcie_del_port(btpci);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static const struct of_device_id bt1_pcie_of_match[] = {
> > > +	{ .compatible = "baikal,bt1-pcie" },
> > > +	{},
> > > +};
> > > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > > +
> > > +static struct platform_driver bt1_pcie_driver = {
> > > +	.probe = bt1_pcie_probe,
> > > +	.remove = bt1_pcie_remove,
> > > +	.driver = {
> > > +		.name	= "bt1-pcie",
> > > +		.of_match_table = bt1_pcie_of_match,
> > > +	},
> > > +};
> > > +module_platform_driver(bt1_pcie_driver);
> > > +
> > > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > > +MODULE_LICENSE("GPL");
> > > -- 
> > > 2.35.1
> > >
Lorenzo Pieralisi Sept. 26, 2022, 10:17 a.m. UTC | #10
On Mon, Sep 12, 2022 at 03:02:11AM +0300, Serge Semin wrote:

[...]

> > > +/*
> > > + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> > > + * instructions. Note the methods are optimized to have the dword operations
> > > + * performed with minimum overhead as the most frequently used ones.
> > > + */
> > > +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> > > +{
> > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > +
> > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > +		return -EINVAL;
> > > +
> > > +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
> > 
> 
> > Is it always safe to read more than requested ?
> 
> This question is kind of contradicting. No matter whether it's safe or
> not we just can't perform the IOs with size other than of the dword
> size. Doing otherwise will cause the bus access error.

It is not contradicting. You are reading more than the requested
size, which can have side effects.

I understand there is no other way around it - still it would be good
to understand whether that can compromise the driver functionality.

> > > +	if (size == 4) {
> > > +		return 0;
> > > +	} else if (size == 2) {
> > > +		*val &= 0xffff;
> > > +		return 0;
> > > +	} else if (size == 1) {
> > > +		*val &= 0xff;
> > > +		return 0;
> > > +	}
> > > +
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> > > +{
> > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > +	u32 tmp, mask;
> > > +
> > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > +		return -EINVAL;
> > > +
> > > +	if (size == 4) {
> > > +		writel(val, addr);
> > > +		return 0;
> > > +	} else if (size == 2 || size == 1) {
> > > +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> > > +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> > > +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> > > +		writel(tmp, addr - ofs);
> > > +		return 0;
> > > +	}
> > 
> 
> > Same question read/modify/write, is it always safe to do it
> > regardless of size ?
> 
> ditto

See above.

> > 
> > > +
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +			     size_t size)
> > > +{
> > > +	int ret;
> > > +	u32 val;
> > > +
> > > +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> > > +	if (ret) {
> > > +		dev_err(pci->dev, "Read DBI address failed\n");
> > > +		return ~0U;
> > 
> 
> > Is this a special magic value the DWC core is expecting ?
> > 
> > Does it clash with a _valid_ return value ?
> 
> It's a normal return value if the PCIe IO wasn't successful.

I don't understand what you mean sorry. I understand you want to log
the error - what I don't get is why you change val to ~0U - why ~0U
and to what use, the function reading dbi can't use that value to
detect an error anyway, it would read whatever value is returned by
this function - regardless of the error condition.

> In this particular case there is no actual PCIe-bus IO though, but
> there are conditions which can cause the errors. So the error status
> is still sanity checked. This part was already commented by Rob here:
> https://lore.kernel.org/linux-pci/20220615171045.GD1413880-robh@kernel.org/
> my response was:
> https://lore.kernel.org/linux-pci/20220619203904.h7q2eb7e4ctsujsk@mobilestation/
> 
> > 
> > > +	}
> > > +
> > > +	return val;
> > > +}
> > > +
> > > +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +			       size_t size, u32 val)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > +	if (ret)
> > > +		dev_err(pci->dev, "Write DBI address failed\n");
> > > +}
> > > +
> > > +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > +				size_t size, u32 val)
> > > +{
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +	int ret;
> > > +
> > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> > > +
> > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > +	if (ret)
> > > +		dev_err(pci->dev, "Write DBI2 address failed\n");
> > > +
> > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > +			   BT1_CCU_PCIE_DBI2_MODE, 0);
> > 
> 
> > IIUC the regmap_update_bits() set up decoding for DBI2.
> 
> Right and then switches it back off.
> 
> > Hopefully the
> > DBI/DBI2 writes are sequentialized, this is a question valid also
> > for other DWC controllers.
> 
> In general you are right, but not in particular case of the DW PCIe
> Root Ports. So the concurrent access to DBI and DBI2 won't cause any
> problem.
> 
> > 
> > What I want to say is, the regmap update in this function sets the
> > DWC HW in a way that can decode DBI2 (please correct me if I am wrong),
> 
> Right.
> 
> > between the two _update_bits() no DBI access should happen because
> > it just would not work.
> 
> No. Because in case of the DW PCIe Root Ports, DBI and DBI2 are almost
> identical. The difference is only in two CSR fields which turn to be
> R/W in DBI2 instead of being RO in DBI. Other than that the DBI and
> DBI2 spaces are identical. That's why we don't need any software-based
> synchronization between the DBI/DBI2 accesses.
> 
> Moreover we won't need to worry about the synchronisation at all if
> DBI2 is mapped via a separate reg-space (see dw_pcie.dbi_base2 field)
> because any concurrency is resolved behind the scene by means of the
> DBI bus HW implementation.
> 
> > 
> > It is a question.
> 
> The situation gets to be more complex in case of DW PCIe End-points
> because some of the DBI CSRs change semantics in DBI2. At the very
> least it concerns the TYPE0_HDR.{BAR0-BAR5} registers, which determine
> the corresponding BARx size and whether it is enabled in DBI2 (see the
> reset_bar() and set_bar() methods implementation in
> drivers/pci/controller/dwc/pcie-designware-ep.c). But my controller is
> the Root Port controller, so the denoted peculiarity doesn't concern
> it.
> 
> > 
> > > +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> > > +{
> > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +	int ret;
> > > +
> > > +	ret = bt1_pcie_get_resources(btpci);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	bt1_pcie_full_stop_bus(btpci, true);
> > > +
> > > +	return bt1_pcie_cold_start_bus(btpci);
> > > +}
> > > +
> > > +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> > > +{
> > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > +
> > > +	bt1_pcie_full_stop_bus(btpci, false);
> > > +}
> > > +
> > > +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> > > +	.host_init = bt1_pcie_host_init,
> > > +	.host_deinit = bt1_pcie_host_deinit,
> > > +};
> > > +
> > > +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci;
> > > +
> > > +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> > > +	if (!btpci)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	btpci->pdev = pdev;
> > > +
> > > +	platform_set_drvdata(pdev, btpci);
> > > +
> > > +	return btpci;
> > > +}
> > > +
> > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > +{
> > > +	struct device *dev = &btpci->pdev->dev;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > > +	 * working with the 64-bit memory addresses.
> > > +	 */
> > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > +	if (ret)
> > > +		return ret;
> > 
> 
> > Is this the right place to set the DMA mask for the host controller
> > embedded DMA controller (actually, the dev pointer is the _host_
> > controller device) ?
> 
> Yes it's. The DMA controller is embedded into the PCIe Root Port
> controller. It CSRs are accessed via either the same CSR space or via
> a separate space but synchronously clocked by the same clock source
> (it's called unrolled iATU/eDMA mode). The memory range the
> controller is capable to reach is platform specific. So the glue
> driver is the best place to set the device DMA-mask. (For instance the
> DW PCIe master AXI-bus width is selected by means of the
> MASTER_BUS_ADDR_WIDTH parameter of the DW PCIe IP-core.)

I need to defer this question to Robin - I think the DMA mask for the
DMA controller device should be set in the respective device driver
(which isn't the host controller driver).

> > How this is going to play when combined with:
> > 
> > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > 
> > It is getting a bit confusing. I believe the code in the link
> > above sets the mask so that through the DMA API we are capable
> > of getting an MSI doorbell virtual address whose physical address
> > can be addressed by the endpoint; this through the DMA API.
> 
> I don't really understand why the code in the link above tries to
> analyze the MSI capability of the DW PCIe Root Port in the framework
> of the dw_pcie_msi_host_init() method. The method utilizes the iMSI-RX
> engine which is specific to the DW PCIe AXI-bus controller
> implementation. It has nothing to do with the PCIe MSI capability
> normally available over the standard PCIe config space.
> 
> As Rob correctly noted here
> https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com
> MSI TLPs never reaches the system memory. (But I would add that this
> only concerns the iMSI-RX engine.) So no matter which memory
> allocated and where, the only thing that matters is the PCIe-bus
> address specified to the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI CSRs,
> which are the DW PCIe-specific and both are always available thus
> supporting the 64-bit messages in any case. So if we had a way to just
> reserve a PCIe-bus address range which at the same time wouldn't have
> a system memory behind, we could have used the reserved range to
> initialize the iMSI-RX MSI-address without need to allocate any
> DMA-able memory at all. That's why the commit 07940c369a6b ("PCI: dwc:
> Fix MSI page leakage in suspend/resume") was absolutely correct.

Again - I would appreciate if Will/Robin can comment on this given
that it is down to DWC controller internals and their relation
with the DMA core layer.

Thanks,
Lorenzo

> > This patch is setting the DMA mask for a different reason, namely
> > setting the host controller embedded DMA controller addressing
> > capabilities.
> 
> AFAIU what is done in that patch is incorrect.
> 
> > 
> > AFAICS - both approaches set the mask for the same device - now
> > the question is about which one is legitimate and how to handle
> > the other.
> 
> That's simple. Mine is legitimate for sure. Another one isn't.
> 
> > 
> > > +
> > > +	btpci->dw.version = DW_PCIE_VER_460A;
> > > +	btpci->dw.dev = dev;
> > > +	btpci->dw.ops = &bt1_pcie_ops;
> > > +
> > > +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> > > +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> > > +
> > > +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> > > +
> > > +	ret = dw_pcie_host_init(&btpci->dw.pp);
> > > +	if (ret)
> > > +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> > > +{
> > > +	dw_pcie_host_deinit(&btpci->dw.pp);
> > > +}
> > > +
> > > +static int bt1_pcie_probe(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci;
> > > +
> > > +	btpci = bt1_pcie_create_data(pdev);
> > 
> 
> > Do we really need a function for that ? I am not too
> > bothered but I think it is overkill.
> 
> I prefer splitting the probe method up into a set of small and
> coherent methods. It IMO improves the code readability for just no
> price since the compiler will embed the single-time used static
> methods anyway.
> 
> -Sergey
> 
> > 
> > Thanks,
> > Lorenzo
> > 
> > > +	if (IS_ERR(btpci))
> > > +		return PTR_ERR(btpci);
> > > +
> > > +	return bt1_pcie_add_port(btpci);
> > > +}
> > > +
> > > +static int bt1_pcie_remove(struct platform_device *pdev)
> > > +{
> > > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > > +
> > > +	bt1_pcie_del_port(btpci);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static const struct of_device_id bt1_pcie_of_match[] = {
> > > +	{ .compatible = "baikal,bt1-pcie" },
> > > +	{},
> > > +};
> > > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > > +
> > > +static struct platform_driver bt1_pcie_driver = {
> > > +	.probe = bt1_pcie_probe,
> > > +	.remove = bt1_pcie_remove,
> > > +	.driver = {
> > > +		.name	= "bt1-pcie",
> > > +		.of_match_table = bt1_pcie_of_match,
> > > +	},
> > > +};
> > > +module_platform_driver(bt1_pcie_driver);
> > > +
> > > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > > +MODULE_LICENSE("GPL");
> > > -- 
> > > 2.35.1
> > >
Serge Semin Sept. 26, 2022, 12:49 p.m. UTC | #11
@Christoph, @Marek, @Bjorn, @Rob could you please join to the
DMA-mask related discussion. @Lorenzo can't decide which driver should
initialize the device DMA-mask.

On Mon, Sep 26, 2022 at 12:17:27PM +0200, Lorenzo Pieralisi wrote:
> On Mon, Sep 12, 2022 at 03:02:11AM +0300, Serge Semin wrote:
> 
> [...]
> 
> > > > +/*
> > > > + * Baikal-T1 MMIO space must be read/written by the dword-aligned
> > > > + * instructions. Note the methods are optimized to have the dword operations
> > > > + * performed with minimum overhead as the most frequently used ones.
> > > > + */
> > > > +static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
> > > > +{
> > > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > > +
> > > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > > +		return -EINVAL;
> > > > +
> > > > +	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
> > > 
> > 
> > > Is it always safe to read more than requested ?
> > 
> > This question is kind of contradicting. No matter whether it's safe or
> > not we just can't perform the IOs with size other than of the dword
> > size. Doing otherwise will cause the bus access error.
> 

> It is not contradicting. You are reading more than the requested
> size, which can have side effects.
> 
> I understand there is no other way around it - still it would be good
> to understand whether that can compromise the driver functionality.

In the framework of the current DW PCIe driver functionality it's
safe. Moreover all the DW PCIe Port-Logic CSRs are of the DWORD
size anyway.

> 
> > > > +	if (size == 4) {
> > > > +		return 0;
> > > > +	} else if (size == 2) {
> > > > +		*val &= 0xffff;
> > > > +		return 0;
> > > > +	} else if (size == 1) {
> > > > +		*val &= 0xff;
> > > > +		return 0;
> > > > +	}
> > > > +
> > > > +	return -EINVAL;
> > > > +}
> > > > +
> > > > +static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
> > > > +{
> > > > +	unsigned int ofs = (uintptr_t)addr & 0x3;
> > > > +	u32 tmp, mask;
> > > > +
> > > > +	if (!IS_ALIGNED((uintptr_t)addr, size))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (size == 4) {
> > > > +		writel(val, addr);
> > > > +		return 0;
> > > > +	} else if (size == 2 || size == 1) {
> > > > +		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
> > > > +		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
> > > > +		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
> > > > +		writel(tmp, addr - ofs);
> > > > +		return 0;
> > > > +	}
> > > 
> > 
> > > Same question read/modify/write, is it always safe to do it
> > > regardless of size ?
> > 
> > ditto
> 
> See above.

The same answer.

> 
> > > 
> > > > +
> > > > +	return -EINVAL;
> > > > +}
> > > > +
> > > > +static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > > +			     size_t size)
> > > > +{
> > > > +	int ret;
> > > > +	u32 val;
> > > > +
> > > > +	ret = bt1_pcie_read_mmio(base + reg, size, &val);
> > > > +	if (ret) {
> > > > +		dev_err(pci->dev, "Read DBI address failed\n");
> > > > +		return ~0U;
> > > 
> > 
> > > Is this a special magic value the DWC core is expecting ?
> > > 
> > > Does it clash with a _valid_ return value ?
> > 
> > It's a normal return value if the PCIe IO wasn't successful.
> 

> I don't understand what you mean sorry. I understand you want to log
> the error - what I don't get is why you change val to ~0U - why ~0U
> and to what use, the function reading dbi can't use that value to
> detect an error anyway, it would read whatever value is returned by
> this function - regardless of the error condition.

Consider the returned FFs as a unsupported request returned for TLPs
sent to invalid or unreachable PCIe device. Though in this particular
case there is no actual TLP emitted, but just an MMIO operation
performed. Why is that needed? Mainly to somehow indicate a
malfunction access. It's also needed to detect the DW PCIe core
inconsistencies like requesting the IOs of the unsupported sizes or
from the size-unaligned addresses. See the way the dw_pcie_write() and
dw_pcie_read() are implemented in the DW PCIe core driver. The only
difference is that the later methods return garbage in case of the
errors and I don't use the PCIBIOS_SUCCESSFUL and
PCIBIOS_BAD_REGISTER_NUMBER macros in my driver by the @Rob earlier
request.

> 
> > In this particular case there is no actual PCIe-bus IO though, but
> > there are conditions which can cause the errors. So the error status
> > is still sanity checked. This part was already commented by Rob here:
> > https://lore.kernel.org/linux-pci/20220615171045.GD1413880-robh@kernel.org/
> > my response was:
> > https://lore.kernel.org/linux-pci/20220619203904.h7q2eb7e4ctsujsk@mobilestation/
> > 
> > > 
> > > > +	}
> > > > +
> > > > +	return val;
> > > > +}
> > > > +
> > > > +static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > > +			       size_t size, u32 val)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > > +	if (ret)
> > > > +		dev_err(pci->dev, "Write DBI address failed\n");
> > > > +}
> > > > +
> > > > +static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
> > > > +				size_t size, u32 val)
> > > > +{
> > > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > > +	int ret;
> > > > +
> > > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > > +			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
> > > > +
> > > > +	ret = bt1_pcie_write_mmio(base + reg, size, val);
> > > > +	if (ret)
> > > > +		dev_err(pci->dev, "Write DBI2 address failed\n");
> > > > +
> > > > +	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
> > > > +			   BT1_CCU_PCIE_DBI2_MODE, 0);
> > > 
> > 
> > > IIUC the regmap_update_bits() set up decoding for DBI2.
> > 
> > Right and then switches it back off.
> > 
> > > Hopefully the
> > > DBI/DBI2 writes are sequentialized, this is a question valid also
> > > for other DWC controllers.
> > 
> > In general you are right, but not in particular case of the DW PCIe
> > Root Ports. So the concurrent access to DBI and DBI2 won't cause any
> > problem.
> > 
> > > 
> > > What I want to say is, the regmap update in this function sets the
> > > DWC HW in a way that can decode DBI2 (please correct me if I am wrong),
> > 
> > Right.
> > 
> > > between the two _update_bits() no DBI access should happen because
> > > it just would not work.
> > 
> > No. Because in case of the DW PCIe Root Ports, DBI and DBI2 are almost
> > identical. The difference is only in two CSR fields which turn to be
> > R/W in DBI2 instead of being RO in DBI. Other than that the DBI and
> > DBI2 spaces are identical. That's why we don't need any software-based
> > synchronization between the DBI/DBI2 accesses.
> > 
> > Moreover we won't need to worry about the synchronisation at all if
> > DBI2 is mapped via a separate reg-space (see dw_pcie.dbi_base2 field)
> > because any concurrency is resolved behind the scene by means of the
> > DBI bus HW implementation.
> > 
> > > 
> > > It is a question.
> > 
> > The situation gets to be more complex in case of DW PCIe End-points
> > because some of the DBI CSRs change semantics in DBI2. At the very
> > least it concerns the TYPE0_HDR.{BAR0-BAR5} registers, which determine
> > the corresponding BARx size and whether it is enabled in DBI2 (see the
> > reset_bar() and set_bar() methods implementation in
> > drivers/pci/controller/dwc/pcie-designware-ep.c). But my controller is
> > the Root Port controller, so the denoted peculiarity doesn't concern
> > it.
> > 
> > > 
> > > > +static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
> > > > +{
> > > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > > +	int ret;
> > > > +
> > > > +	ret = bt1_pcie_get_resources(btpci);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	bt1_pcie_full_stop_bus(btpci, true);
> > > > +
> > > > +	return bt1_pcie_cold_start_bus(btpci);
> > > > +}
> > > > +
> > > > +static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
> > > > +{
> > > > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > +	struct bt1_pcie *btpci = to_bt1_pcie(pci);
> > > > +
> > > > +	bt1_pcie_full_stop_bus(btpci, false);
> > > > +}
> > > > +
> > > > +static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
> > > > +	.host_init = bt1_pcie_host_init,
> > > > +	.host_deinit = bt1_pcie_host_deinit,
> > > > +};
> > > > +
> > > > +static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
> > > > +{
> > > > +	struct bt1_pcie *btpci;
> > > > +
> > > > +	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
> > > > +	if (!btpci)
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +
> > > > +	btpci->pdev = pdev;
> > > > +
> > > > +	platform_set_drvdata(pdev, btpci);
> > > > +
> > > > +	return btpci;
> > > > +}
> > > > +
> > > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > > +{
> > > > +	struct device *dev = &btpci->pdev->dev;
> > > > +	int ret;
> > > > +
> > > > +	/*
> > > > +	 * DW PCIe Root Port controller is equipped with eDMA capable of
> > > > +	 * working with the 64-bit memory addresses.
> > > > +	 */
> > > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > > +	if (ret)
> > > > +		return ret;
> > > 
> > 
> > > Is this the right place to set the DMA mask for the host controller
> > > embedded DMA controller (actually, the dev pointer is the _host_
> > > controller device) ?
> > 
> > Yes it's. The DMA controller is embedded into the PCIe Root Port
> > controller. It CSRs are accessed via either the same CSR space or via
> > a separate space but synchronously clocked by the same clock source
> > (it's called unrolled iATU/eDMA mode). The memory range the
> > controller is capable to reach is platform specific. So the glue
> > driver is the best place to set the device DMA-mask. (For instance the
> > DW PCIe master AXI-bus width is selected by means of the
> > MASTER_BUS_ADDR_WIDTH parameter of the DW PCIe IP-core.)
> 

> I need to defer this question to Robin - I think the DMA mask for the
> DMA controller device should be set in the respective device driver
> (which isn't the host controller driver).

Which device driver? This driver is a vendor-specific DW PCIe
controller driver. It calls a function to register the PCIe host
device based on the DW PCIe platform device, which in its turn
registers the DMA-engine device based on the same platform device. The
platform device turns to be parental for both of them. It's used to
properly map the memory ranges for both of these devices. Anyway if
you say that some other device driver is supposed to initialize the
mask why on earth the pcie-designware-host.c driver sets the DMA mask
to the platform device then?

Note DMA-mask indicates the device DMA capability. It isn't something what
a generic driver can always detect especially in case of a platform
device. So the corresponding LLDD is the best place for it to be set
or overwritten anyway.

> 
> > > How this is going to play when combined with:
> > > 
> > > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > > 
> > > It is getting a bit confusing. I believe the code in the link
> > > above sets the mask so that through the DMA API we are capable
> > > of getting an MSI doorbell virtual address whose physical address
> > > can be addressed by the endpoint; this through the DMA API.
> > 
> > I don't really understand why the code in the link above tries to
> > analyze the MSI capability of the DW PCIe Root Port in the framework
> > of the dw_pcie_msi_host_init() method. The method utilizes the iMSI-RX
> > engine which is specific to the DW PCIe AXI-bus controller
> > implementation. It has nothing to do with the PCIe MSI capability
> > normally available over the standard PCIe config space.
> > 
> > As Rob correctly noted here
> > https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com
> > MSI TLPs never reaches the system memory. (But I would add that this
> > only concerns the iMSI-RX engine.) So no matter which memory
> > allocated and where, the only thing that matters is the PCIe-bus
> > address specified to the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI CSRs,
> > which are the DW PCIe-specific and both are always available thus
> > supporting the 64-bit messages in any case. So if we had a way to just
> > reserve a PCIe-bus address range which at the same time wouldn't have
> > a system memory behind, we could have used the reserved range to
> > initialize the iMSI-RX MSI-address without need to allocate any
> > DMA-able memory at all. That's why the commit 07940c369a6b ("PCI: dwc:
> > Fix MSI page leakage in suspend/resume") was absolutely correct.
> 

> Again - I would appreciate if Will/Robin can comment on this given
> that it is down to DWC controller internals and their relation
> with the DMA core layer.

Again what William suggested was not that much useful. The only thing
which can and should be done based on his message is dropping the
dma_set_mask() method call from dw_pcie_msi_host_init() since the mask
is set to 32-bits by default anyway. Moreover as I said to William in
my response earlier having the DMA mask explicitly set to 32-bits for
all DW PCIe devices may cause the DMA performance degradation since in
that case the DMA above 4GB will be either bounce buffered to the
lower addresses or will fail on the memory mapping stage. But that is
a subject of another fixup-patch. Meanwhile this patch sets the
DMA-mask in accordance with the platform device capability no matter
what part of it performs DMA (Note it can be not only embedded eDMA,
but inbound iATU too).

-Sergey

> 
> Thanks,
> Lorenzo
> 
> > > This patch is setting the DMA mask for a different reason, namely
> > > setting the host controller embedded DMA controller addressing
> > > capabilities.
> > 
> > AFAIU what is done in that patch is incorrect.
> > 
> > > 
> > > AFAICS - both approaches set the mask for the same device - now
> > > the question is about which one is legitimate and how to handle
> > > the other.
> > 
> > That's simple. Mine is legitimate for sure. Another one isn't.
> > 
> > > 
> > > > +
> > > > +	btpci->dw.version = DW_PCIE_VER_460A;
> > > > +	btpci->dw.dev = dev;
> > > > +	btpci->dw.ops = &bt1_pcie_ops;
> > > > +
> > > > +	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
> > > > +	btpci->dw.pp.ops = &bt1_pcie_host_ops;
> > > > +
> > > > +	dw_pcie_cap_set(&btpci->dw, REQ_RES);
> > > > +
> > > > +	ret = dw_pcie_host_init(&btpci->dw.pp);
> > > > +	if (ret)
> > > > +		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void bt1_pcie_del_port(struct bt1_pcie *btpci)
> > > > +{
> > > > +	dw_pcie_host_deinit(&btpci->dw.pp);
> > > > +}
> > > > +
> > > > +static int bt1_pcie_probe(struct platform_device *pdev)
> > > > +{
> > > > +	struct bt1_pcie *btpci;
> > > > +
> > > > +	btpci = bt1_pcie_create_data(pdev);
> > > 
> > 
> > > Do we really need a function for that ? I am not too
> > > bothered but I think it is overkill.
> > 
> > I prefer splitting the probe method up into a set of small and
> > coherent methods. It IMO improves the code readability for just no
> > price since the compiler will embed the single-time used static
> > methods anyway.
> > 
> > -Sergey
> > 
> > > 
> > > Thanks,
> > > Lorenzo
> > > 
> > > > +	if (IS_ERR(btpci))
> > > > +		return PTR_ERR(btpci);
> > > > +
> > > > +	return bt1_pcie_add_port(btpci);
> > > > +}
> > > > +
> > > > +static int bt1_pcie_remove(struct platform_device *pdev)
> > > > +{
> > > > +	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
> > > > +
> > > > +	bt1_pcie_del_port(btpci);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static const struct of_device_id bt1_pcie_of_match[] = {
> > > > +	{ .compatible = "baikal,bt1-pcie" },
> > > > +	{},
> > > > +};
> > > > +MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
> > > > +
> > > > +static struct platform_driver bt1_pcie_driver = {
> > > > +	.probe = bt1_pcie_probe,
> > > > +	.remove = bt1_pcie_remove,
> > > > +	.driver = {
> > > > +		.name	= "bt1-pcie",
> > > > +		.of_match_table = bt1_pcie_of_match,
> > > > +	},
> > > > +};
> > > > +module_platform_driver(bt1_pcie_driver);
> > > > +
> > > > +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
> > > > +MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
> > > > +MODULE_LICENSE("GPL");
> > > > -- 
> > > > 2.35.1
> > > >
Robin Murphy Sept. 26, 2022, 1:09 p.m. UTC | #12
On 2022-09-12 01:25, Serge Semin wrote:
> On Wed, Aug 31, 2022 at 09:54:14AM +0100, Robin Murphy wrote:
>> On 2022-08-31 09:36, Robin Murphy wrote:
>>> On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
>>> [...]
>>>>> +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
>>>>> +{
>>>>> +��� struct device *dev = &btpci->pdev->dev;
>>>>> +��� int ret;
>>>>> +
>>>>> +��� /*
>>>>> +���� * DW PCIe Root Port controller is equipped with eDMA capable of
>>>>> +���� * working with the 64-bit memory addresses.
>>>>> +���� */
>>>>> +��� ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
>>>>> +��� if (ret)
>>>>> +������� return ret;
>>>>
>>>> Is this the right place to set the DMA mask for the host controller
>>>> embedded DMA controller (actually, the dev pointer is the _host_
>>>> controller device) ?
>>>>
>>>> How this is going to play when combined with:
>>>>
>>>> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
>>>>
>>>> It is getting a bit confusing. I believe the code in the link
>>>> above sets the mask so that through the DMA API we are capable
>>>> of getting an MSI doorbell virtual address whose physical address
>>>> can be addressed by the endpoint; this through the DMA API.
>>>>
>>>> This patch is setting the DMA mask for a different reason, namely
>>>> setting the host controller embedded DMA controller addressing
>>>> capabilities.
>>>>
>>>> AFAICS - both approaches set the mask for the same device - now
>>>> the question is about which one is legitimate and how to handle
>>>> the other.
>>>
>>> Assuming the dw-edma-pcie driver is the relevant one, that already sets
>>> its own masks on its own device, so I also don't see why this is here.
>>
> 
>> Ah, I just found the patch at [1], which further implies that this is indeed
>> completely bogus.
> 
> Really? Elaborate please. What you said in the comment to that patch
> has nothing to do with the change you comment here.

It has everything to do with it; if the other driver did the right 
thing, this change wouldn't even be here. Everything you've said has 
implied that the DMA engine driver cares about the AXI side of the 
bridge, which is represented by the platform device. Thus it should set 
the platform device's DMA mask, and use the platform device for DMA API 
calls, and thus there should be no conflict with the host controller 
driver's use of the PCI device's DMA mask to reserve a DMA address in 
PCI memory space on the other side of the bridge, nor any translation 
across the bridge itself.

Thanks,
Robin.
Serge Semin Sept. 26, 2022, 1:31 p.m. UTC | #13
On Mon, Sep 26, 2022 at 02:09:59PM +0100, Robin Murphy wrote:
> On 2022-09-12 01:25, Serge Semin wrote:
> > On Wed, Aug 31, 2022 at 09:54:14AM +0100, Robin Murphy wrote:
> > > On 2022-08-31 09:36, Robin Murphy wrote:
> > > > On 2022-08-29 16:28, Lorenzo Pieralisi wrote:
> > > > [...]
> > > > > > +static int bt1_pcie_add_port(struct bt1_pcie *btpci)
> > > > > > +{
> > > > > > +��� struct device *dev = &btpci->pdev->dev;
> > > > > > +��� int ret;
> > > > > > +
> > > > > > +��� /*
> > > > > > +���� * DW PCIe Root Port controller is equipped with eDMA capable of
> > > > > > +���� * working with the 64-bit memory addresses.
> > > > > > +���� */
> > > > > > +��� ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > > > > > +��� if (ret)
> > > > > > +������� return ret;
> > > > > 
> > > > > Is this the right place to set the DMA mask for the host controller
> > > > > embedded DMA controller (actually, the dev pointer is the _host_
> > > > > controller device) ?
> > > > > 
> > > > > How this is going to play when combined with:
> > > > > 
> > > > > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com
> > > > > 
> > > > > It is getting a bit confusing. I believe the code in the link
> > > > > above sets the mask so that through the DMA API we are capable
> > > > > of getting an MSI doorbell virtual address whose physical address
> > > > > can be addressed by the endpoint; this through the DMA API.
> > > > > 
> > > > > This patch is setting the DMA mask for a different reason, namely
> > > > > setting the host controller embedded DMA controller addressing
> > > > > capabilities.
> > > > > 
> > > > > AFAICS - both approaches set the mask for the same device - now
> > > > > the question is about which one is legitimate and how to handle
> > > > > the other.
> > > > 
> > > > Assuming the dw-edma-pcie driver is the relevant one, that already sets
> > > > its own masks on its own device, so I also don't see why this is here.
> > > 
> > 
> > > Ah, I just found the patch at [1], which further implies that this is indeed
> > > completely bogus.
> > 
> > Really? Elaborate please. What you said in the comment to that patch
> > has nothing to do with the change you comment here.
> 

> It has everything to do with it; if the other driver did the right thing,
> this change wouldn't even be here.

What "right" thing do you imply? What the other driver should have done?

> Everything you've said has implied that
> the DMA engine driver cares about the AXI side of the bridge, which is
> represented by the platform device.

Both DW PCIe host controller and embedded eDMA drivers care about the
AXI-master-side of the device. The only driver which can be aware of
the interface config parameters is the platform driver. This patch
introduces a platform driver which sets the relevant DMA-mask.

> Thus it should set the platform device's
> DMA mask, and use the platform device for DMA API calls, and thus there
> should be no conflict with the host controller driver's use of the PCI
> device's DMA mask to reserve a DMA address in PCI memory space on the other
> side of the bridge, nor any translation across the bridge itself.

How do you expect the eDMA driver would detect the platform device
capability like DMAable memory range?

Note here we are talking about the DMAable memory ranges. Meanwhile
the eDMA-patch [1] you were commenting was necessary due to the
PCI-specific "dma-ranges" property setting. That's why I told you that
this and that parts are irrelevant.

[1] https://lore.kernel.org/dmaengine/20220822185332.26149-23-Sergey.Semin@baikalelectronics.ru/

-Sergey

> 
> Thanks,
> Robin.
Christoph Hellwig Sept. 26, 2022, 2:31 p.m. UTC | #14
On Mon, Sep 26, 2022 at 03:49:24PM +0300, Serge Semin wrote:
> @Christoph, @Marek, @Bjorn, @Rob could you please join to the
> DMA-mask related discussion. @Lorenzo can't decide which driver should
> initialize the device DMA-mask.

The driver that does the actual DMA mapping or allocation functions
need to set it.  But even with your comments on the questions I'm
still confused what struct device you are even talking about.  Can
you explain this a bit better?
Serge Semin Sept. 26, 2022, 8:53 p.m. UTC | #15
On Mon, Sep 26, 2022 at 04:31:28PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 26, 2022 at 03:49:24PM +0300, Serge Semin wrote:
> > @Christoph, @Marek, @Bjorn, @Rob could you please join to the
> > DMA-mask related discussion. @Lorenzo can't decide which driver should
> > initialize the device DMA-mask.
> 

> The driver that does the actual DMA mapping or allocation functions
> need to set it.  But even with your comments on the questions I'm
> still confused what struct device you are even talking about.  Can
> you explain this a bit better?

We are talking about the DW PCIe Root Port controller with DW eDMA engine
embedded. It' simplified structure can be represented as follows:

         +---------------+     +--------+
         | System memory |     | CPU(s) |
         +---------------+     +--------+
                ^  |              |  ^
                | ... System bus ... |
               ... |              | ...
                |  v              v  |
 +------------+------+--------+----------+------+
 | DW PCIe RP | AXI-m|        | AXI-s/DBI|      |
 |            +------+        +----------+      |
 |                ^              ^     |        |
 |         +------+----+         |    CSRs      |
 |         v           v         v              |
 |     +-------+  +---------+ +----------+      |
 |     | eDMA  |  | in-iATU | | out-iATU |      |
 |     +-------+  +---------+ +----------+      |
 |         ^           ^           ^            |
 |         +--------+--+---+-------+            |
 +------------------| PIPE |--------------------+
                    +------+
                      | ^
                      v |
                   PCIe bus

The DW PCIe controller device is instantiated as a platform device
defined in the system DT source file. The device is probed by the
DW PCIe low-level driver, which after the platform-specific setups
initiates the generic DW PCIe host-controller registration. On the way
of that procedure the DW PCIe core tries to auto-detect the DW eDMA
engine availability. If the engine is found, the DW eDMA probe method
is called in order to register the DMA-engine device. After that the
PCIe host bridge is registered. Both the PCIe host-bridge and
DMA-engine devices will have the DW PCIe platform device as parent.

Getting back to the sketch above. Here is a short description of the
content:
1. DW eDMA is capable of performing the data transfers from/to System
memory to/from PCIe bus memory.
2. in-iATU is the Inbound Address Translation Unit, which is
responsible for the PCIe bus peripheral devices to access the system
memory. The "dma-ranges" DT-property is used to initialize the
PCIe<->Sys memory mapping. (@William note the In-iATU setup doesn't
affect the eDMA transfers.)
3. out-iATU is responsible for the CPU(s) to access the PCIe bus
peripheral devices memory/cfg-space.

So eDMA and in-iATU are using the same AXI-master interface to access
the system memory. Thus the DMAable memory capability is the same for
both of them (Though in-iATU may have some specific mapping based on
the "dma-ranges" DT-property setup). Neither DW eDMA nor DW PCIe Root
Port CSRs region have any register to auto-detect the AXI-m interface
address bus width. It's selected during the IP-core synthesize and is
platform-specific. The question is: "What driver/code is supposed to
set the DMA-mask of the DW PCIe platform device?" Seeing the parental
platform device is used to perform the memory-mapping for both DW eDMA
clients and PCIe-bus peripheral device drivers, and seeing the AXI-m
interface parameters aren't auto-detectable and are platform-specific,
the only place it should be done in is the DW PCIe low-level device
driver. I don't really see any alternative... What is your opinion?

-Sergey
William McVicker Sept. 26, 2022, 11:08 p.m. UTC | #16
On 09/26/2022, Serge Semin wrote:
> On Mon, Sep 26, 2022 at 04:31:28PM +0200, Christoph Hellwig wrote:
> > On Mon, Sep 26, 2022 at 03:49:24PM +0300, Serge Semin wrote:
> > > @Christoph, @Marek, @Bjorn, @Rob could you please join to the
> > > DMA-mask related discussion. @Lorenzo can't decide which driver should
> > > initialize the device DMA-mask.
> > 
> 
> > The driver that does the actual DMA mapping or allocation functions
> > need to set it.  But even with your comments on the questions I'm
> > still confused what struct device you are even talking about.  Can
> > you explain this a bit better?
> 
> We are talking about the DW PCIe Root Port controller with DW eDMA engine
> embedded. It' simplified structure can be represented as follows:
> 
>          +---------------+     +--------+
>          | System memory |     | CPU(s) |
>          +---------------+     +--------+
>                 ^  |              |  ^
>                 | ... System bus ... |
>                ... |              | ...
>                 |  v              v  |
>  +------------+------+--------+----------+------+
>  | DW PCIe RP | AXI-m|        | AXI-s/DBI|      |
>  |            +------+        +----------+      |
>  |                ^              ^     |        |
>  |         +------+----+         |    CSRs      |
>  |         v           v         v              |
>  |     +-------+  +---------+ +----------+      |
>  |     | eDMA  |  | in-iATU | | out-iATU |      |
>  |     +-------+  +---------+ +----------+      |
>  |         ^           ^           ^            |
>  |         +--------+--+---+-------+            |
>  +------------------| PIPE |--------------------+
>                     +------+
>                       | ^
>                       v |
>                    PCIe bus
> 
> The DW PCIe controller device is instantiated as a platform device
> defined in the system DT source file. The device is probed by the
> DW PCIe low-level driver, which after the platform-specific setups
> initiates the generic DW PCIe host-controller registration. On the way
> of that procedure the DW PCIe core tries to auto-detect the DW eDMA
> engine availability. If the engine is found, the DW eDMA probe method
> is called in order to register the DMA-engine device. After that the
> PCIe host bridge is registered. Both the PCIe host-bridge and
> DMA-engine devices will have the DW PCIe platform device as parent.
> 
> Getting back to the sketch above. Here is a short description of the
> content:
> 1. DW eDMA is capable of performing the data transfers from/to System
> memory to/from PCIe bus memory.
> 2. in-iATU is the Inbound Address Translation Unit, which is
> responsible for the PCIe bus peripheral devices to access the system
> memory. The "dma-ranges" DT-property is used to initialize the
> PCIe<->Sys memory mapping. (@William note the In-iATU setup doesn't
> affect the eDMA transfers.)
> 3. out-iATU is responsible for the CPU(s) to access the PCIe bus
> peripheral devices memory/cfg-space.
> 
> So eDMA and in-iATU are using the same AXI-master interface to access
> the system memory. Thus the DMAable memory capability is the same for
> both of them (Though in-iATU may have some specific mapping based on
> the "dma-ranges" DT-property setup). Neither DW eDMA nor DW PCIe Root
> Port CSRs region have any register to auto-detect the AXI-m interface
> address bus width. It's selected during the IP-core synthesize and is
> platform-specific. The question is: "What driver/code is supposed to
> set the DMA-mask of the DW PCIe platform device?" Seeing the parental
> platform device is used to perform the memory-mapping for both DW eDMA
> clients and PCIe-bus peripheral device drivers, and seeing the AXI-m
> interface parameters aren't auto-detectable and are platform-specific,
> the only place it should be done in is the DW PCIe low-level device
> driver. I don't really see any alternative... What is your opinion?
> 
> -Sergey

I believe this eDMA implementation is new for an upstream DW PCIe device
driver, right? If so, this will require some refactoring of the DMA mask code,
but you need to also make sure you don't break the MSI target address use case
that prompted this 32-bit DMA mask change -- [1]. My changes were directly
related to allowing the DW PCIe device driver to fallback to a 64-bit DMA mask
for the MSI target address if there are no 32-bit allocations available. For
that use-case, using a 32-bit mask doesn't have any perf impact here since
there is no actual DMAs happening.

Would it be possible for the DW PCIe device driver to set a capabilities flag
that the PCIe host controller can read and set the mask accordingly. This way
you don't need to go fix up any drivers that require a 32-bit DMA'able address
for the MSI target address. For example, I see several of the PCI capability
features have 64-bit flags, e.g. PCI_MSI_FLAGS_64BIT and PCI_X_STATUS_64BIT. If
not, then you're going to have to re-work the host controller driver and DW
PCIe device drivers that require a 32-bit MSI target address.

[1] https://lore.kernel.org/all/20201117165312.25847-1-vidyas@nvidia.com/

Thanks,
Will
Serge Semin Sept. 28, 2022, 10:36 a.m. UTC | #17
On Mon, Sep 26, 2022 at 11:08:18PM +0000, William McVicker wrote:
> On 09/26/2022, Serge Semin wrote:
> > On Mon, Sep 26, 2022 at 04:31:28PM +0200, Christoph Hellwig wrote:
> > > On Mon, Sep 26, 2022 at 03:49:24PM +0300, Serge Semin wrote:
> > > > @Christoph, @Marek, @Bjorn, @Rob could you please join to the
> > > > DMA-mask related discussion. @Lorenzo can't decide which driver should
> > > > initialize the device DMA-mask.
> > > 
> > 
> > > The driver that does the actual DMA mapping or allocation functions
> > > need to set it.  But even with your comments on the questions I'm
> > > still confused what struct device you are even talking about.  Can
> > > you explain this a bit better?
> > 
> > We are talking about the DW PCIe Root Port controller with DW eDMA engine
> > embedded. It' simplified structure can be represented as follows:
> > 
> >          +---------------+     +--------+
> >          | System memory |     | CPU(s) |
> >          +---------------+     +--------+
> >                 ^  |              |  ^
> >                 | ... System bus ... |
> >                ... |              | ...
> >                 |  v              v  |
> >  +------------+------+--------+----------+------+
> >  | DW PCIe RP | AXI-m|        | AXI-s/DBI|      |
> >  |            +------+        +----------+      |
> >  |                ^              ^     |        |
> >  |         +------+----+         |    CSRs      |
> >  |         v           v         v              |
> >  |     +-------+  +---------+ +----------+      |
> >  |     | eDMA  |  | in-iATU | | out-iATU |      |
> >  |     +-------+  +---------+ +----------+      |
> >  |         ^           ^           ^            |
> >  |         +--------+--+---+-------+            |
> >  +------------------| PIPE |--------------------+
> >                     +------+
> >                       | ^
> >                       v |
> >                    PCIe bus
> > 
> > The DW PCIe controller device is instantiated as a platform device
> > defined in the system DT source file. The device is probed by the
> > DW PCIe low-level driver, which after the platform-specific setups
> > initiates the generic DW PCIe host-controller registration. On the way
> > of that procedure the DW PCIe core tries to auto-detect the DW eDMA
> > engine availability. If the engine is found, the DW eDMA probe method
> > is called in order to register the DMA-engine device. After that the
> > PCIe host bridge is registered. Both the PCIe host-bridge and
> > DMA-engine devices will have the DW PCIe platform device as parent.
> > 
> > Getting back to the sketch above. Here is a short description of the
> > content:
> > 1. DW eDMA is capable of performing the data transfers from/to System
> > memory to/from PCIe bus memory.
> > 2. in-iATU is the Inbound Address Translation Unit, which is
> > responsible for the PCIe bus peripheral devices to access the system
> > memory. The "dma-ranges" DT-property is used to initialize the
> > PCIe<->Sys memory mapping. (@William note the In-iATU setup doesn't
> > affect the eDMA transfers.)
> > 3. out-iATU is responsible for the CPU(s) to access the PCIe bus
> > peripheral devices memory/cfg-space.
> > 
> > So eDMA and in-iATU are using the same AXI-master interface to access
> > the system memory. Thus the DMAable memory capability is the same for
> > both of them (Though in-iATU may have some specific mapping based on
> > the "dma-ranges" DT-property setup). Neither DW eDMA nor DW PCIe Root
> > Port CSRs region have any register to auto-detect the AXI-m interface
> > address bus width. It's selected during the IP-core synthesize and is
> > platform-specific. The question is: "What driver/code is supposed to
> > set the DMA-mask of the DW PCIe platform device?" Seeing the parental
> > platform device is used to perform the memory-mapping for both DW eDMA
> > clients and PCIe-bus peripheral device drivers, and seeing the AXI-m
> > interface parameters aren't auto-detectable and are platform-specific,
> > the only place it should be done in is the DW PCIe low-level device
> > driver. I don't really see any alternative... What is your opinion?
> > 
> > -Sergey
> 

> I believe this eDMA implementation is new for an upstream DW PCIe device
> driver, right? If so, this will require some refactoring of the DMA mask code,
> but you need to also make sure you don't break the MSI target address use case
> that prompted this 32-bit DMA mask change -- [1].

As far as I can see the commit
https://lore.kernel.org/all/20201117165312.25847-1-vidyas@nvidia.com/
isn't marked as fixes or whatever. If so it gets to be pointless due to this
https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L183
and this
https://elixir.bootlin.com/linux/latest/source/drivers/base/platform.c#L529
and seeing none of the DW PCIe RP/EP platform drivers change the
device DMA-mask of the being probed platform device. So the mask must
have been of 32-bits anyway even without that commit.

Moreover as Rob already told you here
https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
and I mentioned in my response here
https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
DW PCie MSI TLPs never reach the system memory. The TLP PCIe-bus target
address is checked in the host bridge. If it matches to the one
initialized in the iMSI-RX engine CSRs the MSI IRQ will be raised.
None system memory IO will be actually performed. Thus changing the
device DMA-capability due to something which actually doesn't cause
any DMA at the very least inappropriate.

The last but not least changing the DMA-mask in the common code which
isn't aware of the device/platform capability is also at the very least
inappropriate.

> My changes were directly
> related to allowing the DW PCIe device driver to fallback to a 64-bit DMA mask
> for the MSI target address if there are no 32-bit allocations available. For
> that use-case, using a 32-bit mask doesn't have any perf impact here since
> there is no actual DMAs happening.

Regarding your changes. I'll give you my comments in that thread, but
here is a short summary. One more time. There is no actually DMA
performed on MSI due to the way the iMSI-RX works. So setting the
device DMA-mask based on that is inappropriate. Secondly the coherent
memory might be very expensive on some platforms
(see Documentation/core-api/dma-api.rst). And it's on MIPS32 for
instance. Thus using dma_alloc_coherent()
for something other than for real DMA is also inappropriate. What
should have been done instead:
1. Drop any dma_set_mask*() invocations.
1. Preserve the alloc_page() method usage.
2. Pass GFP_DMA32 to the alloc_page() function only if
PCI_MSI_FLAGS_64BIT is set.

The suggestion above is the best choice seeing we can't reserve some
part of the PCI-bus memory without allocating the real system memory
behind as @Robin noted here in the last paragraph:
https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/

-Sergey

> 
> Would it be possible for the DW PCIe device driver to set a capabilities flag
> that the PCIe host controller can read and set the mask accordingly. This way
> you don't need to go fix up any drivers that require a 32-bit DMA'able address
> for the MSI target address. For example, I see several of the PCI capability
> features have 64-bit flags, e.g. PCI_MSI_FLAGS_64BIT and PCI_X_STATUS_64BIT. If
> not, then you're going to have to re-work the host controller driver and DW
> PCIe device drivers that require a 32-bit MSI target address.
> 
> [1] https://lore.kernel.org/all/20201117165312.25847-1-vidyas@nvidia.com/
> 
> Thanks,
> Will
>
William McVicker Sept. 28, 2022, 5:59 p.m. UTC | #18
On 09/28/2022, Serge Semin wrote:
> On Mon, Sep 26, 2022 at 11:08:18PM +0000, William McVicker wrote:
> > On 09/26/2022, Serge Semin wrote:
> > > On Mon, Sep 26, 2022 at 04:31:28PM +0200, Christoph Hellwig wrote:
> > > > On Mon, Sep 26, 2022 at 03:49:24PM +0300, Serge Semin wrote:
> > > > > @Christoph, @Marek, @Bjorn, @Rob could you please join to the
> > > > > DMA-mask related discussion. @Lorenzo can't decide which driver should
> > > > > initialize the device DMA-mask.
> > > > 
> > > 
> > > > The driver that does the actual DMA mapping or allocation functions
> > > > need to set it.  But even with your comments on the questions I'm
> > > > still confused what struct device you are even talking about.  Can
> > > > you explain this a bit better?
> > > 
> > > We are talking about the DW PCIe Root Port controller with DW eDMA engine
> > > embedded. It' simplified structure can be represented as follows:
> > > 
> > >          +---------------+     +--------+
> > >          | System memory |     | CPU(s) |
> > >          +---------------+     +--------+
> > >                 ^  |              |  ^
> > >                 | ... System bus ... |
> > >                ... |              | ...
> > >                 |  v              v  |
> > >  +------------+------+--------+----------+------+
> > >  | DW PCIe RP | AXI-m|        | AXI-s/DBI|      |
> > >  |            +------+        +----------+      |
> > >  |                ^              ^     |        |
> > >  |         +------+----+         |    CSRs      |
> > >  |         v           v         v              |
> > >  |     +-------+  +---------+ +----------+      |
> > >  |     | eDMA  |  | in-iATU | | out-iATU |      |
> > >  |     +-------+  +---------+ +----------+      |
> > >  |         ^           ^           ^            |
> > >  |         +--------+--+---+-------+            |
> > >  +------------------| PIPE |--------------------+
> > >                     +------+
> > >                       | ^
> > >                       v |
> > >                    PCIe bus
> > > 
> > > The DW PCIe controller device is instantiated as a platform device
> > > defined in the system DT source file. The device is probed by the
> > > DW PCIe low-level driver, which after the platform-specific setups
> > > initiates the generic DW PCIe host-controller registration. On the way
> > > of that procedure the DW PCIe core tries to auto-detect the DW eDMA
> > > engine availability. If the engine is found, the DW eDMA probe method
> > > is called in order to register the DMA-engine device. After that the
> > > PCIe host bridge is registered. Both the PCIe host-bridge and
> > > DMA-engine devices will have the DW PCIe platform device as parent.
> > > 
> > > Getting back to the sketch above. Here is a short description of the
> > > content:
> > > 1. DW eDMA is capable of performing the data transfers from/to System
> > > memory to/from PCIe bus memory.
> > > 2. in-iATU is the Inbound Address Translation Unit, which is
> > > responsible for the PCIe bus peripheral devices to access the system
> > > memory. The "dma-ranges" DT-property is used to initialize the
> > > PCIe<->Sys memory mapping. (@William note the In-iATU setup doesn't
> > > affect the eDMA transfers.)
> > > 3. out-iATU is responsible for the CPU(s) to access the PCIe bus
> > > peripheral devices memory/cfg-space.
> > > 
> > > So eDMA and in-iATU are using the same AXI-master interface to access
> > > the system memory. Thus the DMAable memory capability is the same for
> > > both of them (Though in-iATU may have some specific mapping based on
> > > the "dma-ranges" DT-property setup). Neither DW eDMA nor DW PCIe Root
> > > Port CSRs region have any register to auto-detect the AXI-m interface
> > > address bus width. It's selected during the IP-core synthesize and is
> > > platform-specific. The question is: "What driver/code is supposed to
> > > set the DMA-mask of the DW PCIe platform device?" Seeing the parental
> > > platform device is used to perform the memory-mapping for both DW eDMA
> > > clients and PCIe-bus peripheral device drivers, and seeing the AXI-m
> > > interface parameters aren't auto-detectable and are platform-specific,
> > > the only place it should be done in is the DW PCIe low-level device
> > > driver. I don't really see any alternative... What is your opinion?
> > > 
> > > -Sergey
> > 
> 
> > I believe this eDMA implementation is new for an upstream DW PCIe device
> > driver, right? If so, this will require some refactoring of the DMA mask code,
> > but you need to also make sure you don't break the MSI target address use case
> > that prompted this 32-bit DMA mask change -- [1].
> 
> As far as I can see the commit
> https://lore.kernel.org/all/20201117165312.25847-1-vidyas@nvidia.com/
> isn't marked as fixes or whatever. If so it gets to be pointless due to this
> https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L183
> and this
> https://elixir.bootlin.com/linux/latest/source/drivers/base/platform.c#L529
> and seeing none of the DW PCIe RP/EP platform drivers change the
> device DMA-mask of the being probed platform device. So the mask must
> have been of 32-bits anyway even without that commit.
> 
> Moreover as Rob already told you here
> https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> and I mentioned in my response here
> https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> DW PCie MSI TLPs never reach the system memory. The TLP PCIe-bus target
> address is checked in the host bridge. If it matches to the one
> initialized in the iMSI-RX engine CSRs the MSI IRQ will be raised.
> None system memory IO will be actually performed. Thus changing the
> device DMA-capability due to something which actually doesn't cause
> any DMA at the very least inappropriate.

Thanks for pointing out the DMA mask references during platform device
allocation. I wasn't aware of that. However, I still have issues with using
ZONE_DMA32. See comments on how we can address this here:
https://lore.kernel.org/linux-pci/YzSJ2ioEeRhHC6zn@google.com/

> 
> The last but not least changing the DMA-mask in the common code which
> isn't aware of the device/platform capability is also at the very least
> inappropriate.
> 
> > My changes were directly
> > related to allowing the DW PCIe device driver to fallback to a 64-bit DMA mask
> > for the MSI target address if there are no 32-bit allocations available. For
> > that use-case, using a 32-bit mask doesn't have any perf impact here since
> > there is no actual DMAs happening.
> 
> Regarding your changes. I'll give you my comments in that thread, but
> here is a short summary. One more time. There is no actually DMA
> performed on MSI due to the way the iMSI-RX works. So setting the
> device DMA-mask based on that is inappropriate. Secondly the coherent
> memory might be very expensive on some platforms
> (see Documentation/core-api/dma-api.rst). And it's on MIPS32 for
> instance. Thus using dma_alloc_coherent()
> for something other than for real DMA is also inappropriate. What
> should have been done instead:
> 1. Drop any dma_set_mask*() invocations.

I'm fine with this, but others will need to approve of that.

> 1. Preserve the alloc_page() method usage.
> 2. Pass GFP_DMA32 to the alloc_page() function only if
> PCI_MSI_FLAGS_64BIT is set.
> 
> The suggestion above is the best choice seeing we can't reserve some
> part of the PCI-bus memory without allocating the real system memory
> behind as @Robin noted here in the last paragraph:
> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/

I'm not okay with this as it re-introduces the dependency on ZONE_DMA32.
I responded with more details here with regards to why and how we can work
around the ARCH issues with dma_alloc_coherent():
https://lore.kernel.org/linux-pci/YzSJ2ioEeRhHC6zn@google.com/

Thanks,
Will

> 
> -Sergey
> 
> > 
> > Would it be possible for the DW PCIe device driver to set a capabilities flag
> > that the PCIe host controller can read and set the mask accordingly. This way
> > you don't need to go fix up any drivers that require a 32-bit DMA'able address
> > for the MSI target address. For example, I see several of the PCI capability
> > features have 64-bit flags, e.g. PCI_MSI_FLAGS_64BIT and PCI_X_STATUS_64BIT. If
> > not, then you're going to have to re-work the host controller driver and DW
> > PCIe device drivers that require a 32-bit MSI target address.
> > 
> > [1] https://lore.kernel.org/all/20201117165312.25847-1-vidyas@nvidia.com/
> > 
> > Thanks,
> > Will
> >
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index 62ce3abf0f19..771b8b146623 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -222,6 +222,15 @@  config PCIE_ARTPEC6_EP
 	  Enables support for the PCIe controller in the ARTPEC-6 SoC to work in
 	  endpoint mode. This uses the DesignWare core.
 
+config PCIE_BT1
+	tristate "Baikal-T1 PCIe controller"
+	depends on MIPS_BAIKAL_T1 || COMPILE_TEST
+	depends on PCI_MSI_IRQ_DOMAIN
+	select PCIE_DW_HOST
+	help
+	  Enables support for the PCIe controller in the Baikal-T1 SoC to work
+	  in host mode. It's based on the Synopsys DWC PCIe v4.60a IP-core.
+
 config PCIE_ROCKCHIP_DW_HOST
 	bool "Rockchip DesignWare PCIe controller"
 	select PCIE_DW
diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile
index 8ba7b67f5e50..bf5c311875a1 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -3,6 +3,7 @@  obj-$(CONFIG_PCIE_DW) += pcie-designware.o
 obj-$(CONFIG_PCIE_DW_HOST) += pcie-designware-host.o
 obj-$(CONFIG_PCIE_DW_EP) += pcie-designware-ep.o
 obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
+obj-$(CONFIG_PCIE_BT1) += pcie-bt1.o
 obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
 obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
 obj-$(CONFIG_PCIE_FU740) += pcie-fu740.o
diff --git a/drivers/pci/controller/dwc/pcie-bt1.c b/drivers/pci/controller/dwc/pcie-bt1.c
new file mode 100644
index 000000000000..86b230575ddc
--- /dev/null
+++ b/drivers/pci/controller/dwc/pcie-bt1.c
@@ -0,0 +1,653 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 BAIKAL ELECTRONICS, JSC
+ *
+ * Authors:
+ *   Vadim Vlasov <Vadim.Vlasov@baikalelectronics.ru>
+ *   Serge Semin <Sergey.Semin@baikalelectronics.ru>
+ *
+ * Baikal-T1 PCIe controller driver
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/gpio/consumer.h>
+#include <linux/kernel.h>
+#include <linux/mfd/syscon.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/reset.h>
+#include <linux/types.h>
+
+#include "pcie-designware.h"
+
+/* Baikal-T1 System CCU control registers */
+#define BT1_CCU_PCIE_CLKC			0x140
+#define BT1_CCU_PCIE_REQ_PCS_CLK		BIT(16)
+#define BT1_CCU_PCIE_REQ_MAC_CLK		BIT(17)
+#define BT1_CCU_PCIE_REQ_PIPE_CLK		BIT(18)
+
+#define BT1_CCU_PCIE_RSTC			0x144
+#define BT1_CCU_PCIE_REQ_LINK_RST		BIT(13)
+#define BT1_CCU_PCIE_REQ_SMLH_RST		BIT(14)
+#define BT1_CCU_PCIE_REQ_PHY_RST		BIT(16)
+#define BT1_CCU_PCIE_REQ_CORE_RST		BIT(24)
+#define BT1_CCU_PCIE_REQ_STICKY_RST		BIT(26)
+#define BT1_CCU_PCIE_REQ_NSTICKY_RST		BIT(27)
+
+#define BT1_CCU_PCIE_PMSC			0x148
+#define BT1_CCU_PCIE_LTSSM_STATE_MASK		GENMASK(5, 0)
+#define BT1_CCU_PCIE_LTSSM_DET_QUIET		0x00
+#define BT1_CCU_PCIE_LTSSM_DET_ACT		0x01
+#define BT1_CCU_PCIE_LTSSM_POLL_ACT		0x02
+#define BT1_CCU_PCIE_LTSSM_POLL_COMP		0x03
+#define BT1_CCU_PCIE_LTSSM_POLL_CONF		0x04
+#define BT1_CCU_PCIE_LTSSM_PRE_DET_QUIET	0x05
+#define BT1_CCU_PCIE_LTSSM_DET_WAIT		0x06
+#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_START	0x07
+#define BT1_CCU_PCIE_LTSSM_CFG_LNKWD_ACEPT	0x08
+#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_WAIT	0x09
+#define BT1_CCU_PCIE_LTSSM_CFG_LNNUM_ACEPT	0x0a
+#define BT1_CCU_PCIE_LTSSM_CFG_COMPLETE		0x0b
+#define BT1_CCU_PCIE_LTSSM_CFG_IDLE		0x0c
+#define BT1_CCU_PCIE_LTSSM_RCVR_LOCK		0x0d
+#define BT1_CCU_PCIE_LTSSM_RCVR_SPEED		0x0e
+#define BT1_CCU_PCIE_LTSSM_RCVR_RCVRCFG		0x0f
+#define BT1_CCU_PCIE_LTSSM_RCVR_IDLE		0x10
+#define BT1_CCU_PCIE_LTSSM_L0			0x11
+#define BT1_CCU_PCIE_LTSSM_L0S			0x12
+#define BT1_CCU_PCIE_LTSSM_L123_SEND_IDLE	0x13
+#define BT1_CCU_PCIE_LTSSM_L1_IDLE		0x14
+#define BT1_CCU_PCIE_LTSSM_L2_IDLE		0x15
+#define BT1_CCU_PCIE_LTSSM_L2_WAKE		0x16
+#define BT1_CCU_PCIE_LTSSM_DIS_ENTRY		0x17
+#define BT1_CCU_PCIE_LTSSM_DIS_IDLE		0x18
+#define BT1_CCU_PCIE_LTSSM_DISABLE		0x19
+#define BT1_CCU_PCIE_LTSSM_LPBK_ENTRY		0x1a
+#define BT1_CCU_PCIE_LTSSM_LPBK_ACTIVE		0x1b
+#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT		0x1c
+#define BT1_CCU_PCIE_LTSSM_LPBK_EXIT_TOUT	0x1d
+#define BT1_CCU_PCIE_LTSSM_HOT_RST_ENTRY	0x1e
+#define BT1_CCU_PCIE_LTSSM_HOT_RST		0x1f
+#define BT1_CCU_PCIE_LTSSM_RCVR_EQ0		0x20
+#define BT1_CCU_PCIE_LTSSM_RCVR_EQ1		0x21
+#define BT1_CCU_PCIE_LTSSM_RCVR_EQ2		0x22
+#define BT1_CCU_PCIE_LTSSM_RCVR_EQ3		0x23
+#define BT1_CCU_PCIE_SMLH_LINKUP		BIT(6)
+#define BT1_CCU_PCIE_RDLH_LINKUP		BIT(7)
+#define BT1_CCU_PCIE_PM_LINKSTATE_L0S		BIT(8)
+#define BT1_CCU_PCIE_PM_LINKSTATE_L1		BIT(9)
+#define BT1_CCU_PCIE_PM_LINKSTATE_L2		BIT(10)
+#define BT1_CCU_PCIE_L1_PENDING			BIT(12)
+#define BT1_CCU_PCIE_REQ_EXIT_L1		BIT(14)
+#define BT1_CCU_PCIE_LTSSM_RCVR_EQ		BIT(15)
+#define BT1_CCU_PCIE_PM_DSTAT_MASK		GENMASK(18, 16)
+#define BT1_CCU_PCIE_PM_PME_EN			BIT(20)
+#define BT1_CCU_PCIE_PM_PME_STATUS		BIT(21)
+#define BT1_CCU_PCIE_AUX_PM_EN			BIT(22)
+#define BT1_CCU_PCIE_AUX_PWR_DET		BIT(23)
+#define BT1_CCU_PCIE_WAKE_DET			BIT(24)
+#define BT1_CCU_PCIE_TURNOFF_REQ		BIT(30)
+#define BT1_CCU_PCIE_TURNOFF_ACK		BIT(31)
+
+#define BT1_CCU_PCIE_GENC			0x14c
+#define BT1_CCU_PCIE_LTSSM_EN			BIT(1)
+#define BT1_CCU_PCIE_DBI2_MODE			BIT(2)
+#define BT1_CCU_PCIE_MGMT_EN			BIT(3)
+#define BT1_CCU_PCIE_RXLANE_FLIP_EN		BIT(16)
+#define BT1_CCU_PCIE_TXLANE_FLIP_EN		BIT(17)
+#define BT1_CCU_PCIE_SLV_XFER_PEND		BIT(24)
+#define BT1_CCU_PCIE_RCV_XFER_PEND		BIT(25)
+#define BT1_CCU_PCIE_DBI_XFER_PEND		BIT(26)
+#define BT1_CCU_PCIE_DMA_XFER_PEND		BIT(27)
+
+#define BT1_CCU_PCIE_LTSSM_LINKUP(_pmsc) \
+({ \
+	int __state = FIELD_GET(BT1_CCU_PCIE_LTSSM_STATE_MASK, _pmsc); \
+	__state >= BT1_CCU_PCIE_LTSSM_L0 && __state <= BT1_CCU_PCIE_LTSSM_L2_WAKE; \
+})
+
+/* Baikal-T1 PCIe specific control registers */
+#define BT1_PCIE_AXI2MGM_LANENUM		0xd04
+#define BT1_PCIE_AXI2MGM_LANESEL_MASK		GENMASK(3, 0)
+
+#define BT1_PCIE_AXI2MGM_ADDRCTL		0xd08
+#define BT1_PCIE_AXI2MGM_PHYREG_ADDR_MASK	GENMASK(20, 0)
+#define BT1_PCIE_AXI2MGM_READ_FLAG		BIT(29)
+#define BT1_PCIE_AXI2MGM_DONE			BIT(30)
+#define BT1_PCIE_AXI2MGM_BUSY			BIT(31)
+
+#define BT1_PCIE_AXI2MGM_WRITEDATA		0xd0c
+#define BT1_PCIE_AXI2MGM_WDATA			GENMASK(15, 0)
+
+#define BT1_PCIE_AXI2MGM_READDATA		0xd10
+#define BT1_PCIE_AXI2MGM_RDATA			GENMASK(15, 0)
+
+/* Generic Baikal-T1 PCIe interface resources */
+#define BT1_PCIE_NUM_APP_CLKS			ARRAY_SIZE(bt1_pcie_app_clks)
+#define BT1_PCIE_NUM_CORE_CLKS			ARRAY_SIZE(bt1_pcie_core_clks)
+#define BT1_PCIE_NUM_APP_RSTS			ARRAY_SIZE(bt1_pcie_app_rsts)
+#define BT1_PCIE_NUM_CORE_RSTS			ARRAY_SIZE(bt1_pcie_core_rsts)
+
+/* PCIe bus setup delays and timeouts */
+#define BT1_PCIE_RST_DELAY_MS			100
+#define BT1_PCIE_RUN_DELAY_US			100
+#define BT1_PCIE_REQ_DELAY_US			1
+#define BT1_PCIE_REQ_TIMEOUT_US			1000
+#define BT1_PCIE_LNK_DELAY_US			1000
+#define BT1_PCIE_LNK_TIMEOUT_US			1000000
+
+static const enum dw_pcie_app_clk bt1_pcie_app_clks[] = {
+	DW_PCIE_DBI_CLK, DW_PCIE_MSTR_CLK, DW_PCIE_SLV_CLK,
+};
+
+static const enum dw_pcie_core_clk bt1_pcie_core_clks[] = {
+	DW_PCIE_REF_CLK,
+};
+
+static const enum dw_pcie_app_rst bt1_pcie_app_rsts[] = {
+	DW_PCIE_MSTR_RST, DW_PCIE_SLV_RST,
+};
+
+static const enum dw_pcie_core_rst bt1_pcie_core_rsts[] = {
+	DW_PCIE_NON_STICKY_RST, DW_PCIE_STICKY_RST, DW_PCIE_CORE_RST,
+	DW_PCIE_PIPE_RST, DW_PCIE_PHY_RST, DW_PCIE_HOT_RST, DW_PCIE_PWR_RST,
+};
+
+struct bt1_pcie {
+	struct dw_pcie dw;
+	struct platform_device *pdev;
+	struct regmap *sys_regs;
+};
+#define to_bt1_pcie(_dw) container_of(_dw, struct bt1_pcie, dw)
+
+/*
+ * Baikal-T1 MMIO space must be read/written by the dword-aligned
+ * instructions. Note the methods are optimized to have the dword operations
+ * performed with minimum overhead as the most frequently used ones.
+ */
+static int bt1_pcie_read_mmio(void __iomem *addr, int size, u32 *val)
+{
+	unsigned int ofs = (uintptr_t)addr & 0x3;
+
+	if (!IS_ALIGNED((uintptr_t)addr, size))
+		return -EINVAL;
+
+	*val = readl(addr - ofs) >> ofs * BITS_PER_BYTE;
+	if (size == 4) {
+		return 0;
+	} else if (size == 2) {
+		*val &= 0xffff;
+		return 0;
+	} else if (size == 1) {
+		*val &= 0xff;
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int bt1_pcie_write_mmio(void __iomem *addr, int size, u32 val)
+{
+	unsigned int ofs = (uintptr_t)addr & 0x3;
+	u32 tmp, mask;
+
+	if (!IS_ALIGNED((uintptr_t)addr, size))
+		return -EINVAL;
+
+	if (size == 4) {
+		writel(val, addr);
+		return 0;
+	} else if (size == 2 || size == 1) {
+		mask = GENMASK(size * BITS_PER_BYTE - 1, 0);
+		tmp = readl(addr - ofs) & ~(mask << ofs * BITS_PER_BYTE);
+		tmp |= (val & mask) << ofs * BITS_PER_BYTE;
+		writel(tmp, addr - ofs);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static u32 bt1_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
+			     size_t size)
+{
+	int ret;
+	u32 val;
+
+	ret = bt1_pcie_read_mmio(base + reg, size, &val);
+	if (ret) {
+		dev_err(pci->dev, "Read DBI address failed\n");
+		return ~0U;
+	}
+
+	return val;
+}
+
+static void bt1_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
+			       size_t size, u32 val)
+{
+	int ret;
+
+	ret = bt1_pcie_write_mmio(base + reg, size, val);
+	if (ret)
+		dev_err(pci->dev, "Write DBI address failed\n");
+}
+
+static void bt1_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
+				size_t size, u32 val)
+{
+	struct bt1_pcie *btpci = to_bt1_pcie(pci);
+	int ret;
+
+	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
+			   BT1_CCU_PCIE_DBI2_MODE, BT1_CCU_PCIE_DBI2_MODE);
+
+	ret = bt1_pcie_write_mmio(base + reg, size, val);
+	if (ret)
+		dev_err(pci->dev, "Write DBI2 address failed\n");
+
+	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
+			   BT1_CCU_PCIE_DBI2_MODE, 0);
+}
+
+static int bt1_pcie_start_link(struct dw_pcie *pci)
+{
+	struct bt1_pcie *btpci = to_bt1_pcie(pci);
+	u32 val;
+	int ret;
+
+	/*
+	 * Enable LTSSM and make sure it was able to establish both PHY and
+	 * data links. This procedure shall work fine to reach 2.5 GT/s speed.
+	 */
+	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
+			   BT1_CCU_PCIE_LTSSM_EN, BT1_CCU_PCIE_LTSSM_EN);
+
+	ret = regmap_read_poll_timeout(btpci->sys_regs, BT1_CCU_PCIE_PMSC, val,
+				       (val & BT1_CCU_PCIE_SMLH_LINKUP),
+				       BT1_PCIE_LNK_DELAY_US, BT1_PCIE_LNK_TIMEOUT_US);
+	if (ret) {
+		dev_err(pci->dev, "LTSSM failed to set PHY link up\n");
+		return ret;
+	}
+
+	ret = regmap_read_poll_timeout(btpci->sys_regs, BT1_CCU_PCIE_PMSC, val,
+				       (val & BT1_CCU_PCIE_RDLH_LINKUP),
+				       BT1_PCIE_LNK_DELAY_US, BT1_PCIE_LNK_TIMEOUT_US);
+	if (ret) {
+		dev_err(pci->dev, "LTSSM failed to set data link up\n");
+		return ret;
+	}
+
+	/*
+	 * Activate direct speed change after the link is established in an
+	 * attempt to reach a higher bus performance (up to Gen.3 - 8.0 GT/s).
+	 * This is required at least to get 8.0 GT/s speed.
+	 */
+	val = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
+	val |= PORT_LOGIC_SPEED_CHANGE;
+	dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
+
+	ret = regmap_read_poll_timeout(btpci->sys_regs, BT1_CCU_PCIE_PMSC, val,
+				       BT1_CCU_PCIE_LTSSM_LINKUP(val),
+				       BT1_PCIE_LNK_DELAY_US, BT1_PCIE_LNK_TIMEOUT_US);
+	if (ret)
+		dev_err(pci->dev, "LTSSM failed to get into L0 state\n");
+
+	return ret;
+}
+
+static void bt1_pcie_stop_link(struct dw_pcie *pci)
+{
+	struct bt1_pcie *btpci = to_bt1_pcie(pci);
+
+	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
+			   BT1_CCU_PCIE_LTSSM_EN, 0);
+}
+
+static const struct dw_pcie_ops bt1_pcie_ops = {
+	.read_dbi = bt1_pcie_read_dbi,
+	.write_dbi = bt1_pcie_write_dbi,
+	.write_dbi2 = bt1_pcie_write_dbi2,
+	.start_link = bt1_pcie_start_link,
+	.stop_link = bt1_pcie_stop_link,
+};
+
+static struct pci_ops bt1_pci_ops = {
+	.map_bus = dw_pcie_own_conf_map_bus,
+	.read = pci_generic_config_read32,
+	.write = pci_generic_config_write32,
+};
+
+static int bt1_pcie_get_resources(struct bt1_pcie *btpci)
+{
+	struct device *dev = btpci->dw.dev;
+	int i;
+
+	/* DBI access is supposed to be performed by the dword-aligned IOs */
+	btpci->dw.pp.bridge->ops = &bt1_pci_ops;
+
+	/* These CSRs are in MMIO so we won't check the regmap-methods status */
+	btpci->sys_regs =
+		syscon_regmap_lookup_by_phandle(dev->of_node, "baikal,bt1-syscon");
+	if (IS_ERR(btpci->sys_regs))
+		return dev_err_probe(dev, PTR_ERR(btpci->sys_regs),
+				     "Failed to get syscon\n");
+
+	/* Make sure all the required resources have been specified */
+	for (i = 0; i < BT1_PCIE_NUM_APP_CLKS; i++) {
+		if (!btpci->dw.app_clks[bt1_pcie_app_clks[i]].clk) {
+			dev_err(dev, "App clocks set is incomplete\n");
+			return -ENOENT;
+		}
+	}
+
+	for (i = 0; i < BT1_PCIE_NUM_CORE_CLKS; i++) {
+		if (!btpci->dw.core_clks[bt1_pcie_core_clks[i]].clk) {
+			dev_err(dev, "Core clocks set is incomplete\n");
+			return -ENOENT;
+		}
+	}
+
+	for (i = 0; i < BT1_PCIE_NUM_APP_RSTS; i++) {
+		if (!btpci->dw.app_rsts[bt1_pcie_app_rsts[i]].rstc) {
+			dev_err(dev, "App resets set is incomplete\n");
+			return -ENOENT;
+		}
+	}
+
+	for (i = 0; i < BT1_PCIE_NUM_CORE_RSTS; i++) {
+		if (!btpci->dw.core_rsts[bt1_pcie_core_rsts[i]].rstc) {
+			dev_err(dev, "Core resets set is incomplete\n");
+			return -ENOENT;
+		}
+	}
+
+	return 0;
+}
+
+static void bt1_pcie_full_stop_bus(struct bt1_pcie *btpci, bool init)
+{
+	struct device *dev = btpci->dw.dev;
+	struct dw_pcie *pci = &btpci->dw;
+	int ret;
+
+	/* Disable LTSSM for sure */
+	regmap_update_bits(btpci->sys_regs, BT1_CCU_PCIE_GENC,
+			   BT1_CCU_PCIE_LTSSM_EN, 0);
+
+	/*
+	 * Application reset controls are trigger-based so assert the core
+	 * resets only.
+	 */
+	ret = reset_control_bulk_assert(DW_PCIE_NUM_CORE_RSTS, pci->core_rsts);
+	if (ret)
+		dev_err(dev, "Failed to assert core resets\n");
+
+	/*
+	 * Clocks are disabled by default at least in accordance with the clk
+	 * enable counter value on init stage.
+	 */
+	if (!init) {
+		clk_bulk_disable_unprepare(DW_PCIE_NUM_CORE_CLKS, pci->core_clks);
+
+		clk_bulk_disable_unprepare(DW_PCIE_NUM_APP_CLKS, pci->app_clks);
+	}
+
+	/* The peripheral devices are unavailable anyway so reset them too */
+	gpiod_set_value_cansleep(pci->pe_rst, 1);
+
+	/* Make sure all the resets are settled */
+	msleep(BT1_PCIE_RST_DELAY_MS);
+}
+
+/*
+ * Implements the cold reset procedure in accordance with the reference manual
+ * and available PM signals.
+ */
+static int bt1_pcie_cold_start_bus(struct bt1_pcie *btpci)
+{
+	struct device *dev = btpci->dw.dev;
+	struct dw_pcie *pci = &btpci->dw;
+	u32 val;
+	int ret;
+
+	/* First get out of the Power/Hot reset state */
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_PWR_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert PHY reset\n");
+		return ret;
+	}
+
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_HOT_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert hot reset\n");
+		goto err_assert_pwr_rst;
+	}
+
+	/* Wait for the PM-core to stop requesting the PHY reset */
+	ret = regmap_read_poll_timeout(btpci->sys_regs, BT1_CCU_PCIE_RSTC, val,
+				       !(val & BT1_CCU_PCIE_REQ_PHY_RST),
+				       BT1_PCIE_REQ_DELAY_US, BT1_PCIE_REQ_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Timed out waiting for PM to stop PHY resetting\n");
+		goto err_assert_hot_rst;
+	}
+
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_PHY_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert PHY reset\n");
+		goto err_assert_hot_rst;
+	}
+
+	/* Clocks can be now enabled, but the ref one is crucial at this stage */
+	ret = clk_bulk_prepare_enable(DW_PCIE_NUM_APP_CLKS, pci->app_clks);
+	if (ret) {
+		dev_err(dev, "Failed to enable app clocks\n");
+		goto err_assert_phy_rst;
+	}
+
+	ret = clk_bulk_prepare_enable(DW_PCIE_NUM_CORE_CLKS, pci->core_clks);
+	if (ret) {
+		dev_err(dev, "Failed to enable ref clocks\n");
+		goto err_disable_app_clk;
+	}
+
+	/* Wait for the PM to stop requesting the controller core reset */
+	ret = regmap_read_poll_timeout(btpci->sys_regs, BT1_CCU_PCIE_RSTC, val,
+				       !(val & BT1_CCU_PCIE_REQ_CORE_RST),
+				       BT1_PCIE_REQ_DELAY_US, BT1_PCIE_REQ_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Timed out waiting for PM to stop core resetting\n");
+		goto err_disable_core_clk;
+	}
+
+	/* PCS-PIPE interface and controller core can be now activated */
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_PIPE_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert PIPE reset\n");
+		goto err_disable_core_clk;
+	}
+
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_CORE_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert core reset\n");
+		goto err_assert_pipe_rst;
+	}
+
+	/* It's recommended to reset the core and application logic together */
+	ret = reset_control_bulk_reset(DW_PCIE_NUM_APP_RSTS, pci->app_rsts);
+	if (ret) {
+		dev_err(dev, "Failed to reset app domain\n");
+		goto err_assert_core_rst;
+	}
+
+	/* Sticky/Non-sticky CSR flags can be now unreset too */
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_STICKY_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert sticky reset\n");
+		goto err_assert_core_rst;
+	}
+
+	ret = reset_control_deassert(pci->core_rsts[DW_PCIE_NON_STICKY_RST].rstc);
+	if (ret) {
+		dev_err(dev, "Failed to deassert non-sticky reset\n");
+		goto err_assert_sticky_rst;
+	}
+
+	/* Activate the PCIe bus peripheral devices */
+	gpiod_set_value_cansleep(pci->pe_rst, 0);
+
+	/* Make sure the state is settled (LTSSM is still disabled though) */
+	usleep_range(BT1_PCIE_RUN_DELAY_US, BT1_PCIE_RUN_DELAY_US + 100);
+
+	return 0;
+
+err_assert_sticky_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_STICKY_RST].rstc);
+
+err_assert_core_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_CORE_RST].rstc);
+
+err_assert_pipe_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_PIPE_RST].rstc);
+
+err_disable_core_clk:
+	clk_bulk_disable_unprepare(DW_PCIE_NUM_CORE_CLKS, pci->core_clks);
+
+err_disable_app_clk:
+	clk_bulk_disable_unprepare(DW_PCIE_NUM_APP_CLKS, pci->app_clks);
+
+err_assert_phy_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_PHY_RST].rstc);
+
+err_assert_hot_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_HOT_RST].rstc);
+
+err_assert_pwr_rst:
+	reset_control_assert(pci->core_rsts[DW_PCIE_PWR_RST].rstc);
+
+	return ret;
+}
+
+static int bt1_pcie_host_init(struct dw_pcie_rp *pp)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct bt1_pcie *btpci = to_bt1_pcie(pci);
+	int ret;
+
+	ret = bt1_pcie_get_resources(btpci);
+	if (ret)
+		return ret;
+
+	bt1_pcie_full_stop_bus(btpci, true);
+
+	return bt1_pcie_cold_start_bus(btpci);
+}
+
+static void bt1_pcie_host_deinit(struct dw_pcie_rp *pp)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct bt1_pcie *btpci = to_bt1_pcie(pci);
+
+	bt1_pcie_full_stop_bus(btpci, false);
+}
+
+static const struct dw_pcie_host_ops bt1_pcie_host_ops = {
+	.host_init = bt1_pcie_host_init,
+	.host_deinit = bt1_pcie_host_deinit,
+};
+
+static struct bt1_pcie *bt1_pcie_create_data(struct platform_device *pdev)
+{
+	struct bt1_pcie *btpci;
+
+	btpci = devm_kzalloc(&pdev->dev, sizeof(*btpci), GFP_KERNEL);
+	if (!btpci)
+		return ERR_PTR(-ENOMEM);
+
+	btpci->pdev = pdev;
+
+	platform_set_drvdata(pdev, btpci);
+
+	return btpci;
+}
+
+static int bt1_pcie_add_port(struct bt1_pcie *btpci)
+{
+	struct device *dev = &btpci->pdev->dev;
+	int ret;
+
+	/*
+	 * DW PCIe Root Port controller is equipped with eDMA capable of
+	 * working with the 64-bit memory addresses.
+	 */
+	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+	if (ret)
+		return ret;
+
+	btpci->dw.version = DW_PCIE_VER_460A;
+	btpci->dw.dev = dev;
+	btpci->dw.ops = &bt1_pcie_ops;
+
+	btpci->dw.pp.num_vectors = MAX_MSI_IRQS;
+	btpci->dw.pp.ops = &bt1_pcie_host_ops;
+
+	dw_pcie_cap_set(&btpci->dw, REQ_RES);
+
+	ret = dw_pcie_host_init(&btpci->dw.pp);
+	if (ret)
+		dev_err_probe(dev, ret, "Failed to initialize DWC PCIe host\n");
+
+	return ret;
+}
+
+static void bt1_pcie_del_port(struct bt1_pcie *btpci)
+{
+	dw_pcie_host_deinit(&btpci->dw.pp);
+}
+
+static int bt1_pcie_probe(struct platform_device *pdev)
+{
+	struct bt1_pcie *btpci;
+
+	btpci = bt1_pcie_create_data(pdev);
+	if (IS_ERR(btpci))
+		return PTR_ERR(btpci);
+
+	return bt1_pcie_add_port(btpci);
+}
+
+static int bt1_pcie_remove(struct platform_device *pdev)
+{
+	struct bt1_pcie *btpci = platform_get_drvdata(pdev);
+
+	bt1_pcie_del_port(btpci);
+
+	return 0;
+}
+
+static const struct of_device_id bt1_pcie_of_match[] = {
+	{ .compatible = "baikal,bt1-pcie" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, bt1_pcie_of_match);
+
+static struct platform_driver bt1_pcie_driver = {
+	.probe = bt1_pcie_probe,
+	.remove = bt1_pcie_remove,
+	.driver = {
+		.name	= "bt1-pcie",
+		.of_match_table = bt1_pcie_of_match,
+	},
+};
+module_platform_driver(bt1_pcie_driver);
+
+MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>");
+MODULE_DESCRIPTION("Baikal-T1 PCIe driver");
+MODULE_LICENSE("GPL");