diff mbox series

[RFC,v2,2/2] soc: renesas: Add L2 cache management for RZ/Five SoC

Message ID 20221003223222.448551-3-prabhakar.mahadev-lad.rj@bp.renesas.com (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show
Series AX45MP: Add support to non-coherent DMA | expand

Commit Message

Lad, Prabhakar Oct. 3, 2022, 10:32 p.m. UTC
From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>

On the AX45MP core, cache coherency is a specification option so it may
not be supported. In this case DMA will fail. As a workaround, firstly we
allocate a global dma coherent pool from which DMA allocations are taken
and marked as non-cacheable + bufferable using the PMA region as specified
in the device tree. Synchronization callbacks are implemented to
synchronize when doing DMA transactions.

The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
block that allows dynamic adjustment of memory attributes in the runtime.
It contains a configurable amount of PMA entries implemented as CSR
registers to control the attributes of memory locations in interest.

Below are the memory attributes supported:
* Device, Non-bufferable
* Device, bufferable
* Memory, Non-cacheable, Non-bufferable
* Memory, Non-cacheable, Bufferable
* Memory, Write-back, No-allocate
* Memory, Write-back, Read-allocate
* Memory, Write-back, Write-allocate
* Memory, Write-back, Read and Write-allocate

This patch adds support to configure the memory attributes of the memory
regions as passed from the l2 cache node and exposes the cache management
ops. Currently the OpenSBI code implements support for "Memory,
Non-cacheable, Non-bufferable" option with SBI_EXT_ANDES_SET_PMA.

More info about PMA (section 10.3):
http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf

This feature is based on the work posted [0] by Vincent Chen
<vincentc@andestech.com> for the Andes AndeStart RISC-V CPU.

[0] https://lore.kernel.org/lkml/1540982130-28248-1-git-send-email-vincentc@andestech.com/

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
---
 arch/riscv/include/asm/cacheflush.h    |   8 +
 arch/riscv/include/asm/errata_list.h   |   2 +
 arch/riscv/include/asm/sbi.h           |   1 +
 arch/riscv/mm/dma-noncoherent.c        |  20 ++
 drivers/soc/renesas/Makefile           |   4 +
 drivers/soc/renesas/rzf/Makefile       |   3 +
 drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
 drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++
 8 files changed, 430 insertions(+)
 create mode 100644 drivers/soc/renesas/rzf/Makefile
 create mode 100644 drivers/soc/renesas/rzf/ax45mp_cache.c
 create mode 100644 drivers/soc/renesas/rzf/rzf_sbi.h

Comments

Conor Dooley Oct. 4, 2022, 5:42 p.m. UTC | #1
On Mon, Oct 03, 2022 at 11:32:22PM +0100, Prabhakar wrote:
> From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> 
> On the AX45MP core, cache coherency is a specification option so it may
> not be supported. In this case DMA will fail. As a workaround, firstly we
> allocate a global dma coherent pool from which DMA allocations are taken
> and marked as non-cacheable + bufferable using the PMA region as specified
> in the device tree. Synchronization callbacks are implemented to
> synchronize when doing DMA transactions.
> 
> The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> block that allows dynamic adjustment of memory attributes in the runtime.
> It contains a configurable amount of PMA entries implemented as CSR
> registers to control the attributes of memory locations in interest.
> 
> Below are the memory attributes supported:
> * Device, Non-bufferable
> * Device, bufferable
> * Memory, Non-cacheable, Non-bufferable
> * Memory, Non-cacheable, Bufferable
> * Memory, Write-back, No-allocate
> * Memory, Write-back, Read-allocate
> * Memory, Write-back, Write-allocate
> * Memory, Write-back, Read and Write-allocate
> 
> This patch adds support to configure the memory attributes of the memory
> regions as passed from the l2 cache node and exposes the cache management
> ops. Currently the OpenSBI code implements support for "Memory,
> Non-cacheable, Non-bufferable" option with SBI_EXT_ANDES_SET_PMA.
> 
> More info about PMA (section 10.3):
> http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> 
> This feature is based on the work posted [0] by Vincent Chen
> <vincentc@andestech.com> for the Andes AndeStart RISC-V CPU.
> 
> [0] https://lore.kernel.org/lkml/1540982130-28248-1-git-send-email-vincentc@andestech.com/
> 
> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> ---
>  arch/riscv/include/asm/cacheflush.h    |   8 +
>  arch/riscv/include/asm/errata_list.h   |   2 +
>  arch/riscv/include/asm/sbi.h           |   1 +
>  arch/riscv/mm/dma-noncoherent.c        |  20 ++

Stupid question maybe, but I assume you mixed the driver addition and
the changes to arch/riscv for the sake of easily creating the RFC?

>  drivers/soc/renesas/Makefile           |   4 +
>  drivers/soc/renesas/rzf/Makefile       |   3 +
>  drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
>  drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++
>  8 files changed, 430 insertions(+)
>  create mode 100644 drivers/soc/renesas/rzf/Makefile
>  create mode 100644 drivers/soc/renesas/rzf/ax45mp_cache.c
>  create mode 100644 drivers/soc/renesas/rzf/rzf_sbi.h
> 

I won't make any comments on the ALTERNATIVES usage & leave that to the
likes of Heiko rather than make a fool of myself! But to my untrained
eye, having to use #defines looks like you've strayed pretty far from
the light.. My understanding was that the whole point was to avoid
having any ifdef-ery!

> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 2a0ef738695e..10a7c855d125 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -37,6 +37,7 @@ enum sbi_ext_id {
> 
>         /* Vendor extensions must lie within this range */
>         SBI_EXT_VENDOR_START = 0x09000000,
> +       SBI_EXT_ANDES = 0x0900031E,
>         SBI_EXT_VENDOR_END = 0x09FFFFFF,
>  };

Hmm, does this belong there? It certainly makes the comment look a
little odd! /If/ it goes into this file, I think it should be in a
separate section "heading" - but could it not be put into rzf_sbi.h?

> diff --git a/drivers/soc/renesas/rzf/ax45mp_cache.c b/drivers/soc/renesas/rzf/ax45mp_cache.c
> new file mode 100644
> index 000000000000..6eca32aef33e
> --- /dev/null
> +++ b/drivers/soc/renesas/rzf/ax45mp_cache.c
> @@ -0,0 +1,365 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PMA setup and non-coherent cache functions for AX45MP
> + *

Given your comment in the commit message, should this also be carrying a
copyright from Andestech?

> + * Copyright (C) 2022 Renesas Electronics Corp.
> + */
> +
> +#include <linux/cacheinfo.h>
> +#include <linux/of_address.h>
> +

> +static void __iomem *l2c_base;
> +
> +/* -----------------------------------------------------------------------------

I'll (mostly) keep my nose out of style for soc/renesas, but this /* ---
style looks unusual!

> + * PMA setup
> + */

> +static long sbi_set_pma(void *arg)
> +static void ax45mp_configure_pma_regions(struct device_node *np, int count)
> +static void cpu_dcache_inval_range(unsigned long start,
> +void rzfive_cpu_dma_inval_range(void *vaddr, size_t size)

There's a real mix of function name prefixes in here, sbi_ aside is
there a reason you didn't just stick to ax45mp_foo()? Apologies if
I missed something that should've been obvious

> +static void cpu_dcache_wb_range(unsigned long start,
> +				unsigned long end,
> +				int line_size)
> +{
> +	bool ucctl_ok = false;
> +	unsigned long pa;
> +	int mhartid = 0;
> +#ifdef CONFIG_SMP
> +	mhartid = smp_processor_id();
> +#endif

Won't this produce complaints from your if you compile with CONFIG_SMP
set?

> +
> +	ucctl_ok = cpu_cache_controlable();
> +
> +	while (end > start) {
> +		if (ucctl_ok) {
> +			csr_write(CCTL_REG_UCCTLBEGINADDR_NUM, start);
> +			csr_write(CCTL_REG_UCCTLCOMMAND_NUM, CCTL_L1D_VA_WB);
> +		}
> +
> +		if (l2c_base && (cpu_l2c_ctl_status() & L2_CACHE_CTL_CEN_MASK)) {
> +			pa = virt_to_phys((void *)start);
> +			writel(pa, (void *)(l2c_base + L2C_REG_CN_ACC_OFFSET(mhartid)));
> +			writel(CCTL_L2_PA_WB, (void *)(l2c_base + L2C_REG_CN_CMD_OFFSET(mhartid)));
> +			while ((cpu_l2c_get_cctl_status() &
> +				CCTL_L2_STATUS_CN_MASK(mhartid)) != CCTL_L2_STATUS_IDLE)
> +				;
> +		}
> +
> +		start += line_size;
> +	}
> +}

Thanks,
Conor.
Guo Ren Oct. 5, 2022, 1:28 a.m. UTC | #2
On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
>
> From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
>
> On the AX45MP core, cache coherency is a specification option so it may
> not be supported. In this case DMA will fail. As a workaround, firstly we
> allocate a global dma coherent pool from which DMA allocations are taken
> and marked as non-cacheable + bufferable using the PMA region as specified
> in the device tree. Synchronization callbacks are implemented to
> synchronize when doing DMA transactions.
>
> The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> block that allows dynamic adjustment of memory attributes in the runtime.
> It contains a configurable amount of PMA entries implemented as CSR
> registers to control the attributes of memory locations in interest.
>
> Below are the memory attributes supported:
> * Device, Non-bufferable
> * Device, bufferable
> * Memory, Non-cacheable, Non-bufferable
> * Memory, Non-cacheable, Bufferable
> * Memory, Write-back, No-allocate
> * Memory, Write-back, Read-allocate
> * Memory, Write-back, Write-allocate
> * Memory, Write-back, Read and Write-allocate
Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
give a map list of the types of Svpbmt? And give out what you needed,
but Svpbmt can't.

Here is the Linux dma type to Svpbmt map:
PMA -> Normal
IO -> ioremap, pgprot_noncached
NC -> pgprot_writecombine

How about AX45MP?

>
> This patch adds support to configure the memory attributes of the memory
> regions as passed from the l2 cache node and exposes the cache management
> ops. Currently the OpenSBI code implements support for "Memory,
> Non-cacheable, Non-bufferable" option with SBI_EXT_ANDES_SET_PMA.
>
> More info about PMA (section 10.3):
> http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
>
> This feature is based on the work posted [0] by Vincent Chen
> <vincentc@andestech.com> for the Andes AndeStart RISC-V CPU.
>
> [0] https://lore.kernel.org/lkml/1540982130-28248-1-git-send-email-vincentc@andestech.com/
>
> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> ---
>  arch/riscv/include/asm/cacheflush.h    |   8 +
>  arch/riscv/include/asm/errata_list.h   |   2 +
>  arch/riscv/include/asm/sbi.h           |   1 +
>  arch/riscv/mm/dma-noncoherent.c        |  20 ++
>  drivers/soc/renesas/Makefile           |   4 +
>  drivers/soc/renesas/rzf/Makefile       |   3 +
>  drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
>  drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++
>  8 files changed, 430 insertions(+)
>  create mode 100644 drivers/soc/renesas/rzf/Makefile
>  create mode 100644 drivers/soc/renesas/rzf/ax45mp_cache.c
>  create mode 100644 drivers/soc/renesas/rzf/rzf_sbi.h
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 273ece6b622f..a7c03321afa0 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -63,6 +63,14 @@ void riscv_noncoherent_supported(void);
>  #define SYS_RISCV_FLUSH_ICACHE_LOCAL 1UL
>  #define SYS_RISCV_FLUSH_ICACHE_ALL   (SYS_RISCV_FLUSH_ICACHE_LOCAL)
>
> +#ifdef CONFIG_ARCH_R9A07G043
> +void rzfive_cpu_dma_inval_range(void *vaddr, size_t end);
> +void rzfive_cpu_dma_wb_range(void *vaddr, size_t end);
> +
> +#define ALT_CMO_OP(_op, _start, _size, _cachesize)     \
> +                  _op(_start, _size);
> +#endif
> +
>  #include <asm-generic/cacheflush.h>
>
>  #endif /* _ASM_RISCV_CACHEFLUSH_H */
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 19a771085781..d9cbf60c3b65 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -89,6 +89,7 @@ asm volatile(ALTERNATIVE(                                             \
>  #define ALT_THEAD_PMA(_val)
>  #endif
>
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>  /*
>   * dcache.ipa rs1 (invalidate, physical address)
>   * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> @@ -143,5 +144,6 @@ asm volatile(ALTERNATIVE_2(                                         \
>         : "a0")
>
>  #endif /* __ASSEMBLY__ */
> +#endif
>
>  #endif
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 2a0ef738695e..10a7c855d125 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -37,6 +37,7 @@ enum sbi_ext_id {
>
>         /* Vendor extensions must lie within this range */
>         SBI_EXT_VENDOR_START = 0x09000000,
> +       SBI_EXT_ANDES = 0x0900031E,
>         SBI_EXT_VENDOR_END = 0x09FFFFFF,
>  };
>
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index e3f9bdf47c5f..576601f180ea 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -22,13 +22,25 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
>
>         switch (dir) {
>         case DMA_TO_DEVICE:
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>                 ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> +#elif CONFIG_ARCH_R9A07G043
> +               ALT_CMO_OP(rzfive_cpu_dma_wb_range, vaddr, size, 0x0);
> +#endif
>                 break;
>         case DMA_FROM_DEVICE:
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>                 ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> +#elif CONFIG_ARCH_R9A07G043
> +               ALT_CMO_OP(rzfive_cpu_dma_inval_range, vaddr, size, 0x0);
> +#endif
>                 break;
>         case DMA_BIDIRECTIONAL:
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>                 ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +#elif CONFIG_ARCH_R9A07G043
> +               ALT_CMO_OP(rzfive_cpu_dma_wb_range, vaddr, size, 0x0);
> +#endif
>                 break;
>         default:
>                 break;
> @@ -45,7 +57,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
>                 break;
>         case DMA_FROM_DEVICE:
>         case DMA_BIDIRECTIONAL:
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>                 ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +#elif CONFIG_ARCH_R9A07G043
> +               ALT_CMO_OP(rzfive_cpu_dma_inval_range, vaddr, size, 0x0);
> +#endif
>                 break;
>         default:
>                 break;
> @@ -54,14 +70,17 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
>
>  void arch_dma_prep_coherent(struct page *page, size_t size)
>  {
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>         void *flush_addr = page_address(page);
>
>         ALT_CMO_OP(flush, flush_addr, size, riscv_cbom_block_size);
> +#endif
>  }
>
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>                 const struct iommu_ops *iommu, bool coherent)
>  {
> +#ifdef CONFIG_ERRATA_THEAD_CMO
>         WARN_TAINT(!coherent && riscv_cbom_block_size > ARCH_DMA_MINALIGN,
>                    TAINT_CPU_OUT_OF_SPEC,
>                    "%s %s: ARCH_DMA_MINALIGN smaller than riscv,cbom-block-size (%d < %d)",
> @@ -73,6 +92,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>                    dev_driver_string(dev), dev_name(dev));
>
>         dev->dma_coherent = coherent;
> +#endif
>  }
>
>  #ifdef CONFIG_RISCV_ISA_ZICBOM
> diff --git a/drivers/soc/renesas/Makefile b/drivers/soc/renesas/Makefile
> index 535868c9c7e4..a20cc7ad5b12 100644
> --- a/drivers/soc/renesas/Makefile
> +++ b/drivers/soc/renesas/Makefile
> @@ -31,6 +31,10 @@ ifdef CONFIG_SMP
>  obj-$(CONFIG_ARCH_R9A06G032)   += r9a06g032-smp.o
>  endif
>
> +ifdef CONFIG_RISCV
> +obj-y += rzf/
> +endif
> +
>  # Family
>  obj-$(CONFIG_RST_RCAR)         += rcar-rst.o
>  obj-$(CONFIG_SYSC_RCAR)                += rcar-sysc.o
> diff --git a/drivers/soc/renesas/rzf/Makefile b/drivers/soc/renesas/rzf/Makefile
> new file mode 100644
> index 000000000000..e397ba2c733f
> --- /dev/null
> +++ b/drivers/soc/renesas/rzf/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_ARCH_R9A07G043) += ax45mp_cache.o
> diff --git a/drivers/soc/renesas/rzf/ax45mp_cache.c b/drivers/soc/renesas/rzf/ax45mp_cache.c
> new file mode 100644
> index 000000000000..6eca32aef33e
> --- /dev/null
> +++ b/drivers/soc/renesas/rzf/ax45mp_cache.c
> @@ -0,0 +1,365 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PMA setup and non-coherent cache functions for AX45MP
> + *
> + * Copyright (C) 2022 Renesas Electronics Corp.
> + */
> +
> +#include <linux/cacheinfo.h>
> +#include <linux/of_address.h>
> +
> +#include <asm/sbi.h>
> +
> +#include "rzf_sbi.h"
> +
> +/* D-cache operation */
> +#define CCTL_L1D_VA_INVAL              0
> +#define CCTL_L1D_VA_WB                 1
> +
> +/* L2 cache */
> +#define L2_CACHE_CTL_CEN_MASK          1
> +
> +/* L2 cache registers */
> +#define L2C_REG_CTL_OFFSET             0x8
> +#define L2C_REG_C0_CMD_OFFSET          0x40
> +#define L2C_REG_C0_ACC_OFFSET          0x48
> +#define L2C_REG_STATUS_OFFSET          0x80
> +
> +/* L2 CCTL status */
> +#define CCTL_L2_STATUS_IDLE            0
> +
> +/* L2 CCTL status cores mask */
> +#define CCTL_L2_STATUS_C0_MASK         0xf
> +
> +/* L2 cache operation */
> +#define CCTL_L2_PA_INVAL               0x8
> +#define CCTL_L2_PA_WB                  0x9
> +
> +#define L2C_HPM_PER_CORE_OFFSET                0x8
> +#define L2C_REG_PER_CORE_OFFSET                0x10
> +#define CCTL_L2_STATUS_PER_CORE_OFFSET 4
> +
> +#define L2C_REG_CN_CMD_OFFSET(n)       \
> +       (L2C_REG_C0_CMD_OFFSET + ((n) * L2C_REG_PER_CORE_OFFSET))
> +#define L2C_REG_CN_ACC_OFFSET(n)       \
> +       (L2C_REG_C0_ACC_OFFSET + ((n) * L2C_REG_PER_CORE_OFFSET))
> +#define CCTL_L2_STATUS_CN_MASK(n)      \
> +       (CCTL_L2_STATUS_C0_MASK << ((n) * CCTL_L2_STATUS_PER_CORE_OFFSET))
> +
> +#define MICM_CFG_ISZ_OFFSET            6
> +#define MICM_CFG_ISZ_MASK              (0x7  << MICM_CFG_ISZ_OFFSET)
> +
> +#define MDCM_CFG_DSZ_OFFSET            6
> +#define MDCM_CFG_DSZ_MASK              (0x7  << MDCM_CFG_DSZ_OFFSET)
> +
> +#define CCTL_REG_UCCTLBEGINADDR_NUM    0x80b
> +#define CCTL_REG_UCCTLCOMMAND_NUM      0x80c
> +
> +#define MCACHE_CTL_CCTL_SUEN_OFFSET    8
> +#define MMSC_CFG_CCTLCSR_OFFSET                16
> +#define MISA_20_OFFSET                 20
> +
> +#define MCACHE_CTL_CCTL_SUEN_MASK      (0x1 << MCACHE_CTL_CCTL_SUEN_OFFSET)
> +#define MMSC_CFG_CCTLCSR_MASK          (0x1 << MMSC_CFG_CCTLCSR_OFFSET)
> +#define MISA_20_MASK                   (0x1 << MISA_20_OFFSET)
> +
> +#define MAX_CACHE_LINE_SIZE    256
> +
> +#define ANDES_AX45MP_MAX_PMA_REGIONS   16
> +
> +struct pma_arg_t {
> +       phys_addr_t offset;
> +       unsigned long vaddr;
> +       size_t size;
> +       size_t entry_id;
> +};
> +
> +struct ax45mp_cache_info {
> +       bool init_done;
> +       int dcache_line_size;
> +};
> +
> +static DEFINE_PER_CPU(struct ax45mp_cache_info, cpu_cache_info) = {
> +       .init_done = 0,
> +       .dcache_line_size = SZ_64,
> +};
> +
> +static void __iomem *l2c_base;
> +
> +/* -----------------------------------------------------------------------------
> + * PMA setup
> + */
> +static long sbi_set_pma(void *arg)
> +{
> +       phys_addr_t offset = ((struct pma_arg_t *)arg)->offset;
> +       unsigned long vaddr = ((struct pma_arg_t *)arg)->vaddr;
> +       size_t entry_id = ((struct pma_arg_t *)arg)->entry_id;
> +       size_t size = ((struct pma_arg_t *)arg)->size;
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> +
> +       return ret.value;
> +}
> +
> +static unsigned long cpu_nocache_area_set(unsigned long start,
> +                                         unsigned long size,
> +                                         unsigned long entry_id)
> +{
> +       struct pma_arg_t pma_arg;
> +       unsigned long ret = 0;
> +
> +       pma_arg.offset = start;
> +       pma_arg.size = size;
> +       pma_arg.vaddr = start + size;
> +       pma_arg.entry_id = entry_id;
> +       ret = sbi_set_pma(&pma_arg);
> +
> +       return ret;
> +}
> +
> +static void ax45mp_configure_pma_regions(struct device_node *np, int count)
> +{
> +       u64 start, size;
> +       unsigned int i;
> +
> +       for (i = 0 ; i < count ; i++) {
> +               of_property_read_u64_index(np, "pma-regions", (i << 1), &start);
> +               of_property_read_u64_index(np, "pma-regions", (i << 1) + 1, &size);
> +               cpu_nocache_area_set(start, size, i);
> +       }
> +}
> +
> +/* -----------------------------------------------------------------------------
> + * L2 Cache operations
> + */
> +static uint32_t cpu_get_mcache_ctl_status(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MCACHE_CTL_STATUS, 0, 0, 0, 0, 0, 0);
> +       return ret.value;
> +}
> +
> +static uint32_t cpu_get_micm_cfg_status(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MICM_CTL_STATUS, 0, 0, 0, 0, 0, 0);
> +       return ret.value;
> +}
> +
> +static uint32_t cpu_get_mdcm_cfg_status(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MDCM_CTL_STATUS, 0, 0, 0, 0, 0, 0);
> +       return ret.value;
> +}
> +
> +static uint32_t cpu_get_mmsc_cfg_status(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MMSC_CTL_STATUS, 0, 0, 0, 0, 0, 0);
> +       return ret.value;
> +}
> +
> +static uint32_t cpu_get_misa_cfg_status(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MISA_CTL_STATUS, 0, 0, 0, 0, 0, 0);
> +       return ret.value;
> +}
> +
> +static void fill_cpu_cache_info(struct ax45mp_cache_info *cpu_ci)
> +{
> +       struct cpu_cacheinfo *this_cpu_ci =
> +                       get_cpu_cacheinfo(smp_processor_id());
> +       struct cacheinfo *this_leaf = this_cpu_ci->info_list;
> +       unsigned int i;
> +
> +       for (i = 0; i < this_cpu_ci->num_leaves ; i++, this_leaf++) {
> +               if (this_leaf->type == CACHE_TYPE_DATA)
> +                       cpu_ci->dcache_line_size = this_leaf->coherency_line_size;
> +       }
> +
> +       cpu_ci->init_done = true;
> +}
> +
> +static inline int get_cache_line_size(void)
> +{
> +       struct ax45mp_cache_info *cpu_ci =
> +                       &per_cpu(cpu_cache_info, smp_processor_id());
> +
> +       if (unlikely(!cpu_ci->init_done))
> +               fill_cpu_cache_info(cpu_ci);
> +       return cpu_ci->dcache_line_size;
> +}
> +
> +static uint32_t cpu_l2c_get_cctl_status(void)
> +{
> +       return readl((void *)(l2c_base + L2C_REG_STATUS_OFFSET));
> +}
> +
> +static uint32_t cpu_l2c_ctl_status(void)
> +{
> +       return readl((void *)(l2c_base + L2C_REG_CTL_OFFSET));
> +}
> +
> +static bool cpu_cache_controlable(void)
> +{
> +       return (((cpu_get_micm_cfg_status() & MICM_CFG_ISZ_MASK) ||
> +                (cpu_get_mdcm_cfg_status() & MDCM_CFG_DSZ_MASK)) &&
> +               (cpu_get_misa_cfg_status() & MISA_20_MASK) &&
> +               (cpu_get_mmsc_cfg_status() & MMSC_CFG_CCTLCSR_MASK) &&
> +               (cpu_get_mcache_ctl_status() & MCACHE_CTL_CCTL_SUEN_MASK));
> +}
> +
> +static void cpu_dcache_wb_range(unsigned long start,
> +                               unsigned long end,
> +                               int line_size)
> +{
> +       bool ucctl_ok = false;
> +       unsigned long pa;
> +       int mhartid = 0;
> +#ifdef CONFIG_SMP
> +       mhartid = smp_processor_id();
> +#endif
> +
> +       ucctl_ok = cpu_cache_controlable();
> +
> +       while (end > start) {
> +               if (ucctl_ok) {
> +                       csr_write(CCTL_REG_UCCTLBEGINADDR_NUM, start);
> +                       csr_write(CCTL_REG_UCCTLCOMMAND_NUM, CCTL_L1D_VA_WB);
> +               }
> +
> +               if (l2c_base && (cpu_l2c_ctl_status() & L2_CACHE_CTL_CEN_MASK)) {
> +                       pa = virt_to_phys((void *)start);
> +                       writel(pa, (void *)(l2c_base + L2C_REG_CN_ACC_OFFSET(mhartid)));
> +                       writel(CCTL_L2_PA_WB, (void *)(l2c_base + L2C_REG_CN_CMD_OFFSET(mhartid)));
> +                       while ((cpu_l2c_get_cctl_status() &
> +                               CCTL_L2_STATUS_CN_MASK(mhartid)) != CCTL_L2_STATUS_IDLE)
> +                               ;
> +               }
> +
> +               start += line_size;
> +       }
> +}
> +
> +static void cpu_dcache_inval_range(unsigned long start,
> +                                  unsigned long end,
> +                                  int line_size)
> +{
> +       bool ucctl_ok = false;
> +       unsigned long pa;
> +       int mhartid = 0;
> +#ifdef CONFIG_SMP
> +       mhartid = smp_processor_id();
> +#endif
> +
> +       ucctl_ok = cpu_cache_controlable();
> +
> +       while (end > start) {
> +               if (ucctl_ok) {
> +                       csr_write(CCTL_REG_UCCTLBEGINADDR_NUM, start);
> +                       csr_write(CCTL_REG_UCCTLCOMMAND_NUM, CCTL_L1D_VA_INVAL);
> +               }
> +
> +               if (l2c_base && (cpu_l2c_ctl_status() & L2_CACHE_CTL_CEN_MASK)) {
> +                       pa = virt_to_phys((void *)start);
> +                       writel(pa, (void *)(l2c_base + L2C_REG_CN_ACC_OFFSET(mhartid)));
> +                       writel(CCTL_L2_PA_INVAL,
> +                              (void *)(l2c_base + L2C_REG_CN_CMD_OFFSET(mhartid)));
> +                       while ((cpu_l2c_get_cctl_status() &
> +                               CCTL_L2_STATUS_CN_MASK(mhartid)) != CCTL_L2_STATUS_IDLE)
> +                               ;
> +               }
> +
> +               start += line_size;
> +       }
> +}
> +
> +void rzfive_cpu_dma_inval_range(void *vaddr, size_t size)
> +{
> +       unsigned long line_size = get_cache_line_size();
> +       char cache_buf[2][MAX_CACHE_LINE_SIZE] = { 0 };
> +       unsigned long start = (unsigned long)vaddr;
> +       unsigned long end = start + size;
> +       unsigned long old_start = start;
> +       unsigned long old_end = end;
> +       unsigned long flags;
> +
> +       if (unlikely(start == end))
> +               return;
> +
> +       start = start & (~(line_size - 1));
> +       end = ((end + line_size - 1) & (~(line_size - 1)));
> +
> +       local_irq_save(flags);
> +       if (unlikely(start != old_start))
> +               memcpy(&cache_buf[0][0], (void *)start, line_size);
> +
> +       if (unlikely(end != old_end))
> +               memcpy(&cache_buf[1][0], (void *)(old_end & (~(line_size - 1))), line_size);
> +
> +       cpu_dcache_inval_range(start, end, line_size);
> +
> +       if (unlikely(start != old_start))
> +               memcpy((void *)start, &cache_buf[0][0], (old_start & (line_size - 1)));
> +
> +       if (unlikely(end != old_end))
> +               memcpy((void *)(old_end + 1),
> +                      &cache_buf[1][(old_end & (line_size - 1)) + 1],
> +                      end - old_end - 1);
> +
> +       local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL(rzfive_cpu_dma_inval_range);
> +
> +void rzfive_cpu_dma_wb_range(void *vaddr, size_t size)
> +{
> +       unsigned long line_size = get_cache_line_size();
> +       unsigned long start = (unsigned long)vaddr;
> +       unsigned long end = start + size;
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +       start = start & (~(line_size - 1));
> +       cpu_dcache_wb_range(start, end, line_size);
> +       local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL(rzfive_cpu_dma_wb_range);
> +
> +static const struct of_device_id ax45mp_cache_ids[] = {
> +       { .compatible = "andestech,ax45mp-cache" },
> +       { /* sentinel */ }
> +};
> +
> +static int __init ax45mp_cache_init(void)
> +{
> +       struct device_node *np;
> +       int count;
> +
> +       np = of_find_matching_node(NULL, ax45mp_cache_ids);
> +       if (!np)
> +               return -ENODEV;
> +
> +       l2c_base = of_iomap(np, 0);
> +       if (!l2c_base)
> +               return -ENOMEM;
> +
> +       count = of_property_count_elems_of_size(np, "pma-regions",
> +                                               sizeof(u32) * 4);
> +       if (count > ANDES_AX45MP_MAX_PMA_REGIONS) {
> +               iounmap(l2c_base);
> +               return -EINVAL;
> +       }
> +
> +       ax45mp_configure_pma_regions(np, count);
> +
> +       return 0;
> +}
> +arch_initcall(ax45mp_cache_init);
> diff --git a/drivers/soc/renesas/rzf/rzf_sbi.h b/drivers/soc/renesas/rzf/rzf_sbi.h
> new file mode 100644
> index 000000000000..854fee667276
> --- /dev/null
> +++ b/drivers/soc/renesas/rzf/rzf_sbi.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#ifndef __ARCH_R9A07G043_SBI_H
> +#define __ARCH_R9A07G043_SBI_H
> +
> +enum sbi_ext_andes_fid {
> +       SBI_EXT_ANDES_GET_MCACHE_CTL_STATUS = 0,
> +       SBI_EXT_ANDES_GET_MMISC_CTL_STATUS,
> +       SBI_EXT_ANDES_SET_MCACHE_CTL,
> +       SBI_EXT_ANDES_SET_MMISC_CTL,
> +       SBI_EXT_ANDES_ICACHE_OP,
> +       SBI_EXT_ANDES_DCACHE_OP,
> +       SBI_EXT_ANDES_L1CACHE_I_PREFETCH,
> +       SBI_EXT_ANDES_L1CACHE_D_PREFETCH,
> +       SBI_EXT_ANDES_NON_BLOCKING_LOAD_STORE,
> +       SBI_EXT_ANDES_WRITE_AROUND,
> +       SBI_EXT_ANDES_SET_PMA,
> +       SBI_EXT_ANDES_FREE_PMA,
> +       SBI_EXT_ANDES_PROBE_PMA,
> +       SBI_EXT_ANDES_DCACHE_WBINVAL_ALL,
> +       SBI_EXT_ANDES_GET_MICM_CTL_STATUS,
> +       SBI_EXT_ANDES_GET_MDCM_CTL_STATUS,
> +       SBI_EXT_ANDES_GET_MMSC_CTL_STATUS,
> +       SBI_EXT_ANDES_GET_MISA_CTL_STATUS,
> +};
> +
> +#endif
> --
> 2.25.1
>
Lad, Prabhakar Oct. 5, 2022, 8:44 a.m. UTC | #3
Hi Conor,

Thank you for the review.

On Tue, Oct 4, 2022 at 6:43 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Mon, Oct 03, 2022 at 11:32:22PM +0100, Prabhakar wrote:
> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> >
> > On the AX45MP core, cache coherency is a specification option so it may
> > not be supported. In this case DMA will fail. As a workaround, firstly we
> > allocate a global dma coherent pool from which DMA allocations are taken
> > and marked as non-cacheable + bufferable using the PMA region as specified
> > in the device tree. Synchronization callbacks are implemented to
> > synchronize when doing DMA transactions.
> >
> > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > block that allows dynamic adjustment of memory attributes in the runtime.
> > It contains a configurable amount of PMA entries implemented as CSR
> > registers to control the attributes of memory locations in interest.
> >
> > Below are the memory attributes supported:
> > * Device, Non-bufferable
> > * Device, bufferable
> > * Memory, Non-cacheable, Non-bufferable
> > * Memory, Non-cacheable, Bufferable
> > * Memory, Write-back, No-allocate
> > * Memory, Write-back, Read-allocate
> > * Memory, Write-back, Write-allocate
> > * Memory, Write-back, Read and Write-allocate
> >
> > This patch adds support to configure the memory attributes of the memory
> > regions as passed from the l2 cache node and exposes the cache management
> > ops. Currently the OpenSBI code implements support for "Memory,
> > Non-cacheable, Non-bufferable" option with SBI_EXT_ANDES_SET_PMA.
> >
> > More info about PMA (section 10.3):
> > http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> >
> > This feature is based on the work posted [0] by Vincent Chen
> > <vincentc@andestech.com> for the Andes AndeStart RISC-V CPU.
> >
> > [0] https://lore.kernel.org/lkml/1540982130-28248-1-git-send-email-vincentc@andestech.com/
> >
> > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > ---
> >  arch/riscv/include/asm/cacheflush.h    |   8 +
> >  arch/riscv/include/asm/errata_list.h   |   2 +
> >  arch/riscv/include/asm/sbi.h           |   1 +
> >  arch/riscv/mm/dma-noncoherent.c        |  20 ++
>
> Stupid question maybe, but I assume you mixed the driver addition and
> the changes to arch/riscv for the sake of easily creating the RFC?
>
Indeed.

> >  drivers/soc/renesas/Makefile           |   4 +
> >  drivers/soc/renesas/rzf/Makefile       |   3 +
> >  drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
> >  drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++
> >  8 files changed, 430 insertions(+)
> >  create mode 100644 drivers/soc/renesas/rzf/Makefile
> >  create mode 100644 drivers/soc/renesas/rzf/ax45mp_cache.c
> >  create mode 100644 drivers/soc/renesas/rzf/rzf_sbi.h
> >
>
> I won't make any comments on the ALTERNATIVES usage & leave that to the
> likes of Heiko rather than make a fool of myself! But to my untrained
> eye, having to use #defines looks like you've strayed pretty far from
> the light.. My understanding was that the whole point was to avoid
> having any ifdef-ery!
>
Agreed, as mentioned in the cover letter, we need an approach where we
can detect things runtime and not degrade the system and get rid of
ifdef-ery!. (Suggestion welcome :))

> > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > index 2a0ef738695e..10a7c855d125 100644
> > --- a/arch/riscv/include/asm/sbi.h
> > +++ b/arch/riscv/include/asm/sbi.h
> > @@ -37,6 +37,7 @@ enum sbi_ext_id {
> >
> >         /* Vendor extensions must lie within this range */
> >         SBI_EXT_VENDOR_START = 0x09000000,
> > +       SBI_EXT_ANDES = 0x0900031E,
> >         SBI_EXT_VENDOR_END = 0x09FFFFFF,
> >  };
>
> Hmm, does this belong there? It certainly makes the comment look a
> little odd! /If/ it goes into this file, I think it should be in a
> separate section "heading" - but could it not be put into rzf_sbi.h?
>
It can be moved to rzf_sbi.h

> > diff --git a/drivers/soc/renesas/rzf/ax45mp_cache.c b/drivers/soc/renesas/rzf/ax45mp_cache.c
> > new file mode 100644
> > index 000000000000..6eca32aef33e
> > --- /dev/null
> > +++ b/drivers/soc/renesas/rzf/ax45mp_cache.c
> > @@ -0,0 +1,365 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * PMA setup and non-coherent cache functions for AX45MP
> > + *
>
> Given your comment in the commit message, should this also be carrying a
> copyright from Andestech?
>
I was in two minds as the code has changed a lot compared to orignal
patch series. If you insist I can include it.

> > + * Copyright (C) 2022 Renesas Electronics Corp.
> > + */
> > +
> > +#include <linux/cacheinfo.h>
> > +#include <linux/of_address.h>
> > +
>
> > +static void __iomem *l2c_base;
> > +
> > +/* -----------------------------------------------------------------------------
>
> I'll (mostly) keep my nose out of style for soc/renesas, but this /* ---
> style looks unusual!
>
It's not typical style we use in soc/renesas its just that I wanted to
separate functions it out.

> > + * PMA setup
> > + */
>
> > +static long sbi_set_pma(void *arg)
> > +static void ax45mp_configure_pma_regions(struct device_node *np, int count)
> > +static void cpu_dcache_inval_range(unsigned long start,
> > +void rzfive_cpu_dma_inval_range(void *vaddr, size_t size)
>
> There's a real mix of function name prefixes in here, sbi_ aside is
> there a reason you didn't just stick to ax45mp_foo()? Apologies if
> I missed something that should've been obvious
>
Agreed, I will follow ax45mp_foo() approach.

> > +static void cpu_dcache_wb_range(unsigned long start,
> > +                             unsigned long end,
> > +                             int line_size)
> > +{
> > +     bool ucctl_ok = false;
> > +     unsigned long pa;
> > +     int mhartid = 0;
> > +#ifdef CONFIG_SMP
> > +     mhartid = smp_processor_id();
> > +#endif
>
> Won't this produce complaints from your if you compile with CONFIG_SMP
> set?
>
No I dont see a build issue with SMP enabled, do you see any reason
why it should fail?

Cheers,
Prabhakar
Conor Dooley Oct. 5, 2022, 8:58 a.m. UTC | #4
On 5 October 2022 09:44:56 IST, "Lad, Prabhakar" <prabhakar.csengg@gmail.com> wrote:
>Hi Conor,
>
>Thank you for the review.
>
>On Tue, Oct 4, 2022 at 6:43 PM Conor Dooley <conor@kernel.org> wrote:

>> > +static void cpu_dcache_wb_range(unsigned long start,
>> > +                             unsigned long end,
>> > +                             int line_size)
>> > +{
>> > +     bool ucctl_ok = false;
>> > +     unsigned long pa;
>> > +     int mhartid = 0;
>> > +#ifdef CONFIG_SMP
>> > +     mhartid = smp_processor_id();
>> > +#endif
>>
>> Won't this produce complaints from your if you compile with CONFIG_SMP
>> set?
>>
>No I dont see a build issue with SMP enabled, do you see any reason
>why it should fail?

Not fail but complain about the unused variable.
Conor Dooley Oct. 5, 2022, 9:17 a.m. UTC | #5
On 05/10/2022 09:58, Conor Dooley wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 5 October 2022 09:44:56 IST, "Lad, Prabhakar" <prabhakar.csengg@gmail.com> wrote:
>> Hi Conor,
>>
>> Thank you for the review.
>>
>> On Tue, Oct 4, 2022 at 6:43 PM Conor Dooley <conor@kernel.org> wrote:
> 
>>>> +static void cpu_dcache_wb_range(unsigned long start,
>>>> +                             unsigned long end,
>>>> +                             int line_size)
>>>> +{
>>>> +     bool ucctl_ok = false;
>>>> +     unsigned long pa;
>>>> +     int mhartid = 0;
>>>> +#ifdef CONFIG_SMP
>>>> +     mhartid = smp_processor_id();
>>>> +#endif
>>>
>>> Won't this produce complaints from your if you compile with CONFIG_SMP
>>> set?
>>>
>> No I dont see a build issue with SMP enabled, do you see any reason
>> why it should fail?
> 
> Not fail but complain about the unused variable.
> 

Not unused variable, sorry but the unused 0 that it was initialised with*
Arnd Bergmann Oct. 5, 2022, 9:57 a.m. UTC | #6
On Tue, Oct 4, 2022, at 7:42 PM, Conor Dooley wrote:
> On Mon, Oct 03, 2022 at 11:32:22PM +0100, Prabhakar wrote:
>> 
>> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
>> ---
>>  arch/riscv/include/asm/cacheflush.h    |   8 +
>>  arch/riscv/include/asm/errata_list.h   |   2 +
>>  arch/riscv/include/asm/sbi.h           |   1 +
>>  arch/riscv/mm/dma-noncoherent.c        |  20 ++
>
> Stupid question maybe, but I assume you mixed the driver addition and
> the changes to arch/riscv for the sake of easily creating the RFC?
>
>>  drivers/soc/renesas/Makefile           |   4 +
>>  drivers/soc/renesas/rzf/Makefile       |   3 +
>>  drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
>>  drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++

My feeling is that L2 cache behavior should live in arch/riscv
rather than drivers/soc/, since this is not specific to a SoC
family but rather the CPU core. I would also expect that the
actual implementation and DT binding can be shared with
non-renesas SoCs using the same CPU core.


    Arnd
Lad, Prabhakar Oct. 5, 2022, 10:14 a.m. UTC | #7
Hi Arnd,

On Wed, Oct 5, 2022 at 10:58 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Oct 4, 2022, at 7:42 PM, Conor Dooley wrote:
> > On Mon, Oct 03, 2022 at 11:32:22PM +0100, Prabhakar wrote:
> >>
> >> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> >> ---
> >>  arch/riscv/include/asm/cacheflush.h    |   8 +
> >>  arch/riscv/include/asm/errata_list.h   |   2 +
> >>  arch/riscv/include/asm/sbi.h           |   1 +
> >>  arch/riscv/mm/dma-noncoherent.c        |  20 ++
> >
> > Stupid question maybe, but I assume you mixed the driver addition and
> > the changes to arch/riscv for the sake of easily creating the RFC?
> >
> >>  drivers/soc/renesas/Makefile           |   4 +
> >>  drivers/soc/renesas/rzf/Makefile       |   3 +
> >>  drivers/soc/renesas/rzf/ax45mp_cache.c | 365 +++++++++++++++++++++++++
> >>  drivers/soc/renesas/rzf/rzf_sbi.h      |  27 ++
>
> My feeling is that L2 cache behavior should live in arch/riscv
> rather than drivers/soc/, since this is not specific to a SoC
> family but rather the CPU core. I would also expect that the
> actual implementation and DT binding can be shared with
> non-renesas SoCs using the same CPU core.
>
Totally agree it is related to the CPU core and not the SoC. During
the BoF session it was agreed that unratified extensions code shouldnt
go under the arch/riscv. Since the code has vendor specific SBI calls
RISC-V maintainers asked to move it SoC specific so that maintenance
of the code falls under SoC vendors.

Cheers,
Prabhakar
Lad, Prabhakar Oct. 5, 2022, 10:20 a.m. UTC | #8
Hi Conor,

On Wed, Oct 5, 2022 at 10:17 AM <Conor.Dooley@microchip.com> wrote:
>
> On 05/10/2022 09:58, Conor Dooley wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >
> > On 5 October 2022 09:44:56 IST, "Lad, Prabhakar" <prabhakar.csengg@gmail.com> wrote:
> >> Hi Conor,
> >>
> >> Thank you for the review.
> >>
> >> On Tue, Oct 4, 2022 at 6:43 PM Conor Dooley <conor@kernel.org> wrote:
> >
> >>>> +static void cpu_dcache_wb_range(unsigned long start,
> >>>> +                             unsigned long end,
> >>>> +                             int line_size)
> >>>> +{
> >>>> +     bool ucctl_ok = false;
> >>>> +     unsigned long pa;
> >>>> +     int mhartid = 0;
> >>>> +#ifdef CONFIG_SMP
> >>>> +     mhartid = smp_processor_id();
> >>>> +#endif
> >>>
> >>> Won't this produce complaints from your if you compile with CONFIG_SMP
> >>> set?
> >>>
> >> No I dont see a build issue with SMP enabled, do you see any reason
> >> why it should fail?
> >
> > Not fail but complain about the unused variable.
> >
>
> Not unused variable, sorry but the unused 0 that it was initialised with*

No, it doesn't complain (I dont think compilers complain of such
unused assignments, maybe I'm wrong). BTW I am using GCC 9.4.0. Do you
think I need to update it?

Cheers,
Prabhakar
Conor Dooley Oct. 5, 2022, 10:29 a.m. UTC | #9
On Wed, Oct 05, 2022 at 11:20:40AM +0100, Lad, Prabhakar wrote:
> Hi Conor,
> 
> On Wed, Oct 5, 2022 at 10:17 AM <Conor.Dooley@microchip.com> wrote:
> >
> > On 05/10/2022 09:58, Conor Dooley wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > >
> > > On 5 October 2022 09:44:56 IST, "Lad, Prabhakar" <prabhakar.csengg@gmail.com> wrote:
> > >> Hi Conor,
> > >>
> > >> Thank you for the review.
> > >>
> > >> On Tue, Oct 4, 2022 at 6:43 PM Conor Dooley <conor@kernel.org> wrote:
> > >
> > >>>> +static void cpu_dcache_wb_range(unsigned long start,
> > >>>> +                             unsigned long end,
> > >>>> +                             int line_size)
> > >>>> +{
> > >>>> +     bool ucctl_ok = false;
> > >>>> +     unsigned long pa;
> > >>>> +     int mhartid = 0;
> > >>>> +#ifdef CONFIG_SMP
> > >>>> +     mhartid = smp_processor_id();
> > >>>> +#endif
> > >>>
> > >>> Won't this produce complaints from your if you compile with CONFIG_SMP
> > >>> set?
> > >>>
> > >> No I dont see a build issue with SMP enabled, do you see any reason
> > >> why it should fail?
> > >
> > > Not fail but complain about the unused variable.
> > >
> >
> > Not unused variable, sorry but the unused 0 that it was initialised with*
> 
> No, it doesn't complain (I dont think compilers complain of such
> unused assignments, maybe I'm wrong). BTW I am using GCC 9.4.0. Do you
> think I need to update it?

Maybe it's sparse that generates those warnings, I never know which it
is...
Lad, Prabhakar Oct. 5, 2022, 12:53 p.m. UTC | #10
Hi Guo,

On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> >
> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> >
> > On the AX45MP core, cache coherency is a specification option so it may
> > not be supported. In this case DMA will fail. As a workaround, firstly we
> > allocate a global dma coherent pool from which DMA allocations are taken
> > and marked as non-cacheable + bufferable using the PMA region as specified
> > in the device tree. Synchronization callbacks are implemented to
> > synchronize when doing DMA transactions.
> >
> > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > block that allows dynamic adjustment of memory attributes in the runtime.
> > It contains a configurable amount of PMA entries implemented as CSR
> > registers to control the attributes of memory locations in interest.
> >
> > Below are the memory attributes supported:
> > * Device, Non-bufferable
> > * Device, bufferable
> > * Memory, Non-cacheable, Non-bufferable
> > * Memory, Non-cacheable, Bufferable
> > * Memory, Write-back, No-allocate
> > * Memory, Write-back, Read-allocate
> > * Memory, Write-back, Write-allocate
> > * Memory, Write-back, Read and Write-allocate
> Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> give a map list of the types of Svpbmt? And give out what you needed,
> but Svpbmt can't.
>
Sorry I didn't get what you meant here, could you please elaborate.

> Here is the Linux dma type to Svpbmt map:
> PMA -> Normal
> IO -> ioremap, pgprot_noncached
> NC -> pgprot_writecombine
>
> How about AX45MP?
>
Svpbmt extension is not supported on AX45MP (reported by
riscv_isa_extension_available())

Cheers,
Prabhakar
Guo Ren Oct. 5, 2022, 2:23 p.m. UTC | #11
On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Guo,
>
> On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > >
> > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > >
> > > On the AX45MP core, cache coherency is a specification option so it may
> > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > allocate a global dma coherent pool from which DMA allocations are taken
> > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > in the device tree. Synchronization callbacks are implemented to
> > > synchronize when doing DMA transactions.
> > >
> > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > It contains a configurable amount of PMA entries implemented as CSR
> > > registers to control the attributes of memory locations in interest.
> > >
> > > Below are the memory attributes supported:
> > > * Device, Non-bufferable
> > > * Device, bufferable
> > > * Memory, Non-cacheable, Non-bufferable
> > > * Memory, Non-cacheable, Bufferable
> > > * Memory, Write-back, No-allocate
> > > * Memory, Write-back, Read-allocate
> > > * Memory, Write-back, Write-allocate
> > > * Memory, Write-back, Read and Write-allocate
> > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > give a map list of the types of Svpbmt? And give out what you needed,
> > but Svpbmt can't.
> >
> Sorry I didn't get what you meant here, could you please elaborate.
I know there is no pbmt in AX45MP, I am just curious how many physical
memory attributes you would use in linux? It seems only one type used
in the series:
cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);

I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
without pbmt when they don't have cache coherency protocol. Do you
need to inject dma_sync for desc synchronization? What's the effect of
dynamic PMA in the patch series?

Thx.

>
> > Here is the Linux dma type to Svpbmt map:
> > PMA -> Normal
> > IO -> ioremap, pgprot_noncached
> > NC -> pgprot_writecombine
> >
> > How about AX45MP?
> >
> Svpbmt extension is not supported on AX45MP (reported by
> riscv_isa_extension_available())
>
> Cheers,
> Prabhakar
Lad, Prabhakar Oct. 5, 2022, 3:02 p.m. UTC | #12
Hi Guo,

On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> <prabhakar.csengg@gmail.com> wrote:
> >
> > Hi Guo,
> >
> > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > >
> > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > >
> > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > in the device tree. Synchronization callbacks are implemented to
> > > > synchronize when doing DMA transactions.
> > > >
> > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > registers to control the attributes of memory locations in interest.
> > > >
> > > > Below are the memory attributes supported:
> > > > * Device, Non-bufferable
> > > > * Device, bufferable
> > > > * Memory, Non-cacheable, Non-bufferable
> > > > * Memory, Non-cacheable, Bufferable
> > > > * Memory, Write-back, No-allocate
> > > > * Memory, Write-back, Read-allocate
> > > > * Memory, Write-back, Write-allocate
> > > > * Memory, Write-back, Read and Write-allocate
> > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > give a map list of the types of Svpbmt? And give out what you needed,
> > > but Svpbmt can't.
> > >
> > Sorry I didn't get what you meant here, could you please elaborate.
> I know there is no pbmt in AX45MP, I am just curious how many physical
> memory attributes you would use in linux? It seems only one type used
> in the series:
> cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
>
Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
wondering if we could send these options as flags from DT something
like below so that it's not hard coded in the code.

/* PMA config */
#define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
/* OFF: PMA entry is disabled */
#define AX45MP_PMACFG_ETYP_DISABLED            0
/* Naturally aligned power of 2 region */
#define AX45MP_PMACFG_ETYP_NAPOT            3

#define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
/* Device, Non-bufferable */
#define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
/* Device, bufferable */
#define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
/* Memory, Non-cacheable, Non-bufferable */
#define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
/* Memory, Non-cacheable, Bufferable */
#define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
/* Memory, Write-back, No-allocate */
#define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
/* Memory, Write-back, Read-allocate */
#define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
/* Memory, Write-back, Write-allocate */
#define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
/* Memory, Write-back, Read and Write-allocate */
#define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)

/* AMO instructions are supported */
#define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
/* AMO instructions are not supported */
#define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)


                pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
                              <0x0 0x10000000 0x0 0x04000000 0x0
AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
                              <0x0 0x20000000 0x0 0x10000000 0x0
AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
                              <0x0 0x58000000 0x0 0x08000000 0x0
AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
AX45MP_PMACFG_NAMO_AMO_SUPPORT>;

Does the above sound good?

> I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> without pbmt when they don't have cache coherency protocol. Do you
> need to inject dma_sync for desc synchronization? What's the effect of
> dynamic PMA in the patch series?
>
Currently we have setup the pma regions as below:

l2cache: cache-controller@13400000 {
                compatible = "andestech,ax45mp-cache", "cache";
                cache-size = <0x40000>;
                cache-line-size = <64>;
                cache-sets = <1024>;
                cache-unified;
                reg = <0x0 0x13400000 0x0 0x100000>;
                pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
                              <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
                              <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
                              <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
                interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
        };

The last pma-regions entry 0x58000000 is a DDR location this memory
locations is marked as shared DMA pool with below in DT,

    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;

        reserved: linux,cma@58000000 {
            compatible = "shared-dma-pool";
            no-map;
            linux,dma-default;
            reg = <0x0 0x58000000 0x0 0x08000000>;
        };
    };

And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
automatically fall into this region which is non-cacheable but
bufferable (set in PMA) and rest everything is taken care by clean and
flush callbacks. We dont have  inject dma_sync for desc
synchronization for existing drivers (which are shared with Renesas
RZ/G2L family)

Cheers,
Prabhakar
Guo Ren Oct. 6, 2022, 12:59 a.m. UTC | #13
On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Guo,
>
> On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > <prabhakar.csengg@gmail.com> wrote:
> > >
> > > Hi Guo,
> > >
> > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > >
> > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > >
> > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > >
> > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > synchronize when doing DMA transactions.
> > > > >
> > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > registers to control the attributes of memory locations in interest.
> > > > >
> > > > > Below are the memory attributes supported:
> > > > > * Device, Non-bufferable
> > > > > * Device, bufferable
> > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > * Memory, Non-cacheable, Bufferable
> > > > > * Memory, Write-back, No-allocate
> > > > > * Memory, Write-back, Read-allocate
> > > > > * Memory, Write-back, Write-allocate
> > > > > * Memory, Write-back, Read and Write-allocate
> > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > but Svpbmt can't.
> > > >
> > > Sorry I didn't get what you meant here, could you please elaborate.
> > I know there is no pbmt in AX45MP, I am just curious how many physical
> > memory attributes you would use in linux? It seems only one type used
> > in the series:
> > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> >
> Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> wondering if we could send these options as flags from DT something
> like below so that it's not hard coded in the code.
>
> /* PMA config */
> #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> /* OFF: PMA entry is disabled */
> #define AX45MP_PMACFG_ETYP_DISABLED            0
> /* Naturally aligned power of 2 region */
> #define AX45MP_PMACFG_ETYP_NAPOT            3
>
> #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> /* Device, Non-bufferable */
> #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> /* Device, bufferable */
> #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> /* Memory, Non-cacheable, Non-bufferable */
> #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> /* Memory, Non-cacheable, Bufferable */
> #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> /* Memory, Write-back, No-allocate */
> #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> /* Memory, Write-back, Read-allocate */
> #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> /* Memory, Write-back, Write-allocate */
> #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> /* Memory, Write-back, Read and Write-allocate */
> #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
>
> /* AMO instructions are supported */
> #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> /* AMO instructions are not supported */
> #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
>
>
>                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
>                               <0x0 0x10000000 0x0 0x04000000 0x0
> AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
>                               <0x0 0x20000000 0x0 0x10000000 0x0
> AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
>                               <0x0 0x58000000 0x0 0x08000000 0x0
> AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
>
> Does the above sound good?
I've no idea. But for working around, I would give Acked-by.

>
> > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > without pbmt when they don't have cache coherency protocol. Do you
> > need to inject dma_sync for desc synchronization? What's the effect of
> > dynamic PMA in the patch series?
> >
> Currently we have setup the pma regions as below:
>
> l2cache: cache-controller@13400000 {
>                 compatible = "andestech,ax45mp-cache", "cache";
>                 cache-size = <0x40000>;
>                 cache-line-size = <64>;
>                 cache-sets = <1024>;
>                 cache-unified;
>                 reg = <0x0 0x13400000 0x0 0x100000>;
>                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
>                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
>                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
>                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
>                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
>         };
>
> The last pma-regions entry 0x58000000 is a DDR location this memory
> locations is marked as shared DMA pool with below in DT,
>
>     reserved-memory {
>         #address-cells = <2>;
>         #size-cells = <2>;
>         ranges;
>
>         reserved: linux,cma@58000000 {
>             compatible = "shared-dma-pool";
>             no-map;
>             linux,dma-default;
>             reg = <0x0 0x58000000 0x0 0x08000000>;
>         };
>     };
>
> And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> automatically fall into this region which is non-cacheable but
> bufferable (set in PMA) and rest everything is taken care by clean and
> flush callbacks. We dont have  inject dma_sync for desc
> synchronization for existing drivers (which are shared with Renesas
> RZ/G2L family)
Better than I thought :). The "non-cacheable but bufferable" is "weak
order," also raising the bufferable signal of AXI transactions. Right?
But some drivers think ctrl desc is strong order without bufferable
and don't put any mb() before/after IO control operations.

>
> Cheers,
> Prabhakar
Lad, Prabhakar Oct. 6, 2022, 3:36 p.m. UTC | #14
Hi Guo,

On Thu, Oct 6, 2022 at 1:59 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
> <prabhakar.csengg@gmail.com> wrote:
> >
> > Hi Guo,
> >
> > On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > > <prabhakar.csengg@gmail.com> wrote:
> > > >
> > > > Hi Guo,
> > > >
> > > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > > >
> > > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > > >
> > > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > > >
> > > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > > synchronize when doing DMA transactions.
> > > > > >
> > > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > > registers to control the attributes of memory locations in interest.
> > > > > >
> > > > > > Below are the memory attributes supported:
> > > > > > * Device, Non-bufferable
> > > > > > * Device, bufferable
> > > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > > * Memory, Non-cacheable, Bufferable
> > > > > > * Memory, Write-back, No-allocate
> > > > > > * Memory, Write-back, Read-allocate
> > > > > > * Memory, Write-back, Write-allocate
> > > > > > * Memory, Write-back, Read and Write-allocate
> > > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > > but Svpbmt can't.
> > > > >
> > > > Sorry I didn't get what you meant here, could you please elaborate.
> > > I know there is no pbmt in AX45MP, I am just curious how many physical
> > > memory attributes you would use in linux? It seems only one type used
> > > in the series:
> > > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> > >
> > Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> > wondering if we could send these options as flags from DT something
> > like below so that it's not hard coded in the code.
> >
> > /* PMA config */
> > #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> > /* OFF: PMA entry is disabled */
> > #define AX45MP_PMACFG_ETYP_DISABLED            0
> > /* Naturally aligned power of 2 region */
> > #define AX45MP_PMACFG_ETYP_NAPOT            3
> >
> > #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> > /* Device, Non-bufferable */
> > #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> > /* Device, bufferable */
> > #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> > /* Memory, Non-cacheable, Non-bufferable */
> > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> > /* Memory, Non-cacheable, Bufferable */
> > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> > /* Memory, Write-back, No-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> > /* Memory, Write-back, Read-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> > /* Memory, Write-back, Write-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> > /* Memory, Write-back, Read and Write-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
> >
> > /* AMO instructions are supported */
> > #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> > /* AMO instructions are not supported */
> > #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
> >
> >
> >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> >                               <0x0 0x10000000 0x0 0x04000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
> >                               <0x0 0x20000000 0x0 0x10000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> >                               <0x0 0x58000000 0x0 0x08000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
> >
> > Does the above sound good?
> I've no idea. But for working around, I would give Acked-by.
>
> >
> > > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > > without pbmt when they don't have cache coherency protocol. Do you
> > > need to inject dma_sync for desc synchronization? What's the effect of
> > > dynamic PMA in the patch series?
> > >
> > Currently we have setup the pma regions as below:
> >
> > l2cache: cache-controller@13400000 {
> >                 compatible = "andestech,ax45mp-cache", "cache";
> >                 cache-size = <0x40000>;
> >                 cache-line-size = <64>;
> >                 cache-sets = <1024>;
> >                 cache-unified;
> >                 reg = <0x0 0x13400000 0x0 0x100000>;
> >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
> >                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
> >                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
> >                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
> >                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
> >         };
> >
> > The last pma-regions entry 0x58000000 is a DDR location this memory
> > locations is marked as shared DMA pool with below in DT,
> >
> >     reserved-memory {
> >         #address-cells = <2>;
> >         #size-cells = <2>;
> >         ranges;
> >
> >         reserved: linux,cma@58000000 {
> >             compatible = "shared-dma-pool";
> >             no-map;
> >             linux,dma-default;
> >             reg = <0x0 0x58000000 0x0 0x08000000>;
> >         };
> >     };
> >
> > And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> > IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> > automatically fall into this region which is non-cacheable but
> > bufferable (set in PMA) and rest everything is taken care by clean and
> > flush callbacks. We dont have  inject dma_sync for desc
> > synchronization for existing drivers (which are shared with Renesas
> > RZ/G2L family)
> Better than I thought :). The "non-cacheable but bufferable" is "weak
> order," also raising the bufferable signal of AXI transactions. Right?
I've asked the HW team regarding this to confirm.

> But some drivers think ctrl desc is strong order without bufferable
> and don't put any mb() before/after IO control operations.
>
So far with current testing of suffering block (dmac/emmc/usb/eth)
drivers we have not seen any issues so far.

Cheers,
Prabhakar
Lad, Prabhakar Oct. 11, 2022, 9:38 a.m. UTC | #15
Hi Guo,

On Thu, Oct 6, 2022 at 1:59 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
> <prabhakar.csengg@gmail.com> wrote:
> >
> > Hi Guo,
> >
> > On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > > <prabhakar.csengg@gmail.com> wrote:
> > > >
> > > > Hi Guo,
> > > >
> > > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > > >
> > > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > > >
> > > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > > >
> > > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > > synchronize when doing DMA transactions.
> > > > > >
> > > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > > registers to control the attributes of memory locations in interest.
> > > > > >
> > > > > > Below are the memory attributes supported:
> > > > > > * Device, Non-bufferable
> > > > > > * Device, bufferable
> > > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > > * Memory, Non-cacheable, Bufferable
> > > > > > * Memory, Write-back, No-allocate
> > > > > > * Memory, Write-back, Read-allocate
> > > > > > * Memory, Write-back, Write-allocate
> > > > > > * Memory, Write-back, Read and Write-allocate
> > > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > > but Svpbmt can't.
> > > > >
> > > > Sorry I didn't get what you meant here, could you please elaborate.
> > > I know there is no pbmt in AX45MP, I am just curious how many physical
> > > memory attributes you would use in linux? It seems only one type used
> > > in the series:
> > > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> > >
> > Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> > wondering if we could send these options as flags from DT something
> > like below so that it's not hard coded in the code.
> >
> > /* PMA config */
> > #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> > /* OFF: PMA entry is disabled */
> > #define AX45MP_PMACFG_ETYP_DISABLED            0
> > /* Naturally aligned power of 2 region */
> > #define AX45MP_PMACFG_ETYP_NAPOT            3
> >
> > #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> > /* Device, Non-bufferable */
> > #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> > /* Device, bufferable */
> > #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> > /* Memory, Non-cacheable, Non-bufferable */
> > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> > /* Memory, Non-cacheable, Bufferable */
> > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> > /* Memory, Write-back, No-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> > /* Memory, Write-back, Read-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> > /* Memory, Write-back, Write-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> > /* Memory, Write-back, Read and Write-allocate */
> > #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
> >
> > /* AMO instructions are supported */
> > #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> > /* AMO instructions are not supported */
> > #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
> >
> >
> >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> >                               <0x0 0x10000000 0x0 0x04000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
> >                               <0x0 0x20000000 0x0 0x10000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> >                               <0x0 0x58000000 0x0 0x08000000 0x0
> > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
> >
> > Does the above sound good?
> I've no idea. But for working around, I would give Acked-by.
>
> >
> > > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > > without pbmt when they don't have cache coherency protocol. Do you
> > > need to inject dma_sync for desc synchronization? What's the effect of
> > > dynamic PMA in the patch series?
> > >
> > Currently we have setup the pma regions as below:
> >
> > l2cache: cache-controller@13400000 {
> >                 compatible = "andestech,ax45mp-cache", "cache";
> >                 cache-size = <0x40000>;
> >                 cache-line-size = <64>;
> >                 cache-sets = <1024>;
> >                 cache-unified;
> >                 reg = <0x0 0x13400000 0x0 0x100000>;
> >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
> >                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
> >                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
> >                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
> >                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
> >         };
> >
> > The last pma-regions entry 0x58000000 is a DDR location this memory
> > locations is marked as shared DMA pool with below in DT,
> >
> >     reserved-memory {
> >         #address-cells = <2>;
> >         #size-cells = <2>;
> >         ranges;
> >
> >         reserved: linux,cma@58000000 {
> >             compatible = "shared-dma-pool";
> >             no-map;
> >             linux,dma-default;
> >             reg = <0x0 0x58000000 0x0 0x08000000>;
> >         };
> >     };
> >
> > And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> > IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> > automatically fall into this region which is non-cacheable but
> > bufferable (set in PMA) and rest everything is taken care by clean and
> > flush callbacks. We dont have  inject dma_sync for desc
> > synchronization for existing drivers (which are shared with Renesas
> > RZ/G2L family)
> Better than I thought :). The "non-cacheable but bufferable" is "weak
> order," also raising the bufferable signal of AXI transactions. Right?
Yes, I have confirmed from the HW team it does raise bufferable signal
of AXI transactions. So far with the drivers (ETH/USB/DMAC) we haven't
seen issues so far.

Do you foresee any issues?

Cheers,
Prabhakar
Guo Ren Oct. 11, 2022, 1:10 p.m. UTC | #16
On Tue, Oct 11, 2022 at 5:39 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Guo,
>
> On Thu, Oct 6, 2022 at 1:59 AM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
> > <prabhakar.csengg@gmail.com> wrote:
> > >
> > > Hi Guo,
> > >
> > > On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> > > >
> > > > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > > > <prabhakar.csengg@gmail.com> wrote:
> > > > >
> > > > > Hi Guo,
> > > > >
> > > > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > > > >
> > > > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > > > synchronize when doing DMA transactions.
> > > > > > >
> > > > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > > > registers to control the attributes of memory locations in interest.
> > > > > > >
> > > > > > > Below are the memory attributes supported:
> > > > > > > * Device, Non-bufferable
> > > > > > > * Device, bufferable
> > > > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > > > * Memory, Non-cacheable, Bufferable
> > > > > > > * Memory, Write-back, No-allocate
> > > > > > > * Memory, Write-back, Read-allocate
> > > > > > > * Memory, Write-back, Write-allocate
> > > > > > > * Memory, Write-back, Read and Write-allocate
> > > > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > > > but Svpbmt can't.
> > > > > >
> > > > > Sorry I didn't get what you meant here, could you please elaborate.
> > > > I know there is no pbmt in AX45MP, I am just curious how many physical
> > > > memory attributes you would use in linux? It seems only one type used
> > > > in the series:
> > > > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > > > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> > > >
> > > Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> > > wondering if we could send these options as flags from DT something
> > > like below so that it's not hard coded in the code.
> > >
> > > /* PMA config */
> > > #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> > > /* OFF: PMA entry is disabled */
> > > #define AX45MP_PMACFG_ETYP_DISABLED            0
> > > /* Naturally aligned power of 2 region */
> > > #define AX45MP_PMACFG_ETYP_NAPOT            3
> > >
> > > #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> > > /* Device, Non-bufferable */
> > > #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> > > /* Device, bufferable */
> > > #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> > > /* Memory, Non-cacheable, Non-bufferable */
> > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> > > /* Memory, Non-cacheable, Bufferable */
> > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> > > /* Memory, Write-back, No-allocate */
> > > #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> > > /* Memory, Write-back, Read-allocate */
> > > #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> > > /* Memory, Write-back, Write-allocate */
> > > #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> > > /* Memory, Write-back, Read and Write-allocate */
> > > #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
> > >
> > > /* AMO instructions are supported */
> > > #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> > > /* AMO instructions are not supported */
> > > #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
> > >
> > >
> > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > >                               <0x0 0x10000000 0x0 0x04000000 0x0
> > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
> > >                               <0x0 0x20000000 0x0 0x10000000 0x0
> > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > >                               <0x0 0x58000000 0x0 0x08000000 0x0
> > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
> > >
> > > Does the above sound good?
> > I've no idea. But for working around, I would give Acked-by.
> >
> > >
> > > > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > > > without pbmt when they don't have cache coherency protocol. Do you
> > > > need to inject dma_sync for desc synchronization? What's the effect of
> > > > dynamic PMA in the patch series?
> > > >
> > > Currently we have setup the pma regions as below:
> > >
> > > l2cache: cache-controller@13400000 {
> > >                 compatible = "andestech,ax45mp-cache", "cache";
> > >                 cache-size = <0x40000>;
> > >                 cache-line-size = <64>;
> > >                 cache-sets = <1024>;
> > >                 cache-unified;
> > >                 reg = <0x0 0x13400000 0x0 0x100000>;
> > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
> > >                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
> > >                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
> > >                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
> > >                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
> > >         };
> > >
> > > The last pma-regions entry 0x58000000 is a DDR location this memory
> > > locations is marked as shared DMA pool with below in DT,
> > >
> > >     reserved-memory {
> > >         #address-cells = <2>;
> > >         #size-cells = <2>;
> > >         ranges;
> > >
> > >         reserved: linux,cma@58000000 {
> > >             compatible = "shared-dma-pool";
> > >             no-map;
> > >             linux,dma-default;
> > >             reg = <0x0 0x58000000 0x0 0x08000000>;
> > >         };
> > >     };
> > >
> > > And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> > > IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> > > automatically fall into this region which is non-cacheable but
> > > bufferable (set in PMA) and rest everything is taken care by clean and
> > > flush callbacks. We dont have  inject dma_sync for desc
> > > synchronization for existing drivers (which are shared with Renesas
> > > RZ/G2L family)
> > Better than I thought :). The "non-cacheable but c" is "weak
> > order," also raising the bufferable signal of AXI transactions. Right?
> Yes, I have confirmed from the HW team it does raise bufferable signal
> of AXI transactions. So far with the drivers (ETH/USB/DMAC) we haven't
> seen issues so far.
>
> Do you foresee any issues?
That depends on you interconnect design, most of the simple
interconnects would ignore bufferable. Some NoC interconnects would
buffer the transactions, which means data would be buffered in
interconnects after CPU store instruction retired. If the CPU kicks
the dma working with an IO reg write, hw may not guarantee the orders
of the last data written and dma IO reg kick start. Then dma may lose
the data.

Not only for the interconnect, but also "noncacheable + weak order"
would cause data to stay in the CPU store buffer after the store
instruction retired.

>
> Cheers,
> Prabhakar
Lad, Prabhakar Oct. 17, 2022, 9:39 a.m. UTC | #17
Hi Guo,

On Tue, Oct 11, 2022 at 2:10 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Oct 11, 2022 at 5:39 PM Lad, Prabhakar
> <prabhakar.csengg@gmail.com> wrote:
> >
> > Hi Guo,
> >
> > On Thu, Oct 6, 2022 at 1:59 AM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
> > > <prabhakar.csengg@gmail.com> wrote:
> > > >
> > > > Hi Guo,
> > > >
> > > > On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> > > > >
> > > > > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > > > > <prabhakar.csengg@gmail.com> wrote:
> > > > > >
> > > > > > Hi Guo,
> > > > > >
> > > > > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > > > > >
> > > > > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > > > > synchronize when doing DMA transactions.
> > > > > > > >
> > > > > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > > > > registers to control the attributes of memory locations in interest.
> > > > > > > >
> > > > > > > > Below are the memory attributes supported:
> > > > > > > > * Device, Non-bufferable
> > > > > > > > * Device, bufferable
> > > > > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > > > > * Memory, Non-cacheable, Bufferable
> > > > > > > > * Memory, Write-back, No-allocate
> > > > > > > > * Memory, Write-back, Read-allocate
> > > > > > > > * Memory, Write-back, Write-allocate
> > > > > > > > * Memory, Write-back, Read and Write-allocate
> > > > > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > > > > but Svpbmt can't.
> > > > > > >
> > > > > > Sorry I didn't get what you meant here, could you please elaborate.
> > > > > I know there is no pbmt in AX45MP, I am just curious how many physical
> > > > > memory attributes you would use in linux? It seems only one type used
> > > > > in the series:
> > > > > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > > > > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> > > > >
> > > > Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> > > > wondering if we could send these options as flags from DT something
> > > > like below so that it's not hard coded in the code.
> > > >
> > > > /* PMA config */
> > > > #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> > > > /* OFF: PMA entry is disabled */
> > > > #define AX45MP_PMACFG_ETYP_DISABLED            0
> > > > /* Naturally aligned power of 2 region */
> > > > #define AX45MP_PMACFG_ETYP_NAPOT            3
> > > >
> > > > #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> > > > /* Device, Non-bufferable */
> > > > #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> > > > /* Device, bufferable */
> > > > #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> > > > /* Memory, Non-cacheable, Non-bufferable */
> > > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> > > > /* Memory, Non-cacheable, Bufferable */
> > > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> > > > /* Memory, Write-back, No-allocate */
> > > > #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> > > > /* Memory, Write-back, Read-allocate */
> > > > #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> > > > /* Memory, Write-back, Write-allocate */
> > > > #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> > > > /* Memory, Write-back, Read and Write-allocate */
> > > > #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
> > > >
> > > > /* AMO instructions are supported */
> > > > #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> > > > /* AMO instructions are not supported */
> > > > #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
> > > >
> > > >
> > > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > > >                               <0x0 0x10000000 0x0 0x04000000 0x0
> > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
> > > >                               <0x0 0x20000000 0x0 0x10000000 0x0
> > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > > >                               <0x0 0x58000000 0x0 0x08000000 0x0
> > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
> > > >
> > > > Does the above sound good?
> > > I've no idea. But for working around, I would give Acked-by.
> > >
> > > >
> > > > > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > > > > without pbmt when they don't have cache coherency protocol. Do you
> > > > > need to inject dma_sync for desc synchronization? What's the effect of
> > > > > dynamic PMA in the patch series?
> > > > >
> > > > Currently we have setup the pma regions as below:
> > > >
> > > > l2cache: cache-controller@13400000 {
> > > >                 compatible = "andestech,ax45mp-cache", "cache";
> > > >                 cache-size = <0x40000>;
> > > >                 cache-line-size = <64>;
> > > >                 cache-sets = <1024>;
> > > >                 cache-unified;
> > > >                 reg = <0x0 0x13400000 0x0 0x100000>;
> > > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
> > > >                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
> > > >                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
> > > >                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
> > > >                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
> > > >         };
> > > >
> > > > The last pma-regions entry 0x58000000 is a DDR location this memory
> > > > locations is marked as shared DMA pool with below in DT,
> > > >
> > > >     reserved-memory {
> > > >         #address-cells = <2>;
> > > >         #size-cells = <2>;
> > > >         ranges;
> > > >
> > > >         reserved: linux,cma@58000000 {
> > > >             compatible = "shared-dma-pool";
> > > >             no-map;
> > > >             linux,dma-default;
> > > >             reg = <0x0 0x58000000 0x0 0x08000000>;
> > > >         };
> > > >     };
> > > >
> > > > And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> > > > IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> > > > automatically fall into this region which is non-cacheable but
> > > > bufferable (set in PMA) and rest everything is taken care by clean and
> > > > flush callbacks. We dont have  inject dma_sync for desc
> > > > synchronization for existing drivers (which are shared with Renesas
> > > > RZ/G2L family)
> > > Better than I thought :). The "non-cacheable but c" is "weak
> > > order," also raising the bufferable signal of AXI transactions. Right?
> > Yes, I have confirmed from the HW team it does raise bufferable signal
> > of AXI transactions. So far with the drivers (ETH/USB/DMAC) we haven't
> > seen issues so far.
> >
> > Do you foresee any issues?
> That depends on you interconnect design, most of the simple
> interconnects would ignore bufferable. Some NoC interconnects would
> buffer the transactions, which means data would be buffered in
> interconnects after CPU store instruction retired. If the CPU kicks
> the dma working with an IO reg write, hw may not guarantee the orders
> of the last data written and dma IO reg kick start. Then dma may lose
> the data.
>
I haven't see this issue, maybe to avoid this the controller register
space could be marked as non-cachebale + non-bufferable in the PMA by
this way we could ensure orders.

What do you think?

Cheers,
Prabhakar
Guo Ren Oct. 17, 2022, 12:36 p.m. UTC | #18
On Mon, Oct 17, 2022 at 5:40 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Guo,
>
> On Tue, Oct 11, 2022 at 2:10 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Oct 11, 2022 at 5:39 PM Lad, Prabhakar
> > <prabhakar.csengg@gmail.com> wrote:
> > >
> > > Hi Guo,
> > >
> > > On Thu, Oct 6, 2022 at 1:59 AM Guo Ren <guoren@kernel.org> wrote:
> > > >
> > > > On Wed, Oct 5, 2022 at 11:03 PM Lad, Prabhakar
> > > > <prabhakar.csengg@gmail.com> wrote:
> > > > >
> > > > > Hi Guo,
> > > > >
> > > > > On Wed, Oct 5, 2022 at 3:23 PM Guo Ren <guoren@kernel.org> wrote:
> > > > > >
> > > > > > On Wed, Oct 5, 2022 at 8:54 PM Lad, Prabhakar
> > > > > > <prabhakar.csengg@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Guo,
> > > > > > >
> > > > > > > On Wed, Oct 5, 2022 at 2:29 AM Guo Ren <guoren@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 4, 2022 at 6:32 AM Prabhakar <prabhakar.csengg@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> > > > > > > > >
> > > > > > > > > On the AX45MP core, cache coherency is a specification option so it may
> > > > > > > > > not be supported. In this case DMA will fail. As a workaround, firstly we
> > > > > > > > > allocate a global dma coherent pool from which DMA allocations are taken
> > > > > > > > > and marked as non-cacheable + bufferable using the PMA region as specified
> > > > > > > > > in the device tree. Synchronization callbacks are implemented to
> > > > > > > > > synchronize when doing DMA transactions.
> > > > > > > > >
> > > > > > > > > The Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
> > > > > > > > > block that allows dynamic adjustment of memory attributes in the runtime.
> > > > > > > > > It contains a configurable amount of PMA entries implemented as CSR
> > > > > > > > > registers to control the attributes of memory locations in interest.
> > > > > > > > >
> > > > > > > > > Below are the memory attributes supported:
> > > > > > > > > * Device, Non-bufferable
> > > > > > > > > * Device, bufferable
> > > > > > > > > * Memory, Non-cacheable, Non-bufferable
> > > > > > > > > * Memory, Non-cacheable, Bufferable
> > > > > > > > > * Memory, Write-back, No-allocate
> > > > > > > > > * Memory, Write-back, Read-allocate
> > > > > > > > > * Memory, Write-back, Write-allocate
> > > > > > > > > * Memory, Write-back, Read and Write-allocate
> > > > > > > > Seems Svpbmt's PMA, IO, and NC wouldn't fit your requirements, could
> > > > > > > > give a map list of the types of Svpbmt? And give out what you needed,
> > > > > > > > but Svpbmt can't.
> > > > > > > >
> > > > > > > Sorry I didn't get what you meant here, could you please elaborate.
> > > > > > I know there is no pbmt in AX45MP, I am just curious how many physical
> > > > > > memory attributes you would use in linux? It seems only one type used
> > > > > > in the series:
> > > > > > cpu_nocache_area_set -> sbi_ecall(SBI_EXT_ANDES,
> > > > > > SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
> > > > > >
> > > > > Yes, currently we only use "Memory, Non-cacheable, Bufferable". I was
> > > > > wondering if we could send these options as flags from DT something
> > > > > like below so that it's not hard coded in the code.
> > > > >
> > > > > /* PMA config */
> > > > > #define AX45MP_PMACFG_ETYP                GENMASK(1, 0)
> > > > > /* OFF: PMA entry is disabled */
> > > > > #define AX45MP_PMACFG_ETYP_DISABLED            0
> > > > > /* Naturally aligned power of 2 region */
> > > > > #define AX45MP_PMACFG_ETYP_NAPOT            3
> > > > >
> > > > > #define AX45MP_PMACFG_MTYP                GENMASK(5, 2)
> > > > > /* Device, Non-bufferable */
> > > > > #define AX45MP_PMACFG_MTYP_DEV_NON_BUF            (0 << 2)
> > > > > /* Device, bufferable */
> > > > > #define AX45MP_PMACFG_MTYP_DEV_BUF            (1 << 2)
> > > > > /* Memory, Non-cacheable, Non-bufferable */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_NON_BUF    (2 << 2)
> > > > > /* Memory, Non-cacheable, Bufferable */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF        (3 << 2)
> > > > > /* Memory, Write-back, No-allocate */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_WB_NA            (8 << 2)
> > > > > /* Memory, Write-back, Read-allocate */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_WB_RA            (9 << 2)
> > > > > /* Memory, Write-back, Write-allocate */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_WB_WA            (10 << 2)
> > > > > /* Memory, Write-back, Read and Write-allocate */
> > > > > #define AX45MP_PMACFG_MTYP_MEM_WB_R_WA            (11 << 2)
> > > > >
> > > > > /* AMO instructions are supported */
> > > > > #define AX45MP_PMACFG_NAMO_AMO_SUPPORT            (0 << 6)
> > > > > /* AMO instructions are not supported */
> > > > > #define AX45MP_PMACFG_NAMO_AMO_NO_SUPPORT        (1 << 6)
> > > > >
> > > > >
> > > > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0
> > > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > > > >                               <0x0 0x10000000 0x0 0x04000000 0x0
> > > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT >,
> > > > >                               <0x0 0x20000000 0x0 0x10000000 0x0
> > > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>,
> > > > >                               <0x0 0x58000000 0x0 0x08000000 0x0
> > > > > AX45MP_PMACFG_ETYP_NAPOT |  AX45MP_PMACFG_MTYP_MEM_NON_CACHE_BUF |
> > > > > AX45MP_PMACFG_NAMO_AMO_SUPPORT>;
> > > > >
> > > > > Does the above sound good?
> > > > I've no idea. But for working around, I would give Acked-by.
> > > >
> > > > >
> > > > > > I'm not sure how you make emmc/usb/gmac's dma ctrl desc work around
> > > > > > without pbmt when they don't have cache coherency protocol. Do you
> > > > > > need to inject dma_sync for desc synchronization? What's the effect of
> > > > > > dynamic PMA in the patch series?
> > > > > >
> > > > > Currently we have setup the pma regions as below:
> > > > >
> > > > > l2cache: cache-controller@13400000 {
> > > > >                 compatible = "andestech,ax45mp-cache", "cache";
> > > > >                 cache-size = <0x40000>;
> > > > >                 cache-line-size = <64>;
> > > > >                 cache-sets = <1024>;
> > > > >                 cache-unified;
> > > > >                 reg = <0x0 0x13400000 0x0 0x100000>;
> > > > >                 pma-regions = <0x0 0x00000000 0x0 0x10000000 0x0 0xf>,
> > > > >                               <0x0 0x10000000 0x0 0x04000000 0x0 0xf>,
> > > > >                               <0x0 0x20000000 0x0 0x10000000 0x0 0xf>,
> > > > >                               <0x0 0x58000000 0x0 0x08000000 0x0 0xf>;
> > > > >                 interrupts = <SOC_PERIPHERAL_IRQ(476, IRQ_TYPE_LEVEL_HIGH)>;
> > > > >         };
> > > > >
> > > > > The last pma-regions entry 0x58000000 is a DDR location this memory
> > > > > locations is marked as shared DMA pool with below in DT,
> > > > >
> > > > >     reserved-memory {
> > > > >         #address-cells = <2>;
> > > > >         #size-cells = <2>;
> > > > >         ranges;
> > > > >
> > > > >         reserved: linux,cma@58000000 {
> > > > >             compatible = "shared-dma-pool";
> > > > >             no-map;
> > > > >             linux,dma-default;
> > > > >             reg = <0x0 0x58000000 0x0 0x08000000>;
> > > > >         };
> > > > >     };
> > > > >
> > > > > And for ARCH_R9A07G043 we automatically select DMA_GLOBAL_POOL, so the
> > > > > IP blocks (emmc/usb/gmac's) requesting DMA'able memory will
> > > > > automatically fall into this region which is non-cacheable but
> > > > > bufferable (set in PMA) and rest everything is taken care by clean and
> > > > > flush callbacks. We dont have  inject dma_sync for desc
> > > > > synchronization for existing drivers (which are shared with Renesas
> > > > > RZ/G2L family)
> > > > Better than I thought :). The "non-cacheable but c" is "weak
> > > > order," also raising the bufferable signal of AXI transactions. Right?
> > > Yes, I have confirmed from the HW team it does raise bufferable signal
> > > of AXI transactions. So far with the drivers (ETH/USB/DMAC) we haven't
> > > seen issues so far.
> > >
> > > Do you foresee any issues?
> > That depends on you interconnect design, most of the simple
> > interconnects would ignore bufferable. Some NoC interconnects would
> > buffer the transactions, which means data would be buffered in
> > interconnects after CPU store instruction retired. If the CPU kicks
> > the dma working with an IO reg write, hw may not guarantee the orders
> > of the last data written and dma IO reg kick start. Then dma may lose
> > the data.
> >
> I haven't see this issue, maybe to avoid this the controller register
> space could be marked as non-cachebale + non-bufferable in the PMA by
That sounds good, and should be. Although maybe not needed.

> this way we could ensure orders.
>
> What do you think?
>
> Cheers,
> Prabhakar
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 273ece6b622f..a7c03321afa0 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -63,6 +63,14 @@  void riscv_noncoherent_supported(void);
 #define SYS_RISCV_FLUSH_ICACHE_LOCAL 1UL
 #define SYS_RISCV_FLUSH_ICACHE_ALL   (SYS_RISCV_FLUSH_ICACHE_LOCAL)
 
+#ifdef CONFIG_ARCH_R9A07G043
+void rzfive_cpu_dma_inval_range(void *vaddr, size_t end);
+void rzfive_cpu_dma_wb_range(void *vaddr, size_t end);
+
+#define ALT_CMO_OP(_op, _start, _size, _cachesize)	\
+		   _op(_start, _size);
+#endif
+
 #include <asm-generic/cacheflush.h>
 
 #endif /* _ASM_RISCV_CACHEFLUSH_H */
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 19a771085781..d9cbf60c3b65 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -89,6 +89,7 @@  asm volatile(ALTERNATIVE(						\
 #define ALT_THEAD_PMA(_val)
 #endif
 
+#ifdef CONFIG_ERRATA_THEAD_CMO
 /*
  * dcache.ipa rs1 (invalidate, physical address)
  * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
@@ -143,5 +144,6 @@  asm volatile(ALTERNATIVE_2(						\
 	: "a0")
 
 #endif /* __ASSEMBLY__ */
+#endif
 
 #endif
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 2a0ef738695e..10a7c855d125 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -37,6 +37,7 @@  enum sbi_ext_id {
 
 	/* Vendor extensions must lie within this range */
 	SBI_EXT_VENDOR_START = 0x09000000,
+	SBI_EXT_ANDES = 0x0900031E,
 	SBI_EXT_VENDOR_END = 0x09FFFFFF,
 };
 
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index e3f9bdf47c5f..576601f180ea 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -22,13 +22,25 @@  void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 
 	switch (dir) {
 	case DMA_TO_DEVICE:
+#ifdef CONFIG_ERRATA_THEAD_CMO
 		ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
+#elif CONFIG_ARCH_R9A07G043
+		ALT_CMO_OP(rzfive_cpu_dma_wb_range, vaddr, size, 0x0);
+#endif
 		break;
 	case DMA_FROM_DEVICE:
+#ifdef CONFIG_ERRATA_THEAD_CMO
 		ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
+#elif CONFIG_ARCH_R9A07G043
+		ALT_CMO_OP(rzfive_cpu_dma_inval_range, vaddr, size, 0x0);
+#endif
 		break;
 	case DMA_BIDIRECTIONAL:
+#ifdef CONFIG_ERRATA_THEAD_CMO
 		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+#elif CONFIG_ARCH_R9A07G043
+		ALT_CMO_OP(rzfive_cpu_dma_wb_range, vaddr, size, 0x0);
+#endif
 		break;
 	default:
 		break;
@@ -45,7 +57,11 @@  void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 		break;
 	case DMA_FROM_DEVICE:
 	case DMA_BIDIRECTIONAL:
+#ifdef CONFIG_ERRATA_THEAD_CMO
 		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+#elif CONFIG_ARCH_R9A07G043
+		ALT_CMO_OP(rzfive_cpu_dma_inval_range, vaddr, size, 0x0);
+#endif
 		break;
 	default:
 		break;
@@ -54,14 +70,17 @@  void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 
 void arch_dma_prep_coherent(struct page *page, size_t size)
 {
+#ifdef CONFIG_ERRATA_THEAD_CMO
 	void *flush_addr = page_address(page);
 
 	ALT_CMO_OP(flush, flush_addr, size, riscv_cbom_block_size);
+#endif
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		const struct iommu_ops *iommu, bool coherent)
 {
+#ifdef CONFIG_ERRATA_THEAD_CMO
 	WARN_TAINT(!coherent && riscv_cbom_block_size > ARCH_DMA_MINALIGN,
 		   TAINT_CPU_OUT_OF_SPEC,
 		   "%s %s: ARCH_DMA_MINALIGN smaller than riscv,cbom-block-size (%d < %d)",
@@ -73,6 +92,7 @@  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		   dev_driver_string(dev), dev_name(dev));
 
 	dev->dma_coherent = coherent;
+#endif
 }
 
 #ifdef CONFIG_RISCV_ISA_ZICBOM
diff --git a/drivers/soc/renesas/Makefile b/drivers/soc/renesas/Makefile
index 535868c9c7e4..a20cc7ad5b12 100644
--- a/drivers/soc/renesas/Makefile
+++ b/drivers/soc/renesas/Makefile
@@ -31,6 +31,10 @@  ifdef CONFIG_SMP
 obj-$(CONFIG_ARCH_R9A06G032)	+= r9a06g032-smp.o
 endif
 
+ifdef CONFIG_RISCV
+obj-y += rzf/
+endif
+
 # Family
 obj-$(CONFIG_RST_RCAR)		+= rcar-rst.o
 obj-$(CONFIG_SYSC_RCAR)		+= rcar-sysc.o
diff --git a/drivers/soc/renesas/rzf/Makefile b/drivers/soc/renesas/rzf/Makefile
new file mode 100644
index 000000000000..e397ba2c733f
--- /dev/null
+++ b/drivers/soc/renesas/rzf/Makefile
@@ -0,0 +1,3 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_ARCH_R9A07G043) += ax45mp_cache.o
diff --git a/drivers/soc/renesas/rzf/ax45mp_cache.c b/drivers/soc/renesas/rzf/ax45mp_cache.c
new file mode 100644
index 000000000000..6eca32aef33e
--- /dev/null
+++ b/drivers/soc/renesas/rzf/ax45mp_cache.c
@@ -0,0 +1,365 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PMA setup and non-coherent cache functions for AX45MP
+ *
+ * Copyright (C) 2022 Renesas Electronics Corp.
+ */
+
+#include <linux/cacheinfo.h>
+#include <linux/of_address.h>
+
+#include <asm/sbi.h>
+
+#include "rzf_sbi.h"
+
+/* D-cache operation */
+#define CCTL_L1D_VA_INVAL		0
+#define CCTL_L1D_VA_WB			1
+
+/* L2 cache */
+#define L2_CACHE_CTL_CEN_MASK		1
+
+/* L2 cache registers */
+#define L2C_REG_CTL_OFFSET		0x8
+#define L2C_REG_C0_CMD_OFFSET		0x40
+#define L2C_REG_C0_ACC_OFFSET		0x48
+#define L2C_REG_STATUS_OFFSET		0x80
+
+/* L2 CCTL status */
+#define CCTL_L2_STATUS_IDLE		0
+
+/* L2 CCTL status cores mask */
+#define CCTL_L2_STATUS_C0_MASK		0xf
+
+/* L2 cache operation */
+#define CCTL_L2_PA_INVAL		0x8
+#define CCTL_L2_PA_WB			0x9
+
+#define L2C_HPM_PER_CORE_OFFSET		0x8
+#define L2C_REG_PER_CORE_OFFSET		0x10
+#define CCTL_L2_STATUS_PER_CORE_OFFSET	4
+
+#define L2C_REG_CN_CMD_OFFSET(n)	\
+	(L2C_REG_C0_CMD_OFFSET + ((n) * L2C_REG_PER_CORE_OFFSET))
+#define L2C_REG_CN_ACC_OFFSET(n)	\
+	(L2C_REG_C0_ACC_OFFSET + ((n) * L2C_REG_PER_CORE_OFFSET))
+#define CCTL_L2_STATUS_CN_MASK(n)	\
+	(CCTL_L2_STATUS_C0_MASK << ((n) * CCTL_L2_STATUS_PER_CORE_OFFSET))
+
+#define MICM_CFG_ISZ_OFFSET		6
+#define MICM_CFG_ISZ_MASK		(0x7  << MICM_CFG_ISZ_OFFSET)
+
+#define MDCM_CFG_DSZ_OFFSET		6
+#define MDCM_CFG_DSZ_MASK		(0x7  << MDCM_CFG_DSZ_OFFSET)
+
+#define CCTL_REG_UCCTLBEGINADDR_NUM	0x80b
+#define CCTL_REG_UCCTLCOMMAND_NUM	0x80c
+
+#define MCACHE_CTL_CCTL_SUEN_OFFSET	8
+#define MMSC_CFG_CCTLCSR_OFFSET		16
+#define MISA_20_OFFSET			20
+
+#define MCACHE_CTL_CCTL_SUEN_MASK	(0x1 << MCACHE_CTL_CCTL_SUEN_OFFSET)
+#define MMSC_CFG_CCTLCSR_MASK		(0x1 << MMSC_CFG_CCTLCSR_OFFSET)
+#define MISA_20_MASK			(0x1 << MISA_20_OFFSET)
+
+#define MAX_CACHE_LINE_SIZE	256
+
+#define ANDES_AX45MP_MAX_PMA_REGIONS	16
+
+struct pma_arg_t {
+	phys_addr_t offset;
+	unsigned long vaddr;
+	size_t size;
+	size_t entry_id;
+};
+
+struct ax45mp_cache_info {
+	bool init_done;
+	int dcache_line_size;
+};
+
+static DEFINE_PER_CPU(struct ax45mp_cache_info, cpu_cache_info) = {
+	.init_done = 0,
+	.dcache_line_size = SZ_64,
+};
+
+static void __iomem *l2c_base;
+
+/* -----------------------------------------------------------------------------
+ * PMA setup
+ */
+static long sbi_set_pma(void *arg)
+{
+	phys_addr_t offset = ((struct pma_arg_t *)arg)->offset;
+	unsigned long vaddr = ((struct pma_arg_t *)arg)->vaddr;
+	size_t entry_id = ((struct pma_arg_t *)arg)->entry_id;
+	size_t size = ((struct pma_arg_t *)arg)->size;
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_SET_PMA, offset, vaddr, size, entry_id, 0, 0);
+
+	return ret.value;
+}
+
+static unsigned long cpu_nocache_area_set(unsigned long start,
+					  unsigned long size,
+					  unsigned long entry_id)
+{
+	struct pma_arg_t pma_arg;
+	unsigned long ret = 0;
+
+	pma_arg.offset = start;
+	pma_arg.size = size;
+	pma_arg.vaddr = start + size;
+	pma_arg.entry_id = entry_id;
+	ret = sbi_set_pma(&pma_arg);
+
+	return ret;
+}
+
+static void ax45mp_configure_pma_regions(struct device_node *np, int count)
+{
+	u64 start, size;
+	unsigned int i;
+
+	for (i = 0 ; i < count ; i++) {
+		of_property_read_u64_index(np, "pma-regions", (i << 1), &start);
+		of_property_read_u64_index(np, "pma-regions", (i << 1) + 1, &size);
+		cpu_nocache_area_set(start, size, i);
+	}
+}
+
+/* -----------------------------------------------------------------------------
+ * L2 Cache operations
+ */
+static uint32_t cpu_get_mcache_ctl_status(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MCACHE_CTL_STATUS, 0, 0, 0, 0, 0, 0);
+	return ret.value;
+}
+
+static uint32_t cpu_get_micm_cfg_status(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MICM_CTL_STATUS, 0, 0, 0, 0, 0, 0);
+	return ret.value;
+}
+
+static uint32_t cpu_get_mdcm_cfg_status(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MDCM_CTL_STATUS, 0, 0, 0, 0, 0, 0);
+	return ret.value;
+}
+
+static uint32_t cpu_get_mmsc_cfg_status(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MMSC_CTL_STATUS, 0, 0, 0, 0, 0, 0);
+	return ret.value;
+}
+
+static uint32_t cpu_get_misa_cfg_status(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_ANDES, SBI_EXT_ANDES_GET_MISA_CTL_STATUS, 0, 0, 0, 0, 0, 0);
+	return ret.value;
+}
+
+static void fill_cpu_cache_info(struct ax45mp_cache_info *cpu_ci)
+{
+	struct cpu_cacheinfo *this_cpu_ci =
+			get_cpu_cacheinfo(smp_processor_id());
+	struct cacheinfo *this_leaf = this_cpu_ci->info_list;
+	unsigned int i;
+
+	for (i = 0; i < this_cpu_ci->num_leaves ; i++, this_leaf++) {
+		if (this_leaf->type == CACHE_TYPE_DATA)
+			cpu_ci->dcache_line_size = this_leaf->coherency_line_size;
+	}
+
+	cpu_ci->init_done = true;
+}
+
+static inline int get_cache_line_size(void)
+{
+	struct ax45mp_cache_info *cpu_ci =
+			&per_cpu(cpu_cache_info, smp_processor_id());
+
+	if (unlikely(!cpu_ci->init_done))
+		fill_cpu_cache_info(cpu_ci);
+	return cpu_ci->dcache_line_size;
+}
+
+static uint32_t cpu_l2c_get_cctl_status(void)
+{
+	return readl((void *)(l2c_base + L2C_REG_STATUS_OFFSET));
+}
+
+static uint32_t cpu_l2c_ctl_status(void)
+{
+	return readl((void *)(l2c_base + L2C_REG_CTL_OFFSET));
+}
+
+static bool cpu_cache_controlable(void)
+{
+	return (((cpu_get_micm_cfg_status() & MICM_CFG_ISZ_MASK) ||
+		 (cpu_get_mdcm_cfg_status() & MDCM_CFG_DSZ_MASK)) &&
+		(cpu_get_misa_cfg_status() & MISA_20_MASK) &&
+		(cpu_get_mmsc_cfg_status() & MMSC_CFG_CCTLCSR_MASK) &&
+		(cpu_get_mcache_ctl_status() & MCACHE_CTL_CCTL_SUEN_MASK));
+}
+
+static void cpu_dcache_wb_range(unsigned long start,
+				unsigned long end,
+				int line_size)
+{
+	bool ucctl_ok = false;
+	unsigned long pa;
+	int mhartid = 0;
+#ifdef CONFIG_SMP
+	mhartid = smp_processor_id();
+#endif
+
+	ucctl_ok = cpu_cache_controlable();
+
+	while (end > start) {
+		if (ucctl_ok) {
+			csr_write(CCTL_REG_UCCTLBEGINADDR_NUM, start);
+			csr_write(CCTL_REG_UCCTLCOMMAND_NUM, CCTL_L1D_VA_WB);
+		}
+
+		if (l2c_base && (cpu_l2c_ctl_status() & L2_CACHE_CTL_CEN_MASK)) {
+			pa = virt_to_phys((void *)start);
+			writel(pa, (void *)(l2c_base + L2C_REG_CN_ACC_OFFSET(mhartid)));
+			writel(CCTL_L2_PA_WB, (void *)(l2c_base + L2C_REG_CN_CMD_OFFSET(mhartid)));
+			while ((cpu_l2c_get_cctl_status() &
+				CCTL_L2_STATUS_CN_MASK(mhartid)) != CCTL_L2_STATUS_IDLE)
+				;
+		}
+
+		start += line_size;
+	}
+}
+
+static void cpu_dcache_inval_range(unsigned long start,
+				   unsigned long end,
+				   int line_size)
+{
+	bool ucctl_ok = false;
+	unsigned long pa;
+	int mhartid = 0;
+#ifdef CONFIG_SMP
+	mhartid = smp_processor_id();
+#endif
+
+	ucctl_ok = cpu_cache_controlable();
+
+	while (end > start) {
+		if (ucctl_ok) {
+			csr_write(CCTL_REG_UCCTLBEGINADDR_NUM, start);
+			csr_write(CCTL_REG_UCCTLCOMMAND_NUM, CCTL_L1D_VA_INVAL);
+		}
+
+		if (l2c_base && (cpu_l2c_ctl_status() & L2_CACHE_CTL_CEN_MASK)) {
+			pa = virt_to_phys((void *)start);
+			writel(pa, (void *)(l2c_base + L2C_REG_CN_ACC_OFFSET(mhartid)));
+			writel(CCTL_L2_PA_INVAL,
+			       (void *)(l2c_base + L2C_REG_CN_CMD_OFFSET(mhartid)));
+			while ((cpu_l2c_get_cctl_status() &
+				CCTL_L2_STATUS_CN_MASK(mhartid)) != CCTL_L2_STATUS_IDLE)
+				;
+		}
+
+		start += line_size;
+	}
+}
+
+void rzfive_cpu_dma_inval_range(void *vaddr, size_t size)
+{
+	unsigned long line_size = get_cache_line_size();
+	char cache_buf[2][MAX_CACHE_LINE_SIZE] = { 0 };
+	unsigned long start = (unsigned long)vaddr;
+	unsigned long end = start + size;
+	unsigned long old_start = start;
+	unsigned long old_end = end;
+	unsigned long flags;
+
+	if (unlikely(start == end))
+		return;
+
+	start = start & (~(line_size - 1));
+	end = ((end + line_size - 1) & (~(line_size - 1)));
+
+	local_irq_save(flags);
+	if (unlikely(start != old_start))
+		memcpy(&cache_buf[0][0], (void *)start, line_size);
+
+	if (unlikely(end != old_end))
+		memcpy(&cache_buf[1][0], (void *)(old_end & (~(line_size - 1))), line_size);
+
+	cpu_dcache_inval_range(start, end, line_size);
+
+	if (unlikely(start != old_start))
+		memcpy((void *)start, &cache_buf[0][0], (old_start & (line_size - 1)));
+
+	if (unlikely(end != old_end))
+		memcpy((void *)(old_end + 1),
+		       &cache_buf[1][(old_end & (line_size - 1)) + 1],
+		       end - old_end - 1);
+
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(rzfive_cpu_dma_inval_range);
+
+void rzfive_cpu_dma_wb_range(void *vaddr, size_t size)
+{
+	unsigned long line_size = get_cache_line_size();
+	unsigned long start = (unsigned long)vaddr;
+	unsigned long end = start + size;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	start = start & (~(line_size - 1));
+	cpu_dcache_wb_range(start, end, line_size);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(rzfive_cpu_dma_wb_range);
+
+static const struct of_device_id ax45mp_cache_ids[] = {
+	{ .compatible = "andestech,ax45mp-cache" },
+	{ /* sentinel */ }
+};
+
+static int __init ax45mp_cache_init(void)
+{
+	struct device_node *np;
+	int count;
+
+	np = of_find_matching_node(NULL, ax45mp_cache_ids);
+	if (!np)
+		return -ENODEV;
+
+	l2c_base = of_iomap(np, 0);
+	if (!l2c_base)
+		return -ENOMEM;
+
+	count = of_property_count_elems_of_size(np, "pma-regions",
+						sizeof(u32) * 4);
+	if (count > ANDES_AX45MP_MAX_PMA_REGIONS) {
+		iounmap(l2c_base);
+		return -EINVAL;
+	}
+
+	ax45mp_configure_pma_regions(np, count);
+
+	return 0;
+}
+arch_initcall(ax45mp_cache_init);
diff --git a/drivers/soc/renesas/rzf/rzf_sbi.h b/drivers/soc/renesas/rzf/rzf_sbi.h
new file mode 100644
index 000000000000..854fee667276
--- /dev/null
+++ b/drivers/soc/renesas/rzf/rzf_sbi.h
@@ -0,0 +1,27 @@ 
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef __ARCH_R9A07G043_SBI_H
+#define __ARCH_R9A07G043_SBI_H
+
+enum sbi_ext_andes_fid {
+	SBI_EXT_ANDES_GET_MCACHE_CTL_STATUS = 0,
+	SBI_EXT_ANDES_GET_MMISC_CTL_STATUS,
+	SBI_EXT_ANDES_SET_MCACHE_CTL,
+	SBI_EXT_ANDES_SET_MMISC_CTL,
+	SBI_EXT_ANDES_ICACHE_OP,
+	SBI_EXT_ANDES_DCACHE_OP,
+	SBI_EXT_ANDES_L1CACHE_I_PREFETCH,
+	SBI_EXT_ANDES_L1CACHE_D_PREFETCH,
+	SBI_EXT_ANDES_NON_BLOCKING_LOAD_STORE,
+	SBI_EXT_ANDES_WRITE_AROUND,
+	SBI_EXT_ANDES_SET_PMA,
+	SBI_EXT_ANDES_FREE_PMA,
+	SBI_EXT_ANDES_PROBE_PMA,
+	SBI_EXT_ANDES_DCACHE_WBINVAL_ALL,
+	SBI_EXT_ANDES_GET_MICM_CTL_STATUS,
+	SBI_EXT_ANDES_GET_MDCM_CTL_STATUS,
+	SBI_EXT_ANDES_GET_MMSC_CTL_STATUS,
+	SBI_EXT_ANDES_GET_MISA_CTL_STATUS,
+};
+
+#endif