diff mbox

[2/3] ARM: mm: add support for HW coherent systems in PL310

Message ID 1395677872-32741-3-git-send-email-thomas.petazzoni@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Petazzoni March 24, 2014, 4:17 p.m. UTC
When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.

This commit adds a new l2x0_of_init_coherent() variant of
l2x0_of_init(), which allows the caller to tell the L2 cache
initialization that the system has hardware coherency support enabled,
and that therefore the outer cache sync operation can be skipped if
possible.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/include/asm/hardware/cache-l2x0.h |  1 +
 arch/arm/mm/cache-l2x0.c                   | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

Comments

Catalin Marinas April 8, 2014, 5:24 p.m. UTC | #1
On Mon, Mar 24, 2014 at 04:17:51PM +0000, Thomas Petazzoni wrote:
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -968,7 +968,7 @@ static const struct of_device_id l2x0_ids[] __initconst = {
>  	{}
>  };
>  
> -int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> +int __init l2x0_of_init_common(u32 aux_val, u32 aux_mask, bool is_coherent)
>  {
>  	struct device_node *np;
>  	const struct l2x0_of_data *data;
> @@ -1005,8 +1005,28 @@ int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
>  
>  	of_init = true;
>  	memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> +
> +	/*
> +	 * PL310 doesn't need an outer cache sync operation when the
> +	 * system is operating with hardware coherency enabled, as it
> +	 * is done directly in hardware.
> +	 */
> +	if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> +		outer_cache.sync = NULL;
> +

For this particular case, you can add a specific l2x0_of_data structure
with the right compatible string for your platform where
outer_cache.sync is NULL, assuming that dma_alloc_coherent() returns
Cacheable memory (e.g. your platform uses arm_coherent_dma_ops). The
only use of outer_sync() is for non-SMP barriers in relation to Normal
NonCacheable buffers.

Even if the platform has coherent I/O, you still need the range L2 cache
ops to be available for secondary CPU booting. Unless this platform can
be configured in a way similar to the "marvell,aurora-system-cache" case
and you can make all the outer_cache ops NULL.

>  	l2x0_init(l2x0_base, aux_val, aux_mask);
>  
>  	return 0;
>  }
> +
> +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> +{
> +	return l2x0_of_init_common(aux_val, aux_mask, false);
> +}
> +
> +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> +{
> +	return l2x0_of_init_common(aux_val, aux_mask, true);
> +}

This could be a bit misleading. I would rather have a generic
pl310_data_dma_coherent structure (though even on coherent systems you
could still have drivers that prefer writecombine/NormalNC memory to
avoid thrashing the cache).
Thomas Petazzoni April 8, 2014, 6:12 p.m. UTC | #2
Dear Catalin Marinas,

On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:

> >  	of_init = true;
> >  	memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > +
> > +	/*
> > +	 * PL310 doesn't need an outer cache sync operation when the
> > +	 * system is operating with hardware coherency enabled, as it
> > +	 * is done directly in hardware.
> > +	 */
> > +	if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > +		outer_cache.sync = NULL;
> > +
> 
> For this particular case, you can add a specific l2x0_of_data structure
> with the right compatible string for your platform where
> outer_cache.sync is NULL,

In fact, I'm not sure using a separate compatible string is possible,
because there are situations where the hardware platform may be I/O
coherent, and some situations where it is not the case. For example, in
the current kernel, the platform is I/O coherent when CONFIG_SMP is
enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
same hardware platform, so same Device Tree in both cases.

So using a different compatible string doesn't work here: we would have
to adjust the compatible string depending on whether CONFIG_SMP
or !CONFIG_SMP is used. Of course, this is doable at run time before
probing the L2 cache driver, using a small quirk, but that doesn't
sound really nice.

> assuming that dma_alloc_coherent() returns
> Cacheable memory (e.g. your platform uses arm_coherent_dma_ops).

We don't use arm_coherent_dma_ops, we have our own DMA operations, see
arch/arm/mach-mvebu/coherency.c (in mainline, only the support for the
PJ4B based Armada 370/XP is there, but the support for the Cortex-A9
based Armada 375/38x has been posted at
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244916.html).

> The
> only use of outer_sync() is for non-SMP barriers in relation to Normal
> NonCacheable buffers.
> 
> Even if the platform has coherent I/O, you still need the range L2 cache
> ops to be available for secondary CPU booting. Unless this platform can
> be configured in a way similar to the "marvell,aurora-system-cache" case
> and you can make all the outer_cache ops NULL.

Hum, I am not sure about that one. Maybe Tawfik or Nadav can comment on
this question?

> >  	l2x0_init(l2x0_base, aux_val, aux_mask);
> >  
> >  	return 0;
> >  }
> > +
> > +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> > +{
> > +	return l2x0_of_init_common(aux_val, aux_mask, false);
> > +}
> > +
> > +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> > +{
> > +	return l2x0_of_init_common(aux_val, aux_mask, true);
> > +}
> 
> This could be a bit misleading. I would rather have a generic
> pl310_data_dma_coherent structure (though even on coherent systems you
> could still have drivers that prefer writecombine/NormalNC memory to
> avoid thrashing the cache).

I'm not sure to follow your suggestion here. What structure type would
pl310_data_dma_coherent be? A "struct l2x0_of_data" like this:

static const struct l2x0_of_data pl310_dma_coherent_data = {
        .setup = pl310_of_setup,
        .save  = pl310_save,
        .outer_cache = {
                .resume      = pl310_resume,
                .inv_range   = l2x0_inv_range,
                .clean_range = l2x0_clean_range,
                .flush_range = l2x0_flush_range,
                .sync        = NULL,
                .flush_all   = l2x0_flush_all,
                .inv_all     = l2x0_inv_all,
                .disable     = l2x0_disable,
        },
};

static const struct of_device_id l2x0_ids[] __initconst = {
	[...]
        { .compatible = "arm,pl310-cache", .data = (void *)&pl310_data },
        { .compatible = "arm,pl310-cache-dma-coherent", .data = (void *)&pl310_dma_coherent_data },
	[...]
};

Is this what you have in mind? This looks ok to me, if we have a
solution for the CONFIG_SMP vs. !CONFIG_SMP problem.

Thomas
Catalin Marinas April 9, 2014, 11:15 a.m. UTC | #3
On Tue, Apr 08, 2014 at 07:12:12PM +0100, Thomas Petazzoni wrote:
> On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:
> > >  	of_init = true;
> > >  	memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > > +
> > > +	/*
> > > +	 * PL310 doesn't need an outer cache sync operation when the
> > > +	 * system is operating with hardware coherency enabled, as it
> > > +	 * is done directly in hardware.
> > > +	 */
> > > +	if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > > +		outer_cache.sync = NULL;
> > > +
> > 
> > For this particular case, you can add a specific l2x0_of_data structure
> > with the right compatible string for your platform where
> > outer_cache.sync is NULL,
> 
> In fact, I'm not sure using a separate compatible string is possible,
> because there are situations where the hardware platform may be I/O
> coherent, and some situations where it is not the case. For example, in
> the current kernel, the platform is I/O coherent when CONFIG_SMP is
> enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> same hardware platform, so same Device Tree in both cases.
> 
> So using a different compatible string doesn't work here: we would have
> to adjust the compatible string depending on whether CONFIG_SMP
> or !CONFIG_SMP is used. Of course, this is doable at run time before
> probing the L2 cache driver, using a small quirk, but that doesn't
> sound really nice.

You could use #ifdef CONFIG_SMP around the compatible string but I
agree, it doesn't look nice.

> > assuming that dma_alloc_coherent() returns
> > Cacheable memory (e.g. your platform uses arm_coherent_dma_ops).
> 
> We don't use arm_coherent_dma_ops, we have our own DMA operations, see
> arch/arm/mach-mvebu/coherency.c (in mainline, only the support for the
> PJ4B based Armada 370/XP is there, but the support for the Cortex-A9
> based Armada 375/38x has been posted at
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244916.html).

Slightly confusing. So for dma_alloc you can use Cacheable memory, but
for the from_device cases, you need some mvebu_hwcc_sync_io_barrier()
(and no cache maintenance). Doesn't it mean that barriers like rmb()
need such barrier as well (in combination with dma_alloc'ed buffers)?

> > >  	l2x0_init(l2x0_base, aux_val, aux_mask);
> > >  
> > >  	return 0;
> > >  }
> > > +
> > > +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> > > +{
> > > +	return l2x0_of_init_common(aux_val, aux_mask, false);
> > > +}
> > > +
> > > +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> > > +{
> > > +	return l2x0_of_init_common(aux_val, aux_mask, true);
> > > +}
> > 
> > This could be a bit misleading. I would rather have a generic
> > pl310_data_dma_coherent structure (though even on coherent systems you
> > could still have drivers that prefer writecombine/NormalNC memory to
> > avoid thrashing the cache).
> 
> I'm not sure to follow your suggestion here. What structure type would
> pl310_data_dma_coherent be? A "struct l2x0_of_data" like this:
> 
> static const struct l2x0_of_data pl310_dma_coherent_data = {
>         .setup = pl310_of_setup,
>         .save  = pl310_save,
>         .outer_cache = {
>                 .resume      = pl310_resume,
>                 .inv_range   = l2x0_inv_range,
>                 .clean_range = l2x0_clean_range,
>                 .flush_range = l2x0_flush_range,
>                 .sync        = NULL,
>                 .flush_all   = l2x0_flush_all,
>                 .inv_all     = l2x0_inv_all,
>                 .disable     = l2x0_disable,
>         },
> };
> 
> static const struct of_device_id l2x0_ids[] __initconst = {
> 	[...]
>         { .compatible = "arm,pl310-cache", .data = (void *)&pl310_data },
>         { .compatible = "arm,pl310-cache-dma-coherent", .data = (void *)&pl310_dma_coherent_data },
> 	[...]
> };
> 
> Is this what you have in mind? This looks ok to me, if we have a
> solution for the CONFIG_SMP vs. !CONFIG_SMP problem.

I don't have a nice way. Basically for this board, even if the coherency
was fully set up by firmware (which BTW should be the case for arm64, we
don't take mach-* directories ;)), it depends on the kernel build, I
guess this is because of the Shareable vs NonShareable mappings. The
best I can come up with is a conditionally compiled "marvell,..."
compatible string but your original approach is probably nicer (and as I
said, with the assumption that no driver uses writecombine/dmacoherent
memory for DMA).
Catalin Marinas April 9, 2014, 2:06 p.m. UTC | #4
On Tue, Apr 08, 2014 at 07:12:12PM +0100, Thomas Petazzoni wrote:
> On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:
> 
> > >  	of_init = true;
> > >  	memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > > +
> > > +	/*
> > > +	 * PL310 doesn't need an outer cache sync operation when the
> > > +	 * system is operating with hardware coherency enabled, as it
> > > +	 * is done directly in hardware.
> > > +	 */
> > > +	if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > > +		outer_cache.sync = NULL;
> > > +
> > 
> > For this particular case, you can add a specific l2x0_of_data structure
> > with the right compatible string for your platform where
> > outer_cache.sync is NULL,
> 
> In fact, I'm not sure using a separate compatible string is possible,
> because there are situations where the hardware platform may be I/O
> coherent, and some situations where it is not the case. For example, in
> the current kernel, the platform is I/O coherent when CONFIG_SMP is
> enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> same hardware platform, so same Device Tree in both cases.

I think Russell has a better solution in his L2 cache cleanup series.
Patch 18/44 introduces a .fixup function which takes the outer_cache_fns
pointer and that's a place where your code can check whether coherency
is present or not (and turn .sync into NULL).
Thomas Petazzoni May 6, 2014, 10:07 a.m. UTC | #5
Dear Catalin Marinas,

On Wed, 9 Apr 2014 15:06:25 +0100, Catalin Marinas wrote:

> > In fact, I'm not sure using a separate compatible string is possible,
> > because there are situations where the hardware platform may be I/O
> > coherent, and some situations where it is not the case. For example, in
> > the current kernel, the platform is I/O coherent when CONFIG_SMP is
> > enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> > same hardware platform, so same Device Tree in both cases.
> 
> I think Russell has a better solution in his L2 cache cleanup series.
> Patch 18/44 introduces a .fixup function which takes the outer_cache_fns
> pointer and that's a place where your code can check whether coherency
> is present or not (and turn .sync into NULL).

I had a look at the .fixup proposal, but I don't see how a .fixup
function, which is completely internal to the L2 cache driver, could
guess whether the system is coherent or not.

Thomas
diff mbox

Patch

diff --git a/arch/arm/include/asm/hardware/cache-l2x0.h b/arch/arm/include/asm/hardware/cache-l2x0.h
index 6795ff7..d7db409 100644
--- a/arch/arm/include/asm/hardware/cache-l2x0.h
+++ b/arch/arm/include/asm/hardware/cache-l2x0.h
@@ -110,6 +110,7 @@ 
 extern void __init l2x0_init(void __iomem *base, u32 aux_val, u32 aux_mask);
 #if defined(CONFIG_CACHE_L2X0) && defined(CONFIG_OF)
 extern int l2x0_of_init(u32 aux_val, u32 aux_mask);
+extern int l2x0_of_init_coherent(u32 aux_val, u32 aux_mask);
 #else
 static inline int l2x0_of_init(u32 aux_val, u32 aux_mask)
 {
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7abde2c..76ec012 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -968,7 +968,7 @@  static const struct of_device_id l2x0_ids[] __initconst = {
 	{}
 };
 
-int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
+int __init l2x0_of_init_common(u32 aux_val, u32 aux_mask, bool is_coherent)
 {
 	struct device_node *np;
 	const struct l2x0_of_data *data;
@@ -1005,8 +1005,28 @@  int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
 
 	of_init = true;
 	memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
+
+	/*
+	 * PL310 doesn't need an outer cache sync operation when the
+	 * system is operating with hardware coherency enabled, as it
+	 * is done directly in hardware.
+	 */
+	if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
+		outer_cache.sync = NULL;
+
 	l2x0_init(l2x0_base, aux_val, aux_mask);
 
 	return 0;
 }
+
+int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
+{
+	return l2x0_of_init_common(aux_val, aux_mask, false);
+}
+
+int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
+{
+	return l2x0_of_init_common(aux_val, aux_mask, true);
+}
+
 #endif