diff mbox series

arm64: Make ARCH_DMA_MINALIGN configurable

Message ID 20210517074332.28280-1-vincent.whitchurch@axis.com (mailing list archive)
State New, archived
Headers show
Series arm64: Make ARCH_DMA_MINALIGN configurable | expand

Commit Message

Vincent Whitchurch May 17, 2021, 7:43 a.m. UTC
ARCH_DMA_MINALIGN is hardcoded to 128, but this wastes memory if the
kernel is only intended to be run on platforms with cache line sizes of
64 bytes.

Make this configurable (hidden under CONFIG_EXPERT).  Setting this to 64
bytes reduces the slab memory usage of my Cortex-A53-based system by
~6%, measured right after startup.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
---
 arch/arm64/Kconfig             | 10 ++++++++++
 arch/arm64/include/asm/cache.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Catalin Marinas May 17, 2021, 11:04 a.m. UTC | #1
On Mon, May 17, 2021 at 09:43:32AM +0200, Vincent Whitchurch wrote:
> ARCH_DMA_MINALIGN is hardcoded to 128, but this wastes memory if the
> kernel is only intended to be run on platforms with cache line sizes of
> 64 bytes.
> 
> Make this configurable (hidden under CONFIG_EXPERT).  Setting this to 64
> bytes reduces the slab memory usage of my Cortex-A53-based system by
> ~6%, measured right after startup.

I agree that we waste some memory since the kmalloc caches start from
128 but I don't think a config option is the right.

An option would be to try not to rely on the hard-coded
ARCH_DMA_MINALIGN when the slab caches are created but use
cache_line_size(). It's a bit tricky as the cache_line_size() returned
value may be tweaked by DT or PPTT after the boot caches have been
created (see commit 7b8c87b297a7).

Another option I recall discussing with Arnd about two years ago was to
start with the default 128 at boot but add the smaller slab caches
later, once we have more information. This can be just another 64 byte
cache or even go all the way down to 8 byte if all the devices are
cache coherent.
Ard Biesheuvel May 17, 2021, 12:01 p.m. UTC | #2
On Mon, 17 May 2021 at 13:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Mon, May 17, 2021 at 09:43:32AM +0200, Vincent Whitchurch wrote:
> > ARCH_DMA_MINALIGN is hardcoded to 128, but this wastes memory if the
> > kernel is only intended to be run on platforms with cache line sizes of
> > 64 bytes.
> >
> > Make this configurable (hidden under CONFIG_EXPERT).  Setting this to 64
> > bytes reduces the slab memory usage of my Cortex-A53-based system by
> > ~6%, measured right after startup.
>
> I agree that we waste some memory since the kmalloc caches start from
> 128 but I don't think a config option is the right.
>
> An option would be to try not to rely on the hard-coded
> ARCH_DMA_MINALIGN when the slab caches are created but use
> cache_line_size(). It's a bit tricky as the cache_line_size() returned
> value may be tweaked by DT or PPTT after the boot caches have been
> created (see commit 7b8c87b297a7).
>
> Another option I recall discussing with Arnd about two years ago was to
> start with the default 128 at boot but add the smaller slab caches
> later, once we have more information. This can be just another 64 byte
> cache or even go all the way down to 8 byte if all the devices are
> cache coherent.
>

ARCH_SLAB_MINALIGN is also used to statically align (members of)
struct types, so doing this at runtime is going to have limited
effect.

If a) ThunderX is the only platform we care about (do we?) that has
128 byte cachelines, and b) DMA is cache coherent on such platforms,
couldn't we separate ARCH_SLAB_MINALIGN from ARCH_DMA_MINALIGN? I.e.,
set the first to 64 and keep the second at 128?
Arnd Bergmann May 17, 2021, 1:35 p.m. UTC | #3
On Mon, May 17, 2021 at 2:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Mon, 17 May 2021 at 13:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, May 17, 2021 at 09:43:32AM +0200, Vincent Whitchurch wrote:
> >
> > Another option I recall discussing with Arnd about two years ago was to
> > start with the default 128 at boot but add the smaller slab caches
> > later, once we have more information. This can be just another 64 byte
> > cache or even go all the way down to 8 byte if all the devices are
> > cache coherent.
> >
>
> ARCH_SLAB_MINALIGN is also used to statically align (members of)
> struct types, so doing this at runtime is going to have limited
> effect.
>
> If a) ThunderX is the only platform we care about (do we?) that has
> 128 byte cachelines, and b) DMA is cache coherent on such platforms,
> couldn't we separate ARCH_SLAB_MINALIGN from ARCH_DMA_MINALIGN? I.e.,
> set the first to 64 and keep the second at 128?

What is the purpose of ARCH_DMA_MINALIGN then? If it's cache
coherent, does DMA buffer still need to be aligned to the cache line
size?

Are you sure that ThunderX is the only one with cache lines this large?

       Arnd
Catalin Marinas May 17, 2021, 2:15 p.m. UTC | #4
On Mon, May 17, 2021 at 02:01:54PM +0200, Ard Biesheuvel wrote:
> On Mon, 17 May 2021 at 13:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, May 17, 2021 at 09:43:32AM +0200, Vincent Whitchurch wrote:
> > > ARCH_DMA_MINALIGN is hardcoded to 128, but this wastes memory if the
> > > kernel is only intended to be run on platforms with cache line sizes of
> > > 64 bytes.
> > >
> > > Make this configurable (hidden under CONFIG_EXPERT).  Setting this to 64
> > > bytes reduces the slab memory usage of my Cortex-A53-based system by
> > > ~6%, measured right after startup.
> >
> > I agree that we waste some memory since the kmalloc caches start from
> > 128 but I don't think a config option is the right.
> >
> > An option would be to try not to rely on the hard-coded
> > ARCH_DMA_MINALIGN when the slab caches are created but use
> > cache_line_size(). It's a bit tricky as the cache_line_size() returned
> > value may be tweaked by DT or PPTT after the boot caches have been
> > created (see commit 7b8c87b297a7).
> >
> > Another option I recall discussing with Arnd about two years ago was to
> > start with the default 128 at boot but add the smaller slab caches
> > later, once we have more information. This can be just another 64 byte
> > cache or even go all the way down to 8 byte if all the devices are
> > cache coherent.
> 
> ARCH_SLAB_MINALIGN is also used to statically align (members of)
> struct types, so doing this at runtime is going to have limited
> effect.

You probably mean ARCH_KMALLOC_MINALIGN. We don't touch
ARCH_SLAB_MINALIGN unless KASAN_{SW,HW}_TAGS is enabled and it is still
maximum 16 (I wonder if it's safe to reduce this to 8 as it might be the
case with KASAN_SW_TAGS).

> If a) ThunderX is the only platform we care about (do we?) that has
> 128 byte cachelines, and b) DMA is cache coherent on such platforms,
> couldn't we separate ARCH_SLAB_MINALIGN from ARCH_DMA_MINALIGN? I.e.,
> set the first to 64 and keep the second at 128?

ARCH_KMALLOC_MINALIGN is indeed the same as ARCH_DMA_MINALIGN. We can't
do much about when the requirement is DMA. For example, struct devres
has a data[] array aligned to ARCH_KMALLOC_MINALIGN with a comment that
it may be needed for DMA.

I can't tell how many SoCs out there have 128 byte cache lines (it can
be in a system level cache that reacts to cache maintenance by VA) and
are not fully coherent. I know there is one with 256 byte caches but
luckily it's cache coherent.

Even if we can't fix the build-time structure alignment, I think there's
still sufficient benefit in allowing smaller slab caches.
Catalin Marinas May 17, 2021, 2:20 p.m. UTC | #5
On Mon, May 17, 2021 at 03:35:39PM +0200, Arnd Bergmann wrote:
> On Mon, May 17, 2021 at 2:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Mon, 17 May 2021 at 13:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Mon, May 17, 2021 at 09:43:32AM +0200, Vincent Whitchurch wrote:
> > >
> > > Another option I recall discussing with Arnd about two years ago was to
> > > start with the default 128 at boot but add the smaller slab caches
> > > later, once we have more information. This can be just another 64 byte
> > > cache or even go all the way down to 8 byte if all the devices are
> > > cache coherent.
> > >
> >
> > ARCH_SLAB_MINALIGN is also used to statically align (members of)
> > struct types, so doing this at runtime is going to have limited
> > effect.
> >
> > If a) ThunderX is the only platform we care about (do we?) that has
> > 128 byte cachelines, and b) DMA is cache coherent on such platforms,
> > couldn't we separate ARCH_SLAB_MINALIGN from ARCH_DMA_MINALIGN? I.e.,
> > set the first to 64 and keep the second at 128?
> 
> What is the purpose of ARCH_DMA_MINALIGN then? If it's cache
> coherent, does DMA buffer still need to be aligned to the cache line
> size?

For coherent DMA, we don't need the alignment from a correctness
perspective.

For performance, a driver can always pass SLAB_HWCACHE_ALIGN which ends
up using cache_line_size(), so it gets the actual hardware value.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9f1d8566bbf9..c8716aa18001 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -274,6 +274,16 @@  config ARCH_MMAP_RND_COMPAT_BITS_MIN
 config ARCH_MMAP_RND_COMPAT_BITS_MAX
        default 16
 
+config ARM64_DMA_MINALIGN
+	int "DMA alignment for slab buffers" if EXPERT
+	default 128
+	help
+	  Set the number of bytes to which buffers allocated with kmalloc()
+	  need to be aligned to ensure that they will be usable for DMA
+	  (ARCH_DMA_MINALIGN).  Lower values may save memory on platforms with
+	  smaller cache line sizes, but may also make the kernel non-functional
+	  on platforms with cache line sizes larger than the chosen value.
+
 config NO_IOPORT_MAP
 	def_bool y if !PCI
 
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..5fa68820ff57 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -47,7 +47,7 @@ 
  * cache before the transfer is done, causing old data to be seen by
  * the CPU.
  */
-#define ARCH_DMA_MINALIGN	(128)
+#define ARCH_DMA_MINALIGN	(CONFIG_ARM64_DMA_MINALIGN)
 
 #ifdef CONFIG_KASAN_SW_TAGS
 #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)