diff mbox series

[v2,1/2] mm: slab: Introduce __GFP_PACKED for smaller kmalloc() alignments

Message ID 20221025205247.3264568-2-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show
Series mm: Allow kmalloc() allocations below ARCH_KMALLOC_MINALIGN | expand

Commit Message

Catalin Marinas Oct. 25, 2022, 8:52 p.m. UTC
By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
(e.g. 128 on arm64) and significant memory is wasted through small
kmalloc() allocations.

Reduce the minimum alignment for kmalloc() to the default
KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
__GFP_PACKED flag is passed. With this gfp flag, the alignment is
reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).

There's no slob support.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 include/linux/gfp_types.h | 10 ++++++++--
 include/linux/slab.h      | 22 ++++++++++++++++++----
 mm/slab_common.c          |  3 ++-
 3 files changed, 28 insertions(+), 7 deletions(-)

Comments

Greg KH Oct. 26, 2022, 6:39 a.m. UTC | #1
On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> (e.g. 128 on arm64) and significant memory is wasted through small
> kmalloc() allocations.
> 
> Reduce the minimum alignment for kmalloc() to the default
> KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> __GFP_PACKED flag is passed. With this gfp flag, the alignment is
> reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).

Can memory allocated with __GFP_PACKED be sent to DMA controllers?

If not, you should say that somewhere here or I'm going to get a bunch
of patches trying to add this flag to tiny USB urb allocations (where we
allocate 8 or 16 bytes) that is then going to fail on some hardware.

thanks,

greg k-h
Catalin Marinas Oct. 26, 2022, 8:39 a.m. UTC | #2
On Wed, Oct 26, 2022 at 08:39:04AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> > By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> > This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> > (e.g. 128 on arm64) and significant memory is wasted through small
> > kmalloc() allocations.
> > 
> > Reduce the minimum alignment for kmalloc() to the default
> > KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> > requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> > __GFP_PACKED flag is passed. With this gfp flag, the alignment is
> > reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).
> 
> Can memory allocated with __GFP_PACKED be sent to DMA controllers?
> 
> If not, you should say that somewhere here or I'm going to get a bunch
> of patches trying to add this flag to tiny USB urb allocations (where we
> allocate 8 or 16 bytes) that is then going to fail on some hardware.

Good point, I'll add a comment.

We can also add a check to the DMA API when debugging is enabled,
something like WARN_ON_ONCE(ksize(ptr) < cache_line_size()) for
non-coherent devices.
Greg KH Oct. 26, 2022, 9:49 a.m. UTC | #3
On Wed, Oct 26, 2022 at 09:39:32AM +0100, Catalin Marinas wrote:
> On Wed, Oct 26, 2022 at 08:39:04AM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> > > By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> > > This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> > > (e.g. 128 on arm64) and significant memory is wasted through small
> > > kmalloc() allocations.
> > > 
> > > Reduce the minimum alignment for kmalloc() to the default
> > > KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> > > requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> > > __GFP_PACKED flag is passed. With this gfp flag, the alignment is
> > > reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).
> > 
> > Can memory allocated with __GFP_PACKED be sent to DMA controllers?
> > 
> > If not, you should say that somewhere here or I'm going to get a bunch
> > of patches trying to add this flag to tiny USB urb allocations (where we
> > allocate 8 or 16 bytes) that is then going to fail on some hardware.
> 
> Good point, I'll add a comment.
> 
> We can also add a check to the DMA API when debugging is enabled,
> something like WARN_ON_ONCE(ksize(ptr) < cache_line_size()) for
> non-coherent devices.

It's not the size of the object that matters, it's the alignment, right?
So shouldn't the check be simpler to just look at the alignment of the
pointer which should be almost "free"?

thanks,

greg k-h
Catalin Marinas Oct. 26, 2022, 9:58 a.m. UTC | #4
On Wed, Oct 26, 2022 at 11:49:05AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 26, 2022 at 09:39:32AM +0100, Catalin Marinas wrote:
> > On Wed, Oct 26, 2022 at 08:39:04AM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> > > > By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> > > > This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> > > > (e.g. 128 on arm64) and significant memory is wasted through small
> > > > kmalloc() allocations.
> > > > 
> > > > Reduce the minimum alignment for kmalloc() to the default
> > > > KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> > > > requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> > > > __GFP_PACKED flag is passed. With this gfp flag, the alignment is
> > > > reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).
> > > 
> > > Can memory allocated with __GFP_PACKED be sent to DMA controllers?
> > > 
> > > If not, you should say that somewhere here or I'm going to get a bunch
> > > of patches trying to add this flag to tiny USB urb allocations (where we
> > > allocate 8 or 16 bytes) that is then going to fail on some hardware.
> > 
> > Good point, I'll add a comment.
> > 
> > We can also add a check to the DMA API when debugging is enabled,
> > something like WARN_ON_ONCE(ksize(ptr) < cache_line_size()) for
> > non-coherent devices.
> 
> It's not the size of the object that matters, it's the alignment, right?
> So shouldn't the check be simpler to just look at the alignment of the
> pointer which should be almost "free"?

It's the alignment of the start (easily checked) but also the end. For
small urb allocations, we need to know that they came from a slab page
where there's nothing in the adjacent bytes before the next cache line.
It would have been nice if the DMA API was called with both start and
size aligned but that's not the case. I think such check would fail even
for larger buffers like network packets.
Hyeonggon Yoo Oct. 27, 2022, 12:11 p.m. UTC | #5
On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> (e.g. 128 on arm64) and significant memory is wasted through small
> kmalloc() allocations.
> 
> Reduce the minimum alignment for kmalloc() to the default
> KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> __GFP_PACKED flag is passed.
>
> With this gfp flag, the alignment is
> reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).
>
> There's no slob support.
> 

Thank you for pushing it forward!

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  include/linux/gfp_types.h | 10 ++++++++--
>  include/linux/slab.h      | 22 ++++++++++++++++++----
>  mm/slab_common.c          |  3 ++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> index d88c46ca82e1..305cb8cb6f8b 100644
> --- a/include/linux/gfp_types.h
> +++ b/include/linux/gfp_types.h
> @@ -55,8 +55,9 @@ typedef unsigned int __bitwise gfp_t;
>  #define ___GFP_SKIP_KASAN_UNPOISON	0
>  #define ___GFP_SKIP_KASAN_POISON	0
>  #endif
> +#define ___GFP_PACKED		0x8000000u
>  #ifdef CONFIG_LOCKDEP
> -#define ___GFP_NOLOCKDEP	0x8000000u
> +#define ___GFP_NOLOCKDEP	0x10000000u
>  #else
>  #define ___GFP_NOLOCKDEP	0
>  #endif
> @@ -243,6 +244,10 @@ typedef unsigned int __bitwise gfp_t;
>   *
>   * %__GFP_SKIP_KASAN_POISON makes KASAN skip poisoning on page deallocation.
>   * Typically, used for userspace pages. Only effective in HW_TAGS mode.
> + *
> + * %__GFP_PACKED returns a pointer aligned to the possibly smaller
> + * KMALLOC_PACKED_ALIGN rather than ARCH_KMALLOC_MINALIGN. Useful for small
> + * object allocation on architectures that define large ARCH_DMA_MINALIGN.
>   */
>  #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
> @@ -251,12 +256,13 @@ typedef unsigned int __bitwise gfp_t;
>  #define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO)
>  #define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON)
>  #define __GFP_SKIP_KASAN_POISON   ((__force gfp_t)___GFP_SKIP_KASAN_POISON)
> +#define __GFP_PACKED	((__force gfp_t)___GFP_PACKED)
>  
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (28 + IS_ENABLED(CONFIG_LOCKDEP))
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /**
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 90877fcde70b..0f59585b5fbf 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -223,8 +223,6 @@ void kmem_dump_obj(void *object);
>   */
>  #if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
>  #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
> -#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
> -#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
>  #else
>  #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
>  #endif
> @@ -310,6 +308,11 @@ static inline unsigned int arch_slab_minalign(void)
>  #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW)
>  #endif
>  
> +/*
> + * This alignment should be at least sizeof(unsigned long long).
> + */
> +#define KMALLOC_PACKED_ALIGN	(KMALLOC_MIN_SIZE)
> +

I think __assume_kmalloc_alignment should be changed as well,
to avoid compiler making wrong decision.

>  /*
>   * This restriction comes from byte sized index implementation.
>   * Page size is normally 2^12 bytes and, in this case, if we want to use
> @@ -382,6 +385,17 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
>  		return KMALLOC_CGROUP;
>  }
>  
> +/*
> + * Align the size to ARCH_KMALLOC_MINALIGN unless __GFP_PACKED is passed.
> + */
> +static __always_inline size_t kmalloc_size_align(size_t size, gfp_t flags)
> +{
> +	if (ARCH_KMALLOC_MINALIGN > KMALLOC_PACKED_ALIGN &&
> +	    !(flags & __GFP_PACKED))
> +		size = ALIGN(size, ARCH_KMALLOC_MINALIGN);
> +	return size;
> +}
> +
>  /*
>   * Figure out which kmalloc slab an allocation of a certain size
>   * belongs to.
> @@ -568,7 +582,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>  		if (size > KMALLOC_MAX_CACHE_SIZE)
>  			return kmalloc_large(size, flags);
>  #ifndef CONFIG_SLOB
> -		index = kmalloc_index(size);
> +		index = kmalloc_index(kmalloc_size_align(size, flags));
>  
>  		if (!index)
>  			return ZERO_SIZE_PTR;
> @@ -590,7 +604,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>  		if (size > KMALLOC_MAX_CACHE_SIZE)
>  			return kmalloc_large_node(size, flags, node);
>  
> -		index = kmalloc_index(size);
> +		index = kmalloc_index(kmalloc_size_align(size, flags));
>  
>  		if (!index)
>  			return ZERO_SIZE_PTR;
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 33b1886b06eb..0e4ea396cd4f 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -627,7 +627,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>  		unsigned int useroffset, unsigned int usersize)
>  {
>  	int err;
> -	unsigned int align = ARCH_KMALLOC_MINALIGN;
> +	unsigned int align = KMALLOC_PACKED_ALIGN;
>  
>  	s->name = name;
>  	s->size = s->object_size = size;
> @@ -720,6 +720,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>  {
>  	unsigned int index;
>
> +	size = kmalloc_size_align(size, flags);
>
>
>  	if (size <= 192) {
>  		if (!size)
>  			return ZERO_SIZE_PTR;
>
Catalin Marinas Oct. 28, 2022, 7:32 a.m. UTC | #6
On Thu, Oct 27, 2022 at 09:11:49PM +0900, Hyeonggon Yoo wrote:
> On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> > +/*
> > + * This alignment should be at least sizeof(unsigned long long).
> > + */
> > +#define KMALLOC_PACKED_ALIGN	(KMALLOC_MIN_SIZE)
> > +
> 
> I think __assume_kmalloc_alignment should be changed as well,
> to avoid compiler making wrong decision.

Good point. Thanks.
diff mbox series

Patch

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index d88c46ca82e1..305cb8cb6f8b 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -55,8 +55,9 @@  typedef unsigned int __bitwise gfp_t;
 #define ___GFP_SKIP_KASAN_UNPOISON	0
 #define ___GFP_SKIP_KASAN_POISON	0
 #endif
+#define ___GFP_PACKED		0x8000000u
 #ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP	0x8000000u
+#define ___GFP_NOLOCKDEP	0x10000000u
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
@@ -243,6 +244,10 @@  typedef unsigned int __bitwise gfp_t;
  *
  * %__GFP_SKIP_KASAN_POISON makes KASAN skip poisoning on page deallocation.
  * Typically, used for userspace pages. Only effective in HW_TAGS mode.
+ *
+ * %__GFP_PACKED returns a pointer aligned to the possibly smaller
+ * KMALLOC_PACKED_ALIGN rather than ARCH_KMALLOC_MINALIGN. Useful for small
+ * object allocation on architectures that define large ARCH_DMA_MINALIGN.
  */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
@@ -251,12 +256,13 @@  typedef unsigned int __bitwise gfp_t;
 #define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO)
 #define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON)
 #define __GFP_SKIP_KASAN_POISON   ((__force gfp_t)___GFP_SKIP_KASAN_POISON)
+#define __GFP_PACKED	((__force gfp_t)___GFP_PACKED)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (28 + IS_ENABLED(CONFIG_LOCKDEP))
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 90877fcde70b..0f59585b5fbf 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -223,8 +223,6 @@  void kmem_dump_obj(void *object);
  */
 #if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8
 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
-#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN
-#define KMALLOC_SHIFT_LOW ilog2(ARCH_DMA_MINALIGN)
 #else
 #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
 #endif
@@ -310,6 +308,11 @@  static inline unsigned int arch_slab_minalign(void)
 #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW)
 #endif
 
+/*
+ * This alignment should be at least sizeof(unsigned long long).
+ */
+#define KMALLOC_PACKED_ALIGN	(KMALLOC_MIN_SIZE)
+
 /*
  * This restriction comes from byte sized index implementation.
  * Page size is normally 2^12 bytes and, in this case, if we want to use
@@ -382,6 +385,17 @@  static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
 		return KMALLOC_CGROUP;
 }
 
+/*
+ * Align the size to ARCH_KMALLOC_MINALIGN unless __GFP_PACKED is passed.
+ */
+static __always_inline size_t kmalloc_size_align(size_t size, gfp_t flags)
+{
+	if (ARCH_KMALLOC_MINALIGN > KMALLOC_PACKED_ALIGN &&
+	    !(flags & __GFP_PACKED))
+		size = ALIGN(size, ARCH_KMALLOC_MINALIGN);
+	return size;
+}
+
 /*
  * Figure out which kmalloc slab an allocation of a certain size
  * belongs to.
@@ -568,7 +582,7 @@  static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
 		if (size > KMALLOC_MAX_CACHE_SIZE)
 			return kmalloc_large(size, flags);
 #ifndef CONFIG_SLOB
-		index = kmalloc_index(size);
+		index = kmalloc_index(kmalloc_size_align(size, flags));
 
 		if (!index)
 			return ZERO_SIZE_PTR;
@@ -590,7 +604,7 @@  static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
 		if (size > KMALLOC_MAX_CACHE_SIZE)
 			return kmalloc_large_node(size, flags, node);
 
-		index = kmalloc_index(size);
+		index = kmalloc_index(kmalloc_size_align(size, flags));
 
 		if (!index)
 			return ZERO_SIZE_PTR;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 33b1886b06eb..0e4ea396cd4f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -627,7 +627,7 @@  void __init create_boot_cache(struct kmem_cache *s, const char *name,
 		unsigned int useroffset, unsigned int usersize)
 {
 	int err;
-	unsigned int align = ARCH_KMALLOC_MINALIGN;
+	unsigned int align = KMALLOC_PACKED_ALIGN;
 
 	s->name = name;
 	s->size = s->object_size = size;
@@ -720,6 +720,7 @@  struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 {
 	unsigned int index;
 
+	size = kmalloc_size_align(size, flags);
 	if (size <= 192) {
 		if (!size)
 			return ZERO_SIZE_PTR;