diff mbox series

mm, slab: extend kmalloc() alignment for non power-of-two sizes

Message ID 20240702155800.166503-2-vbabka@suse.cz (mailing list archive)
State New
Headers show
Series mm, slab: extend kmalloc() alignment for non power-of-two sizes | expand

Commit Message

Vlastimil Babka July 2, 2024, 3:58 p.m. UTC
Slab allocators have been guaranteeing natural alignment for
power-of-two sizes since commit 59bb47985c1d ("mm, sl[aou]b: guarantee
natural alignment for kmalloc(power-of-two)"), while any other sizes are
aligned only to ARCH_KMALLOC_MINALIGN bytes.

Rust's allocator API specifies size and alignment per allocation, which
have to satisfy the following rules, per Alice Ryhl [1]:

  1. The alignment is a power of two.
  2. The size is non-zero.
  3. When you round up the size to the next multiple of the alignment,
     then it must not overflow the signed type isize / ssize_t.

In order to map this to kmalloc()'s guarantees, some requested
allocation sizes have to be enlarged to the next power-of-two size [2].
For example, an allocation of size 96 and alignment of 32 will be
enlarged to an allocation of size 128, because the existing kmalloc-96
bucket doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without
slab debugging active, the layout of the kmalloc-96 slabs however
naturally aligns the objects to 32 bytes, so extending the size to 128
bytes is wasteful.

To improve the situation we can extend the kmalloc() alignment
guarantees in a way that

1) doesn't change the current slab layout (and thus does not increase
   internal fragmentation) when slab debugging is not active
2) reduces waste in the Rust allocator use case
3) is a superset of the current guarantee for power-of-two sizes.

The extended guarantee is that alignment is at least the largest
power-of-two divisor of the requested size. For power-of-two sizes the
largest divisor is the size itself, but let's keep this case documented
separately for clarity.

For current kmalloc size buckets, it means kmalloc-96 will guarantee
alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes.

This covers the rules 1 and 2 above of Rust's API as long as the size is
a multiple of the alignment. The Rust layer should now only need to
round up the size to the next multiple if it isn't, while enforcing the
rule 3.

Implementation-wise, this changes the alignment calculation in
create_boot_cache(). While at it also do the calulation only for caches
with the SLAB_KMALLOC flag, because the function is also used to create
the initial kmem_cache and kmem_cache_node caches, where no alignment
guarantee is necessary.

Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@mail.gmail.com/ [2]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/core-api/memory-allocation.rst | 6 ++++--
 include/linux/slab.h                         | 3 ++-
 mm/slab_common.c                             | 9 +++++----
 3 files changed, 11 insertions(+), 7 deletions(-)

Comments

Alice Ryhl July 2, 2024, 4:40 p.m. UTC | #1
On Tue, Jul 2, 2024 at 5:58 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Slab allocators have been guaranteeing natural alignment for
> power-of-two sizes since commit 59bb47985c1d ("mm, sl[aou]b: guarantee
> natural alignment for kmalloc(power-of-two)"), while any other sizes are
> aligned only to ARCH_KMALLOC_MINALIGN bytes.
>
> Rust's allocator API specifies size and alignment per allocation, which
> have to satisfy the following rules, per Alice Ryhl [1]:
>
>   1. The alignment is a power of two.
>   2. The size is non-zero.
>   3. When you round up the size to the next multiple of the alignment,
>      then it must not overflow the signed type isize / ssize_t.
>
> In order to map this to kmalloc()'s guarantees, some requested
> allocation sizes have to be enlarged to the next power-of-two size [2].
> For example, an allocation of size 96 and alignment of 32 will be
> enlarged to an allocation of size 128, because the existing kmalloc-96
> bucket doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without
> slab debugging active, the layout of the kmalloc-96 slabs however
> naturally aligns the objects to 32 bytes, so extending the size to 128
> bytes is wasteful.
>
> To improve the situation we can extend the kmalloc() alignment
> guarantees in a way that
>
> 1) doesn't change the current slab layout (and thus does not increase
>    internal fragmentation) when slab debugging is not active
> 2) reduces waste in the Rust allocator use case
> 3) is a superset of the current guarantee for power-of-two sizes.
>
> The extended guarantee is that alignment is at least the largest
> power-of-two divisor of the requested size. For power-of-two sizes the
> largest divisor is the size itself, but let's keep this case documented
> separately for clarity.
>
> For current kmalloc size buckets, it means kmalloc-96 will guarantee
> alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes.
>
> This covers the rules 1 and 2 above of Rust's API as long as the size is
> a multiple of the alignment. The Rust layer should now only need to
> round up the size to the next multiple if it isn't, while enforcing the
> rule 3.
>
> Implementation-wise, this changes the alignment calculation in
> create_boot_cache(). While at it also do the calulation only for caches
> with the SLAB_KMALLOC flag, because the function is also used to create
> the initial kmem_cache and kmem_cache_node caches, where no alignment
> guarantee is necessary.
>
> Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@mail.gmail.com/ [1]
> Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@mail.gmail.com/ [2]
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

Since this change is motivated by Rust, why not also include the Rust
change in this patch? You would need to remove the if inside
krealloc_aligned in rust/kernel/alloc/allocator.rs and update the
comments.

Alice
Roman Gushchin July 2, 2024, 7:30 p.m. UTC | #2
On Tue, Jul 02, 2024 at 05:58:01PM +0200, Vlastimil Babka wrote:
> Slab allocators have been guaranteeing natural alignment for
> power-of-two sizes since commit 59bb47985c1d ("mm, sl[aou]b: guarantee
> natural alignment for kmalloc(power-of-two)"), while any other sizes are
> aligned only to ARCH_KMALLOC_MINALIGN bytes.
> 
> Rust's allocator API specifies size and alignment per allocation, which
> have to satisfy the following rules, per Alice Ryhl [1]:
> 
>   1. The alignment is a power of two.
>   2. The size is non-zero.
>   3. When you round up the size to the next multiple of the alignment,
>      then it must not overflow the signed type isize / ssize_t.
> 
> In order to map this to kmalloc()'s guarantees, some requested
> allocation sizes have to be enlarged to the next power-of-two size [2].
> For example, an allocation of size 96 and alignment of 32 will be
> enlarged to an allocation of size 128, because the existing kmalloc-96
> bucket doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without
> slab debugging active, the layout of the kmalloc-96 slabs however
> naturally aligns the objects to 32 bytes, so extending the size to 128
> bytes is wasteful.
> 
> To improve the situation we can extend the kmalloc() alignment
> guarantees in a way that
> 
> 1) doesn't change the current slab layout (and thus does not increase
>    internal fragmentation) when slab debugging is not active
> 2) reduces waste in the Rust allocator use case
> 3) is a superset of the current guarantee for power-of-two sizes.
> 
> The extended guarantee is that alignment is at least the largest
> power-of-two divisor of the requested size. For power-of-two sizes the
> largest divisor is the size itself, but let's keep this case documented
> separately for clarity.
> 
> For current kmalloc size buckets, it means kmalloc-96 will guarantee
> alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes.
> 
> This covers the rules 1 and 2 above of Rust's API as long as the size is
> a multiple of the alignment. The Rust layer should now only need to
> round up the size to the next multiple if it isn't, while enforcing the
> rule 3.
> 
> Implementation-wise, this changes the alignment calculation in
> create_boot_cache(). While at it also do the calulation only for caches
> with the SLAB_KMALLOC flag, because the function is also used to create
> the initial kmem_cache and kmem_cache_node caches, where no alignment
> guarantee is necessary.
> 
> Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@mail.gmail.com/ [1]
> Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@mail.gmail.com/ [2]
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Hello Vlastimil,

the idea and the implementation makes total sense to me.

Do you have an estimate for the memory overhead it will typically introduce?
I don't think it will be too large though and actually can be compensated
by potential performance gains due to a better memory alignment. What do you
think?

Thanks!
Vlastimil Babka July 2, 2024, 8:25 p.m. UTC | #3
On 7/2/24 9:30 PM, Roman Gushchin wrote:
> On Tue, Jul 02, 2024 at 05:58:01PM +0200, Vlastimil Babka wrote:
> Hello Vlastimil,
> 
> the idea and the implementation makes total sense to me.
> 
> Do you have an estimate for the memory overhead it will typically introduce?

There's no new overhead for the non-debug case as the layout already
naturally has the same alignment as is now guaranteed. Debug has its own
overhead so it's enabled only when needed, and this will not add much more.

> I don't think it will be too large though and actually can be compensated
> by potential performance gains due to a better memory alignment. What do you
> think?

Yeah but again, the difference would be only in the debug case and
performance gains there are not so interesting :)

> Thanks!
Roman Gushchin July 2, 2024, 9:13 p.m. UTC | #4
On Tue, Jul 02, 2024 at 10:25:44PM +0200, Vlastimil Babka wrote:
> On 7/2/24 9:30 PM, Roman Gushchin wrote:
> > On Tue, Jul 02, 2024 at 05:58:01PM +0200, Vlastimil Babka wrote:
> > Hello Vlastimil,
> > 
> > the idea and the implementation makes total sense to me.
> > 
> > Do you have an estimate for the memory overhead it will typically introduce?
> 
> There's no new overhead for the non-debug case as the layout already
> naturally has the same alignment as is now guaranteed. Debug has its own
> overhead so it's enabled only when needed, and this will not add much more.

Got it, but can you, please, add this note with some explanation why it's true
to the commit message? Because it's not obvious and the commit message makes
almost the opposite impression:
    Slab allocators have been guaranteeing natural alignment for
    power-of-two sizes <...>, while any other sizes are
    aligned only to ARCH_KMALLOC_MINALIGN bytes.

Should it be "guaranteed to be aligned" vs are actually aligned?

> 
> > I don't think it will be too large though and actually can be compensated
> > by potential performance gains due to a better memory alignment. What do you
> > think?
> 
> Yeah but again, the difference would be only in the debug case and
> performance gains there are not so interesting :)

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!
Vlastimil Babka July 2, 2024, 9:18 p.m. UTC | #5
On 7/2/24 6:40 PM, Alice Ryhl wrote:
> On Tue, Jul 2, 2024 at 5:58 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> Thanks!
> 
> Since this change is motivated by Rust, why not also include the Rust
> change in this patch? You would need to remove the if inside
> krealloc_aligned in rust/kernel/alloc/allocator.rs and update the
> comments.

Right, thanks. Does this look ok? (not tested as I don't have the
environment working)

diff --git a/rust/kernel/alloc/allocator.rs b/rust/kernel/alloc/allocator.rs
index 229642960cd1..c619acb8b285 100644
--- a/rust/kernel/alloc/allocator.rs
+++ b/rust/kernel/alloc/allocator.rs
@@ -18,23 +18,16 @@ pub(crate) unsafe fn krealloc_aligned(ptr: *mut u8, new_layout: Layout, flags: F
     // Customized layouts from `Layout::from_size_align()` can have size < align, so pad first.
     let layout = new_layout.pad_to_align();
 
+    // Note that `layout.size()` (after padding) is guaranteed to be a multiple of `layout.align()`
+    // which together with the slab guarantee means the `krealloc` will return a properly aligned
+    // object (see comments in `kmalloc()` for more information).
     let mut size = layout.size();
 
-    if layout.align() > bindings::ARCH_SLAB_MINALIGN {
-        // The alignment requirement exceeds the slab guarantee, thus try to enlarge the size
-        // to use the "power-of-two" size/alignment guarantee (see comments in `kmalloc()` for
-        // more information).
-        //
-        // Note that `layout.size()` (after padding) is guaranteed to be a multiple of
-        // `layout.align()`, so `next_power_of_two` gives enough alignment guarantee.
-        size = size.next_power_of_two();
-    }
-
     // SAFETY:
     // - `ptr` is either null or a pointer returned from a previous `k{re}alloc()` by the
     //   function safety requirement.
     // - `size` is greater than 0 since it's either a `layout.size()` (which cannot be zero
-    //   according to the function safety requirement) or a result from `next_power_of_two()`.
+    //   according to the function safety requirement)
     unsafe { bindings::krealloc(ptr as *const core::ffi::c_void, size, flags.0) as *mut u8 }
 }
Vlastimil Babka July 2, 2024, 9:20 p.m. UTC | #6
On 7/2/24 11:13 PM, Roman Gushchin wrote:
> On Tue, Jul 02, 2024 at 10:25:44PM +0200, Vlastimil Babka wrote:
>> On 7/2/24 9:30 PM, Roman Gushchin wrote:
>> > On Tue, Jul 02, 2024 at 05:58:01PM +0200, Vlastimil Babka wrote:
>> > Hello Vlastimil,
>> > 
>> > the idea and the implementation makes total sense to me.
>> > 
>> > Do you have an estimate for the memory overhead it will typically introduce?
>> 
>> There's no new overhead for the non-debug case as the layout already
>> naturally has the same alignment as is now guaranteed. Debug has its own
>> overhead so it's enabled only when needed, and this will not add much more.
> 
> Got it, but can you, please, add this note with some explanation why it's true
> to the commit message? Because it's not obvious and the commit message makes

It's there but later:

1) doesn't change the current slab layout (and thus does not increase
   internal fragmentation) when slab debugging is not active

> almost the opposite impression:
>     Slab allocators have been guaranteeing natural alignment for
>     power-of-two sizes <...>, while any other sizes are
>     aligned only to ARCH_KMALLOC_MINALIGN bytes.
> 
> Should it be "guaranteed to be aligned" vs are actually aligned?

Yes that's the case. Will update.

>> 
>> > I don't think it will be too large though and actually can be compensated
>> > by potential performance gains due to a better memory alignment. What do you
>> > think?
>> 
>> Yeah but again, the difference would be only in the debug case and
>> performance gains there are not so interesting :)
> 
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!

> Thanks!
Boqun Feng July 2, 2024, 9:41 p.m. UTC | #7
On Tue, Jul 02, 2024 at 11:18:36PM +0200, Vlastimil Babka wrote:
> On 7/2/24 6:40 PM, Alice Ryhl wrote:
> > On Tue, Jul 2, 2024 at 5:58 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > Thanks!
> > 
> > Since this change is motivated by Rust, why not also include the Rust
> > change in this patch? You would need to remove the if inside
> > krealloc_aligned in rust/kernel/alloc/allocator.rs and update the
> > comments.
> 
> Right, thanks. Does this look ok? (not tested as I don't have the
> environment working)
> 

Thanks! This overall looks good to me.

> diff --git a/rust/kernel/alloc/allocator.rs b/rust/kernel/alloc/allocator.rs
> index 229642960cd1..c619acb8b285 100644
> --- a/rust/kernel/alloc/allocator.rs
> +++ b/rust/kernel/alloc/allocator.rs
> @@ -18,23 +18,16 @@ pub(crate) unsafe fn krealloc_aligned(ptr: *mut u8, new_layout: Layout, flags: F
>      // Customized layouts from `Layout::from_size_align()` can have size < align, so pad first.
>      let layout = new_layout.pad_to_align();
>  
> +    // Note that `layout.size()` (after padding) is guaranteed to be a multiple of `layout.align()`
> +    // which together with the slab guarantee means the `krealloc` will return a properly aligned
> +    // object (see comments in `kmalloc()` for more information).
>      let mut size = layout.size();

The `mut` needs to be removed since no need to modify `size` afterwards.

>  
> -    if layout.align() > bindings::ARCH_SLAB_MINALIGN {
> -        // The alignment requirement exceeds the slab guarantee, thus try to enlarge the size
> -        // to use the "power-of-two" size/alignment guarantee (see comments in `kmalloc()` for
> -        // more information).
> -        //
> -        // Note that `layout.size()` (after padding) is guaranteed to be a multiple of
> -        // `layout.align()`, so `next_power_of_two` gives enough alignment guarantee.
> -        size = size.next_power_of_two();
> -    }
> -
>      // SAFETY:
>      // - `ptr` is either null or a pointer returned from a previous `k{re}alloc()` by the
>      //   function safety requirement.
>      // - `size` is greater than 0 since it's either a `layout.size()` (which cannot be zero
> -    //   according to the function safety requirement) or a result from `next_power_of_two()`.
> +    //   according to the function safety requirement)

The word "either" needs to be removed as well.

With these changes,

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

>      unsafe { bindings::krealloc(ptr as *const core::ffi::c_void, size, flags.0) as *mut u8 }
>  }
>  
>
diff mbox series

Patch

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 1c58d883b273..8b84eb4bdae7 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -144,8 +144,10 @@  configuration, but it is a good practice to use `kmalloc` for objects
 smaller than page size.
 
 The address of a chunk allocated with `kmalloc` is aligned to at least
-ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a power of two, the
-alignment is also guaranteed to be at least the respective size.
+ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
+alignment is also guaranteed to be at least the respective size. For other
+sizes, the alignment is guaranteed to be at least the largest power-of-two
+divisor of the size.
 
 Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
 to kmalloc_array(): a helper for resizing arrays is provided in the form of
diff --git a/include/linux/slab.h b/include/linux/slab.h
index ed6bee5ec2b6..640cea6e6323 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -604,7 +604,8 @@  void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
  *
  * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
  * bytes. For @size of power of two bytes, the alignment is also guaranteed
- * to be at least to the size.
+ * to be at least to the size. For other sizes, the alignment is guaranteed to
+ * be at least the largest power-of-two divisor of @size.
  *
  * The @flags argument may be one of the GFP flags defined at
  * include/linux/gfp_types.h and described at
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1560a1546bb1..7272ef7bc55f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -617,11 +617,12 @@  void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->size = s->object_size = size;
 
 	/*
-	 * For power of two sizes, guarantee natural alignment for kmalloc
-	 * caches, regardless of SL*B debugging options.
+	 * kmalloc caches guarantee alignment of at least the largest
+	 * power-of-two divisor of the size. For power-of-two sizes,
+	 * it is the size itself.
 	 */
-	if (is_power_of_2(size))
-		align = max(align, size);
+	if (flags & SLAB_KMALLOC)
+		align = max(align, 1U << (ffs(size) - 1));
 	s->align = calculate_alignment(flags, align, size);
 
 #ifdef CONFIG_HARDENED_USERCOPY