diff mbox series

[v4] slob: add size header to all allocations

Message ID 20211122013026.909933-1-rkovhaev@gmail.com (mailing list archive)
State New
Headers show
Series [v4] slob: add size header to all allocations | expand

Commit Message

Rustam Kovhaev Nov. 22, 2021, 1:30 a.m. UTC
Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
size header.
It simplifies the slab API and guarantees that both kmem_cache_alloc()
and kmalloc() memory could be freed by kfree().

meminfo right after the system boot, x86-64 on xfs, without the patch:
Slab:              34700 kB

the same, with the patch:
Slab:              35752 kB

Link: https://lore.kernel.org/lkml/20210929212347.1139666-1-rkovhaev@gmail.com
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
---
v4:
 - Drop the align_offset parameter and pass size unchanged to
   slob_alloc()
 - Add SLOB_HDR_SIZE to size in slob_alloc()

v3:
 - Add SLOB_HDR_SIZE define
 - Remove references to minalign
 - Improve documentation wording

v2:
 - Allocate compound pages in slob_alloc_node()
 - Use slob_free_pages() in kfree()
 - Update documentation
---
 Documentation/core-api/memory-allocation.rst |  12 +-
 mm/slob.c                                    | 140 +++++++++----------
 2 files changed, 72 insertions(+), 80 deletions(-)

Comments

Christoph Lameter (Ampere) Nov. 22, 2021, 9:22 a.m. UTC | #1
On Sun, 21 Nov 2021, Rustam Kovhaev wrote:

> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
> size header.
> It simplifies the slab API and guarantees that both kmem_cache_alloc()
> and kmalloc() memory could be freed by kfree().
>
> meminfo right after the system boot, x86-64 on xfs, without the patch:
> Slab:              34700 kB
>
> the same, with the patch:
> Slab:              35752 kB

> +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)

Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes.

> @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
>  	unsigned long flags;
>  	bool _unused;
>
> +	size += SLOB_HDR_SIZE;

And every object now has this overhead? 128 bytes extra in extreme cases
per object?


> -	if (size < PAGE_SIZE - minalign) {
> -		int align = minalign;
> +	if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
> +		int align = SLOB_HDR_SIZE;

And the object is also aligned to 128 bytes boundaries on some
architectures.

So a 4 byte object occupies 256 bytes in SLOB?

SLOB will no longer be a low memory overhead allocator then.
Vlastimil Babka Nov. 22, 2021, 9:40 a.m. UTC | #2
On 11/22/21 10:22, Christoph Lameter wrote:
> On Sun, 21 Nov 2021, Rustam Kovhaev wrote:
> 
>> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the
>> size header.
>> It simplifies the slab API and guarantees that both kmem_cache_alloc()
>> and kmalloc() memory could be freed by kfree().
>>
>> meminfo right after the system boot, x86-64 on xfs, without the patch:
>> Slab:              34700 kB
>>
>> the same, with the patch:
>> Slab:              35752 kB
> 
>> +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)
> 
> Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes.
> 
>> @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
>>  	unsigned long flags;
>>  	bool _unused;
>>
>> +	size += SLOB_HDR_SIZE;
> 
> And every object now has this overhead? 128 bytes extra in extreme cases
> per object?
> 
> 
>> -	if (size < PAGE_SIZE - minalign) {
>> -		int align = minalign;
>> +	if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
>> +		int align = SLOB_HDR_SIZE;
> 
> And the object is also aligned to 128 bytes boundaries on some
> architectures.
> 
> So a 4 byte object occupies 256 bytes in SLOB?
> 
> SLOB will no longer be a low memory overhead allocator then.

Hm good point, didn't realize those MINALIGN constants can be that large. I
think this overhead was already the case with SLOB for kmalloc caches, but
now it would be worse, as it would be for all kmem caches.

But it seems there's no reason we couldn't do better? I.e. use the value of
SLOB_HDR_SIZE only to align the beginning of actual object (and name the
define different than SLOB_HDR_SIZE). But the size of the header, where we
store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
64bit. The address of the header shouldn't have a reason to be also aligned
to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
it and not the slab consumers which rely on those alignments?
Christoph Lameter (Ampere) Nov. 22, 2021, 10:36 a.m. UTC | #3
On Mon, 22 Nov 2021, Vlastimil Babka wrote:

> But it seems there's no reason we couldn't do better? I.e. use the value of
> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> define different than SLOB_HDR_SIZE). But the size of the header, where we
> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> 64bit. The address of the header shouldn't have a reason to be also aligned
> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> it and not the slab consumers which rely on those alignments?

Well the best way would be to put it at the end of the object in order to
avoid the alignment problem. This is a particular issue with SLOB because
it allows multiple types of objects in a single page frame.

If only one type of object would be allowed then the object size etc can
be stored in the page struct.

So I guess placement at the beginning cannot be avoided. That in turn runs
into trouble with the DMA requirements on some platforms where the
beginning of the object has to be cache line aligned.

I dont know but it seems that making slob that sophisticated is counter
productive. Remove SLOB?
Vlastimil Babka Nov. 22, 2021, 10:45 a.m. UTC | #4
On 11/22/21 11:36, Christoph Lameter wrote:
> On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> 
>> But it seems there's no reason we couldn't do better? I.e. use the value of
>> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
>> define different than SLOB_HDR_SIZE). But the size of the header, where we
>> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
>> 64bit. The address of the header shouldn't have a reason to be also aligned
>> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
>> it and not the slab consumers which rely on those alignments?
> 
> Well the best way would be to put it at the end of the object in order to
> avoid the alignment problem. This is a particular issue with SLOB because
> it allows multiple types of objects in a single page frame.
> 
> If only one type of object would be allowed then the object size etc can
> be stored in the page struct.
> 
> So I guess placement at the beginning cannot be avoided. That in turn runs
> into trouble with the DMA requirements on some platforms where the
> beginning of the object has to be cache line aligned.

It's no problem to have the real beginning of the object aligned, and the
prepended header not. The code already does that before this patch for the
kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
ARCH_SLAB_MINALIGN.

> I dont know but it seems that making slob that sophisticated is counter
> productive. Remove SLOB?

I wouldn't mind, but somebody might :)
Christoph Lameter (Ampere) Nov. 22, 2021, 11:40 a.m. UTC | #5
On Mon, 22 Nov 2021, Vlastimil Babka wrote:

> It's no problem to have the real beginning of the object aligned, and the
> prepended header not. The code already does that before this patch for the
> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> ARCH_SLAB_MINALIGN.

Ok but then the first object in a page may still have those issues.

> > I dont know but it seems that making slob that sophisticated is counter
> > productive. Remove SLOB?
>
> I wouldn't mind, but somebody might :)

Well run a space efficiency analysis after this patch. If the memory used
is larger than SLUB (with the configuration for minimal data footprint)
then there is no reason for SLOB to continue.
Vlastimil Babka Nov. 22, 2021, 11:49 a.m. UTC | #6
On 11/22/21 12:40, Christoph Lameter wrote:
> On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> 
>> It's no problem to have the real beginning of the object aligned, and the
>> prepended header not. The code already does that before this patch for the
>> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
>> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
>> ARCH_SLAB_MINALIGN.
> 
> Ok but then the first object in a page may still have those issues.

Hmm, that's right. I guess we should also distinguish ARCH_KMALLOC_MINALIGN
for kmalloc paths, and ARCH_SLAB_MINALIGN for kmem_cache_alloc paths. Seems
the latter is generally smaller, thus some holes left by kmalloc allocations
could be filled later by kmem_cache_alloc allocations.

> 
>> > I dont know but it seems that making slob that sophisticated is counter
>> > productive. Remove SLOB?
>>
>> I wouldn't mind, but somebody might :)
> 
> Well run a space efficiency analysis after this patch. If the memory used
> is larger than SLUB (with the configuration for minimal data footprint)
> then there is no reason for SLOB to continue.

Makes sense.
David Laight Nov. 23, 2021, 10:18 a.m. UTC | #7
From: Vlastimil Babka
> Sent: 22 November 2021 10:46
> 
> On 11/22/21 11:36, Christoph Lameter wrote:
> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> >
> >> But it seems there's no reason we couldn't do better? I.e. use the value of
> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> >> 64bit. The address of the header shouldn't have a reason to be also aligned
> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> >> it and not the slab consumers which rely on those alignments?
> >
> > Well the best way would be to put it at the end of the object in order to
> > avoid the alignment problem. This is a particular issue with SLOB because
> > it allows multiple types of objects in a single page frame.
> >
> > If only one type of object would be allowed then the object size etc can
> > be stored in the page struct.

Or just a single byte that is the index of the associated free list structure.
For 32bit and for the smaller kmalloc() area it may be reasonable to have
a separate array indexed by the page of the address.

> > So I guess placement at the beginning cannot be avoided. That in turn runs
> > into trouble with the DMA requirements on some platforms where the
> > beginning of the object has to be cache line aligned.
> 
> It's no problem to have the real beginning of the object aligned, and the
> prepended header not.

I'm not sure that helps.
The header can't share a cache line with the previous item (because it
might be mapped for DMA) so will always take a full cache line.

There might me some strange scheme where you put the size at the end
and the offset of the 'last end' into the page struct.
The DMA API should let you safely read the size from an allocated
buffer - but you can't modify it.

There is also all the code that allocates 'power of 2' sized buffers
under the assumption they are efficient - as soon as you add a size
field that assumption just causes the sizes of item to (often) double.

	David

> The code already does that before this patch for the
> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> ARCH_SLAB_MINALIGN.
> 
> > I dont know but it seems that making slob that sophisticated is counter
> > productive. Remove SLOB?
> 
> I wouldn't mind, but somebody might :)
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Rustam Kovhaev Nov. 30, 2021, 7 a.m. UTC | #8
On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote:
> From: Vlastimil Babka
> > Sent: 22 November 2021 10:46
> > 
> > On 11/22/21 11:36, Christoph Lameter wrote:
> > > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> > >
> > >> But it seems there's no reason we couldn't do better? I.e. use the value of
> > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> > >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> > >> 64bit. The address of the header shouldn't have a reason to be also aligned
> > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> > >> it and not the slab consumers which rely on those alignments?
> > >
> > > Well the best way would be to put it at the end of the object in order to
> > > avoid the alignment problem. This is a particular issue with SLOB because
> > > it allows multiple types of objects in a single page frame.
> > >
> > > If only one type of object would be allowed then the object size etc can
> > > be stored in the page struct.
> 
> Or just a single byte that is the index of the associated free list structure.
> For 32bit and for the smaller kmalloc() area it may be reasonable to have
> a separate array indexed by the page of the address.
> 
> > > So I guess placement at the beginning cannot be avoided. That in turn runs
> > > into trouble with the DMA requirements on some platforms where the
> > > beginning of the object has to be cache line aligned.
> > 
> > It's no problem to have the real beginning of the object aligned, and the
> > prepended header not.
> 
> I'm not sure that helps.
> The header can't share a cache line with the previous item (because it
> might be mapped for DMA) so will always take a full cache line.

I thought that DMA API allocates buffers that are larger than page size.
DMA pool seems to be able to give out smaller buffers, but underneath it
seems to be calling page allocator.
The SLOB objects that have this header are all less than page size, and
they cannot end up in DMA code paths, or can they?

> 
> There might me some strange scheme where you put the size at the end
> and the offset of the 'last end' into the page struct.
> The DMA API should let you safely read the size from an allocated
> buffer - but you can't modify it.
> 
> There is also all the code that allocates 'power of 2' sized buffers
> under the assumption they are efficient - as soon as you add a size
> field that assumption just causes the sizes of item to (often) double.
> 
> 	David
> 
> > The code already does that before this patch for the
> > kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
> > bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
> > ARCH_SLAB_MINALIGN.
> > 
> > > I dont know but it seems that making slob that sophisticated is counter
> > > productive. Remove SLOB?
> > 
> > I wouldn't mind, but somebody might :)
> > 
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
David Laight Nov. 30, 2021, 9:23 a.m. UTC | #9
From: Rustam Kovhaev
> Sent: 30 November 2021 07:00
> 
> On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote:
> > From: Vlastimil Babka
> > > Sent: 22 November 2021 10:46
> > >
> > > On 11/22/21 11:36, Christoph Lameter wrote:
> > > > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> > > >
> > > >> But it seems there's no reason we couldn't do better? I.e. use the value of
> > > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> > > >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> > > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> > > >> 64bit. The address of the header shouldn't have a reason to be also aligned
> > > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> > > >> it and not the slab consumers which rely on those alignments?
> > > >
> > > > Well the best way would be to put it at the end of the object in order to
> > > > avoid the alignment problem. This is a particular issue with SLOB because
> > > > it allows multiple types of objects in a single page frame.
> > > >
...
> >
> > > > So I guess placement at the beginning cannot be avoided. That in turn runs
> > > > into trouble with the DMA requirements on some platforms where the
> > > > beginning of the object has to be cache line aligned.
> > >
> > > It's no problem to have the real beginning of the object aligned, and the
> > > prepended header not.
> >
> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
> 
> I thought that DMA API allocates buffers that are larger than page size.
> DMA pool seems to be able to give out smaller buffers, but underneath it
> seems to be calling page allocator.
> The SLOB objects that have this header are all less than page size, and
> they cannot end up in DMA code paths, or can they?

The problem isn't dma_alloc_coherent() it is when memory allocated
elsewhere is used for DMA.
On systems with non-coherent DMA accesses the data cache has to be
flushed before all and invalidated after read DMA transfers.
The cpu must not dirty any of the cache lines associated with a read DMA.

This is on top of any requirements for the alignment of the returned address.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Christoph Lameter (Ampere) Nov. 30, 2021, 9:41 a.m. UTC | #10
On Mon, 29 Nov 2021, Rustam Kovhaev wrote:

> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
>
> I thought that DMA API allocates buffers that are larger than page size.
> DMA pool seems to be able to give out smaller buffers, but underneath it
> seems to be calling page allocator.
> The SLOB objects that have this header are all less than page size, and
> they cannot end up in DMA code paths, or can they?

kmalloc slab allocations must return dmaable memory. If the underlying
hardware can only dma to a cache line border then all objects must be
aligned that way.
Vlastimil Babka Nov. 30, 2021, 2:55 p.m. UTC | #11
On 11/23/21 11:18, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 22 November 2021 10:46
>> 
>> On 11/22/21 11:36, Christoph Lameter wrote:
>> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
>> >
>> >> But it seems there's no reason we couldn't do better? I.e. use the value of
>> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
>> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
>> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
>> >> 64bit. The address of the header shouldn't have a reason to be also aligned
>> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
>> >> it and not the slab consumers which rely on those alignments?
>> >
>> > Well the best way would be to put it at the end of the object in order to
>> > avoid the alignment problem. This is a particular issue with SLOB because
>> > it allows multiple types of objects in a single page frame.
>> >
>> > If only one type of object would be allowed then the object size etc can
>> > be stored in the page struct.
> 
> Or just a single byte that is the index of the associated free list structure.
> For 32bit and for the smaller kmalloc() area it may be reasonable to have
> a separate array indexed by the page of the address.
> 
>> > So I guess placement at the beginning cannot be avoided. That in turn runs
>> > into trouble with the DMA requirements on some platforms where the
>> > beginning of the object has to be cache line aligned.
>> 
>> It's no problem to have the real beginning of the object aligned, and the
>> prepended header not.
> 
> I'm not sure that helps.
> The header can't share a cache line with the previous item (because it
> might be mapped for DMA) so will always take a full cache line.

So if this is true, then I think we already have a problem with SLOB today
(and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
older).

Let's say we are on arm64 where (AFAICS):
ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
ARCH_SLAB_MINALIGN = 64

The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.

Let's say we call kmalloc(64) and get a completely fresh page.
In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
It will call slob_alloc() for size of size+minalign=64+128=192, align and
align_offset = 128
Thus the allocation will use 128 bytes for the header, 64 for the object.
Both the header and object aligned to 128 bytes.
But the remaining 64 bytes of the second 128 bytes will remain free, as the
allocated size is 192 bytes:

| 128B header, aligned | 64B object | 64B free | rest also free |

If there's another kmalloc allocation, the 128 bytes aligment due to
ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
bytes that share the 128 block where the previous kmalloc allocation lies.

So either I missed something or we violate the rule that kmalloc() provides
blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their
beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area
containing the allocated object.

> There might me some strange scheme where you put the size at the end
> and the offset of the 'last end' into the page struct.
> The DMA API should let you safely read the size from an allocated
> buffer - but you can't modify it.
> 
> There is also all the code that allocates 'power of 2' sized buffers
> under the assumption they are efficient - as soon as you add a size
> field that assumption just causes the sizes of item to (often) double.
> 
> 	David
> 
>> The code already does that before this patch for the
>> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256
>> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN /
>> ARCH_SLAB_MINALIGN.
>> 
>> > I dont know but it seems that making slob that sophisticated is counter
>> > productive. Remove SLOB?
>> 
>> I wouldn't mind, but somebody might :)
>> 
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
David Laight Nov. 30, 2021, 3:21 p.m. UTC | #12
From: Vlastimil Babka
> Sent: 30 November 2021 14:56
> 
> On 11/23/21 11:18, David Laight wrote:
> > From: Vlastimil Babka
> >> Sent: 22 November 2021 10:46
> >>
> >> On 11/22/21 11:36, Christoph Lameter wrote:
> >> > On Mon, 22 Nov 2021, Vlastimil Babka wrote:
> >> >
> >> >> But it seems there's no reason we couldn't do better? I.e. use the value of
> >> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the
> >> >> define different than SLOB_HDR_SIZE). But the size of the header, where we
> >> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on
> >> >> 64bit. The address of the header shouldn't have a reason to be also aligned
> >> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes
> >> >> it and not the slab consumers which rely on those alignments?
> >> >
> >> > Well the best way would be to put it at the end of the object in order to
> >> > avoid the alignment problem. This is a particular issue with SLOB because
> >> > it allows multiple types of objects in a single page frame.
> >> >
> >> > If only one type of object would be allowed then the object size etc can
> >> > be stored in the page struct.
> >
> > Or just a single byte that is the index of the associated free list structure.
> > For 32bit and for the smaller kmalloc() area it may be reasonable to have
> > a separate array indexed by the page of the address.
> >
> >> > So I guess placement at the beginning cannot be avoided. That in turn runs
> >> > into trouble with the DMA requirements on some platforms where the
> >> > beginning of the object has to be cache line aligned.
> >>
> >> It's no problem to have the real beginning of the object aligned, and the
> >> prepended header not.
> >
> > I'm not sure that helps.
> > The header can't share a cache line with the previous item (because it
> > might be mapped for DMA) so will always take a full cache line.
> 
> So if this is true, then I think we already have a problem with SLOB today
> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
> older).
> 
> Let's say we are on arm64 where (AFAICS):
> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
> ARCH_SLAB_MINALIGN = 64

Is that valid?
Isn't SLAB being used to implement kmalloc() so the architecture
defined alignment must apply?

> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.
> 
> Let's say we call kmalloc(64) and get a completely fresh page.
> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
> It will call slob_alloc() for size of size+minalign=64+128=192, align and
> align_offset = 128
> Thus the allocation will use 128 bytes for the header, 64 for the object.
> Both the header and object aligned to 128 bytes.
> But the remaining 64 bytes of the second 128 bytes will remain free, as the
> allocated size is 192 bytes:
> 
> | 128B header, aligned | 64B object | 64B free | rest also free |

That is horribly wasteful on memory :-)

> If there's another kmalloc allocation, the 128 bytes aligment due to
> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
> bytes that share the 128 block where the previous kmalloc allocation lies.

If the memory returned by kmem_cache_alloc() can be used for DMA then
ARCH_DMA_MINALIGN has to apply to the returned buffers.
So, maybe, that cache can't exist?

I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple
of that size.
More particularly the rest of the area can't be allocated to anything else.
So it ought to be valid to return the 2nd half of a 128 byte cache line
provided the first half isn't written while the allocation is active.

But that ARCH_KMALLOC_MINALIGN only applies to 'large' items?
Small items only need aligning to the power of 2 below their size.
So 8 bytes items only need 8 byte alignment even though a larger
item might need (say) 64 byte alignment.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Christoph Lameter (Ampere) Nov. 30, 2021, 3:26 p.m. UTC | #13
On Tue, 30 Nov 2021, Vlastimil Babka wrote:

> So either I missed something or we violate the rule that kmalloc() provides
> blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their
> beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area
> containing the allocated object.

Indeed.... The DMA API documentation in the kernel states:

Documentation/DMA-API-HOWTO.rst

2) ARCH_KMALLOC_MINALIGN

+   Architectures must ensure that kmalloc'ed buffer is
+   DMA-safe. Drivers and subsystems depend on it. If an architecture
+   isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
+   the CPU cache is identical to data in main memory),
+   ARCH_KMALLOC_MINALIGN must be set so that the memory allocator
+   makes sure that kmalloc'ed buffer doesn't share a cache line with
+   the others. See arch/arm/include/asm/cache.h as an example.


Note that this is only the case for kmalloc. Not for a slab cache setup
separately from the kmalloc array. That is why ARCH_KMALLOC_MINALIGN
exists.
Vlastimil Babka Nov. 30, 2021, 3:39 p.m. UTC | #14
On 11/30/21 16:21, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 30 November 2021 14:56
>> 
>> On 11/23/21 11:18, David Laight wrote:
>> > From: Vlastimil Babka
>> >
>> > Or just a single byte that is the index of the associated free list structure.
>> > For 32bit and for the smaller kmalloc() area it may be reasonable to have
>> > a separate array indexed by the page of the address.
>> >
>> >> > So I guess placement at the beginning cannot be avoided. That in turn runs
>> >> > into trouble with the DMA requirements on some platforms where the
>> >> > beginning of the object has to be cache line aligned.
>> >>
>> >> It's no problem to have the real beginning of the object aligned, and the
>> >> prepended header not.
>> >
>> > I'm not sure that helps.
>> > The header can't share a cache line with the previous item (because it
>> > might be mapped for DMA) so will always take a full cache line.
>> 
>> So if this is true, then I think we already have a problem with SLOB today
>> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
>> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
>> older).
>> 
>> Let's say we are on arm64 where (AFAICS):
>> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
>> ARCH_SLAB_MINALIGN = 64
> 
> Is that valid?
> Isn't SLAB being used to implement kmalloc() so the architecture
> defined alignment must apply?

SLAB is used to implement kmalloc() yes, but that's an implementation
detail. I assume that we provide these DMA guarantees to all kmalloc() users
as we don't know which will use it for DMA and which not, but if somebody
creates their specific SLAB cache, they have to decide explicitly if they
are going to use DMA with those objects, and request such alignment if yes.
If not, we can use smaller alignment that's only required by e.g. the CPU.

>> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.
>> 
>> Let's say we call kmalloc(64) and get a completely fresh page.
>> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
>> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
>> It will call slob_alloc() for size of size+minalign=64+128=192, align and
>> align_offset = 128
>> Thus the allocation will use 128 bytes for the header, 64 for the object.
>> Both the header and object aligned to 128 bytes.
>> But the remaining 64 bytes of the second 128 bytes will remain free, as the
>> allocated size is 192 bytes:
>> 
>> | 128B header, aligned | 64B object | 64B free | rest also free |
> 
> That is horribly wasteful on memory :-)

Yes. I don't know how this historically came to be for SLOB, which was
supposed to minimize memory usage (at the expense of cpu efficiency). But
the point raised in this thread that if we extend this to all
kmem_cache_alloc() allocations to make them possible to free with kfree(),
we'll make it a lot worse :/

>> If there's another kmalloc allocation, the 128 bytes aligment due to
>> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
>> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
>> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
>> bytes that share the 128 block where the previous kmalloc allocation lies.
> 
> If the memory returned by kmem_cache_alloc() can be used for DMA then
> ARCH_DMA_MINALIGN has to apply to the returned buffers.
> So, maybe, that cache can't exist?

See above, I assume that cache cannot be used for DMA. But if we are trying
to protect the kmalloc(64) DMA guarantees, then that shouldn't depend on the
guarantees of the second, unrelated allocation?

> I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple
> of that size.

Doesn't seem so. That would indeed fix the problem, assuming it really is a
problem (yet seems nobody reported it occuring in practice).

> More particularly the rest of the area can't be allocated to anything else.
> So it ought to be valid to return the 2nd half of a 128 byte cache line
> provided the first half isn't written while the allocation is active.

As the allocator cannot know when the first half will be used for DMA by
whoever allocated it, we can only assume it can happen at any time, and not
return the 2nd half, ever?

> But that ARCH_KMALLOC_MINALIGN only applies to 'large' items?
> Small items only need aligning to the power of 2 below their size.
> So 8 bytes items only need 8 byte alignment even though a larger
> item might need (say) 64 byte alignment.

But if we never defined such threshold (that would probably have to be arch
independent) then we can't start making such assumptions today, as we don't
know which kmalloc() users do expect DMA and which not? It would have to be
a flag or something. And yeah there is already a __GFP_DMA flag, but it
means something a bit different...

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
diff mbox series

Patch

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 5954ddf6ee13..31806d5ebeec 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -170,7 +170,11 @@  should be used if a part of the cache might be copied to the userspace.
 After the cache is created kmem_cache_alloc() and its convenience
 wrappers can allocate memory from that cache.
 
-When the allocated memory is no longer needed it must be freed. You can
-use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
-`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
-don't forget to destroy the cache with kmem_cache_destroy().
+When the allocated memory is no longer needed it must be freed.
+Objects allocated by `kmalloc` can be freed with `kfree` or `kvfree`.
+Objects allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`
+or also by `kfree` or `kvfree`.
+Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
+Memory allocated by `kvmalloc` can be freed with `kvfree`.
+Caches created by `kmem_cache_create` should be freed with
+`kmem_cache_destroy`.
diff --git a/mm/slob.c b/mm/slob.c
index 03deee1e6a94..5cc49102e070 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -25,23 +25,18 @@ 
  * into the free list in address order, so this is effectively an
  * address-ordered first fit.
  *
- * Above this is an implementation of kmalloc/kfree. Blocks returned
- * from kmalloc are prepended with a 4-byte header with the kmalloc size.
- * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls
- * alloc_pages() directly, allocating compound pages so the page order
- * does not have to be separately tracked.
- * These objects are detected in kfree() because PageSlab()
- * is false for them.
+ * Blocks that are less than (PAGE_SIZE - SLOB_HDR_SIZE) are prepended with
+ * a 4-byte header with the size. Larger blocks do not have the header and
+ * SLOB calls alloc_pages() directly, allocating compound pages so the
+ * page order does not have to be separately tracked. These objects are
+ * detected in kfree() because PageSlab() is false for them.
  *
  * SLAB is emulated on top of SLOB by simply calling constructors and
  * destructors for every SLAB allocation. Objects are returned with the
  * 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which
  * case the low-level allocator will fragment blocks to create the proper
- * alignment. Again, objects of page-size or greater are allocated by
- * calling alloc_pages(). As SLAB objects know their size, no separate
- * size bookkeeping is necessary and there is essentially no allocation
- * space overhead, and compound pages aren't needed for multi-page
- * allocations.
+ * alignment. Again, objects of (PAGE_SIZE - SLOB_HDR_SIZE) or greater are
+ * allocated by calling alloc_pages().
  *
  * NUMA support in SLOB is fairly simplistic, pushing most of the real
  * logic down to the page allocator, and simply doing the node accounting
@@ -88,6 +83,8 @@  typedef s16 slobidx_t;
 typedef s32 slobidx_t;
 #endif
 
+#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN)
+
 struct slob_block {
 	slobidx_t units;
 };
@@ -207,12 +204,14 @@  static void *slob_new_pages(gfp_t gfp, int order, int node)
 	return page_address(page);
 }
 
-static void slob_free_pages(void *b, int order)
+static void slob_free_pages(struct page *sp, int order)
 {
-	struct page *sp = virt_to_page(b);
-
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += 1 << order;
+	if (PageSlab(sp)) {
+		__ClearPageSlab(sp);
+		page_mapcount_reset(sp);
+		if (current->reclaim_state)
+			current->reclaim_state->reclaimed_slab += 1 << order;
+	}
 
 	mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
 			    -(PAGE_SIZE << order));
@@ -247,9 +246,7 @@  static void *slob_page_alloc(struct page *sp, size_t size, int align,
 		/*
 		 * 'aligned' will hold the address of the slob block so that the
 		 * address 'aligned'+'align_offset' is aligned according to the
-		 * 'align' parameter. This is for kmalloc() which prepends the
-		 * allocated block with its size, so that the block itself is
-		 * aligned when needed.
+		 * 'align' parameter.
 		 */
 		if (align) {
 			aligned = (slob_t *)
@@ -298,8 +295,7 @@  static void *slob_page_alloc(struct page *sp, size_t size, int align,
 /*
  * slob_alloc: entry point into the slob allocator.
  */
-static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
-							int align_offset)
+static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 {
 	struct page *sp;
 	struct list_head *slob_list;
@@ -307,6 +303,7 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
 	unsigned long flags;
 	bool _unused;
 
+	size += SLOB_HDR_SIZE;
 	if (size < SLOB_BREAK1)
 		slob_list = &free_slob_small;
 	else if (size < SLOB_BREAK2)
@@ -330,7 +327,7 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
 		if (sp->units < SLOB_UNITS(size))
 			continue;
 
-		b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list);
+		b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &page_removed_from_list);
 		if (!b)
 			continue;
 
@@ -367,31 +364,32 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
 		INIT_LIST_HEAD(&sp->slab_list);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
-		b = slob_page_alloc(sp, size, align, align_offset, &_unused);
+		b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &_unused);
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
 	if (unlikely(gfp & __GFP_ZERO))
 		memset(b, 0, size);
-	return b;
+	/* Write size in the header */
+	*(unsigned int *)b = size - SLOB_HDR_SIZE;
+	return (void *)b + SLOB_HDR_SIZE;
 }
 
 /*
  * slob_free: entry point into the slob allocator.
  */
-static void slob_free(void *block, int size)
+static void slob_free(void *block)
 {
 	struct page *sp;
-	slob_t *prev, *next, *b = (slob_t *)block;
+	slob_t *prev, *next, *b = block - SLOB_HDR_SIZE;
+	unsigned int size;
 	slobidx_t units;
 	unsigned long flags;
 	struct list_head *slob_list;
 
-	if (unlikely(ZERO_OR_NULL_PTR(block)))
-		return;
-	BUG_ON(!size);
-
-	sp = virt_to_page(block);
+	size = *(unsigned int *)b + SLOB_HDR_SIZE;
+	BUG_ON(!size || size >= PAGE_SIZE);
+	sp = virt_to_page(b);
 	units = SLOB_UNITS(size);
 
 	spin_lock_irqsave(&slob_lock, flags);
@@ -401,9 +399,7 @@  static void slob_free(void *block, int size)
 		if (slob_page_free(sp))
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
-		__ClearPageSlab(sp);
-		page_mapcount_reset(sp);
-		slob_free_pages(b, 0);
+		slob_free_pages(sp, 0);
 		return;
 	}
 
@@ -476,36 +472,29 @@  void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
 static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 {
-	unsigned int *m;
-	int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
 	void *ret;
 
 	gfp &= gfp_allowed_mask;
 
 	might_alloc(gfp);
 
-	if (size < PAGE_SIZE - minalign) {
-		int align = minalign;
+	if (size < PAGE_SIZE - SLOB_HDR_SIZE) {
+		int align = SLOB_HDR_SIZE;
 
 		/*
 		 * For power of two sizes, guarantee natural alignment for
 		 * kmalloc()'d objects.
 		 */
 		if (is_power_of_2(size))
-			align = max(minalign, (int) size);
+			align = max(align, (int) size);
 
 		if (!size)
 			return ZERO_SIZE_PTR;
 
-		m = slob_alloc(size + minalign, gfp, align, node, minalign);
-
-		if (!m)
-			return NULL;
-		*m = size;
-		ret = (void *)m + minalign;
+		ret = slob_alloc(size, gfp, align, node);
 
 		trace_kmalloc_node(caller, ret,
-				   size, size + minalign, gfp, node);
+				   size, size + SLOB_HDR_SIZE, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -553,26 +542,17 @@  void kfree(const void *block)
 	kmemleak_free(block);
 
 	sp = virt_to_page(block);
-	if (PageSlab(sp)) {
-		int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
-		unsigned int *m = (unsigned int *)(block - align);
-		slob_free(m, *m + align);
-	} else {
-		unsigned int order = compound_order(sp);
-		mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-				    -(PAGE_SIZE << order));
-		__free_pages(sp, order);
-
-	}
+	if (PageSlab(sp))
+		slob_free((void *)block);
+	else
+		slob_free_pages(sp, compound_order(sp));
 }
 EXPORT_SYMBOL(kfree);
 
-/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
 size_t __ksize(const void *block)
 {
 	struct page *sp;
-	int align;
-	unsigned int *m;
+	unsigned int size;
 
 	BUG_ON(!block);
 	if (unlikely(block == ZERO_SIZE_PTR))
@@ -582,9 +562,8 @@  size_t __ksize(const void *block)
 	if (unlikely(!PageSlab(sp)))
 		return page_size(sp);
 
-	align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
-	m = (unsigned int *)(block - align);
-	return SLOB_UNITS(*m) * SLOB_UNIT;
+	size = *(unsigned int *)(block - SLOB_HDR_SIZE);
+	return SLOB_UNITS(size) * SLOB_UNIT;
 }
 EXPORT_SYMBOL(__ksize);
 
@@ -606,16 +585,19 @@  static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	might_alloc(flags);
 
-	if (c->size < PAGE_SIZE) {
-		b = slob_alloc(c->size, flags, c->align, node, 0);
+	if (c->size < PAGE_SIZE - SLOB_HDR_SIZE) {
+		b = slob_alloc(c->size, flags, c->align, node);
 		trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
-					    SLOB_UNITS(c->size) * SLOB_UNIT,
+					    SLOB_UNITS(c->size + SLOB_HDR_SIZE) * SLOB_UNIT,
 					    flags, node);
 	} else {
-		b = slob_new_pages(flags, get_order(c->size), node);
+		unsigned int order = get_order(c->size);
+
+		if (likely(order))
+			flags |= __GFP_COMP;
+		b = slob_new_pages(flags, order, node);
 		trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
-					    PAGE_SIZE << get_order(c->size),
-					    flags, node);
+					    PAGE_SIZE << order, flags, node);
 	}
 
 	if (b && c->ctor) {
@@ -647,12 +629,18 @@  void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 #endif
 
-static void __kmem_cache_free(void *b, int size)
+static void __kmem_cache_free(void *b)
 {
-	if (size < PAGE_SIZE)
-		slob_free(b, size);
+	struct page *sp;
+
+	if (unlikely(ZERO_OR_NULL_PTR(b)))
+		return;
+
+	sp = virt_to_page(b);
+	if (PageSlab(sp))
+		slob_free(b);
 	else
-		slob_free_pages(b, get_order(size));
+		slob_free_pages(sp, compound_order(sp));
 }
 
 static void kmem_rcu_free(struct rcu_head *head)
@@ -660,7 +648,7 @@  static void kmem_rcu_free(struct rcu_head *head)
 	struct slob_rcu *slob_rcu = (struct slob_rcu *)head;
 	void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu));
 
-	__kmem_cache_free(b, slob_rcu->size);
+	__kmem_cache_free(b);
 }
 
 void kmem_cache_free(struct kmem_cache *c, void *b)
@@ -673,7 +661,7 @@  void kmem_cache_free(struct kmem_cache *c, void *b)
 		slob_rcu->size = c->size;
 		call_rcu(&slob_rcu->head, kmem_rcu_free);
 	} else {
-		__kmem_cache_free(b, c->size);
+		__kmem_cache_free(b);
 	}
 }
 EXPORT_SYMBOL(kmem_cache_free);