Message ID | 20211122013026.909933-1-rkovhaev@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] slob: add size header to all allocations | expand |
On Sun, 21 Nov 2021, Rustam Kovhaev wrote: > Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the > size header. > It simplifies the slab API and guarantees that both kmem_cache_alloc() > and kmalloc() memory could be freed by kfree(). > > meminfo right after the system boot, x86-64 on xfs, without the patch: > Slab: 34700 kB > > the same, with the patch: > Slab: 35752 kB > +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes. > @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, > unsigned long flags; > bool _unused; > > + size += SLOB_HDR_SIZE; And every object now has this overhead? 128 bytes extra in extreme cases per object? > - if (size < PAGE_SIZE - minalign) { > - int align = minalign; > + if (size < PAGE_SIZE - SLOB_HDR_SIZE) { > + int align = SLOB_HDR_SIZE; And the object is also aligned to 128 bytes boundaries on some architectures. So a 4 byte object occupies 256 bytes in SLOB? SLOB will no longer be a low memory overhead allocator then.
On 11/22/21 10:22, Christoph Lameter wrote: > On Sun, 21 Nov 2021, Rustam Kovhaev wrote: > >> Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the >> size header. >> It simplifies the slab API and guarantees that both kmem_cache_alloc() >> and kmalloc() memory could be freed by kfree(). >> >> meminfo right after the system boot, x86-64 on xfs, without the patch: >> Slab: 34700 kB >> >> the same, with the patch: >> Slab: 35752 kB > >> +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) > > Ok that is up to 128 bytes on some architectues. Mostly 32 or 64 bytes. > >> @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, >> unsigned long flags; >> bool _unused; >> >> + size += SLOB_HDR_SIZE; > > And every object now has this overhead? 128 bytes extra in extreme cases > per object? > > >> - if (size < PAGE_SIZE - minalign) { >> - int align = minalign; >> + if (size < PAGE_SIZE - SLOB_HDR_SIZE) { >> + int align = SLOB_HDR_SIZE; > > And the object is also aligned to 128 bytes boundaries on some > architectures. > > So a 4 byte object occupies 256 bytes in SLOB? > > SLOB will no longer be a low memory overhead allocator then. Hm good point, didn't realize those MINALIGN constants can be that large. I think this overhead was already the case with SLOB for kmalloc caches, but now it would be worse, as it would be for all kmem caches. But it seems there's no reason we couldn't do better? I.e. use the value of SLOB_HDR_SIZE only to align the beginning of actual object (and name the define different than SLOB_HDR_SIZE). But the size of the header, where we store the object lenght could be just a native word - 4 bytes on 32bit, 8 on 64bit. The address of the header shouldn't have a reason to be also aligned to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes it and not the slab consumers which rely on those alignments?
On Mon, 22 Nov 2021, Vlastimil Babka wrote: > But it seems there's no reason we couldn't do better? I.e. use the value of > SLOB_HDR_SIZE only to align the beginning of actual object (and name the > define different than SLOB_HDR_SIZE). But the size of the header, where we > store the object lenght could be just a native word - 4 bytes on 32bit, 8 on > 64bit. The address of the header shouldn't have a reason to be also aligned > to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes > it and not the slab consumers which rely on those alignments? Well the best way would be to put it at the end of the object in order to avoid the alignment problem. This is a particular issue with SLOB because it allows multiple types of objects in a single page frame. If only one type of object would be allowed then the object size etc can be stored in the page struct. So I guess placement at the beginning cannot be avoided. That in turn runs into trouble with the DMA requirements on some platforms where the beginning of the object has to be cache line aligned. I dont know but it seems that making slob that sophisticated is counter productive. Remove SLOB?
On 11/22/21 11:36, Christoph Lameter wrote: > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > >> But it seems there's no reason we couldn't do better? I.e. use the value of >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the >> define different than SLOB_HDR_SIZE). But the size of the header, where we >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on >> 64bit. The address of the header shouldn't have a reason to be also aligned >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes >> it and not the slab consumers which rely on those alignments? > > Well the best way would be to put it at the end of the object in order to > avoid the alignment problem. This is a particular issue with SLOB because > it allows multiple types of objects in a single page frame. > > If only one type of object would be allowed then the object size etc can > be stored in the page struct. > > So I guess placement at the beginning cannot be avoided. That in turn runs > into trouble with the DMA requirements on some platforms where the > beginning of the object has to be cache line aligned. It's no problem to have the real beginning of the object aligned, and the prepended header not. The code already does that before this patch for the kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN. > I dont know but it seems that making slob that sophisticated is counter > productive. Remove SLOB? I wouldn't mind, but somebody might :)
On Mon, 22 Nov 2021, Vlastimil Babka wrote: > It's no problem to have the real beginning of the object aligned, and the > prepended header not. The code already does that before this patch for the > kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 > bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / > ARCH_SLAB_MINALIGN. Ok but then the first object in a page may still have those issues. > > I dont know but it seems that making slob that sophisticated is counter > > productive. Remove SLOB? > > I wouldn't mind, but somebody might :) Well run a space efficiency analysis after this patch. If the memory used is larger than SLUB (with the configuration for minimal data footprint) then there is no reason for SLOB to continue.
On 11/22/21 12:40, Christoph Lameter wrote: > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > >> It's no problem to have the real beginning of the object aligned, and the >> prepended header not. The code already does that before this patch for the >> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 >> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / >> ARCH_SLAB_MINALIGN. > > Ok but then the first object in a page may still have those issues. Hmm, that's right. I guess we should also distinguish ARCH_KMALLOC_MINALIGN for kmalloc paths, and ARCH_SLAB_MINALIGN for kmem_cache_alloc paths. Seems the latter is generally smaller, thus some holes left by kmalloc allocations could be filled later by kmem_cache_alloc allocations. > >> > I dont know but it seems that making slob that sophisticated is counter >> > productive. Remove SLOB? >> >> I wouldn't mind, but somebody might :) > > Well run a space efficiency analysis after this patch. If the memory used > is larger than SLUB (with the configuration for minimal data footprint) > then there is no reason for SLOB to continue. Makes sense.
From: Vlastimil Babka > Sent: 22 November 2021 10:46 > > On 11/22/21 11:36, Christoph Lameter wrote: > > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > > > >> But it seems there's no reason we couldn't do better? I.e. use the value of > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the > >> define different than SLOB_HDR_SIZE). But the size of the header, where we > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on > >> 64bit. The address of the header shouldn't have a reason to be also aligned > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes > >> it and not the slab consumers which rely on those alignments? > > > > Well the best way would be to put it at the end of the object in order to > > avoid the alignment problem. This is a particular issue with SLOB because > > it allows multiple types of objects in a single page frame. > > > > If only one type of object would be allowed then the object size etc can > > be stored in the page struct. Or just a single byte that is the index of the associated free list structure. For 32bit and for the smaller kmalloc() area it may be reasonable to have a separate array indexed by the page of the address. > > So I guess placement at the beginning cannot be avoided. That in turn runs > > into trouble with the DMA requirements on some platforms where the > > beginning of the object has to be cache line aligned. > > It's no problem to have the real beginning of the object aligned, and the > prepended header not. I'm not sure that helps. The header can't share a cache line with the previous item (because it might be mapped for DMA) so will always take a full cache line. There might me some strange scheme where you put the size at the end and the offset of the 'last end' into the page struct. The DMA API should let you safely read the size from an allocated buffer - but you can't modify it. There is also all the code that allocates 'power of 2' sized buffers under the assumption they are efficient - as soon as you add a size field that assumption just causes the sizes of item to (often) double. David > The code already does that before this patch for the > kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 > bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / > ARCH_SLAB_MINALIGN. > > > I dont know but it seems that making slob that sophisticated is counter > > productive. Remove SLOB? > > I wouldn't mind, but somebody might :) > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote: > From: Vlastimil Babka > > Sent: 22 November 2021 10:46 > > > > On 11/22/21 11:36, Christoph Lameter wrote: > > > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > > > > > >> But it seems there's no reason we couldn't do better? I.e. use the value of > > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the > > >> define different than SLOB_HDR_SIZE). But the size of the header, where we > > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on > > >> 64bit. The address of the header shouldn't have a reason to be also aligned > > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes > > >> it and not the slab consumers which rely on those alignments? > > > > > > Well the best way would be to put it at the end of the object in order to > > > avoid the alignment problem. This is a particular issue with SLOB because > > > it allows multiple types of objects in a single page frame. > > > > > > If only one type of object would be allowed then the object size etc can > > > be stored in the page struct. > > Or just a single byte that is the index of the associated free list structure. > For 32bit and for the smaller kmalloc() area it may be reasonable to have > a separate array indexed by the page of the address. > > > > So I guess placement at the beginning cannot be avoided. That in turn runs > > > into trouble with the DMA requirements on some platforms where the > > > beginning of the object has to be cache line aligned. > > > > It's no problem to have the real beginning of the object aligned, and the > > prepended header not. > > I'm not sure that helps. > The header can't share a cache line with the previous item (because it > might be mapped for DMA) so will always take a full cache line. I thought that DMA API allocates buffers that are larger than page size. DMA pool seems to be able to give out smaller buffers, but underneath it seems to be calling page allocator. The SLOB objects that have this header are all less than page size, and they cannot end up in DMA code paths, or can they? > > There might me some strange scheme where you put the size at the end > and the offset of the 'last end' into the page struct. > The DMA API should let you safely read the size from an allocated > buffer - but you can't modify it. > > There is also all the code that allocates 'power of 2' sized buffers > under the assumption they are efficient - as soon as you add a size > field that assumption just causes the sizes of item to (often) double. > > David > > > The code already does that before this patch for the > > kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 > > bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / > > ARCH_SLAB_MINALIGN. > > > > > I dont know but it seems that making slob that sophisticated is counter > > > productive. Remove SLOB? > > > > I wouldn't mind, but somebody might :) > > > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
From: Rustam Kovhaev > Sent: 30 November 2021 07:00 > > On Tue, Nov 23, 2021 at 10:18:27AM +0000, David Laight wrote: > > From: Vlastimil Babka > > > Sent: 22 November 2021 10:46 > > > > > > On 11/22/21 11:36, Christoph Lameter wrote: > > > > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > > > > > > > >> But it seems there's no reason we couldn't do better? I.e. use the value of > > > >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the > > > >> define different than SLOB_HDR_SIZE). But the size of the header, where we > > > >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on > > > >> 64bit. The address of the header shouldn't have a reason to be also aligned > > > >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes > > > >> it and not the slab consumers which rely on those alignments? > > > > > > > > Well the best way would be to put it at the end of the object in order to > > > > avoid the alignment problem. This is a particular issue with SLOB because > > > > it allows multiple types of objects in a single page frame. > > > > ... > > > > > > So I guess placement at the beginning cannot be avoided. That in turn runs > > > > into trouble with the DMA requirements on some platforms where the > > > > beginning of the object has to be cache line aligned. > > > > > > It's no problem to have the real beginning of the object aligned, and the > > > prepended header not. > > > > I'm not sure that helps. > > The header can't share a cache line with the previous item (because it > > might be mapped for DMA) so will always take a full cache line. > > I thought that DMA API allocates buffers that are larger than page size. > DMA pool seems to be able to give out smaller buffers, but underneath it > seems to be calling page allocator. > The SLOB objects that have this header are all less than page size, and > they cannot end up in DMA code paths, or can they? The problem isn't dma_alloc_coherent() it is when memory allocated elsewhere is used for DMA. On systems with non-coherent DMA accesses the data cache has to be flushed before all and invalidated after read DMA transfers. The cpu must not dirty any of the cache lines associated with a read DMA. This is on top of any requirements for the alignment of the returned address. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 29 Nov 2021, Rustam Kovhaev wrote: > > I'm not sure that helps. > > The header can't share a cache line with the previous item (because it > > might be mapped for DMA) so will always take a full cache line. > > I thought that DMA API allocates buffers that are larger than page size. > DMA pool seems to be able to give out smaller buffers, but underneath it > seems to be calling page allocator. > The SLOB objects that have this header are all less than page size, and > they cannot end up in DMA code paths, or can they? kmalloc slab allocations must return dmaable memory. If the underlying hardware can only dma to a cache line border then all objects must be aligned that way.
On 11/23/21 11:18, David Laight wrote: > From: Vlastimil Babka >> Sent: 22 November 2021 10:46 >> >> On 11/22/21 11:36, Christoph Lameter wrote: >> > On Mon, 22 Nov 2021, Vlastimil Babka wrote: >> > >> >> But it seems there's no reason we couldn't do better? I.e. use the value of >> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the >> >> define different than SLOB_HDR_SIZE). But the size of the header, where we >> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on >> >> 64bit. The address of the header shouldn't have a reason to be also aligned >> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes >> >> it and not the slab consumers which rely on those alignments? >> > >> > Well the best way would be to put it at the end of the object in order to >> > avoid the alignment problem. This is a particular issue with SLOB because >> > it allows multiple types of objects in a single page frame. >> > >> > If only one type of object would be allowed then the object size etc can >> > be stored in the page struct. > > Or just a single byte that is the index of the associated free list structure. > For 32bit and for the smaller kmalloc() area it may be reasonable to have > a separate array indexed by the page of the address. > >> > So I guess placement at the beginning cannot be avoided. That in turn runs >> > into trouble with the DMA requirements on some platforms where the >> > beginning of the object has to be cache line aligned. >> >> It's no problem to have the real beginning of the object aligned, and the >> prepended header not. > > I'm not sure that helps. > The header can't share a cache line with the previous item (because it > might be mapped for DMA) so will always take a full cache line. So if this is true, then I think we already have a problem with SLOB today (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but older). Let's say we are on arm64 where (AFAICS): ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128 ARCH_SLAB_MINALIGN = 64 The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN. Let's say we call kmalloc(64) and get a completely fresh page. In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128. It will call slob_alloc() for size of size+minalign=64+128=192, align and align_offset = 128 Thus the allocation will use 128 bytes for the header, 64 for the object. Both the header and object aligned to 128 bytes. But the remaining 64 bytes of the second 128 bytes will remain free, as the allocated size is 192 bytes: | 128B header, aligned | 64B object | 64B free | rest also free | If there's another kmalloc allocation, the 128 bytes aligment due to ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's fine. But if there's a kmem_cache_alloc() from a cache serving <=64B objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64 bytes that share the 128 block where the previous kmalloc allocation lies. So either I missed something or we violate the rule that kmalloc() provides blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area containing the allocated object. > There might me some strange scheme where you put the size at the end > and the offset of the 'last end' into the page struct. > The DMA API should let you safely read the size from an allocated > buffer - but you can't modify it. > > There is also all the code that allocates 'power of 2' sized buffers > under the assumption they are efficient - as soon as you add a size > field that assumption just causes the sizes of item to (often) double. > > David > >> The code already does that before this patch for the >> kmalloc power-of-two alignments, where e.g. the object can be aligned to 256 >> bytes, but the prepended header to a smaller ARCH_KMALLOC_MINALIGN / >> ARCH_SLAB_MINALIGN. >> >> > I dont know but it seems that making slob that sophisticated is counter >> > productive. Remove SLOB? >> >> I wouldn't mind, but somebody might :) >> > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
From: Vlastimil Babka > Sent: 30 November 2021 14:56 > > On 11/23/21 11:18, David Laight wrote: > > From: Vlastimil Babka > >> Sent: 22 November 2021 10:46 > >> > >> On 11/22/21 11:36, Christoph Lameter wrote: > >> > On Mon, 22 Nov 2021, Vlastimil Babka wrote: > >> > > >> >> But it seems there's no reason we couldn't do better? I.e. use the value of > >> >> SLOB_HDR_SIZE only to align the beginning of actual object (and name the > >> >> define different than SLOB_HDR_SIZE). But the size of the header, where we > >> >> store the object lenght could be just a native word - 4 bytes on 32bit, 8 on > >> >> 64bit. The address of the header shouldn't have a reason to be also aligned > >> >> to ARCH_KMALLOC_MINALIGN / ARCH_SLAB_MINALIGN as only SLOB itself processes > >> >> it and not the slab consumers which rely on those alignments? > >> > > >> > Well the best way would be to put it at the end of the object in order to > >> > avoid the alignment problem. This is a particular issue with SLOB because > >> > it allows multiple types of objects in a single page frame. > >> > > >> > If only one type of object would be allowed then the object size etc can > >> > be stored in the page struct. > > > > Or just a single byte that is the index of the associated free list structure. > > For 32bit and for the smaller kmalloc() area it may be reasonable to have > > a separate array indexed by the page of the address. > > > >> > So I guess placement at the beginning cannot be avoided. That in turn runs > >> > into trouble with the DMA requirements on some platforms where the > >> > beginning of the object has to be cache line aligned. > >> > >> It's no problem to have the real beginning of the object aligned, and the > >> prepended header not. > > > > I'm not sure that helps. > > The header can't share a cache line with the previous item (because it > > might be mapped for DMA) so will always take a full cache line. > > So if this is true, then I think we already have a problem with SLOB today > (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d > ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but > older). > > Let's say we are on arm64 where (AFAICS): > ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128 > ARCH_SLAB_MINALIGN = 64 Is that valid? Isn't SLAB being used to implement kmalloc() so the architecture defined alignment must apply? > The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN. > > Let's say we call kmalloc(64) and get a completely fresh page. > In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to > max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128. > It will call slob_alloc() for size of size+minalign=64+128=192, align and > align_offset = 128 > Thus the allocation will use 128 bytes for the header, 64 for the object. > Both the header and object aligned to 128 bytes. > But the remaining 64 bytes of the second 128 bytes will remain free, as the > allocated size is 192 bytes: > > | 128B header, aligned | 64B object | 64B free | rest also free | That is horribly wasteful on memory :-) > If there's another kmalloc allocation, the 128 bytes aligment due to > ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's > fine. But if there's a kmem_cache_alloc() from a cache serving <=64B > objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64 > bytes that share the 128 block where the previous kmalloc allocation lies. If the memory returned by kmem_cache_alloc() can be used for DMA then ARCH_DMA_MINALIGN has to apply to the returned buffers. So, maybe, that cache can't exist? I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple of that size. More particularly the rest of the area can't be allocated to anything else. So it ought to be valid to return the 2nd half of a 128 byte cache line provided the first half isn't written while the allocation is active. But that ARCH_KMALLOC_MINALIGN only applies to 'large' items? Small items only need aligning to the power of 2 below their size. So 8 bytes items only need 8 byte alignment even though a larger item might need (say) 64 byte alignment. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, 30 Nov 2021, Vlastimil Babka wrote: > So either I missed something or we violate the rule that kmalloc() provides > blocks where ARCH_KMALLOC_MINALIGN is not just the alignment of their > beginning but also nothing else touches the N*ARCH_KMALLOC_MINALIGN area > containing the allocated object. Indeed.... The DMA API documentation in the kernel states: Documentation/DMA-API-HOWTO.rst 2) ARCH_KMALLOC_MINALIGN + Architectures must ensure that kmalloc'ed buffer is + DMA-safe. Drivers and subsystems depend on it. If an architecture + isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in + the CPU cache is identical to data in main memory), + ARCH_KMALLOC_MINALIGN must be set so that the memory allocator + makes sure that kmalloc'ed buffer doesn't share a cache line with + the others. See arch/arm/include/asm/cache.h as an example. Note that this is only the case for kmalloc. Not for a slab cache setup separately from the kmalloc array. That is why ARCH_KMALLOC_MINALIGN exists.
On 11/30/21 16:21, David Laight wrote: > From: Vlastimil Babka >> Sent: 30 November 2021 14:56 >> >> On 11/23/21 11:18, David Laight wrote: >> > From: Vlastimil Babka >> > >> > Or just a single byte that is the index of the associated free list structure. >> > For 32bit and for the smaller kmalloc() area it may be reasonable to have >> > a separate array indexed by the page of the address. >> > >> >> > So I guess placement at the beginning cannot be avoided. That in turn runs >> >> > into trouble with the DMA requirements on some platforms where the >> >> > beginning of the object has to be cache line aligned. >> >> >> >> It's no problem to have the real beginning of the object aligned, and the >> >> prepended header not. >> > >> > I'm not sure that helps. >> > The header can't share a cache line with the previous item (because it >> > might be mapped for DMA) so will always take a full cache line. >> >> So if this is true, then I think we already have a problem with SLOB today >> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d >> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but >> older). >> >> Let's say we are on arm64 where (AFAICS): >> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128 >> ARCH_SLAB_MINALIGN = 64 > > Is that valid? > Isn't SLAB being used to implement kmalloc() so the architecture > defined alignment must apply? SLAB is used to implement kmalloc() yes, but that's an implementation detail. I assume that we provide these DMA guarantees to all kmalloc() users as we don't know which will use it for DMA and which not, but if somebody creates their specific SLAB cache, they have to decide explicitly if they are going to use DMA with those objects, and request such alignment if yes. If not, we can use smaller alignment that's only required by e.g. the CPU. >> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN. >> >> Let's say we call kmalloc(64) and get a completely fresh page. >> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to >> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128. >> It will call slob_alloc() for size of size+minalign=64+128=192, align and >> align_offset = 128 >> Thus the allocation will use 128 bytes for the header, 64 for the object. >> Both the header and object aligned to 128 bytes. >> But the remaining 64 bytes of the second 128 bytes will remain free, as the >> allocated size is 192 bytes: >> >> | 128B header, aligned | 64B object | 64B free | rest also free | > > That is horribly wasteful on memory :-) Yes. I don't know how this historically came to be for SLOB, which was supposed to minimize memory usage (at the expense of cpu efficiency). But the point raised in this thread that if we extend this to all kmem_cache_alloc() allocations to make them possible to free with kfree(), we'll make it a lot worse :/ >> If there's another kmalloc allocation, the 128 bytes aligment due to >> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's >> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B >> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64 >> bytes that share the 128 block where the previous kmalloc allocation lies. > > If the memory returned by kmem_cache_alloc() can be used for DMA then > ARCH_DMA_MINALIGN has to apply to the returned buffers. > So, maybe, that cache can't exist? See above, I assume that cache cannot be used for DMA. But if we are trying to protect the kmalloc(64) DMA guarantees, then that shouldn't depend on the guarantees of the second, unrelated allocation? > I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple > of that size. Doesn't seem so. That would indeed fix the problem, assuming it really is a problem (yet seems nobody reported it occuring in practice). > More particularly the rest of the area can't be allocated to anything else. > So it ought to be valid to return the 2nd half of a 128 byte cache line > provided the first half isn't written while the allocation is active. As the allocator cannot know when the first half will be used for DMA by whoever allocated it, we can only assume it can happen at any time, and not return the 2nd half, ever? > But that ARCH_KMALLOC_MINALIGN only applies to 'large' items? > Small items only need aligning to the power of 2 below their size. > So 8 bytes items only need 8 byte alignment even though a larger > item might need (say) 64 byte alignment. But if we never defined such threshold (that would probably have to be arch independent) then we can't start making such assumptions today, as we don't know which kmalloc() users do expect DMA and which not? It would have to be a flag or something. And yeah there is already a __GFP_DMA flag, but it means something a bit different... > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 5954ddf6ee13..31806d5ebeec 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -170,7 +170,11 @@ should be used if a part of the cache might be copied to the userspace. After the cache is created kmem_cache_alloc() and its convenience wrappers can allocate memory from that cache. -When the allocated memory is no longer needed it must be freed. You can -use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and -`kvmalloc`. The slab caches should be freed with kmem_cache_free(). And -don't forget to destroy the cache with kmem_cache_destroy(). +When the allocated memory is no longer needed it must be freed. +Objects allocated by `kmalloc` can be freed with `kfree` or `kvfree`. +Objects allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free` +or also by `kfree` or `kvfree`. +Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`. +Memory allocated by `kvmalloc` can be freed with `kvfree`. +Caches created by `kmem_cache_create` should be freed with +`kmem_cache_destroy`. diff --git a/mm/slob.c b/mm/slob.c index 03deee1e6a94..5cc49102e070 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -25,23 +25,18 @@ * into the free list in address order, so this is effectively an * address-ordered first fit. * - * Above this is an implementation of kmalloc/kfree. Blocks returned - * from kmalloc are prepended with a 4-byte header with the kmalloc size. - * If kmalloc is asked for objects of PAGE_SIZE or larger, it calls - * alloc_pages() directly, allocating compound pages so the page order - * does not have to be separately tracked. - * These objects are detected in kfree() because PageSlab() - * is false for them. + * Blocks that are less than (PAGE_SIZE - SLOB_HDR_SIZE) are prepended with + * a 4-byte header with the size. Larger blocks do not have the header and + * SLOB calls alloc_pages() directly, allocating compound pages so the + * page order does not have to be separately tracked. These objects are + * detected in kfree() because PageSlab() is false for them. * * SLAB is emulated on top of SLOB by simply calling constructors and * destructors for every SLAB allocation. Objects are returned with the * 4-byte alignment unless the SLAB_HWCACHE_ALIGN flag is set, in which * case the low-level allocator will fragment blocks to create the proper - * alignment. Again, objects of page-size or greater are allocated by - * calling alloc_pages(). As SLAB objects know their size, no separate - * size bookkeeping is necessary and there is essentially no allocation - * space overhead, and compound pages aren't needed for multi-page - * allocations. + * alignment. Again, objects of (PAGE_SIZE - SLOB_HDR_SIZE) or greater are + * allocated by calling alloc_pages(). * * NUMA support in SLOB is fairly simplistic, pushing most of the real * logic down to the page allocator, and simply doing the node accounting @@ -88,6 +83,8 @@ typedef s16 slobidx_t; typedef s32 slobidx_t; #endif +#define SLOB_HDR_SIZE max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) + struct slob_block { slobidx_t units; }; @@ -207,12 +204,14 @@ static void *slob_new_pages(gfp_t gfp, int order, int node) return page_address(page); } -static void slob_free_pages(void *b, int order) +static void slob_free_pages(struct page *sp, int order) { - struct page *sp = virt_to_page(b); - - if (current->reclaim_state) - current->reclaim_state->reclaimed_slab += 1 << order; + if (PageSlab(sp)) { + __ClearPageSlab(sp); + page_mapcount_reset(sp); + if (current->reclaim_state) + current->reclaim_state->reclaimed_slab += 1 << order; + } mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, -(PAGE_SIZE << order)); @@ -247,9 +246,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align, /* * 'aligned' will hold the address of the slob block so that the * address 'aligned'+'align_offset' is aligned according to the - * 'align' parameter. This is for kmalloc() which prepends the - * allocated block with its size, so that the block itself is - * aligned when needed. + * 'align' parameter. */ if (align) { aligned = (slob_t *) @@ -298,8 +295,7 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align, /* * slob_alloc: entry point into the slob allocator. */ -static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, - int align_offset) +static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) { struct page *sp; struct list_head *slob_list; @@ -307,6 +303,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, unsigned long flags; bool _unused; + size += SLOB_HDR_SIZE; if (size < SLOB_BREAK1) slob_list = &free_slob_small; else if (size < SLOB_BREAK2) @@ -330,7 +327,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, if (sp->units < SLOB_UNITS(size)) continue; - b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list); + b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &page_removed_from_list); if (!b) continue; @@ -367,31 +364,32 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node, INIT_LIST_HEAD(&sp->slab_list); set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE)); set_slob_page_free(sp, slob_list); - b = slob_page_alloc(sp, size, align, align_offset, &_unused); + b = slob_page_alloc(sp, size, align, SLOB_HDR_SIZE, &_unused); BUG_ON(!b); spin_unlock_irqrestore(&slob_lock, flags); } if (unlikely(gfp & __GFP_ZERO)) memset(b, 0, size); - return b; + /* Write size in the header */ + *(unsigned int *)b = size - SLOB_HDR_SIZE; + return (void *)b + SLOB_HDR_SIZE; } /* * slob_free: entry point into the slob allocator. */ -static void slob_free(void *block, int size) +static void slob_free(void *block) { struct page *sp; - slob_t *prev, *next, *b = (slob_t *)block; + slob_t *prev, *next, *b = block - SLOB_HDR_SIZE; + unsigned int size; slobidx_t units; unsigned long flags; struct list_head *slob_list; - if (unlikely(ZERO_OR_NULL_PTR(block))) - return; - BUG_ON(!size); - - sp = virt_to_page(block); + size = *(unsigned int *)b + SLOB_HDR_SIZE; + BUG_ON(!size || size >= PAGE_SIZE); + sp = virt_to_page(b); units = SLOB_UNITS(size); spin_lock_irqsave(&slob_lock, flags); @@ -401,9 +399,7 @@ static void slob_free(void *block, int size) if (slob_page_free(sp)) clear_slob_page_free(sp); spin_unlock_irqrestore(&slob_lock, flags); - __ClearPageSlab(sp); - page_mapcount_reset(sp); - slob_free_pages(b, 0); + slob_free_pages(sp, 0); return; } @@ -476,36 +472,29 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page) static __always_inline void * __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) { - unsigned int *m; - int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); void *ret; gfp &= gfp_allowed_mask; might_alloc(gfp); - if (size < PAGE_SIZE - minalign) { - int align = minalign; + if (size < PAGE_SIZE - SLOB_HDR_SIZE) { + int align = SLOB_HDR_SIZE; /* * For power of two sizes, guarantee natural alignment for * kmalloc()'d objects. */ if (is_power_of_2(size)) - align = max(minalign, (int) size); + align = max(align, (int) size); if (!size) return ZERO_SIZE_PTR; - m = slob_alloc(size + minalign, gfp, align, node, minalign); - - if (!m) - return NULL; - *m = size; - ret = (void *)m + minalign; + ret = slob_alloc(size, gfp, align, node); trace_kmalloc_node(caller, ret, - size, size + minalign, gfp, node); + size, size + SLOB_HDR_SIZE, gfp, node); } else { unsigned int order = get_order(size); @@ -553,26 +542,17 @@ void kfree(const void *block) kmemleak_free(block); sp = virt_to_page(block); - if (PageSlab(sp)) { - int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); - unsigned int *m = (unsigned int *)(block - align); - slob_free(m, *m + align); - } else { - unsigned int order = compound_order(sp); - mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, - -(PAGE_SIZE << order)); - __free_pages(sp, order); - - } + if (PageSlab(sp)) + slob_free((void *)block); + else + slob_free_pages(sp, compound_order(sp)); } EXPORT_SYMBOL(kfree); -/* can't use ksize for kmem_cache_alloc memory, only kmalloc */ size_t __ksize(const void *block) { struct page *sp; - int align; - unsigned int *m; + unsigned int size; BUG_ON(!block); if (unlikely(block == ZERO_SIZE_PTR)) @@ -582,9 +562,8 @@ size_t __ksize(const void *block) if (unlikely(!PageSlab(sp))) return page_size(sp); - align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); - m = (unsigned int *)(block - align); - return SLOB_UNITS(*m) * SLOB_UNIT; + size = *(unsigned int *)(block - SLOB_HDR_SIZE); + return SLOB_UNITS(size) * SLOB_UNIT; } EXPORT_SYMBOL(__ksize); @@ -606,16 +585,19 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) might_alloc(flags); - if (c->size < PAGE_SIZE) { - b = slob_alloc(c->size, flags, c->align, node, 0); + if (c->size < PAGE_SIZE - SLOB_HDR_SIZE) { + b = slob_alloc(c->size, flags, c->align, node); trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size, - SLOB_UNITS(c->size) * SLOB_UNIT, + SLOB_UNITS(c->size + SLOB_HDR_SIZE) * SLOB_UNIT, flags, node); } else { - b = slob_new_pages(flags, get_order(c->size), node); + unsigned int order = get_order(c->size); + + if (likely(order)) + flags |= __GFP_COMP; + b = slob_new_pages(flags, order, node); trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size, - PAGE_SIZE << get_order(c->size), - flags, node); + PAGE_SIZE << order, flags, node); } if (b && c->ctor) { @@ -647,12 +629,18 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node) EXPORT_SYMBOL(kmem_cache_alloc_node); #endif -static void __kmem_cache_free(void *b, int size) +static void __kmem_cache_free(void *b) { - if (size < PAGE_SIZE) - slob_free(b, size); + struct page *sp; + + if (unlikely(ZERO_OR_NULL_PTR(b))) + return; + + sp = virt_to_page(b); + if (PageSlab(sp)) + slob_free(b); else - slob_free_pages(b, get_order(size)); + slob_free_pages(sp, compound_order(sp)); } static void kmem_rcu_free(struct rcu_head *head) @@ -660,7 +648,7 @@ static void kmem_rcu_free(struct rcu_head *head) struct slob_rcu *slob_rcu = (struct slob_rcu *)head; void *b = (void *)slob_rcu - (slob_rcu->size - sizeof(struct slob_rcu)); - __kmem_cache_free(b, slob_rcu->size); + __kmem_cache_free(b); } void kmem_cache_free(struct kmem_cache *c, void *b) @@ -673,7 +661,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b) slob_rcu->size = c->size; call_rcu(&slob_rcu->head, kmem_rcu_free); } else { - __kmem_cache_free(b, c->size); + __kmem_cache_free(b); } } EXPORT_SYMBOL(kmem_cache_free);
Let's prepend both kmalloc() and kmem_cache_alloc() allocations with the size header. It simplifies the slab API and guarantees that both kmem_cache_alloc() and kmalloc() memory could be freed by kfree(). meminfo right after the system boot, x86-64 on xfs, without the patch: Slab: 34700 kB the same, with the patch: Slab: 35752 kB Link: https://lore.kernel.org/lkml/20210929212347.1139666-1-rkovhaev@gmail.com Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> --- v4: - Drop the align_offset parameter and pass size unchanged to slob_alloc() - Add SLOB_HDR_SIZE to size in slob_alloc() v3: - Add SLOB_HDR_SIZE define - Remove references to minalign - Improve documentation wording v2: - Allocate compound pages in slob_alloc_node() - Use slob_free_pages() in kfree() - Update documentation --- Documentation/core-api/memory-allocation.rst | 12 +- mm/slob.c | 140 +++++++++---------- 2 files changed, 72 insertions(+), 80 deletions(-)