diff mbox series

[v2,2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

Message ID 20190826111627.7505-3-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series guarantee natural alignment for kmalloc() | expand

Commit Message

Vlastimil Babka Aug. 26, 2019, 11:16 a.m. UTC
In most configurations, kmalloc() happens to return naturally aligned (i.e.
aligned to the block size itself) blocks for power of two sizes. That means
some kmalloc() users might unknowingly rely on that alignment, until stuff
breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
and blocks stop being aligned. Then developers have to devise workaround such
as own kmem caches with specified alignment [1], which is not always practical,
as recently evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
variant would not help with code unknowingly relying on the implicit alignment.
For slab implementations it would either require creating more kmalloc caches,
or allocate a larger size and only give back part of it. That would be
wasteful, especially with a generic alignment parameter (in contrast with a
fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc() alignment to
size explicitly guaranteed for power-of-two sizes under all configurations.
What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch. The
  implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
  to redzoning, however SLAB disables redzoning for caches with alignment
  larger than unsigned long long. Practically on at least x86 this includes
  kmalloc caches as they use cache line alignment, which is larger than that.
  Still, this patch ensures alignment on all arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
  this patch, explicit alignment is guaranteed with redzoning as well. This
  will result in more memory being wasted, but that should be acceptable in a
  debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc(). The potential downside is increased fragmentation. While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with difference
  in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/core-api/memory-allocation.rst |  4 ++
 include/linux/slab.h                         |  4 ++
 mm/slab_common.c                             | 11 ++++-
 mm/slob.c                                    | 42 +++++++++++++++-----
 4 files changed, 49 insertions(+), 12 deletions(-)

Comments

Christoph Lameter (Ampere) Aug. 28, 2019, 6:45 p.m. UTC | #1
On Mon, 26 Aug 2019, Vlastimil Babka wrote:

> The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> variant would not help with code unknowingly relying on the implicit alignment.
> For slab implementations it would either require creating more kmalloc caches,
> or allocate a larger size and only give back part of it. That would be
> wasteful, especially with a generic alignment parameter (in contrast with a
> fixed alignment to size).

The additional caches will be detected if similar to existing ones and
merged into one. So the overhead is not that significant.

> Ideally we should provide to mm users what they need without difficult
> workarounds or own reimplementations, so let's make the kmalloc() alignment to
> size explicitly guaranteed for power-of-two sizes under all configurations.

The objection remains that this will create exceptions for the general
notion that all kmalloc caches are aligned to KMALLOC_MINALIGN which may
be suprising and it limits the optimizations that slab allocators may use
for optimizing data use. The SLOB allocator was designed in such a way
that data wastage is limited. The changes here sabotage that goal and show
that future slab allocators may be similarly constrained with the
exceptional alignents implemented. Additional debugging features etc etc
must all support the exceptional alignment requirements.

> * SLUB layout is also unchanged unless redzoning is enabled through
>   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
>   this patch, explicit alignment is guaranteed with redzoning as well. This
>   will result in more memory being wasted, but that should be acceptable in a
>   debugging scenario.

Well ok. That sounds fine (apart from breaking the rules for slab object
alignment).

> * SLOB has no implicit alignment so this patch adds it explicitly for
>   kmalloc(). The potential downside is increased fragmentation. While
>   pathological allocation scenarios are certainly possible, in my testing,
>   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
>   was consumed by slab pages both before and after the patch, with difference
>   in the noise.

This change to slob will cause a significant additional use of memory. The
advertised advantage of SLOB is that *minimal* memory will be used since
it is targeted for embedded systems. Different types of slab objects of
varying sizes can be allocated in the same memory page to reduce
allocation overhead.

Having these exceptional rules for aligning power of two sizes caches
will significantly increase the memory wastage in SLOB.

The result of this patch is just to use more memory to be safe from
certain pathologies where one subsystem was relying on an alignment that
was not specified. That is why this approach should not be called
�natural" but "implicit alignment". The one using the slab cache is not
aware that the slab allocator provides objects aligned in a special way
(which is in general not needed. There seems to be a single pathological
case that needs to be addressed and I thought that was due to some
brokenness in the hardware?).

It is better to ensure that subsystems that require special alignment
explicitly tell the allocator about this.

I still think implicit exceptions to alignments are a bad idea. Those need
to be explicity specified and that is possible using kmem_cache_create().
Matthew Wilcox (Oracle) Aug. 28, 2019, 7:46 p.m. UTC | #2
On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> 
> The objection remains that this will create exceptions for the general
> notion that all kmalloc caches are aligned to KMALLOC_MINALIGN which may

Hmm?  kmalloc caches will be aligned to both KMALLOC_MINALIGN and the
natural alignment of the object.

> be suprising and it limits the optimizations that slab allocators may use
> for optimizing data use. The SLOB allocator was designed in such a way
> that data wastage is limited. The changes here sabotage that goal and show
> that future slab allocators may be similarly constrained with the
> exceptional alignents implemented. Additional debugging features etc etc
> must all support the exceptional alignment requirements.

While I sympathise with the poor programmer who has to write the
fourth implementation of the sl*b interface, it's more for the pain of
picking a new letter than the pain of needing to honour the alignment
of allocations.

There are many places in the kernel which assume alignment.  They break
when it's not supplied.  I believe we have a better overall system if
the MM developers provide stronger guarantees than the MM consumers have
to work around only weak guarantees.

> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> 
> This change to slob will cause a significant additional use of memory. The
> advertised advantage of SLOB is that *minimal* memory will be used since
> it is targeted for embedded systems. Different types of slab objects of
> varying sizes can be allocated in the same memory page to reduce
> allocation overhead.

Did you not read the part where he said the difference was in the noise?

> The result of this patch is just to use more memory to be safe from
> certain pathologies where one subsystem was relying on an alignment that
> was not specified. That is why this approach should not be called
> �natural" but "implicit alignment". The one using the slab cache is not
> aware that the slab allocator provides objects aligned in a special way
> (which is in general not needed. There seems to be a single pathological
> case that needs to be addressed and I thought that was due to some
> brokenness in the hardware?).

It turns out there are lots of places which assume this, including the
pmem driver, the ramdisk driver and a few other similar drivers.

> It is better to ensure that subsystems that require special alignment
> explicitly tell the allocator about this.

But it's not the subsystems which have this limitation which do the
allocation; it's the subsystems who allocate the memory that they then
pass to the subsystems.  So you're forcing communication of these limits
up & down the stack.

> I still think implicit exceptions to alignments are a bad idea. Those need
> to be explicity specified and that is possible using kmem_cache_create().

I swear we covered this last time the topic came up, but XFS would need
to create special slab caches for each size between 512 and PAGE_SIZE.
Potentially larger, depending on whether the MM developers are willing to
guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
aligned block of memory indefinitely.
Dave Chinner Aug. 28, 2019, 10:24 p.m. UTC | #3
On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> > I still think implicit exceptions to alignments are a bad idea. Those need
> > to be explicity specified and that is possible using kmem_cache_create().
> 
> I swear we covered this last time the topic came up, but XFS would need
> to create special slab caches for each size between 512 and PAGE_SIZE.
> Potentially larger, depending on whether the MM developers are willing to
> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
> aligned block of memory indefinitely.

Page size alignment of multi-page heap allocations is ncessary. The
current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
pages and is not page aligned. That causes just as much in way
of alignment problems as unaligned objects in multi-object-per-page
slabs.

As I said in the lastest discussion of this problem on XFS (pmem
devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
slab allocator to give us naturally aligned object or fail if it
can't. I don't care how that gets implemented (e.g. another set of
heap slabs like the -rcl slabs), I just don't want every high level
subsystem that allocates heap memory for IO buffers to have to
implement their own aligned slab caches.

Cheers,

Dave.
Michal Hocko Aug. 29, 2019, 7:39 a.m. UTC | #4
On Wed 28-08-19 12:46:08, Matthew Wilcox wrote:
> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
[...]
> > be suprising and it limits the optimizations that slab allocators may use
> > for optimizing data use. The SLOB allocator was designed in such a way
> > that data wastage is limited. The changes here sabotage that goal and show
> > that future slab allocators may be similarly constrained with the
> > exceptional alignents implemented. Additional debugging features etc etc
> > must all support the exceptional alignment requirements.
> 
> While I sympathise with the poor programmer who has to write the
> fourth implementation of the sl*b interface, it's more for the pain of
> picking a new letter than the pain of needing to honour the alignment
> of allocations.
> 
> There are many places in the kernel which assume alignment.  They break
> when it's not supplied.  I believe we have a better overall system if
> the MM developers provide stronger guarantees than the MM consumers have
> to work around only weak guarantees.

I absolutely agree. A hypothetical benefit of a new implementation
doesn't outweigh the complexity the existing code has to jump over or
worse is not aware of and it is broken silently. My general experience
is that the later is more likely with a large variety of drivers we have
in the tree and odd things they do in general.
Vlastimil Babka Aug. 29, 2019, 7:56 a.m. UTC | #5
On 8/29/19 12:24 AM, Dave Chinner wrote:
> On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
>> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
>>> I still think implicit exceptions to alignments are a bad idea. Those need
>>> to be explicity specified and that is possible using kmem_cache_create().
>>
>> I swear we covered this last time the topic came up, but XFS would need
>> to create special slab caches for each size between 512 and PAGE_SIZE.
>> Potentially larger, depending on whether the MM developers are willing to
>> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
>> aligned block of memory indefinitely.
> 
> Page size alignment of multi-page heap allocations is ncessary. The
> current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
> pages and is not page aligned. That causes just as much in way
> of alignment problems as unaligned objects in multi-object-per-page
> slabs.

Ugh, multi-page (power of two) allocations *at the page allocator level*
simply have to be aligned, as that's how the buddy allocator has always
worked, and it would be madness to try relax that guarantee and require
an explicit flag at this point. The kmalloc wrapper with SLUB will pass
everything above 8KB directly to the page allocator, so that's fine too.
4k and 8k are the only (multi-)page sizes still managed as SLUB objects.
I would say that these sizes are the most striking example that it's
wrong not to align them without extra flags or special API variant.

> As I said in the lastest discussion of this problem on XFS (pmem
> devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
> slab allocator to give us naturally aligned object or fail if it
> can't. I don't care how that gets implemented (e.g. another set of
> heap slabs like the -rcl slabs), I just don't want every high level

Given alignment is orthogonal to -rcl and dma-, would that be another
three sets? Or we assume that dma- would want it always, and complicate
the rules further? Funilly enough, SLOB would be the simplest case here.

> subsystem that allocates heap memory for IO buffers to have to
> implement their own aligned slab caches.

Definitely agree. I still hope we can do that even without a new flag/API.

> Cheers,
> 
> Dave.
>
Dave Chinner Aug. 30, 2019, 12:29 a.m. UTC | #6
On Thu, Aug 29, 2019 at 09:56:13AM +0200, Vlastimil Babka wrote:
> On 8/29/19 12:24 AM, Dave Chinner wrote:
> > On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
> >> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> >>> I still think implicit exceptions to alignments are a bad idea. Those need
> >>> to be explicity specified and that is possible using kmem_cache_create().
> >>
> >> I swear we covered this last time the topic came up, but XFS would need
> >> to create special slab caches for each size between 512 and PAGE_SIZE.
> >> Potentially larger, depending on whether the MM developers are willing to
> >> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
> >> aligned block of memory indefinitely.
> > 
> > Page size alignment of multi-page heap allocations is ncessary. The
> > current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
> > pages and is not page aligned. That causes just as much in way
> > of alignment problems as unaligned objects in multi-object-per-page
> > slabs.
> 
> Ugh, multi-page (power of two) allocations *at the page allocator level*
> simply have to be aligned, as that's how the buddy allocator has always
> worked, and it would be madness to try relax that guarantee and require
> an explicit flag at this point. The kmalloc wrapper with SLUB will pass
> everything above 8KB directly to the page allocator, so that's fine too.
> 4k and 8k are the only (multi-)page sizes still managed as SLUB objects.

On a 4kB page size box, yes.

On a 64kB page size system, 4/8kB allocations are still sub-page
objects and will have alignment issues. Hence right now we can't
assume a 4/8/16/32kB allocation will be page size aligned
anywhere, because they are heap allocations on 64kB page sized
machines.

> I would say that these sizes are the most striking example that it's
> wrong not to align them without extra flags or special API variant.

Yup, just pointing out that they aren't guaranteed alignment right
now on x86-64.

> > As I said in the lastest discussion of this problem on XFS (pmem
> > devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
> > slab allocator to give us naturally aligned object or fail if it
> > can't. I don't care how that gets implemented (e.g. another set of
> > heap slabs like the -rcl slabs), I just don't want every high level
> 
> Given alignment is orthogonal to -rcl and dma-, would that be another
> three sets? Or we assume that dma- would want it always, and complicate
> the rules further? Funilly enough, SLOB would be the simplest case here.

Not my problem. :) All I'm pointing out is that the minimum
functionality we require is specifying individual allocations as
needing alignment. I've just implemented that API in XFS, so
whatever happens in the allocation infrastructure from this point
onwards is really just implementation optimisation for us now....

Cheers,

Dave.
Christoph Lameter (Ampere) Aug. 30, 2019, 5:41 p.m. UTC | #7
On Thu, 29 Aug 2019, Michal Hocko wrote:

> > There are many places in the kernel which assume alignment.  They break
> > when it's not supplied.  I believe we have a better overall system if
> > the MM developers provide stronger guarantees than the MM consumers have
> > to work around only weak guarantees.
>
> I absolutely agree. A hypothetical benefit of a new implementation
> doesn't outweigh the complexity the existing code has to jump over or
> worse is not aware of and it is broken silently. My general experience
> is that the later is more likely with a large variety of drivers we have
> in the tree and odd things they do in general.


The current behavior without special alignment for these caches has been
in the wild for over a decade. And this is now coming up?

There is one case now it seems with a broken hardware that has issues and
we now move to an alignment requirement from the slabs with exceptions and
this and that?

If there is an exceptional alignment requirement then that needs to be
communicated to the allocator. A special flag or create a special
kmem_cache or something.
Matthew Wilcox (Oracle) Sept. 1, 2019, 12:52 a.m. UTC | #8
On Fri, Aug 30, 2019 at 05:41:46PM +0000, Christopher Lameter wrote:
> On Thu, 29 Aug 2019, Michal Hocko wrote:
> > > There are many places in the kernel which assume alignment.  They break
> > > when it's not supplied.  I believe we have a better overall system if
> > > the MM developers provide stronger guarantees than the MM consumers have
> > > to work around only weak guarantees.
> >
> > I absolutely agree. A hypothetical benefit of a new implementation
> > doesn't outweigh the complexity the existing code has to jump over or
> > worse is not aware of and it is broken silently. My general experience
> > is that the later is more likely with a large variety of drivers we have
> > in the tree and odd things they do in general.
> 
> The current behavior without special alignment for these caches has been
> in the wild for over a decade. And this is now coming up?

In the wild ... and rarely enabled.  When it is enabled, it may or may
not be noticed as data corruption, or tripping other debugging asserts.
Users then turn off the rare debugging option.

> There is one case now it seems with a broken hardware that has issues and
> we now move to an alignment requirement from the slabs with exceptions and
> this and that?

Perhaps you could try reading what hasa been written instead of sticking
to a strawman of your own invention?

> If there is an exceptional alignment requirement then that needs to be
> communicated to the allocator. A special flag or create a special
> kmem_cache or something.

The only way I'd agree to that is if we deliberately misalign every
allocation that doesn't have this special flag set.  Because right now,
breakage happens everywhere when these debug options are enabled, and
the very people who need to be helped are being hurt by the debugging.
Christoph Lameter (Ampere) Sept. 3, 2019, 8:13 p.m. UTC | #9
On Sat, 31 Aug 2019, Matthew Wilcox wrote:

> > The current behavior without special alignment for these caches has been
> > in the wild for over a decade. And this is now coming up?
>
> In the wild ... and rarely enabled.  When it is enabled, it may or may
> not be noticed as data corruption, or tripping other debugging asserts.
> Users then turn off the rare debugging option.

Its enabled in all full debug session as far as I know. Fedora for
example has been running this for ages to find breakage in device drivers
etc etc.

> > If there is an exceptional alignment requirement then that needs to be
> > communicated to the allocator. A special flag or create a special
> > kmem_cache or something.
>
> The only way I'd agree to that is if we deliberately misalign every
> allocation that doesn't have this special flag set.  Because right now,
> breakage happens everywhere when these debug options are enabled, and
> the very people who need to be helped are being hurt by the debugging.

That is customarily occurring for testing by adding "slub_debug" to the
kernel commandline (or adding debug kernel options) and since my
information is that this is done frequently (and has been for over a
decade now) I am having a hard time believing the stories of great
breakage here. These drivers were not tested with debugging on before?
Never ran with a debug kernel?
Matthew Wilcox (Oracle) Sept. 3, 2019, 8:53 p.m. UTC | #10
On Tue, Sep 03, 2019 at 08:13:45PM +0000, Christopher Lameter wrote:
> On Sat, 31 Aug 2019, Matthew Wilcox wrote:
> 
> > > The current behavior without special alignment for these caches has been
> > > in the wild for over a decade. And this is now coming up?
> >
> > In the wild ... and rarely enabled.  When it is enabled, it may or may
> > not be noticed as data corruption, or tripping other debugging asserts.
> > Users then turn off the rare debugging option.
> 
> Its enabled in all full debug session as far as I know. Fedora for
> example has been running this for ages to find breakage in device drivers
> etc etc.

Are you telling me nobody uses the ramdisk driver on fedora?  Because
that's one of the affected drivers.

> > > If there is an exceptional alignment requirement then that needs to be
> > > communicated to the allocator. A special flag or create a special
> > > kmem_cache or something.
> >
> > The only way I'd agree to that is if we deliberately misalign every
> > allocation that doesn't have this special flag set.  Because right now,
> > breakage happens everywhere when these debug options are enabled, and
> > the very people who need to be helped are being hurt by the debugging.
> 
> That is customarily occurring for testing by adding "slub_debug" to the
> kernel commandline (or adding debug kernel options) and since my
> information is that this is done frequently (and has been for over a
> decade now) I am having a hard time believing the stories of great
> breakage here. These drivers were not tested with debugging on before?
> Never ran with a debug kernel?

Whatever is being done is clearly not enough to trigger the bug.  So how
about it?  Create an option to slab/slub to always return misaligned
memory.
Christoph Hellwig Sept. 4, 2019, 5:19 a.m. UTC | #11
On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
> > Its enabled in all full debug session as far as I know. Fedora for
> > example has been running this for ages to find breakage in device drivers
> > etc etc.
> 
> Are you telling me nobody uses the ramdisk driver on fedora?  Because
> that's one of the affected drivers.

For pmem/brd misaligned memory alone doesn't seem to be the problem.
Misaligned memory that cross a page barrier is.  And at least XFS
before my log recovery changes only used kmalloc for smaller than
page size allocation, so this case probably didn't hit.  But other
cases where alignment and not just not crossing a page boundary
occurred and we had problems with those before.  It just too a long
time for people to root cause them.
Ming Lei Sept. 4, 2019, 6:40 a.m. UTC | #12
On Wed, Sep 04, 2019 at 07:19:33AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
> > > Its enabled in all full debug session as far as I know. Fedora for
> > > example has been running this for ages to find breakage in device drivers
> > > etc etc.
> > 
> > Are you telling me nobody uses the ramdisk driver on fedora?  Because
> > that's one of the affected drivers.
> 
> For pmem/brd misaligned memory alone doesn't seem to be the problem.
> Misaligned memory that cross a page barrier is.  And at least XFS
> before my log recovery changes only used kmalloc for smaller than
> page size allocation, so this case probably didn't hit.

BTW, does sl[aou]b guarantee that smaller than page size allocation via kmalloc()
won't cross page boundary any time?

Thanks,
Ming
Vlastimil Babka Sept. 4, 2019, 7:20 a.m. UTC | #13
On 9/4/19 8:40 AM, Ming Lei wrote:
> On Wed, Sep 04, 2019 at 07:19:33AM +0200, Christoph Hellwig wrote:
>> On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
>>>> Its enabled in all full debug session as far as I know. Fedora for
>>>> example has been running this for ages to find breakage in device drivers
>>>> etc etc.
>>>
>>> Are you telling me nobody uses the ramdisk driver on fedora?  Because
>>> that's one of the affected drivers.
>>
>> For pmem/brd misaligned memory alone doesn't seem to be the problem.
>> Misaligned memory that cross a page barrier is.  And at least XFS
>> before my log recovery changes only used kmalloc for smaller than
>> page size allocation, so this case probably didn't hit.
> 
> BTW, does sl[aou]b guarantee that smaller than page size allocation via kmalloc()
> won't cross page boundary any time?

AFAIK no, it's the same as with the alignment. After this patch, that
guarantee would follow from the alignment one for power of two sizes,
but sizes 96 and 192 could still cross page boundary.

> Thanks,
> Ming
>
Christoph Lameter (Ampere) Sept. 4, 2019, 7:31 p.m. UTC | #14
On Tue, 3 Sep 2019, Matthew Wilcox wrote

> > Its enabled in all full debug session as far as I know. Fedora for
> > example has been running this for ages to find breakage in device drivers
> > etc etc.
>
> Are you telling me nobody uses the ramdisk driver on fedora?  Because
> that's one of the affected drivers.

How do I know? I dont run these tests.

> > decade now) I am having a hard time believing the stories of great
> > breakage here. These drivers were not tested with debugging on before?
> > Never ran with a debug kernel?
>
> Whatever is being done is clearly not enough to trigger the bug.  So how
> about it?  Create an option to slab/slub to always return misaligned
> memory.

It already exists

Add "slub_debug" on the kernel command line.
Vlastimil Babka Sept. 23, 2019, 4:36 p.m. UTC | #15
On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> In most configurations, kmalloc() happens to return naturally aligned (i.e.
> aligned to the block size itself) blocks for power of two sizes. That means
> some kmalloc() users might unknowingly rely on that alignment, until stuff
> breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> and blocks stop being aligned. Then developers have to devise workaround such
> as own kmem caches with specified alignment [1], which is not always practical,
> as recently evidenced in [2].
> 
> The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> variant would not help with code unknowingly relying on the implicit alignment.
> For slab implementations it would either require creating more kmalloc caches,
> or allocate a larger size and only give back part of it. That would be
> wasteful, especially with a generic alignment parameter (in contrast with a
> fixed alignment to size).
> 
> Ideally we should provide to mm users what they need without difficult
> workarounds or own reimplementations, so let's make the kmalloc() alignment to
> size explicitly guaranteed for power-of-two sizes under all configurations.
> What this means for the three available allocators?
> 
> * SLAB object layout happens to be mostly unchanged by the patch. The
>   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
>   to redzoning, however SLAB disables redzoning for caches with alignment
>   larger than unsigned long long. Practically on at least x86 this includes
>   kmalloc caches as they use cache line alignment, which is larger than that.
>   Still, this patch ensures alignment on all arches and cache sizes.
> 
> * SLUB layout is also unchanged unless redzoning is enabled through
>   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
>   this patch, explicit alignment is guaranteed with redzoning as well. This
>   will result in more memory being wasted, but that should be acceptable in a
>   debugging scenario.
> 
> * SLOB has no implicit alignment so this patch adds it explicitly for
>   kmalloc(). The potential downside is increased fragmentation. While
>   pathological allocation scenarios are certainly possible, in my testing,
>   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
>   was consumed by slab pages both before and after the patch, with difference
>   in the noise.
> 
> [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> [3] https://lwn.net/Articles/787740/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

So if anyone thinks this is a good idea, please express it (preferably
in a formal way such as Acked-by), otherwise it seems the patch will be
dropped (due to a private NACK, apparently).

Otherwise I don't think there can be an objective conclusion. On the one
hand we avoid further problems and workarounds due to misalignment (or
objects allocated beyond page boundary, which was only recently
mentioned), on the other hand we potentially make future changes to
SLAB/SLUB or hypotetical new implementation either more complicated, or
less effective due to extra fragmentation. Different people can have
different opinions on what's more important.

Let me however explain why I think we don't have to fear the future
implementation complications that much. There was an argument IIRC that
extra non-debug metadata could start to be prepended/appended to an
object in the future (i.e. RCU freeing head?).

1) Caches can be already created with explicit alignment, so a naive
pre/appending implementation would already waste memory on such caches.
2) Even without explicit alignment, a single slab cache for 512k objects
with few bytes added to each object would waste almost 512k as the
objects wouldn't fit precisely in an (order-X) page. The percentage
wasted depends on X.
3) Roman recently posted a patchset [1] that basically adds a cgroup
pointer to each object. The implementation doesn't append it to objects
naively however, but adds a separately allocated array. Alignment is
thus unchanged.

[1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/
David Sterba Sept. 23, 2019, 5:17 p.m. UTC | #16
On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

As a user of the allocator interface in filesystem, I'd like to see a
more generic way to address the alignment guarantees so we don't have to
apply workarounds like 3acd48507dc43eeeb each time we find that we
missed something. (Where 'missed' might be another sort of weird memory
corruption hard to trigger.)

The workaround got applied because I was not sure about the timeframe of
merge of this patch, also to remove pressure for merge in case there are
more private acks and nacks to be sent. In the end I'd be fine with
reverting the workaround in order to use the generic code again.

Thanks.
Darrick J. Wong Sept. 23, 2019, 5:51 p.m. UTC | #17
On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > So if anyone thinks this is a good idea, please express it (preferably
> > in a formal way such as Acked-by), otherwise it seems the patch will be
> > dropped (due to a private NACK, apparently).

Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
privilege of gutting a patch via private NAK without any of that open
development discussion incovenience. <grumble>

As far as XFS is concerned I merged Dave's series that checks the
alignment of io memory allocations and falls back to vmalloc if the
alignment won't work, because I got tired of scrolling past the endless
discussion and bug reports and inaction spanning months.

Now this private NAK stuff helps me feel vindicated for merging it
despite my misgivings because now I can declare that "XFS will just work
around all the stupid broken sh*t it finds in the rest of the kernel".

--D

> As a user of the allocator interface in filesystem, I'd like to see a
> more generic way to address the alignment guarantees so we don't have to
> apply workarounds like 3acd48507dc43eeeb each time we find that we
> missed something. (Where 'missed' might be another sort of weird memory
> corruption hard to trigger.)
> 
> The workaround got applied because I was not sure about the timeframe of
> merge of this patch, also to remove pressure for merge in case there are
> more private acks and nacks to be sent. In the end I'd be fine with
> reverting the workaround in order to use the generic code again.
> 
> Thanks.
Roman Gushchin Sept. 23, 2019, 5:54 p.m. UTC | #18
On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > aligned to the block size itself) blocks for power of two sizes. That means
> > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > and blocks stop being aligned. Then developers have to devise workaround such
> > as own kmem caches with specified alignment [1], which is not always practical,
> > as recently evidenced in [2].
> > 
> > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > variant would not help with code unknowingly relying on the implicit alignment.
> > For slab implementations it would either require creating more kmalloc caches,
> > or allocate a larger size and only give back part of it. That would be
> > wasteful, especially with a generic alignment parameter (in contrast with a
> > fixed alignment to size).
> > 
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> > What this means for the three available allocators?
> > 
> > * SLAB object layout happens to be mostly unchanged by the patch. The
> >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> >   to redzoning, however SLAB disables redzoning for caches with alignment
> >   larger than unsigned long long. Practically on at least x86 this includes
> >   kmalloc caches as they use cache line alignment, which is larger than that.
> >   Still, this patch ensures alignment on all arches and cache sizes.
> > 
> > * SLUB layout is also unchanged unless redzoning is enabled through
> >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> >   this patch, explicit alignment is guaranteed with redzoning as well. This
> >   will result in more memory being wasted, but that should be acceptable in a
> >   debugging scenario.
> > 
> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_787740_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=UUX1YoPTOOycNowHuRP2ZnqwSwZFjAFrkQFrqstidZ0&s=Kt_XTKlh2qxbC_7ME44MV3_QWFVRHlI1p2EQZYP0uqY&e= 
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).
> 
> Otherwise I don't think there can be an objective conclusion. On the one
> hand we avoid further problems and workarounds due to misalignment (or
> objects allocated beyond page boundary, which was only recently
> mentioned), on the other hand we potentially make future changes to
> SLAB/SLUB or hypotetical new implementation either more complicated, or
> less effective due to extra fragmentation. Different people can have
> different opinions on what's more important.
> 
> Let me however explain why I think we don't have to fear the future
> implementation complications that much. There was an argument IIRC that
> extra non-debug metadata could start to be prepended/appended to an
> object in the future (i.e. RCU freeing head?).
> 
> 1) Caches can be already created with explicit alignment, so a naive
> pre/appending implementation would already waste memory on such caches.
> 2) Even without explicit alignment, a single slab cache for 512k objects
> with few bytes added to each object would waste almost 512k as the
> objects wouldn't fit precisely in an (order-X) page. The percentage
> wasted depends on X.
> 3) Roman recently posted a patchset [1] that basically adds a cgroup
> pointer to each object. The implementation doesn't append it to objects
> naively however, but adds a separately allocated array. Alignment is
> thus unchanged.

To be fair here, we *might* want to put this pointer just after/before
the object to reduce the number of cache misses. I've put it into
a separate place mostly for simplicity reasons.

It's not an objection though, just a note.

Thanks!

> 
> [1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/
>
Matthew Wilcox (Oracle) Sept. 23, 2019, 6:57 p.m. UTC | #19
On Mon, Aug 26, 2019 at 01:16:27PM +0200, Vlastimil Babka wrote:
> @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel
>  configuration, but it is a good practice to use `kmalloc` for objects
>  smaller than page size.
>  
> +The address of a chunk allocated with `kmalloc` is aligned to at least
> +ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the

"For sizes which are a power of two, the"

> +alignment is also guaranteed to be at least to the respective size.

s/to //

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Lameter (Ampere) Sept. 24, 2019, 8:47 p.m. UTC | #20
On Mon, 23 Sep 2019, Darrick J. Wong wrote:

> On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > > So if anyone thinks this is a good idea, please express it (preferably
> > > in a formal way such as Acked-by), otherwise it seems the patch will be
> > > dropped (due to a private NACK, apparently).
>
> Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> privilege of gutting a patch via private NAK without any of that open
> development discussion incovenience. <grumble>

There was a public discussion about this issue and from what I can tell
the outcome was that the allocator already provides what you want. Which
was a mechanism to misalign objects and detect these issues. This
mechanism has been in use for over a decade.
Matthew Wilcox (Oracle) Sept. 24, 2019, 8:51 p.m. UTC | #21
On Tue, Sep 24, 2019 at 08:47:52PM +0000, cl@linux.com wrote:
> On Mon, 23 Sep 2019, Darrick J. Wong wrote:
> 
> > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > > On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > > > So if anyone thinks this is a good idea, please express it (preferably
> > > > in a formal way such as Acked-by), otherwise it seems the patch will be
> > > > dropped (due to a private NACK, apparently).
> >
> > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > privilege of gutting a patch via private NAK without any of that open
> > development discussion incovenience. <grumble>
> 
> There was a public discussion about this issue and from what I can tell
> the outcome was that the allocator already provides what you want. Which
> was a mechanism to misalign objects and detect these issues. This
> mechanism has been in use for over a decade.

You missed the important part, which was *ENABLED BY DEFAULT*.  People
who are enabling a debugging option to debug their issues, should not
have to first debug all the other issues that enabling that debugging
option uncovers!
Christoph Lameter (Ampere) Sept. 24, 2019, 8:52 p.m. UTC | #22
On Mon, 23 Sep 2019, David Sterba wrote:

> As a user of the allocator interface in filesystem, I'd like to see a
> more generic way to address the alignment guarantees so we don't have to
> apply workarounds like 3acd48507dc43eeeb each time we find that we
> missed something. (Where 'missed' might be another sort of weird memory
> corruption hard to trigger.)

The alignment guarantees are clearly documented and objects are misaligned
in debugging kernels.

Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
debug kernel or full debugging on until it hit mainline. Not good.

The consequence for the lack of proper testing is to make the production
kernel contain the debug measures?
Christoph Lameter (Ampere) Sept. 24, 2019, 8:55 p.m. UTC | #23
n Tue, 24 Sep 2019, Matthew Wilcox wrote:

> > There was a public discussion about this issue and from what I can tell
> > the outcome was that the allocator already provides what you want. Which
> > was a mechanism to misalign objects and detect these issues. This
> > mechanism has been in use for over a decade.
>
> You missed the important part, which was *ENABLED BY DEFAULT*.  People
> who are enabling a debugging option to debug their issues, should not
> have to first debug all the other issues that enabling that debugging
> option uncovers!

Why would you have to debug all other issues? You could put your patch on
top of the latest stable or distro kernel for testing.

And I thought the rc phase was there for everyone to work on the bugs of
each other?
Vlastimil Babka Sept. 24, 2019, 9:19 p.m. UTC | #24
On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
>> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
>>> So if anyone thinks this is a good idea, please express it (preferably
>>> in a formal way such as Acked-by), otherwise it seems the patch will be
>>> dropped (due to a private NACK, apparently).
> 
> Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> privilege of gutting a patch via private NAK without any of that open
> development discussion incovenience. <grumble>
> 
> As far as XFS is concerned I merged Dave's series that checks the
> alignment of io memory allocations and falls back to vmalloc if the
> alignment won't work, because I got tired of scrolling past the endless
> discussion and bug reports and inaction spanning months.

I think it's a big fail of kmalloc API that you have to do that, and
especially with vmalloc, which has the overhead of setting up page
tables, and it's a waste for allocation requests smaller than page size.
I wish we could have nice things.
Dave Chinner Sept. 24, 2019, 9:53 p.m. UTC | #25
On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote:
> On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> >>> So if anyone thinks this is a good idea, please express it (preferably
> >>> in a formal way such as Acked-by), otherwise it seems the patch will be
> >>> dropped (due to a private NACK, apparently).
> > 
> > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > privilege of gutting a patch via private NAK without any of that open
> > development discussion incovenience. <grumble>
> > 
> > As far as XFS is concerned I merged Dave's series that checks the
> > alignment of io memory allocations and falls back to vmalloc if the
> > alignment won't work, because I got tired of scrolling past the endless
> > discussion and bug reports and inaction spanning months.
> 
> I think it's a big fail of kmalloc API that you have to do that, and
> especially with vmalloc, which has the overhead of setting up page
> tables, and it's a waste for allocation requests smaller than page size.
> I wish we could have nice things.

I don't think the problem here is the code. The problem here is that
we have a dysfunctional development community and there are no
processes we can follow to ensure architectural problems in core
subsystems are addressed in a timely manner...

And this criticism isn't just of the mm/ here - this alignment
problem is exacerbated by exactly the same issue on the block layer
side. i.e. the block layer and drivers have -zero- bounds checking
to catch these sorts of things and the block layer maintainer will
not accept patches for runtime checks that would catch these issues
and make them instantly visible to us.

These are not code problems: we can fix the problems with code (and
I have done so to demonstrate "this is how we do what you say is
impossible").  The problem here is people in positions of
control/power are repeatedly demonstrating an inability to
compromise to reach a solution that works for everyone.

It's far better for us just to work around bullshit like this in XFS
now, then when the core subsystems get they act together years down
the track we can remove the workaround from XFS. Users don't care
how we fix the problem, they just want it fixed. If that means we
have to route around dysfunctional developer groups, then we'll just
have to do that....

Cheers,

Dave.
Darrick J. Wong Sept. 24, 2019, 10:21 p.m. UTC | #26
On Wed, Sep 25, 2019 at 07:53:53AM +1000, Dave Chinner wrote:
> On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote:
> > On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> > > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > >>> So if anyone thinks this is a good idea, please express it (preferably
> > >>> in a formal way such as Acked-by), otherwise it seems the patch will be
> > >>> dropped (due to a private NACK, apparently).
> > > 
> > > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > > privilege of gutting a patch via private NAK without any of that open
> > > development discussion incovenience. <grumble>
> > > 
> > > As far as XFS is concerned I merged Dave's series that checks the
> > > alignment of io memory allocations and falls back to vmalloc if the
> > > alignment won't work, because I got tired of scrolling past the endless
> > > discussion and bug reports and inaction spanning months.
> > 
> > I think it's a big fail of kmalloc API that you have to do that, and
> > especially with vmalloc, which has the overhead of setting up page
> > tables, and it's a waste for allocation requests smaller than page size.
> > I wish we could have nice things.
> 
> I don't think the problem here is the code. The problem here is that
> we have a dysfunctional development community and there are no
> processes we can follow to ensure architectural problems in core
> subsystems are addressed in a timely manner...
> 
> And this criticism isn't just of the mm/ here - this alignment
> problem is exacerbated by exactly the same issue on the block layer
> side. i.e. the block layer and drivers have -zero- bounds checking
> to catch these sorts of things and the block layer maintainer will
> not accept patches for runtime checks that would catch these issues
> and make them instantly visible to us.
> 
> These are not code problems: we can fix the problems with code (and
> I have done so to demonstrate "this is how we do what you say is
> impossible").  The problem here is people in positions of
> control/power are repeatedly demonstrating an inability to
> compromise to reach a solution that works for everyone.
> 
> It's far better for us just to work around bullshit like this in XFS
> now, then when the core subsystems get they act together years down
> the track we can remove the workaround from XFS. Users don't care
> how we fix the problem, they just want it fixed. If that means we
> have to route around dysfunctional developer groups, then we'll just
> have to do that....

Seconded.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Andrew Morton Sept. 24, 2019, 11:54 p.m. UTC | #27
On Tue, 24 Sep 2019 20:52:52 +0000 (UTC) cl@linux.com wrote:

> On Mon, 23 Sep 2019, David Sterba wrote:
> 
> > As a user of the allocator interface in filesystem, I'd like to see a
> > more generic way to address the alignment guarantees so we don't have to
> > apply workarounds like 3acd48507dc43eeeb each time we find that we
> > missed something. (Where 'missed' might be another sort of weird memory
> > corruption hard to trigger.)
> 
> The alignment guarantees are clearly documented and objects are misaligned
> in debugging kernels.
> 
> Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
> debug kernel or full debugging on until it hit mainline. Not good.
> 
> The consequence for the lack of proper testing is to make the production
> kernel contain the debug measures?

This isn't a debug measure - it's making the interface do that which
people evidently expect it to do.  Minor point.

I agree it's a bit regrettable to do this but it does appear that the
change will make the kernel overall a better place given the reality of
kernel development.

Given this, have you reviewed the patch for overall implementation
correctness?

I'm wondering if we can avoid at least some of the patch's overhead if
slab debugging is disabled - the allocators are already returning
suitably aligned memory, so why add the new code in that case?
Vlastimil Babka Sept. 25, 2019, 7:17 a.m. UTC | #28
On 9/25/19 1:54 AM, Andrew Morton wrote:
> On Tue, 24 Sep 2019 20:52:52 +0000 (UTC) cl@linux.com wrote:
> 
>> On Mon, 23 Sep 2019, David Sterba wrote:
>>
>>> As a user of the allocator interface in filesystem, I'd like to see a
>>> more generic way to address the alignment guarantees so we don't have to
>>> apply workarounds like 3acd48507dc43eeeb each time we find that we
>>> missed something. (Where 'missed' might be another sort of weird memory
>>> corruption hard to trigger.)
>>
>> The alignment guarantees are clearly documented and objects are misaligned
>> in debugging kernels.
>>
>> Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
>> debug kernel or full debugging on until it hit mainline. Not good.
>>
>> The consequence for the lack of proper testing is to make the production
>> kernel contain the debug measures?
> 
> This isn't a debug measure - it's making the interface do that which
> people evidently expect it to do.  Minor point.

Yes, detecting issues due to misalignment is one thing, but then there
are the workarounds necessary to achieve it (for multiple sizes, so no
single kmem_cache_create(..., alignment)), as XFS folks demonstrated.

> I agree it's a bit regrettable to do this but it does appear that the
> change will make the kernel overall a better place given the reality of
> kernel development.

Thanks.

> Given this, have you reviewed the patch for overall implementation
> correctness?
> 
> I'm wondering if we can avoid at least some of the patch's overhead if
> slab debugging is disabled - the allocators are already returning
> suitably aligned memory, so why add the new code in that case?

Most of the new code is for SLOB, which has no debugging and yet
misaligns. For SLUB and SLAB, it's just passing alignment argument to
kmem_cache_create() for kmalloc caches, which means just extra few
instructions during boot, and no extra code during kmalloc/kfree itself.
Christoph Lameter (Ampere) Sept. 26, 2019, 12:14 a.m. UTC | #29
On Tue, 24 Sep 2019, Andrew Morton wrote:

> I agree it's a bit regrettable to do this but it does appear that the
> change will make the kernel overall a better place given the reality of
> kernel development.

No it wont.

- It will only work for special cases like the kmalloc array
without extras like metadata at the end of objects.

- It will be an inconsistency in the alignments provided by the allocator.

- It will cause us in the future to constantly consider these exceptional
alignments in the maintenance of the allocators.

- These alignments are only needed in exceptional cases but with the patch
we will provide the alignment by default even if the allocating subsystem
does not need it.

- We have mechanisms to detect alignment problems using debug kernels and
debug options that have been available for years. These were not used for
testing in these cases it seems before the patches hit mainline. Once in
mainly someone ran a debug kernel and found the issue.

> Given this, have you reviewed the patch for overall implementation
> correctness?

Yes, the patch is fine.

> I'm wondering if we can avoid at least some of the patch's overhead if
> slab debugging is disabled - the allocators are already returning
> suitably aligned memory, so why add the new code in that case?

As far as I know this patch is not needed given that we have had the
standards for alignments for a long time now.

Why would the allocators provide specially aligned memory just based on
the size of an object? This is weird and unexpected behavior.
Christoph Lameter (Ampere) Sept. 26, 2019, 12:16 a.m. UTC | #30
On Wed, 25 Sep 2019, Vlastimil Babka wrote:

> Most of the new code is for SLOB, which has no debugging and yet
> misaligns. For SLUB and SLAB, it's just passing alignment argument to
> kmem_cache_create() for kmalloc caches, which means just extra few
> instructions during boot, and no extra code during kmalloc/kfree itself.

SLOB follows the standards for alignments in slab allocators and will
correctly align if you ask the allocator for a properly aligned object.
Vlastimil Babka Sept. 26, 2019, 7:41 a.m. UTC | #31
On 9/26/19 2:14 AM, Christopher Lameter wrote:
> On Tue, 24 Sep 2019, Andrew Morton wrote:
> 
>> I agree it's a bit regrettable to do this but it does appear that the
>> change will make the kernel overall a better place given the reality of
>> kernel development.
> 
> No it wont.
> 
> - It will only work for special cases like the kmalloc array
> without extras like metadata at the end of objects.

I don't understand what you mean here? The kmalloc caches are special
because they don't have metadata at the end of objects? Others do?

> - It will be an inconsistency in the alignments provided by the allocator.

I don't see a scenario where this will cause a kmalloc user problems.
Can you describe a scenario where a kmalloc users would have some
assumptions about alignment, but due to this change, those assumptions
will be incorrect, and how exactly would it break their code?

> - It will cause us in the future to constantly consider these exceptional
> alignments in the maintenance of the allocators.

Caches can be already created with explicit alignment. This patch just
means there are more of them.

> - These alignments are only needed in exceptional cases but with the patch
> we will provide the alignment by default even if the allocating subsystem
> does not need it.

True. This is where we have to make the decision whether to make things
simpler for those that don't realize they need the alignment, and
whether that's worth the cost. We have evidence of those cases, and the
cost is currently zero in the common cases (SLAB, SLUB without debug
runtime-enabled).

> - We have mechanisms to detect alignment problems using debug kernels and
> debug options that have been available for years. These were not used for
> testing in these cases it seems before the patches hit mainline. Once in
> mainly someone ran a debug kernel and found the issue.

Debugging options are useful if you know there's a bug and you want to
find it. AFAIK the various bots/CIs that do e.g. randconfig, or enable
debug options explicitly, run those kernels in a VM, so I guess that's
why potential breakage due to alignment can lurk in a hw-specific driver.

>> Given this, have you reviewed the patch for overall implementation
>> correctness?
> 
> Yes, the patch is fine.
> 
>> I'm wondering if we can avoid at least some of the patch's overhead if
>> slab debugging is disabled - the allocators are already returning
>> suitably aligned memory, so why add the new code in that case?
> 
> As far as I know this patch is not needed given that we have had the
> standards for alignments for a long time now.
> 
> Why would the allocators provide specially aligned memory just based on
> the size of an object? This is weird and unexpected behavior.

For some, it's expected.
David Sterba Sept. 26, 2019, 1:02 p.m. UTC | #32
On Tue, Sep 24, 2019 at 08:55:02PM +0000, cl@linux.com wrote:
> n Tue, 24 Sep 2019, Matthew Wilcox wrote:
> 
> > > There was a public discussion about this issue and from what I can tell
> > > the outcome was that the allocator already provides what you want. Which
> > > was a mechanism to misalign objects and detect these issues. This
> > > mechanism has been in use for over a decade.
> >
> > You missed the important part, which was *ENABLED BY DEFAULT*.  People
> > who are enabling a debugging option to debug their issues, should not
> > have to first debug all the other issues that enabling that debugging
> > option uncovers!
> 
> Why would you have to debug all other issues? You could put your patch on
> top of the latest stable or distro kernel for testing.

This does not work in development branches. They're based on some stable
point but otherwise there's a lot of new code that usually has bugs and
it's quite important be able to understand where the bug comes from.

And the debugging instrumentation is there to add more sanity checks and
canaries to catch overflows, assertions etc. If it's unreliable, then
there's no point using it during development, IOW fix all bugs first and
then see if there are more left after turning the debugging.
Christoph Lameter (Ampere) Sept. 28, 2019, 1:12 a.m. UTC | #33
On Thu, 26 Sep 2019, Vlastimil Babka wrote:

> > - It will only work for special cases like the kmalloc array
> > without extras like metadata at the end of objects.
>
> I don't understand what you mean here? The kmalloc caches are special
> because they don't have metadata at the end of objects? Others do?

Yes.

> > - These alignments are only needed in exceptional cases but with the patch
> > we will provide the alignment by default even if the allocating subsystem
> > does not need it.
>
> True. This is where we have to make the decision whether to make things
> simpler for those that don't realize they need the alignment, and
> whether that's worth the cost. We have evidence of those cases, and the
> cost is currently zero in the common cases (SLAB, SLUB without debug
> runtime-enabled).

The cost is zero for a particular layout of the objects in a page using a
particular allocator and hardware configuration.

However, the layout may be different due to another allocator that prefers
to arrange things differently (SLOB puts multiple objects of different
types in the same page to save memory), if we need to add data to these
objects (debugging info, new metadata about the object, maybe the memcg
pointer, maybe other things that may come up), or other innovative
approaches (such as putting data of different kmem caches that are
commonly used together in the same page to improve locality).

The cost is an unnecessary petrification of the data layout of the memory
allocators.

> > - We have mechanisms to detect alignment problems using debug kernels and
> > debug options that have been available for years. These were not used for
> > testing in these cases it seems before the patches hit mainline. Once in
> > mainly someone ran a debug kernel and found the issue.
>
> Debugging options are useful if you know there's a bug and you want to
> find it. AFAIK the various bots/CIs that do e.g. randconfig, or enable
> debug options explicitly, run those kernels in a VM, so I guess that's
> why potential breakage due to alignment can lurk in a hw-specific driver.

That is not my experience. You need to run debugging to verify that a
patch does not cause locking problems, memory corruption etc etc. And
upstream code is tested by various people with debugging kernels so they
will locate the bugs that others introduce. This is usually not because
there was a focus on a particular bug. If you have a hw specific thing
that is not generally tested and skip the debugging tests well yes then we
have a problem.

What I have seen with developers is that they feel the debugging steps are
unnecessary for conveniences sake. I have seen build environments that had
proper steps for verification with a debug kernel. However, someone
disabled them "some months ago" and "nothing happened". Then strange
failures in production systems occur.
Christoph Hellwig Sept. 30, 2019, 8:06 a.m. UTC | #34
On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

I think we absolutely need something like this, and I'm sick and tired
of the people just claiming there is no problem.

From the user POV I don't care if aligned allocations need a new
GFP_ALIGNED flag or not, but as far as I can tell the latter will
probably cause more overhead in practice than not having it.

So unless someone comes up with a better counter proposal to provide
aligned kmalloc of some form that doesn't require a giant amount of
boilerplate code in the users:

Acked^2-by: Christoph Hellwig <hch@lst.de>
Michal Hocko Sept. 30, 2019, 9:23 a.m. UTC | #35
On Mon 23-09-19 18:36:32, Vlastimil Babka wrote:
> On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > aligned to the block size itself) blocks for power of two sizes. That means
> > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > and blocks stop being aligned. Then developers have to devise workaround such
> > as own kmem caches with specified alignment [1], which is not always practical,
> > as recently evidenced in [2].
> > 
> > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > variant would not help with code unknowingly relying on the implicit alignment.
> > For slab implementations it would either require creating more kmalloc caches,
> > or allocate a larger size and only give back part of it. That would be
> > wasteful, especially with a generic alignment parameter (in contrast with a
> > fixed alignment to size).
> > 
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> > What this means for the three available allocators?
> > 
> > * SLAB object layout happens to be mostly unchanged by the patch. The
> >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> >   to redzoning, however SLAB disables redzoning for caches with alignment
> >   larger than unsigned long long. Practically on at least x86 this includes
> >   kmalloc caches as they use cache line alignment, which is larger than that.
> >   Still, this patch ensures alignment on all arches and cache sizes.
> > 
> > * SLUB layout is also unchanged unless redzoning is enabled through
> >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> >   this patch, explicit alignment is guaranteed with redzoning as well. This
> >   will result in more memory being wasted, but that should be acceptable in a
> >   debugging scenario.
> > 
> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > [3] https://lwn.net/Articles/787740/
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

Sigh.

An existing code to workaround the lack of alignment guarantee just show
that this is necessary. And there wasn't any real technical argument
against except for a highly theoretical optimizations/new allocator that
would be tight by the guarantee.

Therefore
Acked-by: Michal Hocko <mhocko@suse.com>
Kirill A . Shutemov Sept. 30, 2019, 9:32 a.m. UTC | #36
On Mon, Sep 30, 2019 at 11:23:34AM +0200, Michal Hocko wrote:
> On Mon 23-09-19 18:36:32, Vlastimil Babka wrote:
> > On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > > aligned to the block size itself) blocks for power of two sizes. That means
> > > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > > and blocks stop being aligned. Then developers have to devise workaround such
> > > as own kmem caches with specified alignment [1], which is not always practical,
> > > as recently evidenced in [2].
> > > 
> > > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > > variant would not help with code unknowingly relying on the implicit alignment.
> > > For slab implementations it would either require creating more kmalloc caches,
> > > or allocate a larger size and only give back part of it. That would be
> > > wasteful, especially with a generic alignment parameter (in contrast with a
> > > fixed alignment to size).
> > > 
> > > Ideally we should provide to mm users what they need without difficult
> > > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > > size explicitly guaranteed for power-of-two sizes under all configurations.
> > > What this means for the three available allocators?
> > > 
> > > * SLAB object layout happens to be mostly unchanged by the patch. The
> > >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> > >   to redzoning, however SLAB disables redzoning for caches with alignment
> > >   larger than unsigned long long. Practically on at least x86 this includes
> > >   kmalloc caches as they use cache line alignment, which is larger than that.
> > >   Still, this patch ensures alignment on all arches and cache sizes.
> > > 
> > > * SLUB layout is also unchanged unless redzoning is enabled through
> > >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> > >   this patch, explicit alignment is guaranteed with redzoning as well. This
> > >   will result in more memory being wasted, but that should be acceptable in a
> > >   debugging scenario.
> > > 
> > > * SLOB has no implicit alignment so this patch adds it explicitly for
> > >   kmalloc(). The potential downside is increased fragmentation. While
> > >   pathological allocation scenarios are certainly possible, in my testing,
> > >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> > >   was consumed by slab pages both before and after the patch, with difference
> > >   in the noise.
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > > [3] https://lwn.net/Articles/787740/
> > > 
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > So if anyone thinks this is a good idea, please express it (preferably
> > in a formal way such as Acked-by), otherwise it seems the patch will be
> > dropped (due to a private NACK, apparently).
> 
> Sigh.
> 
> An existing code to workaround the lack of alignment guarantee just show
> that this is necessary. And there wasn't any real technical argument
> against except for a highly theoretical optimizations/new allocator that
> would be tight by the guarantee.
> 
> Therefore
> Acked-by: Michal Hocko <mhocko@suse.com>

Agreed.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Matthew Wilcox (Oracle) Sept. 30, 2019, 1:32 p.m. UTC | #37
On Sat, Sep 28, 2019 at 01:12:49AM +0000, Christopher Lameter wrote:
> However, the layout may be different due to another allocator that prefers
> to arrange things differently (SLOB puts multiple objects of different
> types in the same page to save memory), if we need to add data to these
> objects (debugging info, new metadata about the object, maybe the memcg
> pointer, maybe other things that may come up), or other innovative
> approaches (such as putting data of different kmem caches that are
> commonly used together in the same page to improve locality).

If we ever do start putting objects of different sizes that are commonly
allocated together in the same page (eg inodes & dentries), then those
aren't going to be random kmalloc() allocation; they're going to be
special kmem caches that can specify "I don't care about alignment".

Also, we haven't done that.  We've had a slab allocator for twenty years,
and nobody's tried to do that.  Maybe the co-allocation would be a net
loss (I suspect).  Or the gain is too small for the added complexity.
Whatever way, this is a strawman.

> The cost is an unnecessary petrification of the data layout of the memory
> allocators.

Yes, it is.  And it's a cost I'm willing to pay in order to get the
guarantee of alignment.
diff mbox series

Patch

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 7744aa3bf2e0..27c54854b508 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -98,6 +98,10 @@  limited. The actual limit depends on the hardware and the kernel
 configuration, but it is a good practice to use `kmalloc` for objects
 smaller than page size.
 
+The address of a chunk allocated with `kmalloc` is aligned to at least
+ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the
+alignment is also guaranteed to be at least to the respective size.
+
 For large allocations you can use :c:func:`vmalloc` and
 :c:func:`vzalloc`, or directly request pages from the page
 allocator. The memory allocated by `vmalloc` and related functions is
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 56c9c7eed34e..0d4c26395785 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -493,6 +493,10 @@  static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
  * kmalloc is the normal method of allocating memory
  * for objects smaller than page size in the kernel.
  *
+ * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
+ * bytes. For @size of power of two bytes, the alignment is also guaranteed
+ * to be at least to the size.
+ *
  * The @flags argument may be one of the GFP flags defined at
  * include/linux/gfp.h and described at
  * :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>`
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 929c02a90fba..b9ba93ad5c7f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -993,10 +993,19 @@  void __init create_boot_cache(struct kmem_cache *s, const char *name,
 		unsigned int useroffset, unsigned int usersize)
 {
 	int err;
+	unsigned int align = ARCH_KMALLOC_MINALIGN;
 
 	s->name = name;
 	s->size = s->object_size = size;
-	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+
+	/*
+	 * For power of two sizes, guarantee natural alignment for kmalloc
+	 * caches, regardless of SL*B debugging options.
+	 */
+	if (is_power_of_2(size))
+		align = max(align, size);
+	s->align = calculate_alignment(flags, align, size);
+
 	s->useroffset = useroffset;
 	s->usersize = usersize;
 
diff --git a/mm/slob.c b/mm/slob.c
index 3dcde9cf2b17..07a39047aa54 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -224,6 +224,7 @@  static void slob_free_pages(void *b, int order)
  * @sp: Page to look in.
  * @size: Size of the allocation.
  * @align: Allocation alignment.
+ * @align_offset: Offset in the allocated block that will be aligned.
  * @page_removed_from_list: Return parameter.
  *
  * Tries to find a chunk of memory at least @size bytes big within @page.
@@ -234,7 +235,7 @@  static void slob_free_pages(void *b, int order)
  *         true (set to false otherwise).
  */
 static void *slob_page_alloc(struct page *sp, size_t size, int align,
-			     bool *page_removed_from_list)
+			      int align_offset, bool *page_removed_from_list)
 {
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
@@ -243,8 +244,17 @@  static void *slob_page_alloc(struct page *sp, size_t size, int align,
 	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
+		/*
+		 * 'aligned' will hold the address of the slob block so that the
+		 * address 'aligned'+'align_offset' is aligned according to the
+		 * 'align' parameter. This is for kmalloc() which prepends the
+		 * allocated block with its size, so that the block itself is
+		 * aligned when needed.
+		 */
 		if (align) {
-			aligned = (slob_t *)ALIGN((unsigned long)cur, align);
+			aligned = (slob_t *)
+				(ALIGN((unsigned long)cur + align_offset, align)
+				 - align_offset);
 			delta = aligned - cur;
 		}
 		if (avail >= units + delta) { /* room enough? */
@@ -288,7 +298,8 @@  static void *slob_page_alloc(struct page *sp, size_t size, int align,
 /*
  * slob_alloc: entry point into the slob allocator.
  */
-static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
+static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
+							int align_offset)
 {
 	struct page *sp;
 	struct list_head *slob_list;
@@ -319,7 +330,7 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		if (sp->units < SLOB_UNITS(size))
 			continue;
 
-		b = slob_page_alloc(sp, size, align, &page_removed_from_list);
+		b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list);
 		if (!b)
 			continue;
 
@@ -356,7 +367,7 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		INIT_LIST_HEAD(&sp->slab_list);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
-		b = slob_page_alloc(sp, size, align, &_unused);
+		b = slob_page_alloc(sp, size, align, align_offset, &_unused);
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
@@ -458,7 +469,7 @@  static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 {
 	unsigned int *m;
-	int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+	int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
 	void *ret;
 
 	gfp &= gfp_allowed_mask;
@@ -466,19 +477,28 @@  __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 	fs_reclaim_acquire(gfp);
 	fs_reclaim_release(gfp);
 
-	if (size < PAGE_SIZE - align) {
+	if (size < PAGE_SIZE - minalign) {
+		int align = minalign;
+
+		/*
+		 * For power of two sizes, guarantee natural alignment for
+		 * kmalloc()'d objects.
+		 */
+		if (is_power_of_2(size))
+			align = max(minalign, (int) size);
+
 		if (!size)
 			return ZERO_SIZE_PTR;
 
-		m = slob_alloc(size + align, gfp, align, node);
+		m = slob_alloc(size + minalign, gfp, align, node, minalign);
 
 		if (!m)
 			return NULL;
 		*m = size;
-		ret = (void *)m + align;
+		ret = (void *)m + minalign;
 
 		trace_kmalloc_node(caller, ret,
-				   size, size + align, gfp, node);
+				   size, size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -579,7 +599,7 @@  static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 	fs_reclaim_release(flags);
 
 	if (c->size < PAGE_SIZE) {
-		b = slob_alloc(c->size, flags, c->align, node);
+		b = slob_alloc(c->size, flags, c->align, node, 0);
 		trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
 					    SLOB_UNITS(c->size) * SLOB_UNIT,
 					    flags, node);