diff mbox series

[025/155] slub: relocate freelist pointer to middle of object

Message ID 20200402040427.WyxceElzI%akpm@linux-foundation.org (mailing list archive)
State New, archived
Headers show
Series [001/155] tools/accounting/getdelays.c: fix netlink attribute length | expand

Commit Message

Andrew Morton April 2, 2020, 4:04 a.m. UTC
From: Kees Cook <keescook@chromium.org>
Subject: slub: relocate freelist pointer to middle of object

In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it
became clear that moving the freelist pointer away from the edge of
allocations would likely improve the overall defensive posture of the
inline freelist pointer.  My benchmarks show no meaningful change to
performance (they seem to show it being faster), so this looks like a
reasonable change to make.

Instead of having the freelist pointer at the very beginning of an
allocation (offset 0) or at the very end of an allocation (effectively
offset -sizeof(void *) from the next allocation), move it away from the
edges of the allocation and into the middle.  This provides some
protection against small-sized neighboring overflows (or underflows), for
which the freelist pointer is commonly the target.  (Large or well
controlled overwrites are much more likely to attack live object contents,
instead of attempting freelist corruption.)

The vaunted kernel build benchmark, across 5 runs. Before:

	Mean: 250.05
	Std Dev: 1.85

and after, which appears mysteriously faster:

	Mean: 247.13
	Std Dev: 0.76

Attempts at running "sysbench --test=memory" show the change to be well in
the noise (sysbench seems to be pretty unstable here -- it's not really
measuring allocation).

Hackbench is more allocation-heavy, and while the std dev is above the
difference, it looks like may manifest as an improvement as well:

20 runs of "hackbench -g 20 -l 1000", before:

	Mean: 36.322
	Std Dev: 0.577

and after:

	Mean: 36.056
	Std Dev: 0.598

[1] https://twitter.com/vnik5287/status/1235113523098685440

Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Vitaly Nikolenko <vnik@duasynt.com>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: Christoph Lameter <cl@linux.com>Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |    7 +++++++
 1 file changed, 7 insertions(+)

Comments

Marco Elver April 15, 2020, 4:47 p.m. UTC | #1
Hello,

On Wed, 01 Apr 2020, Andrew Morton wrote:

> From: Kees Cook <keescook@chromium.org>
> Subject: slub: relocate freelist pointer to middle of object
> 
> In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it
> became clear that moving the freelist pointer away from the edge of
> allocations would likely improve the overall defensive posture of the
> inline freelist pointer.  My benchmarks show no meaningful change to
> performance (they seem to show it being faster), so this looks like a
> reasonable change to make.
> 
> Instead of having the freelist pointer at the very beginning of an
> allocation (offset 0) or at the very end of an allocation (effectively
> offset -sizeof(void *) from the next allocation), move it away from the
> edges of the allocation and into the middle.  This provides some
> protection against small-sized neighboring overflows (or underflows), for
> which the freelist pointer is commonly the target.  (Large or well
> controlled overwrites are much more likely to attack live object contents,
> instead of attempting freelist corruption.)
> 
> The vaunted kernel build benchmark, across 5 runs. Before:
> 
> 	Mean: 250.05
> 	Std Dev: 1.85
> 
> and after, which appears mysteriously faster:
> 
> 	Mean: 247.13
> 	Std Dev: 0.76
> 
> Attempts at running "sysbench --test=memory" show the change to be well in
> the noise (sysbench seems to be pretty unstable here -- it's not really
> measuring allocation).
> 
> Hackbench is more allocation-heavy, and while the std dev is above the
> difference, it looks like may manifest as an improvement as well:
> 
> 20 runs of "hackbench -g 20 -l 1000", before:
> 
> 	Mean: 36.322
> 	Std Dev: 0.577
> 
> and after:
> 
> 	Mean: 36.056
> 	Std Dev: 0.598
> 
> [1] https://twitter.com/vnik5287/status/1235113523098685440
> 
> Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescook
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Acked-by: Christoph Lameter <cl@linux.com>
> Cc: Vitaly Nikolenko <vnik@duasynt.com>
> Cc: Silvio Cesare <silvio.cesare@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/slub.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 

With kernel v5.7-rc1 I am unable to boot when using the SLUB allocator
and red zoning (slub_debug=Z), but otherwise a default config. Bisect
points to this patch, and when reverting it, the kernel boots again.

Splat:
	[...]
	[    0.328713] rcu: Hierarchical RCU implementation.
	[    0.329169] rcu:     RCU event tracing is enabled.
	[    0.329611] rcu:     RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
	[    0.330251] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
	[    0.330984] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
	[    0.332130] NR_IRQS: 4352, nr_irqs: 488, preallocated irqs: 16
	[    0.332713] general protection fault, probably for non-canonical address 0xccccccccccccccd4: 0000 [#1] SMP PTI
	[    0.333680] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc1+ #3
	[    0.334280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
	[    0.335079] RIP: 0010:deactivate_slab.isra.0+0x5b/0x460
	[    0.335582] Code: 48 8b b4 c7 e0 00 00 00 49 8b 44 24 20 31 ff 48 85 c0 40 0f 95 c7 83 c7 0f 89 7c 24 18 48 85 d2 0f 84 a0 00 00 00 41 8b 4e 20 <48> 8b 3c 0b 48 85 ff 0f 84 8c 00 00 00 49 8b 54 24 28 48 89 04 0b
	[    0.337385] RSP: 0000:ffffffffb7e03c80 EFLAGS: 00010086
	[    0.337907] RAX: 0000000000000000 RBX: cccccccccccccccc RCX: 0000000000000008
	[    0.338688] RDX: cccccccccccccccc RSI: ffff91241c800f40 RDI: 000000000000000f
	[    0.339473] RBP: ffffffffb7e03d20 R08: ffff91241fc2d230 R09: 0000000000000000
	[    0.340256] R10: ffff91241c89c010 R11: 0000000000000000 R12: ffffcf2f20722700
	[    0.341041] R13: cccccccccccccccc R14: ffff91241c802180 R15: ffffcf2f20722700
	[    0.341833] FS:  0000000000000000(0000) GS:ffff91241fc00000(0000) knlGS:0000000000000000
	[    0.342727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[    0.343359] CR2: ffff911e74e01000 CR3: 000000027460a001 CR4: 00000000000606b0
	[    0.344146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	[    0.344929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	[    0.345727] Call Trace:
	[    0.345999]  ? setup_object_debug.isra.0+0x1d/0x40
	[    0.346525]  ? new_slab+0x195/0x340
	[    0.346909]  ? init_object+0x2f/0x80
	[    0.347305]  ___slab_alloc+0x526/0x570
	[    0.347717]  ? kasprintf+0x4e/0x70
	[    0.348092]  ? init_object+0x2f/0x80
	[    0.348488]  ? string+0x42/0x50
	[    0.348834]  ? kasprintf+0x4e/0x70
	[    0.349217]  __kmalloc_track_caller+0x1d2/0x200
	[    0.349720]  kvasprintf+0x64/0xc0
	[    0.350085]  kasprintf+0x4e/0x70
	[    0.350442]  ? kmem_cache_alloc_trace+0x188/0x1b0
	[    0.350962]  __irq_domain_alloc_fwnode+0x8f/0xd0
	[    0.351474]  arch_early_irq_init+0x16/0x90
	[    0.351923]  start_kernel+0x2aa/0x4c2
	[    0.352325]  secondary_startup_64+0xb6/0xc0
	[    0.352784] Modules linked in:
	[    0.353124] random: get_random_bytes called from print_oops_end_marker+0x21/0x40 with crng_init=0
	[    0.353126] ---[ end trace 186486c23e10986d ]---
	[    0.354613] RIP: 0010:deactivate_slab.isra.0+0x5b/0x460
	[    0.355186] Code: 48 8b b4 c7 e0 00 00 00 49 8b 44 24 20 31 ff 48 85 c0 40 0f 95 c7 83 c7 0f 89 7c 24 18 48 85 d2 0f 84 a0 00 00 00 41 8b 4e 20 <48> 8b 3c 0b 48 85 ff 0f 84 8c 00 00 00 49 8b 54 24 28 48 89 04 0b
	[    0.357255] RSP: 0000:ffffffffb7e03c80 EFLAGS: 00010086
	[    0.357829] RAX: 0000000000000000 RBX: cccccccccccccccc RCX: 0000000000000008
	[    0.358613] RDX: cccccccccccccccc RSI: ffff91241c800f40 RDI: 000000000000000f
	[    0.359398] RBP: ffffffffb7e03d20 R08: ffff91241fc2d230 R09: 0000000000000000
	[    0.360181] R10: ffff91241c89c010 R11: 0000000000000000 R12: ffffcf2f20722700
	[    0.360965] R13: cccccccccccccccc R14: ffff91241c802180 R15: ffffcf2f20722700
	[    0.361755] FS:  0000000000000000(0000) GS:ffff91241fc00000(0000) knlGS:0000000000000000
	[    0.362645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[    0.363275] CR2: ffff911e74e01000 CR3: 000000027460a001 CR4: 00000000000606b0
	[    0.364060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	[    0.364844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	[    0.365636] Kernel panic - not syncing: Attempted to kill the idle task!
	[    0.366393] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Can you reproduce this? Let me know if you need more information.

Thanks,
-- Marco

> --- a/mm/slub.c~slub-relocate-freelist-pointer-to-middle-of-object
> +++ a/mm/slub.c
> @@ -3581,6 +3581,13 @@ static int calculate_sizes(struct kmem_c
>  		 */
>  		s->offset = size;
>  		size += sizeof(void *);
> +	} else if (size > sizeof(void *)) {
> +		/*
> +		 * Store freelist pointer near middle of object to keep
> +		 * it away from the edges of the object to avoid small
> +		 * sized over/underflows from neighboring allocations.
> +		 */
> +		s->offset = ALIGN(size / 2, sizeof(void *));
>  	}
>  
>  #ifdef CONFIG_SLUB_DEBUG
> _
Kees Cook April 15, 2020, 5:07 p.m. UTC | #2
On Wed, Apr 15, 2020 at 06:47:26PM +0200, Marco Elver wrote:
> On Wed, 01 Apr 2020, Andrew Morton wrote:
> > From: Kees Cook <keescook@chromium.org>
> > Subject: slub: relocate freelist pointer to middle of object
> > [...]
> 
> With kernel v5.7-rc1 I am unable to boot when using the SLUB allocator
> and red zoning (slub_debug=Z), but otherwise a default config. Bisect
> points to this patch, and when reverting it, the kernel boots again.
> 
> Splat:
> 	[...]
> 	[    0.328713] rcu: Hierarchical RCU implementation.
> 	[    0.329169] rcu:     RCU event tracing is enabled.
> 	[    0.329611] rcu:     RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
> 	[    0.330251] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> 	[    0.330984] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> 	[    0.332130] NR_IRQS: 4352, nr_irqs: 488, preallocated irqs: 16
> 	[    0.332713] general protection fault, probably for non-canonical address 0xccccccccccccccd4: 0000 [#1] SMP PTI
> 	[    0.333680] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc1+ #3
> 	[    0.334280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> 	[    0.335079] RIP: 0010:deactivate_slab.isra.0+0x5b/0x460

Thanks for the report! It seems something isn't using get_freepointer()
(and is missing the s->offset calculation). I will try to track it
down...

> Can you reproduce this? Let me know if you need more information.

Yup! I see a crash in the same place with slub_debug=Z. Since I'm
building with CONFIG_SLAB_FREELIST_HARDENED=y, I see a random number
instead of 0xccccccccccccccd4. I'll keep digging...
Kees Cook April 15, 2020, 6 p.m. UTC | #3
On Wed, Apr 15, 2020 at 06:47:26PM +0200, Marco Elver wrote:
> With kernel v5.7-rc1 I am unable to boot when using the SLUB allocator
> and red zoning (slub_debug=Z), but otherwise a default config. Bisect
> points to this patch, and when reverting it, the kernel boots again.

Okay, I believe this patch should fix the problem:
https://lore.kernel.org/lkml/202004151054.BD695840@keescook
diff mbox series

Patch

--- a/mm/slub.c~slub-relocate-freelist-pointer-to-middle-of-object
+++ a/mm/slub.c
@@ -3581,6 +3581,13 @@  static int calculate_sizes(struct kmem_c
 		 */
 		s->offset = size;
 		size += sizeof(void *);
+	} else if (size > sizeof(void *)) {
+		/*
+		 * Store freelist pointer near middle of object to keep
+		 * it away from the edges of the object to avoid small
+		 * sized over/underflows from neighboring allocations.
+		 */
+		s->offset = ALIGN(size / 2, sizeof(void *));
 	}
 
 #ifdef CONFIG_SLUB_DEBUG