randomize_kstack: Improve stack alignment codegen

Message ID	20240702211612.work.576-kees@kernel.org (mailing list archive)
State	Mainlined
Commit	872bb37f6829d4f7f3ed5afe2786add3d4384b4b
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 953DF1534E8; Tue, 2 Jul 2024 21:16:18 +0000 (UTC) From: Kees Cook <kees@kernel.org> To: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <kees@kernel.org>, "liuyuntao (F)" <liuyuntao12@huawei.com>, "Gustavo A. R. Silva" <gustavoars@kernel.org>, linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] randomize_kstack: Improve stack alignment codegen Date: Tue, 2 Jul 2024 14:16:16 -0700 Message-Id: <20240702211612.work.576-kees@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	randomize_kstack: Improve stack alignment codegen \| expand randomize_kstack: Improve stack alignment codegen

Message ID

20240702211612.work.576-kees@kernel.org (mailing list archive)

State

Mainlined

Commit

872bb37f6829d4f7f3ed5afe2786add3d4384b4b

Headers

From: Kees Cook <kees@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <kees@kernel.org>,
	"liuyuntao (F)" <liuyuntao12@huawei.com>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	linux-hardening@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] randomize_kstack: Improve stack alignment codegen
Date: Tue,  2 Jul 2024 14:16:16 -0700
Message-Id: <20240702211612.work.576-kees@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

randomize_kstack: Improve stack alignment codegen | expand

Commit Message

Kees Cook July 2, 2024, 9:16 p.m. UTC

The codgen for adding architecture-specific stack alignment to the
effective alloca() usage is somewhat inefficient and allows a bit to get
carried beyond the desired entropy range. This isn't really a problem,
but it's unexpected and the codegen is kind of bad.

Quoting Mark[1], the disassembly for arm64's invoke_syscall() looks like:

	// offset = raw_cpu_read(kstack_offset)
	mov     x4, sp
	adrp    x0, kstack_offset
	mrs     x5, tpidr_el1
	add     x0, x0, #:lo12:kstack_offset
	ldr     w0, [x0, x5]

	// offset = KSTACK_OFFSET_MAX(offset)
	and     x0, x0, #0x3ff

	// alloca(offset)
	add     x0, x0, #0xf
	and     x0, x0, #0x7f0
	sub     sp, x4, x0

... which in C would be:

	offset = raw_cpu_read(kstack_offset)
	offset &= 0x3ff;			// [0x0, 0x3ff]
	offset += 0xf;				// [0xf, 0x40e]
	offset &= 0x7f0;			// [0x0,

... so when *all* bits [3:0] are 0, they'll have no impact, and when
*any* of bits [3:0] are 1 they'll trigger a carry into bit 4, which
could ripple all the way up and spill into bit 10.

Switch the masking in KSTACK_OFFSET_MAX() to explicitly clear the bottom
bits to avoid the rounding by using 0b1111110000 instead of 0b1111111111:

	// offset = raw_cpu_read(kstack_offset)
	mov     x4, sp
	adrp    x0, 0 <kstack_offset>
	mrs     x5, tpidr_el1
	add     x0, x0, #:lo12:kstack_offset
	ldr     w0, [x0, x5]

	// offset = KSTACK_OFFSET_MAX(offset)
	and     x0, x0, #0x3f0

	// alloca(offset)
	sub     sp, x4, x0

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/lkml/ZnVfOnIuFl2kNWkT@J2N7QTR9R3/ [1]
Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "liuyuntao (F)" <liuyuntao12@huawei.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: linux-hardening@vger.kernel.org
---
 include/linux/randomize_kstack.h | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 6d92b68efbf6..1d982dbdd0d0 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -32,13 +32,19 @@  DECLARE_PER_CPU(u32, kstack_offset);
 #endif
 
 /*
- * Use, at most, 10 bits of entropy. We explicitly cap this to keep the
- * "VLA" from being unbounded (see above). 10 bits leaves enough room for
- * per-arch offset masks to reduce entropy (by removing higher bits, since
- * high entropy may overly constrain usable stack space), and for
- * compiler/arch-specific stack alignment to remove the lower bits.
+ * Use, at most, 6 bits of entropy (on 64-bit; 8 on 32-bit). This cap is
+ * to keep the "VLA" from being unbounded (see above). Additionally clear
+ * the bottom 4 bits (on 64-bit systems, 2 for 32-bit), since stack
+ * alignment will always be at least word size. This makes the compiler
+ * code gen better when it is applying the actual per-arch alignment to
+ * the final offset. The resulting randomness is reasonable without overly
+ * constraining usable stack space.
  */
-#define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
+#ifdef CONFIG_64BIT
+#define KSTACK_OFFSET_MAX(x)	((x) & 0b1111110000)
+#else
+#define KSTACK_OFFSET_MAX(x)	((x) & 0b1111111100)
+#endif
 
 /**
  * add_random_kstack_offset - Increase stack utilization by previously

randomize_kstack: Improve stack alignment codegen

Commit Message

Patch