Message ID | 20200324203231.64324-6-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Optionally randomize kernel stack offset each syscall | expand |
On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > Allow for a randomized stack offset on a per-syscall basis, with roughly > 5 bits of entropy. > > Signed-off-by: Kees Cook <keescook@chromium.org> Just to check, do you have an idea of the impact on arm64? Patch 3 had figures for x86 where it reads the TSC, and it's unclear to me how get_random_int() compares to that. Otherwise, this looks sound to me; I'd jsut like to know whether the overhead is in the same ballpark. Thanks Mark. > --- > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/syscall.c | 10 ++++++++++ > 2 files changed, 11 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0b30e884e088..4d5aa4959f72 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -127,6 +127,7 @@ config ARM64 > select HAVE_ARCH_MMAP_RND_BITS > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT > select HAVE_ARCH_PREL32_RELOCATIONS > + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET > select HAVE_ARCH_SECCOMP_FILTER > select HAVE_ARCH_STACKLEAK > select HAVE_ARCH_THREAD_STRUCT_WHITELIST > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c > index a12c0c88d345..238dbd753b44 100644 > --- a/arch/arm64/kernel/syscall.c > +++ b/arch/arm64/kernel/syscall.c > @@ -5,6 +5,7 @@ > #include <linux/errno.h> > #include <linux/nospec.h> > #include <linux/ptrace.h> > +#include <linux/randomize_kstack.h> > #include <linux/syscalls.h> > > #include <asm/daifflags.h> > @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, > { > long ret; > > + add_random_kstack_offset(); > + > if (scno < sc_nr) { > syscall_fn_t syscall_fn; > syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)]; > @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, > } > > regs->regs[0] = ret; > + > + /* > + * Since the compiler chooses a 4 bit alignment for the stack, > + * let's save one additional bit (9 total), which gets us up > + * near 5 bits of entropy. > + */ > + choose_random_kstack_offset(get_random_int() & 0x1FF); > } > > static inline bool has_syscall_work(unsigned long flags) > -- > 2.20.1 >
On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote: > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > Allow for a randomized stack offset on a per-syscall basis, with roughly > > 5 bits of entropy. > > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > Just to check, do you have an idea of the impact on arm64? Patch 3 had > figures for x86 where it reads the TSC, and it's unclear to me how > get_random_int() compares to that. I didn't do a measurement on arm64 since I don't have a good bare-metal test environment. I know Andy Lutomirki has plans for making get_random_get() as fast as possible, so that's why I used it here. I couldn't figure out if there was a comparable instruction like rdtsc in aarch64 (it seems there's a cycle counter, but I found nothing in the kernel that seemed to actually use it)? > Otherwise, this looks sound to me; I'd jsut like to know whether the > overhead is in the same ballpark. Thanks! -Kees
On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote: > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote: > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > > Allow for a randomized stack offset on a per-syscall basis, with roughly > > > 5 bits of entropy. > > > > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > > Just to check, do you have an idea of the impact on arm64? Patch 3 had > > figures for x86 where it reads the TSC, and it's unclear to me how > > get_random_int() compares to that. > > I didn't do a measurement on arm64 since I don't have a good bare-metal > test environment. I know Andy Lutomirki has plans for making > get_random_get() as fast as possible, so that's why I used it here. Ok. I suspect I also won't get the chance to test that in the next few days, but if I do I'll try to share the results. My concern here was that, get_random_int() has to grab a spinlock and mess with IRQ masking, so has the potential to block for much longer, but that might not be an issue in practice, and I don't think that should block these patches. > I couldn't figure out if there was a comparable instruction like rdtsc > in aarch64 (it seems there's a cycle counter, but I found nothing in > the kernel that seemed to actually use it)? AArch64 doesn't have a direct equivalent. The generic counter (CNTxCT_EL0) is the closest thing, but its nominal frequency is typically much lower than the nominal CPU clock frequency (unlike TSC where they're the same). The cycle counter (PMCCNTR_EL0) is part of the PMU, and can't be relied on in the same way (e.g. as perf reprograms it to generate overflow events, and it can stop for things like WFI/WFE). Thanks, Mark.
On Thu, Mar 26, 2020 at 11:15:21AM +0000, Mark Rutland wrote: > On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote: > > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote: > > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > > > Allow for a randomized stack offset on a per-syscall basis, with roughly > > > > 5 bits of entropy. > > > > > > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > > > > Just to check, do you have an idea of the impact on arm64? Patch 3 had > > > figures for x86 where it reads the TSC, and it's unclear to me how > > > get_random_int() compares to that. > > > > I didn't do a measurement on arm64 since I don't have a good bare-metal > > test environment. I know Andy Lutomirki has plans for making > > get_random_get() as fast as possible, so that's why I used it here. > > Ok. I suspect I also won't get the chance to test that in the next few > days, but if I do I'll try to share the results. Okay, thanks! I can try a rough estimate under emulation, but I assume that'll be mostly useless. :) > My concern here was that, get_random_int() has to grab a spinlock and > mess with IRQ masking, so has the potential to block for much longer, > but that might not be an issue in practice, and I don't think that > should block these patches. Gotcha. I was already surprised by how "heavy" the per-cpu access was when I looked at the resulting assembly (there looked to be preempt stuff, etc). But my hope was that this is configurable so people can measure for themselves if they want it, and most people who want this feature have a high tolerance for performance trade-offs. ;) > > I couldn't figure out if there was a comparable instruction like rdtsc > > in aarch64 (it seems there's a cycle counter, but I found nothing in > > the kernel that seemed to actually use it)? > > AArch64 doesn't have a direct equivalent. The generic counter > (CNTxCT_EL0) is the closest thing, but its nominal frequency is > typically much lower than the nominal CPU clock frequency (unlike TSC > where they're the same). The cycle counter (PMCCNTR_EL0) is part of the > PMU, and can't be relied on in the same way (e.g. as perf reprograms it > to generate overflow events, and it can stop for things like WFI/WFE). Okay, cool; thanks for the details! It's always nice to confirm I didn't miss some glaringly obvious solution. ;) For a potential v2, should I add your reviewed-by or wait for your timing analysis, etc?
On Thu, Mar 26, 2020 at 09:31:32AM -0700, Kees Cook wrote: > On Thu, Mar 26, 2020 at 11:15:21AM +0000, Mark Rutland wrote: > > On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote: > > > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote: > > > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > > > > Allow for a randomized stack offset on a per-syscall basis, with roughly > > > > > 5 bits of entropy. > > > > > > > > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > > > > > > Just to check, do you have an idea of the impact on arm64? Patch 3 had > > > > figures for x86 where it reads the TSC, and it's unclear to me how > > > > get_random_int() compares to that. > > > > > > I didn't do a measurement on arm64 since I don't have a good bare-metal > > > test environment. I know Andy Lutomirki has plans for making > > > get_random_get() as fast as possible, so that's why I used it here. > > > > Ok. I suspect I also won't get the chance to test that in the next few > > days, but if I do I'll try to share the results. > > Okay, thanks! I can try a rough estimate under emulation, but I assume > that'll be mostly useless. :) > > > My concern here was that, get_random_int() has to grab a spinlock and > > mess with IRQ masking, so has the potential to block for much longer, > > but that might not be an issue in practice, and I don't think that > > should block these patches. > > Gotcha. I was already surprised by how "heavy" the per-cpu access was > when I looked at the resulting assembly (there looked to be preempt > stuff, etc). But my hope was that this is configurable so people can > measure for themselves if they want it, and most people who want this > feature have a high tolerance for performance trade-offs. ;) > > > > I couldn't figure out if there was a comparable instruction like rdtsc > > > in aarch64 (it seems there's a cycle counter, but I found nothing in > > > the kernel that seemed to actually use it)? > > > > AArch64 doesn't have a direct equivalent. The generic counter > > (CNTxCT_EL0) is the closest thing, but its nominal frequency is > > typically much lower than the nominal CPU clock frequency (unlike TSC > > where they're the same). The cycle counter (PMCCNTR_EL0) is part of the > > PMU, and can't be relied on in the same way (e.g. as perf reprograms it > > to generate overflow events, and it can stop for things like WFI/WFE). > > Okay, cool; thanks for the details! It's always nice to confirm I didn't > miss some glaringly obvious solution. ;) > > For a potential v2, should I add your reviewed-by or wait for your > timing analysis, etc? I'd rather not give an R-b until I've seen numbers, but please don't block waiting for that. For the moment, feel free to add: Acked-by: Mark Rutland <mark.rutland@arm.com> ... and it's down to Will and Catalin to make the call for arm64. Thanks, Mark.
On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > Allow for a randomized stack offset on a per-syscall basis, with roughly > 5 bits of entropy. > > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/syscall.c | 10 ++++++++++ > 2 files changed, 11 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0b30e884e088..4d5aa4959f72 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -127,6 +127,7 @@ config ARM64 > select HAVE_ARCH_MMAP_RND_BITS > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT > select HAVE_ARCH_PREL32_RELOCATIONS > + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET > select HAVE_ARCH_SECCOMP_FILTER > select HAVE_ARCH_STACKLEAK > select HAVE_ARCH_THREAD_STRUCT_WHITELIST > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c > index a12c0c88d345..238dbd753b44 100644 > --- a/arch/arm64/kernel/syscall.c > +++ b/arch/arm64/kernel/syscall.c > @@ -5,6 +5,7 @@ > #include <linux/errno.h> > #include <linux/nospec.h> > #include <linux/ptrace.h> > +#include <linux/randomize_kstack.h> > #include <linux/syscalls.h> > > #include <asm/daifflags.h> > @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, > { > long ret; > > + add_random_kstack_offset(); > + > if (scno < sc_nr) { > syscall_fn_t syscall_fn; > syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)]; > @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, > } > > regs->regs[0] = ret; > + > + /* > + * Since the compiler chooses a 4 bit alignment for the stack, > + * let's save one additional bit (9 total), which gets us up > + * near 5 bits of entropy. > + */ > + choose_random_kstack_offset(get_random_int() & 0x1FF); Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff is 9 bits, and that is 4+5 but so what? Will
On Mon, Apr 20, 2020 at 09:54:58PM +0100, Will Deacon wrote: > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > + /* > > + * Since the compiler chooses a 4 bit alignment for the stack, > > + * let's save one additional bit (9 total), which gets us up > > + * near 5 bits of entropy. > > + */ > > + choose_random_kstack_offset(get_random_int() & 0x1FF); > > Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff > is 9 bits, and that is 4+5 but so what? Er, well, yes. I guess I was just trying to explain why there were 9 bits saved here and to document what I was seeing the compiler actually doing with the values. (And it serves as a comparison to the x86 comment which is explaining similar calculations in the face of x86_64 vs ia32.) Would something like this be better? /* * Since the compiler uses 4 bit alignment for the stack (1 more than * x86_64), let's try to match the 5ish-bit entropy seen in x86_64, * instead of having needlessly lower entropy. As a result, keep the * low 9 bits. */
On Mon, Apr 20, 2020 at 03:34:57PM -0700, Kees Cook wrote: > On Mon, Apr 20, 2020 at 09:54:58PM +0100, Will Deacon wrote: > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote: > > > + /* > > > + * Since the compiler chooses a 4 bit alignment for the stack, > > > + * let's save one additional bit (9 total), which gets us up > > > + * near 5 bits of entropy. > > > + */ > > > + choose_random_kstack_offset(get_random_int() & 0x1FF); > > > > Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff > > is 9 bits, and that is 4+5 but so what? > > Er, well, yes. I guess I was just trying to explain why there were 9 > bits saved here and to document what I was seeing the compiler actually > doing with the values. (And it serves as a comparison to the x86 comment > which is explaining similar calculations in the face of x86_64 vs ia32.) > > Would something like this be better? > > /* > * Since the compiler uses 4 bit alignment for the stack (1 more than > * x86_64), let's try to match the 5ish-bit entropy seen in x86_64, > * instead of having needlessly lower entropy. As a result, keep the > * low 9 bits. > */ Yes, thank you! I was missing the comparison to x86_64 and so the one "additional" bit didn't make sense to me. With the new comment: Acked-by: Will Deacon <will@kernel.org> I'm assuming you're merging this via some other tree, but let me know if you need anything else from me. Cheers, Will
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0b30e884e088..4d5aa4959f72 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -127,6 +127,7 @@ config ARM64 select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_PREL32_RELOCATIONS + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_STACKLEAK select HAVE_ARCH_THREAD_STRUCT_WHITELIST diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index a12c0c88d345..238dbd753b44 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -5,6 +5,7 @@ #include <linux/errno.h> #include <linux/nospec.h> #include <linux/ptrace.h> +#include <linux/randomize_kstack.h> #include <linux/syscalls.h> #include <asm/daifflags.h> @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, { long ret; + add_random_kstack_offset(); + if (scno < sc_nr) { syscall_fn_t syscall_fn; syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)]; @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, } regs->regs[0] = ret; + + /* + * Since the compiler chooses a 4 bit alignment for the stack, + * let's save one additional bit (9 total), which gets us up + * near 5 bits of entropy. + */ + choose_random_kstack_offset(get_random_int() & 0x1FF); } static inline bool has_syscall_work(unsigned long flags)
Allow for a randomized stack offset on a per-syscall basis, with roughly 5 bits of entropy. Signed-off-by: Kees Cook <keescook@chromium.org> --- arch/arm64/Kconfig | 1 + arch/arm64/kernel/syscall.c | 10 ++++++++++ 2 files changed, 11 insertions(+)