Message ID | 20240327061258.2370291-1-sorear@fastmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d14fa1fcf69db9d070e75f1c4425211fa619dfc8 |
Headers | show |
Series | riscv: process: Fix kernel gp leakage | expand |
Hi Stefan, On Wed, Mar 27, 2024 at 2:14 PM Stefan O'Rear <sorear@fastmail.com> wrote: > > childregs represents the registers which are active for the new thread > in user context. For a kernel thread, childregs->gp is never used since > the kernel gp is not touched by switch_to. For a user mode helper, the > gp value can be observed in user space after execve or possibly by other > means. > > Fixes: 7db91e57a0ac ("RISC-V: Task implementation") > Signed-off-by: Stefan O'Rear <sorear@fastmail.com> > --- > arch/riscv/kernel/process.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c > index 92922dbd5b5c..51042f48da17 100644 > --- a/arch/riscv/kernel/process.c > +++ b/arch/riscv/kernel/process.c > @@ -27,8 +27,6 @@ > #include <asm/vector.h> > #include <asm/cpufeature.h> > > -register unsigned long gp_in_global __asm__("gp"); > - > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > #include <linux/stackprotector.h> > unsigned long __stack_chk_guard __read_mostly; > @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > if (unlikely(args->fn)) { > /* Kernel thread */ > memset(childregs, 0, sizeof(struct pt_regs)); > - childregs->gp = gp_in_global; > /* Supervisor/Machine, irqs on: */ > childregs->status = SR_PP | SR_PIE; > > -- > 2.40.1 > > Can you help express in more detail what the problem was before fixing it? Thanks, Yunhui
On Wed, Mar 27, 2024, at 4:43 AM, yunhui cui wrote: > Hi Stefan, > > On Wed, Mar 27, 2024 at 2:14 PM Stefan O'Rear <sorear@fastmail.com> wrote: >> >> childregs represents the registers which are active for the new thread >> in user context. For a kernel thread, childregs->gp is never used since >> the kernel gp is not touched by switch_to. For a user mode helper, the >> gp value can be observed in user space after execve or possibly by other >> means. >> >> Fixes: 7db91e57a0ac ("RISC-V: Task implementation") >> Signed-off-by: Stefan O'Rear <sorear@fastmail.com> >> --- >> arch/riscv/kernel/process.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c >> index 92922dbd5b5c..51042f48da17 100644 >> --- a/arch/riscv/kernel/process.c >> +++ b/arch/riscv/kernel/process.c >> @@ -27,8 +27,6 @@ >> #include <asm/vector.h> >> #include <asm/cpufeature.h> >> >> -register unsigned long gp_in_global __asm__("gp"); >> - >> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) >> #include <linux/stackprotector.h> >> unsigned long __stack_chk_guard __read_mostly; >> @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) >> if (unlikely(args->fn)) { >> /* Kernel thread */ >> memset(childregs, 0, sizeof(struct pt_regs)); >> - childregs->gp = gp_in_global; >> /* Supervisor/Machine, irqs on: */ >> childregs->status = SR_PP | SR_PIE; >> >> -- >> 2.40.1 >> >> > Can you help express in more detail what the problem was before fixing it? It's a KASLR bypass, since gp_in_global is the address of the kernel symbol __global_pointer$. The /* Kernel thread */ comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. childregs is the *user* context during syscall execution and it is observable from userspace in at least five ways: 1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment. This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch. 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries. 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns. 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code. 5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers. Does this help? How much of this should be in the commit message? -s > Thanks, > Yunhui > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Stefan, On Thu, Mar 28, 2024 at 12:54 AM Stefan O'Rear <sorear@fastmail.com> wrote: > > On Wed, Mar 27, 2024, at 4:43 AM, yunhui cui wrote: > > Hi Stefan, > > > > On Wed, Mar 27, 2024 at 2:14 PM Stefan O'Rear <sorear@fastmail.com> wrote: > >> > >> childregs represents the registers which are active for the new thread > >> in user context. For a kernel thread, childregs->gp is never used since > >> the kernel gp is not touched by switch_to. For a user mode helper, the > >> gp value can be observed in user space after execve or possibly by other > >> means. > >> > >> Fixes: 7db91e57a0ac ("RISC-V: Task implementation") > >> Signed-off-by: Stefan O'Rear <sorear@fastmail.com> > >> --- > >> arch/riscv/kernel/process.c | 3 --- > >> 1 file changed, 3 deletions(-) > >> > >> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c > >> index 92922dbd5b5c..51042f48da17 100644 > >> --- a/arch/riscv/kernel/process.c > >> +++ b/arch/riscv/kernel/process.c > >> @@ -27,8 +27,6 @@ > >> #include <asm/vector.h> > >> #include <asm/cpufeature.h> > >> > >> -register unsigned long gp_in_global __asm__("gp"); > >> - > >> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > >> #include <linux/stackprotector.h> > >> unsigned long __stack_chk_guard __read_mostly; > >> @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > >> if (unlikely(args->fn)) { > >> /* Kernel thread */ > >> memset(childregs, 0, sizeof(struct pt_regs)); > >> - childregs->gp = gp_in_global; > >> /* Supervisor/Machine, irqs on: */ > >> childregs->status = SR_PP | SR_PIE; > >> > >> -- > >> 2.40.1 > >> > >> > > Can you help express in more detail what the problem was before fixing it? > > It's a KASLR bypass, since gp_in_global is the address of the kernel symbol > __global_pointer$. > > The /* Kernel thread */ comment is somewhat inaccurate in that it is also used > for user_mode_helper threads, which exec a user process, e.g. /sbin/init or > when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have > PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. > > childregs is the *user* context during syscall execution and it is observable > from userspace in at least five ways: > > 1. kernel_execve does not currently clear integer registers, so the starting > register state for PID 1 and other user processes started by the kernel has > sp = user stack, gp = kernel __global_pointer$, all other integer registers > zeroed by the memset in the patch comment. > > This is a bug in its own right, but I'm unwilling to bet that it is the only > way to exploit the issue addressed by this patch. > > 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread > before it execs, but ptrace requires SIGSTOP to be delivered which can only > happen at user/kernel boundaries. > > 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for > user_mode_helpers before the exec completes, but gp is not one of the > registers it returns. > > 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel > addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses > are also exposed via PERF_SAMPLE_REGS_USER which is permitted under > LOCKDOWN_PERF. I have not attempted to write exploit code. > > 5. Much of the tracing infrastructure allows access to user registers. I have > not attempted to determine which forms of tracing allow access to user > registers without already allowing access to kernel registers. > > Does this help? How much of this should be in the commit message? Fine enough, Thanks. Thanks, Yunhui
Hi Stefan, On 27/03/2024 17:53, Stefan O'Rear wrote: > On Wed, Mar 27, 2024, at 4:43 AM, yunhui cui wrote: >> Hi Stefan, >> >> On Wed, Mar 27, 2024 at 2:14 PM Stefan O'Rear <sorear@fastmail.com> wrote: >>> childregs represents the registers which are active for the new thread >>> in user context. For a kernel thread, childregs->gp is never used since >>> the kernel gp is not touched by switch_to. For a user mode helper, the >>> gp value can be observed in user space after execve or possibly by other >>> means. >>> >>> Fixes: 7db91e57a0ac ("RISC-V: Task implementation") >>> Signed-off-by: Stefan O'Rear <sorear@fastmail.com> >>> --- >>> arch/riscv/kernel/process.c | 3 --- >>> 1 file changed, 3 deletions(-) >>> >>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c >>> index 92922dbd5b5c..51042f48da17 100644 >>> --- a/arch/riscv/kernel/process.c >>> +++ b/arch/riscv/kernel/process.c >>> @@ -27,8 +27,6 @@ >>> #include <asm/vector.h> >>> #include <asm/cpufeature.h> >>> >>> -register unsigned long gp_in_global __asm__("gp"); >>> - >>> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) >>> #include <linux/stackprotector.h> >>> unsigned long __stack_chk_guard __read_mostly; >>> @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) >>> if (unlikely(args->fn)) { >>> /* Kernel thread */ >>> memset(childregs, 0, sizeof(struct pt_regs)); >>> - childregs->gp = gp_in_global; >>> /* Supervisor/Machine, irqs on: */ >>> childregs->status = SR_PP | SR_PIE; >>> >>> -- >>> 2.40.1 >>> >>> >> Can you help express in more detail what the problem was before fixing it? > It's a KASLR bypass, since gp_in_global is the address of the kernel symbol > __global_pointer$. > > The /* Kernel thread */ comment is somewhat inaccurate in that it is also used > for user_mode_helper threads, which exec a user process, e.g. /sbin/init or > when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have > PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. > > childregs is the *user* context during syscall execution and it is observable > from userspace in at least five ways: > > 1. kernel_execve does not currently clear integer registers, so the starting > register state for PID 1 and other user processes started by the kernel has > sp = user stack, gp = kernel __global_pointer$, all other integer registers > zeroed by the memset in the patch comment. So as I did not this know this path really well, I played a bit and I can confirm that usermode processes reach userspace with the gp = kernel: Thread 1 hit Breakpoint 12, 0x00007fff82487fc4 in ?? () 1: x/i $pc => 0x7fff82487fc4: mv a0,sp 3: /x $gp = 0xffffffff817fee50 > > This is a bug in its own right, but I'm unwilling to bet that it is the only > way to exploit the issue addressed by this patch. > > 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread > before it execs, but ptrace requires SIGSTOP to be delivered which can only > happen at user/kernel boundaries. > > 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for > user_mode_helpers before the exec completes, but gp is not one of the > registers it returns. > > 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel > addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses > are also exposed via PERF_SAMPLE_REGS_USER which is permitted under > LOCKDOWN_PERF. I have not attempted to write exploit code. > > 5. Much of the tracing infrastructure allows access to user registers. I have > not attempted to determine which forms of tracing allow access to user > registers without already allowing access to kernel registers. > > Does this help? How much of this should be in the commit message? I'd put them all, but up to you, at least the first usecase that I was able to reproduce should be added to the commit log. You can add: Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> And this should go to -fixes. Thanks, Alex > > -s > >> Thanks, >> Yunhui >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Tue, 02 Apr 2024 02:21:15 PDT (-0700), alex@ghiti.fr wrote: > Hi Stefan, > > On 27/03/2024 17:53, Stefan O'Rear wrote: >> On Wed, Mar 27, 2024, at 4:43 AM, yunhui cui wrote: >>> Hi Stefan, >>> >>> On Wed, Mar 27, 2024 at 2:14 PM Stefan O'Rear <sorear@fastmail.com> wrote: >>>> childregs represents the registers which are active for the new thread >>>> in user context. For a kernel thread, childregs->gp is never used since >>>> the kernel gp is not touched by switch_to. For a user mode helper, the >>>> gp value can be observed in user space after execve or possibly by other >>>> means. >>>> >>>> Fixes: 7db91e57a0ac ("RISC-V: Task implementation") >>>> Signed-off-by: Stefan O'Rear <sorear@fastmail.com> >>>> --- >>>> arch/riscv/kernel/process.c | 3 --- >>>> 1 file changed, 3 deletions(-) >>>> >>>> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c >>>> index 92922dbd5b5c..51042f48da17 100644 >>>> --- a/arch/riscv/kernel/process.c >>>> +++ b/arch/riscv/kernel/process.c >>>> @@ -27,8 +27,6 @@ >>>> #include <asm/vector.h> >>>> #include <asm/cpufeature.h> >>>> >>>> -register unsigned long gp_in_global __asm__("gp"); >>>> - >>>> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) >>>> #include <linux/stackprotector.h> >>>> unsigned long __stack_chk_guard __read_mostly; >>>> @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) >>>> if (unlikely(args->fn)) { >>>> /* Kernel thread */ >>>> memset(childregs, 0, sizeof(struct pt_regs)); >>>> - childregs->gp = gp_in_global; >>>> /* Supervisor/Machine, irqs on: */ >>>> childregs->status = SR_PP | SR_PIE; >>>> >>>> -- >>>> 2.40.1 >>>> >>>> >>> Can you help express in more detail what the problem was before fixing it? >> It's a KASLR bypass, since gp_in_global is the address of the kernel symbol >> __global_pointer$. >> >> The /* Kernel thread */ comment is somewhat inaccurate in that it is also used >> for user_mode_helper threads, which exec a user process, e.g. /sbin/init or >> when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have >> PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. >> >> childregs is the *user* context during syscall execution and it is observable >> from userspace in at least five ways: >> >> 1. kernel_execve does not currently clear integer registers, so the starting >> register state for PID 1 and other user processes started by the kernel has >> sp = user stack, gp = kernel __global_pointer$, all other integer registers >> zeroed by the memset in the patch comment. > > > So as I did not this know this path really well, I played a bit and I > can confirm that usermode processes reach userspace with the gp = kernel: > > Thread 1 hit Breakpoint 12, 0x00007fff82487fc4 in ?? () > 1: x/i $pc > => 0x7fff82487fc4: mv a0,sp > 3: /x $gp = 0xffffffff817fee50 > > >> >> This is a bug in its own right, but I'm unwilling to bet that it is the only >> way to exploit the issue addressed by this patch. >> >> 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread >> before it execs, but ptrace requires SIGSTOP to be delivered which can only >> happen at user/kernel boundaries. >> >> 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for >> user_mode_helpers before the exec completes, but gp is not one of the >> registers it returns. >> >> 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel >> addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses >> are also exposed via PERF_SAMPLE_REGS_USER which is permitted under >> LOCKDOWN_PERF. I have not attempted to write exploit code. >> >> 5. Much of the tracing infrastructure allows access to user registers. I have >> not attempted to determine which forms of tracing allow access to user >> registers without already allowing access to kernel registers. >> >> Does this help? How much of this should be in the commit message? > > > I'd put them all, but up to you, at least the first usecase that I was > able to reproduce should be added to the commit log. I just pasted it all in the commit, it seems generally useful for people running into the commit. With the Link tags maybe it's less important these days, but I always just err on the side of putting more stuff in the commit messages. > You can add: > > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > And this should go to -fixes. It's queued up for the tester. > Thanks, > > Alex > > >> >> -s >> >>> Thanks, >>> Yunhui >>> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv
Hello: This patch was applied to riscv/linux.git (fixes) by Palmer Dabbelt <palmer@rivosinc.com>: On Wed, 27 Mar 2024 02:12:58 -0400 you wrote: > childregs represents the registers which are active for the new thread > in user context. For a kernel thread, childregs->gp is never used since > the kernel gp is not touched by switch_to. For a user mode helper, the > gp value can be observed in user space after execve or possibly by other > means. > > Fixes: 7db91e57a0ac ("RISC-V: Task implementation") > Signed-off-by: Stefan O'Rear <sorear@fastmail.com> > > [...] Here is the summary with links: - riscv: process: Fix kernel gp leakage https://git.kernel.org/riscv/c/d14fa1fcf69d You are awesome, thank you!
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 92922dbd5b5c..51042f48da17 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -27,8 +27,6 @@ #include <asm/vector.h> #include <asm/cpufeature.h> -register unsigned long gp_in_global __asm__("gp"); - #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) #include <linux/stackprotector.h> unsigned long __stack_chk_guard __read_mostly; @@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (unlikely(args->fn)) { /* Kernel thread */ memset(childregs, 0, sizeof(struct pt_regs)); - childregs->gp = gp_in_global; /* Supervisor/Machine, irqs on: */ childregs->status = SR_PP | SR_PIE;
childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Signed-off-by: Stefan O'Rear <sorear@fastmail.com> --- arch/riscv/kernel/process.c | 3 --- 1 file changed, 3 deletions(-)