Message ID | 20220929222936.14584-1-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Shadowstacks for userspace | expand |
On Thu, Sep 29, 2022 at 03:28:57PM -0700, Rick Edgecombe wrote: > This is an overdue followup to the “Shadow stacks for userspace” CET series. > Thanks for all the comments on the first version [0]. They drove a decent > amount of changes for v2. Since it has been awhile, I’ll try to summarize the > areas that got major changes since last time. Smaller changes are listed in > each patch. Thanks for the write-up! > [...] > GUP > --- > Shadow stack memory is generally treated as writable by the kernel, but > it behaves differently then other writable memory with respect to GUP. > FOLL_WRITE will not GUP shadow stack memory unless FOLL_FORCE is also > set. Shadow stack memory is writable from the perspective of being > changeable by userspace, but it is also protected memory from > userspace’s perspective. So preventing it from being writable via > FOLL_WRITE help’s make it harder for userspace to arbitrarily write to > it. However, like read-only memory, FOLL_FORCE can still write through > it. This means shadow stacks can be written to via things like > “/proc/self/mem”. Apps that want extra security will have to prevent > access to kernel features that can write with FOLL_FORCE. This seems like a problem to me -- the point of SS is that there cannot be a way to write to them without specific instruction sequences. The fact that /proc/self/mem bypasses memory protections was an old design mistake that keeps leading to surprising behaviors. It would be much nicer to draw the line somewhere and just say that FOLL_FORCE doesn't work on VM_SHADOW_STACK. Why must FOLL_FORCE be allowed to write to SS? > [...] > Shadow stack signal format > -------------------------- > So to handle alt shadow stacks we need to push some data onto a stack. To > prevent SROP we need to push something to the shadow stack that the kernel can > [...] > shadow stack return address or a shadow stack tokens. To make sure it can’t be > used, data is pushed with the high bit (bit 63) set. This bit is a linear > address bit in both the token format and a normal return address, so it should > not conflict with anything. It puts any return address in the kernel half of > the address space, so would never be created naturally by a userspace program. > It will not be a valid restore token either, as the kernel address will never > be pointing to the previous frame in the shadow stack. > > When a signal hits, the format pushed to the stack that is handling the signal > is four 8 byte values (since we are 64 bit only): > |1...old SSP|1...alt stack size|1...alt stack base|0| Do these end up being non-canonical addresses? (To avoid confusion with "real" kernel addresses?) -Kees
On Mon, Oct 3, 2022 at 7:04 PM Kees Cook <keescook@chromium.org> wrote: > On Thu, Sep 29, 2022 at 03:28:57PM -0700, Rick Edgecombe wrote: > > This is an overdue followup to the “Shadow stacks for userspace” CET series. > > Thanks for all the comments on the first version [0]. They drove a decent > > amount of changes for v2. Since it has been awhile, I’ll try to summarize the > > areas that got major changes since last time. Smaller changes are listed in > > each patch. > > Thanks for the write-up! > > > [...] > > GUP > > --- > > Shadow stack memory is generally treated as writable by the kernel, but > > it behaves differently then other writable memory with respect to GUP. > > FOLL_WRITE will not GUP shadow stack memory unless FOLL_FORCE is also > > set. Shadow stack memory is writable from the perspective of being > > changeable by userspace, but it is also protected memory from > > userspace’s perspective. So preventing it from being writable via > > FOLL_WRITE help’s make it harder for userspace to arbitrarily write to > > it. However, like read-only memory, FOLL_FORCE can still write through > > it. This means shadow stacks can be written to via things like > > “/proc/self/mem”. Apps that want extra security will have to prevent > > access to kernel features that can write with FOLL_FORCE. > > This seems like a problem to me -- the point of SS is that there cannot be > a way to write to them without specific instruction sequences. The fact > that /proc/self/mem bypasses memory protections was an old design mistake > that keeps leading to surprising behaviors. It would be much nicer to > draw the line somewhere and just say that FOLL_FORCE doesn't work on > VM_SHADOW_STACK. Why must FOLL_FORCE be allowed to write to SS? But once you have FOLL_FORCE, you can also just write over stuff like executable code instead of writing over the stack. I don't think allowing FOLL_FORCE writes over shadow stacks from /proc/$pid/mem is making things worse in any way, and it's probably helpful for stuff like debuggers. If you don't want /proc/$pid/mem to be able to do stuff like that, then IMO the way to go is to change when /proc/$pid/mem uses FOLL_FORCE, or to limit overall write access to /proc/$pid/mem.
On Mon, 2022-10-03 at 10:04 -0700, Kees Cook wrote: > > Shadow stack signal format > > -------------------------- > > So to handle alt shadow stacks we need to push some data onto a > > stack. To > > prevent SROP we need to push something to the shadow stack that the > > kernel can > > [...] > > shadow stack return address or a shadow stack tokens. To make sure > > it can’t be > > used, data is pushed with the high bit (bit 63) set. This bit is a > > linear > > address bit in both the token format and a normal return address, > > so it should > > not conflict with anything. It puts any return address in the > > kernel half of > > the address space, so would never be created naturally by a > > userspace program. > > It will not be a valid restore token either, as the kernel address > > will never > > be pointing to the previous frame in the shadow stack. > > > > When a signal hits, the format pushed to the stack that is handling > > the signal > > is four 8 byte values (since we are 64 bit only): > > > 1...old SSP|1...alt stack size|1...alt stack base|0| > > Do these end up being non-canonical addresses? (To avoid confusion > with > "real" kernel addresses?) Usually, but not necessarily with LAM. LAM cannot mask bit 63 though. So hypothetically they could become "real" kernel addresses some day. To keep them in the user half but still make sure they are not usable, you would either have to encode the bits over a lot of entries which would use extra space, or shrink the available address space, which could cause compatibility problems. Do you think it's an issue?
On Mon, Oct 03, 2022 at 06:33:52PM +0000, Edgecombe, Rick P wrote: > On Mon, 2022-10-03 at 10:04 -0700, Kees Cook wrote: > > > Shadow stack signal format > > > -------------------------- > > > So to handle alt shadow stacks we need to push some data onto a > > > stack. To > > > prevent SROP we need to push something to the shadow stack that the > > > kernel can > > > [...] > > > shadow stack return address or a shadow stack tokens. To make sure > > > it can’t be > > > used, data is pushed with the high bit (bit 63) set. This bit is a > > > linear > > > address bit in both the token format and a normal return address, > > > so it should > > > not conflict with anything. It puts any return address in the > > > kernel half of > > > the address space, so would never be created naturally by a > > > userspace program. > > > It will not be a valid restore token either, as the kernel address > > > will never > > > be pointing to the previous frame in the shadow stack. > > > > > > When a signal hits, the format pushed to the stack that is handling > > > the signal > > > is four 8 byte values (since we are 64 bit only): > > > > 1...old SSP|1...alt stack size|1...alt stack base|0| > > > > Do these end up being non-canonical addresses? (To avoid confusion > > with > > "real" kernel addresses?) > > Usually, but not necessarily with LAM. LAM cannot mask bit 63 though. > So hypothetically they could become "real" kernel addresses some day. > To keep them in the user half but still make sure they are not usable, > you would either have to encode the bits over a lot of entries which > would use extra space, or shrink the available address space, which > could cause compatibility problems. > > Do you think it's an issue? Nah; I think it's a good solution. I was just trying to make sure I understood it correctly. Thanks!
On Mon, Oct 03, 2022 at 07:25:03PM +0200, Jann Horn wrote: > On Mon, Oct 3, 2022 at 7:04 PM Kees Cook <keescook@chromium.org> wrote: > > On Thu, Sep 29, 2022 at 03:28:57PM -0700, Rick Edgecombe wrote: > > > This is an overdue followup to the “Shadow stacks for userspace” CET series. > > > Thanks for all the comments on the first version [0]. They drove a decent > > > amount of changes for v2. Since it has been awhile, I’ll try to summarize the > > > areas that got major changes since last time. Smaller changes are listed in > > > each patch. > > > > Thanks for the write-up! > > > > > [...] > > > GUP > > > --- > > > Shadow stack memory is generally treated as writable by the kernel, but > > > it behaves differently then other writable memory with respect to GUP. > > > FOLL_WRITE will not GUP shadow stack memory unless FOLL_FORCE is also > > > set. Shadow stack memory is writable from the perspective of being > > > changeable by userspace, but it is also protected memory from > > > userspace’s perspective. So preventing it from being writable via > > > FOLL_WRITE help’s make it harder for userspace to arbitrarily write to > > > it. However, like read-only memory, FOLL_FORCE can still write through > > > it. This means shadow stacks can be written to via things like > > > “/proc/self/mem”. Apps that want extra security will have to prevent > > > access to kernel features that can write with FOLL_FORCE. > > > > This seems like a problem to me -- the point of SS is that there cannot be > > a way to write to them without specific instruction sequences. The fact > > that /proc/self/mem bypasses memory protections was an old design mistake > > that keeps leading to surprising behaviors. It would be much nicer to > > draw the line somewhere and just say that FOLL_FORCE doesn't work on > > VM_SHADOW_STACK. Why must FOLL_FORCE be allowed to write to SS? > > But once you have FOLL_FORCE, you can also just write over stuff like > executable code instead of writing over the stack. I don't think > allowing FOLL_FORCE writes over shadow stacks from /proc/$pid/mem is > making things worse in any way, and it's probably helpful for stuff > like debuggers. > > If you don't want /proc/$pid/mem to be able to do stuff like that, > then IMO the way to go is to change when /proc/$pid/mem uses > FOLL_FORCE, or to limit overall write access to /proc/$pid/mem. Yeah, all reasonable. I just wish we could ditch FOLL_FORCE; it continues to weird me out how powerful that fd's side-effects are.
From: Kees Cook <keescook@chromium.org> ... > > > > If you don't want /proc/$pid/mem to be able to do stuff like that, > > then IMO the way to go is to change when /proc/$pid/mem uses > > FOLL_FORCE, or to limit overall write access to /proc/$pid/mem. > > Yeah, all reasonable. I just wish we could ditch FOLL_FORCE; it continues > to weird me out how powerful that fd's side-effects are. Could you remove FOLL_FORCE from /proc/$pid/mem and add a /proc/$pid/mem_force that enable FOLL_FORCE but requires root (or similar) access. Although I suspect gdb may like to have write access to code? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Oct 04, 2022 at 09:57:48AM +0000, David Laight wrote: > From: Kees Cook <keescook@chromium.org> > ... > > > > > > If you don't want /proc/$pid/mem to be able to do stuff like that, > > > then IMO the way to go is to change when /proc/$pid/mem uses > > > FOLL_FORCE, or to limit overall write access to /proc/$pid/mem. > > > > Yeah, all reasonable. I just wish we could ditch FOLL_FORCE; it continues > > to weird me out how powerful that fd's side-effects are. > > Could you remove FOLL_FORCE from /proc/$pid/mem and add a > /proc/$pid/mem_force that enable FOLL_FORCE but requires root > (or similar) access. > > Although I suspect gdb may like to have write access to > code? As Jann has reminded me out of band, while FOLL_FORCE is still worrisome, it's really /proc/$pid/mem access at all without an active ptrace attachment (and to self). Here's my totally untested idea to require access to /proc/$pid/mem having an established ptrace relationship: diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index c952c5ba8fab..0393741eeabb 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -64,6 +64,7 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead); #define PTRACE_MODE_NOAUDIT 0x04 #define PTRACE_MODE_FSCREDS 0x08 #define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_ATTACHED 0x20 /* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */ #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS) diff --git a/fs/proc/base.c b/fs/proc/base.c index 93f7e3d971e4..fadec587d133 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -826,7 +826,7 @@ static int __mem_open(struct inode *inode, struct file *file, unsigned int mode) static int mem_open(struct inode *inode, struct file *file) { - int ret = __mem_open(inode, file, PTRACE_MODE_ATTACH); + int ret = __mem_open(inode, file, PTRACE_MODE_ATTACHED); /* OK to pass negative loff_t, we can catch out-of-range */ file->f_mode |= FMODE_UNSIGNED_OFFSET; diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 1893d909e45c..c97e6d734ae5 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -304,6 +304,12 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode) * or halting the specified task is impossible. */ + /* + * If an existing ptrace relationship is required, not even + * introspection is allowed. + */ + if ((mode & PTRACE_MODE_ATTACHED) && ptrace_parent(task) != current) + return -EPERM; /* Don't let security modules deny introspection */ if (same_thread_group(task, current)) return 0;