Message ID | 20220130211838.8382-23-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadow stacks for userspace | expand |
On 1/30/22 13:18, Rick Edgecombe wrote: > Shadow stack accesses are writes from handle_mm_fault() perspective. So to > generate the correct PTE, maybe_mkwrite() will rely on the presence of > VM_SHADOW_STACK or VM_WRITE in the vma. > > In future patches, when VM_SHADOW_STACK is actually creatable by > userspace, a problem could happen if a user calls > mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code > would then be confused in the event of shadow stack accesses, and create a > writable PTE for a shadow stack access. Then the process would fault in a > loop. > > Prevent this from happening by blocking this kind of memory (VM_WRITE and > VM_SHADOW_STACK) from being created, instead of complicating the fault > handler logic to handle it. > > Add an x86 arch_validate_flags() implementation to handle the check. > Rename the uapi/asm/mman.h header guard to be able to use it for > arch/x86/include/asm/mman.h where the arch_validate_flags() will be. It would be great if this also said: There is an existing arch_validate_flags() hook for mmap() and mprotect() which allows architectures to reject unwanted ->vm_flags combinations. Add an implementation for x86. That's somewhat implied from what is there already, but making it more clear would be nice. There's a much higher bar to add a new arch hook than to just implement an existing one. > diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h > new file mode 100644 > index 000000000000..b44fe31deb3a > --- /dev/null > +++ b/arch/x86/include/asm/mman.h > @@ -0,0 +1,21 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_X86_MMAN_H > +#define _ASM_X86_MMAN_H > + > +#include <linux/mm.h> > +#include <uapi/asm/mman.h> > + > +#ifdef CONFIG_X86_SHADOW_STACK > +static inline bool arch_validate_flags(unsigned long vm_flags) > +{ > + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) > + return false; > + > + return true; > +} The design decision here seems to be that VM_SHADOW_STACK is itself a pseudo-VM_WRITE flag. Like you said: "Shadow stack accesses are writes from handle_mm_fault()". Very early on, this series seems to have made the decision that shadow stacks are writable and need lots of write handling behavior, *BUT* shouldn't have VM_WRITE set. As a whole, that seems odd. The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be set together. I guess the downside is that pte_mkwrite() would need to be made to work on shadow stack PTEs. That particular design decision was never discussed. I think it has a really big impact on the rest of the series. What do you think? Was it a good idea? Or would the alternative be more complicated than what you have now?
On Fri, 2022-02-11 at 14:19 -0800, Dave Hansen wrote: > On 1/30/22 13:18, Rick Edgecombe wrote: > > Shadow stack accesses are writes from handle_mm_fault() > > perspective. So to > > generate the correct PTE, maybe_mkwrite() will rely on the presence > > of > > VM_SHADOW_STACK or VM_WRITE in the vma. > > > > In future patches, when VM_SHADOW_STACK is actually creatable by > > userspace, a problem could happen if a user calls > > mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. > > The code > > would then be confused in the event of shadow stack accesses, and > > create a > > writable PTE for a shadow stack access. Then the process would > > fault in a > > loop. > > > > Prevent this from happening by blocking this kind of memory > > (VM_WRITE and > > VM_SHADOW_STACK) from being created, instead of complicating the > > fault > > handler logic to handle it. > > > > Add an x86 arch_validate_flags() implementation to handle the > > check. > > Rename the uapi/asm/mman.h header guard to be able to use it for > > arch/x86/include/asm/mman.h where the arch_validate_flags() will > > be. > > It would be great if this also said: > > There is an existing arch_validate_flags() hook for mmap() and > mprotect() which allows architectures to reject unwanted > ->vm_flags combinations. Add an implementation for x86. > > That's somewhat implied from what is there already, but making it > more > clear would be nice. There's a much higher bar to add a new arch > hook > than to just implement an existing one. Ok, makes sense. > > > > diff --git a/arch/x86/include/asm/mman.h > > b/arch/x86/include/asm/mman.h > > new file mode 100644 > > index 000000000000..b44fe31deb3a > > --- /dev/null > > +++ b/arch/x86/include/asm/mman.h > > @@ -0,0 +1,21 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +#ifndef _ASM_X86_MMAN_H > > +#define _ASM_X86_MMAN_H > > + > > +#include <linux/mm.h> > > +#include <uapi/asm/mman.h> > > + > > +#ifdef CONFIG_X86_SHADOW_STACK > > +static inline bool arch_validate_flags(unsigned long vm_flags) > > +{ > > + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) > > + return false; > > + > > + return true; > > +} > > The design decision here seems to be that VM_SHADOW_STACK is itself a > pseudo-VM_WRITE flag. Like you said: "Shadow stack accesses are > writes > from handle_mm_fault()". > > Very early on, this series seems to have made the decision that > shadow > stacks are writable and need lots of write handling behavior, *BUT* > shouldn't have VM_WRITE set. As a whole, that seems odd. > > The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be > set > together. I guess the downside is that pte_mkwrite() would need to > be > made to work on shadow stack PTEs. > > That particular design decision was never discussed. I think it has > a > really big impact on the rest of the series. What do you think? Was > it > a good idea? Or would the alternative be more complicated than what > you > have now? First of all, thanks again for the deep review of the MM piece. I'm still pondering the overall problem, which is why I haven't responded to those yet. I had originally thought that the MM changes were a bit hard to follow. I was also somewhat amazed at how naturally normal COW worked. I was wondering where the big COW stuff would be happening. In the way that COW was sort of tucked away, overloading writability seemed sort of aligned. But the names are very confusing, and this patch probably should have been a hint that there are problems design wise. For writability, especially with WRSS, I do think it's a bit unnatural to think of shadow stack memory as anything but writable. Especially when it comes to COW. But shadow stack accesses are not always writes, incssp for example. The code will create shadow stack memory for shadow stack access loads, which of course isn't writing anything, but is required to make the instruction work. So it calls mkwrite(), which is weird. But... it does need to leave it in a state that is kind of writable, so makes a little sense I guess. I was wondering if maybe the mm code can't be fully sensible for shadow stacks without creating maybe_mkshstk() and adding it everywhere in a whole new fault path. Then you have reads, writes and shadow stack accesses that each have their own logic. It might require so many additions that better names and comments are preferable. I don't know though, still trying to come up with a good opinion.
diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..b44fe31deb3a --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include <linux/mm.h> +#include <uapi/asm/mman.h> + +#ifdef CONFIG_X86_SHADOW_STACK +static inline bool arch_validate_flags(unsigned long vm_flags) +{ + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) + return false; + + return true; +} + +#define arch_validate_flags(vm_flags) arch_validate_flags(vm_flags) + +#endif /* CONFIG_X86_SHADOW_STACK */ + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..9704e27c4d24 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#ifndef _ASM_X86_MMAN_H -#define _ASM_X86_MMAN_H +#ifndef _UAPI_ASM_X86_MMAN_H +#define _UAPI_ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ @@ -28,4 +28,4 @@ #include <asm-generic/mman.h> -#endif /* _ASM_X86_MMAN_H */ +#endif /* _UAPI_ASM_X86_MMAN_H */
Shadow stack accesses are writes from handle_mm_fault() perspective. So to generate the correct PTE, maybe_mkwrite() will rely on the presence of VM_SHADOW_STACK or VM_WRITE in the vma. In future patches, when VM_SHADOW_STACK is actually creatable by userspace, a problem could happen if a user calls mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code would then be confused in the event of shadow stack accesses, and create a writable PTE for a shadow stack access. Then the process would fault in a loop. Prevent this from happening by blocking this kind of memory (VM_WRITE and VM_SHADOW_STACK) from being created, instead of complicating the fault handler logic to handle it. Add an x86 arch_validate_flags() implementation to handle the check. Rename the uapi/asm/mman.h header guard to be able to use it for arch/x86/include/asm/mman.h where the arch_validate_flags() will be. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- v1: - New patch. arch/x86/include/asm/mman.h | 21 +++++++++++++++++++++ arch/x86/include/uapi/asm/mman.h | 6 +++--- 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/mman.h