Message ID | 20221203003606.6838-19-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadow stacks for userspace | expand |
On Fri, Dec 02, 2022 at 04:35:45PM -0800, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > The x86 Control-flow Enforcement Technology (CET) feature includes a new > type of memory called shadow stack. This shadow stack memory has some > unusual properties, which requires some core mm changes to function > properly. > > With the introduction of shadow stack memory there are two ways a pte can > be writable: regular writable memory and shadow stack memory. > > In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() > or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases > where a PTE is made writable. However, there are places where pte_mkwrite() > is called directly and the logic should now also create a shadow stack PTE > in the case of a shadow stack VMA. > > - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE > directly and call pte_mkwrite(). Teach it about pte_mkwrite_shstk() > > - When userfaultfd is creating a PTE after userspace handles the fault > it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() > > To make the code cleaner, introduce is_shstk_write() which simplifies > checking for VM_WRITE | VM_SHADOW_STACK together. > > In other cases where pte_mkwrite() is called directly, the VMA will not > be VM_SHADOW_STACK, and so shadow stack memory should not be created. > - In the case of pte_savedwrite(), shadow stack VMA's are excluded. > - In the case of the "dirty_accountable" optimization in mprotect(), > shadow stack VMA's won't be VM_SHARED, so it is not nessary. > > Tested-by: Pengfei Xu <pengfei.xu@intel.com> > Tested-by: John Allen <john.allen@amd.com> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e4530b39f378..a89dfa9174ae 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -918,6 +918,9 @@ static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) } #endif /* CONFIG_PAGE_TABLE_ISOLATION */ +#define is_shstk_write is_shstk_write +extern bool is_shstk_write(unsigned long vm_flags); + #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 8525f2876fb4..f0e536bea3ca 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -876,3 +876,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +bool is_shstk_write(unsigned long vm_flags) +{ + return (vm_flags & (VM_SHADOW_STACK | VM_WRITE)) == + (VM_SHADOW_STACK | VM_WRITE); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d8096578610a..b4a9d9936463 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1586,6 +1586,13 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifndef is_shstk_write +static inline bool is_shstk_write(unsigned long vm_flags) +{ + return false; +} +#endif + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/memory.c b/mm/memory.c index 8a6d5c823f91..c02b6421241d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4128,7 +4128,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) + + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 721b2365dbca..53d417683e01 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -645,7 +645,9 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) + if (is_shstk_write(vma->vm_flags)) + entry = pte_mkwrite_shstk(pte_mkdirty(entry)); + else if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3a8ff47943d5..1f6d102d069b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page_mapping(page); spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set.