diff mbox series

[v8,15/27] mm: Handle shadow stack page fault

Message ID 20190813205225.12032-16-yu-cheng.yu@intel.com (mailing list archive)
State New, archived
Headers show
Series Control-flow Enforcement: Shadow Stack | expand

Commit Message

Yu-cheng Yu Aug. 13, 2019, 8:52 p.m. UTC
When a task does fork(), its shadow stack (SHSTK) must be duplicated
for the child.  This patch implements a flow similar to copy-on-write
of an anonymous page, but for SHSTK.

A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
to effect the copying.  In copy_one_pte(), clear the dirty bit from a
SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
time, fix the PTE and copy/re-use the page.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/mm/pgtable.c         | 15 +++++++++++++++
 include/asm-generic/pgtable.h | 15 +++++++++++++++
 mm/memory.c                   |  7 ++++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

Comments

Andy Lutomirski Aug. 13, 2019, 10:55 p.m. UTC | #1
On Tue, Aug 13, 2019 at 2:02 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> When a task does fork(), its shadow stack (SHSTK) must be duplicated
> for the child.  This patch implements a flow similar to copy-on-write
> of an anonymous page, but for SHSTK.
>
> A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
> to effect the copying.  In copy_one_pte(), clear the dirty bit from a
> SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
> time, fix the PTE and copy/re-use the page.

Is using VM_SHSTK and special-casing all of this really better than
using a special mapping or other pseudo-file-backed VMA and putting
all the magic in the vm_operations?

--Andy
Yu-cheng Yu Aug. 14, 2019, 4:27 p.m. UTC | #2
On Tue, 2019-08-13 at 15:55 -0700, Andy Lutomirski wrote:
> On Tue, Aug 13, 2019 at 2:02 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > 
> > When a task does fork(), its shadow stack (SHSTK) must be duplicated
> > for the child.  This patch implements a flow similar to copy-on-write
> > of an anonymous page, but for SHSTK.
> > 
> > A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
> > to effect the copying.  In copy_one_pte(), clear the dirty bit from a
> > SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
> > time, fix the PTE and copy/re-use the page.
> 
> Is using VM_SHSTK and special-casing all of this really better than
> using a special mapping or other pseudo-file-backed VMA and putting
> all the magic in the vm_operations?

A special mapping is cleaner.  However, we also need to exclude normal [RO +
dirty] pages from shadow stack.

Yu-cheng
Dave Hansen Aug. 14, 2019, 4:48 p.m. UTC | #3
On 8/14/19 9:27 AM, Yu-cheng Yu wrote:
> On Tue, 2019-08-13 at 15:55 -0700, Andy Lutomirski wrote:
>> On Tue, Aug 13, 2019 at 2:02 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>> When a task does fork(), its shadow stack (SHSTK) must be duplicated
>>> for the child.  This patch implements a flow similar to copy-on-write
>>> of an anonymous page, but for SHSTK.
>>>
>>> A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
>>> to effect the copying.  In copy_one_pte(), clear the dirty bit from a
>>> SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
>>> time, fix the PTE and copy/re-use the page.
>> Is using VM_SHSTK and special-casing all of this really better than
>> using a special mapping or other pseudo-file-backed VMA and putting
>> all the magic in the vm_operations?
> A special mapping is cleaner.  However, we also need to exclude normal [RO +
> dirty] pages from shadow stack.

I don't understand what you are saying.

Are you saying that we need this VM_SHSTK flag in order to exclude
RO+HW-Dirty pages from being created in non-shadow-stack VMAs?
Yu-cheng Yu Aug. 14, 2019, 5 p.m. UTC | #4
On Wed, 2019-08-14 at 09:48 -0700, Dave Hansen wrote:
> On 8/14/19 9:27 AM, Yu-cheng Yu wrote:
> > On Tue, 2019-08-13 at 15:55 -0700, Andy Lutomirski wrote:
> > > On Tue, Aug 13, 2019 at 2:02 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > > When a task does fork(), its shadow stack (SHSTK) must be duplicated
> > > > for the child.  This patch implements a flow similar to copy-on-write
> > > > of an anonymous page, but for SHSTK.
> > > > 
> > > > A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
> > > > to effect the copying.  In copy_one_pte(), clear the dirty bit from a
> > > > SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
> > > > time, fix the PTE and copy/re-use the page.
> > > 
> > > Is using VM_SHSTK and special-casing all of this really better than
> > > using a special mapping or other pseudo-file-backed VMA and putting
> > > all the magic in the vm_operations?
> > 
> > A special mapping is cleaner.  However, we also need to exclude normal [RO +
> > dirty] pages from shadow stack.
> 
> I don't understand what you are saying.
> 
> Are you saying that we need this VM_SHSTK flag in order to exclude
> RO+HW-Dirty pages from being created in non-shadow-stack VMAs?

We use VM_SHSTK for page fault handling (the special-casing).  If we have a
special mapping, all these become cleaner (but more code).  However, we still
need most of the PTE macros (e.g. ptep_set_wrprotect, PAGE_DIRTY_SW, etc.).

Yu-cheng
diff mbox series

Patch

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 44816ff6411f..0c10d0c5e329 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -876,3 +876,18 @@  int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+inline pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHSTK)
+		return pte_mkdirty_shstk(pte);
+	else
+		return pte;
+}
+
+inline bool arch_copy_pte_mapping(vm_flags_t vm_flags)
+{
+	return (vm_flags & VM_SHSTK);
+}
+#endif /* CONFIG_X86_INTEL_SHADOW_STACK_USER */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 75d9d68a6de7..89b0fa132f1f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1188,4 +1188,19 @@  static inline bool arch_has_pfn_modify_check(void)
 #define mm_pmd_folded(mm)	__is_defined(__PAGETABLE_PMD_FOLDED)
 #endif
 
+#ifndef CONFIG_ARCH_HAS_SHSTK
+static inline pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma)
+{
+	return pte;
+}
+
+static inline bool arch_copy_pte_mapping(vm_flags_t vm_flags)
+{
+	return false;
+}
+#else
+pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma);
+bool arch_copy_pte_mapping(vm_flags_t vm_flags);
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..be93a73b5152 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,7 +754,8 @@  copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * If it's a COW mapping, write protect it both
 	 * in the parent and the child
 	 */
-	if (is_cow_mapping(vm_flags) && pte_write(pte)) {
+	if ((is_cow_mapping(vm_flags) && pte_write(pte)) ||
+	    arch_copy_pte_mapping(vm_flags)) {
 		ptep_set_wrprotect(src_mm, addr, src_pte);
 		pte = pte_wrprotect(pte);
 	}
@@ -2273,6 +2274,7 @@  static inline void wp_page_reuse(struct vm_fault *vmf)
 	flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 	entry = pte_mkyoung(vmf->orig_pte);
 	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	entry = pte_set_vma_features(entry, vma);
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2348,6 +2350,7 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		entry = pte_set_vma_features(entry, vma);
 		/*
 		 * Clear the pte entry and flush it first, before updating the
 		 * pte with the new entry. This will avoid a race condition
@@ -2866,6 +2869,7 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	pte = mk_pte(page, vma->vm_page_prot);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = pte_set_vma_features(pte, vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = RMAP_EXCLUSIVE;
@@ -3008,6 +3012,7 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
+	entry = pte_set_vma_features(entry, vma);
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);