diff mbox series

[21/35] mm/mprotect: Exclude shadow stack from preserve_write

Message ID 20220130211838.8382-22-rick.p.edgecombe@intel.com (mailing list archive)
State New
Headers show
Series Shadow stacks for userspace | expand

Commit Message

Edgecombe, Rick P Jan. 30, 2022, 9:18 p.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com>

In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is
preserved to avoid the additional write fault after the NUMA hinting fault.
However, pte_write() now includes both normal writable and shadow stack
(RW=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need
to preserve it.

Exclude shadow stack from preserve_write test, and apply the same change to
change_huge_pmd().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

v25:
 - Move is_shadow_stack_mapping() to a separate line.

v24:
 - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping().

 mm/huge_memory.c | 7 +++++++
 mm/mprotect.c    | 7 +++++++
 2 files changed, 14 insertions(+)

Comments

Dave Hansen Feb. 10, 2022, 7:27 p.m. UTC | #1
On 1/30/22 13:18, Rick Edgecombe wrote:
> In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is
> preserved to avoid the additional write fault after the NUMA hinting fault.
> However, pte_write() now includes both normal writable and shadow stack
> (RW=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need
> to preserve it.

This series creates an interesting situation: it causes a logical
disconnection between things that were tightly coupled before.  For
instance, before this series, _PAGE_RW=1 and "writable" really were
synonyms.  They meant the same thing.

One of the complexities in this series is differentiating the two.  For
instance, a shadow stack page can be written to, even though it has
_PAGE_RW=0.

This particular patch seems to be hacking around the problem that a
p*_mkwrite() doesn't work on shadow stack PTE/PMDs.  First, that makes
me wonder what *actually* happens if we do a plain pte_mkwrite() on a
shadow stack PTE.  I *think* it will take the [Write=0,Dirty=1] PTE and

       pte = pte_set_flags(pte, _PAGE_RW);

so we'll end up with [Write=1,Dirty=1], which is bad.

Let's say pte_mkwrite() can't be fixed.  We should probably make it
VM_BUG_ON() if it's ever asked to muck with a shadow stack PTE.

It's also weird because we have this pte_write()==1 PTE in a !VM_WRITE
VMA.  Then, we're trying to pte_mkwrite() under this !VM_WRITE VMA.

	pte_write() <-- returns true for on shadow stack PTE!
	pte_mkwrite() <-- illegal on shadow stack PTE

I need to think about this a little more.  I don't have a solution.
But, as-is, it seems untenable.  The rules are just too counter
intuitive to live.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1c7167e6f223..01375e39b52b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1750,6 +1750,13 @@  int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		return 0;
 
 	preserve_write = prot_numa && pmd_write(*pmd);
+
+	/*
+	 * Preserve only normal writable huge PMD, but not shadow
+	 * stack (RW=0, Dirty=1).
+	 */
+	if (is_shadow_stack_mapping(vma->vm_flags))
+		preserve_write = false;
 	ret = 1;
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b0012c13a00e..faac710f0891 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -77,6 +77,13 @@  static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			pte_t ptent;
 			bool preserve_write = prot_numa && pte_write(oldpte);
 
+			/*
+			 * Preserve only normal writable PTE, but not shadow
+			 * stack (RW=0, Dirty=1).
+			 */
+			if (is_shadow_stack_mapping(vma->vm_flags))
+				preserve_write = false;
+
 			/*
 			 * Avoid trapping faults against the zero or KSM
 			 * pages. See similar comment in change_huge_pmd.