diff mbox

[RFC,v2,16/27] mm: Modify can_follow_write_pte/pmd for shadow stack

Message ID 20180710222639.8241-17-yu-cheng.yu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu, Yu-cheng July 10, 2018, 10:26 p.m. UTC
There are three possible shadow stack PTE settings:

  Normal SHSTK PTE: (R/O + DIRTY_HW)
  SHSTK PTE COW'ed: (R/O + DIRTY_HW)
  SHSTK PTE shared as R/O data: (R/O + DIRTY_SW)

Update can_follow_write_pte/pmd for the shadow stack.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/gup.c         | 11 ++++++++---
 mm/huge_memory.c | 10 +++++++---
 2 files changed, 15 insertions(+), 6 deletions(-)

Comments

Dave Hansen July 10, 2018, 11:37 p.m. UTC | #1
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote:
> There are three possible shadow stack PTE settings:
> 
>   Normal SHSTK PTE: (R/O + DIRTY_HW)
>   SHSTK PTE COW'ed: (R/O + DIRTY_HW)
>   SHSTK PTE shared as R/O data: (R/O + DIRTY_SW)
> 
> Update can_follow_write_pte/pmd for the shadow stack.

First of all, thanks for the excellent patch headers.  It's nice to have
that reference every time even though it's repeated.

> -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags,
> +					bool shstk)
>  {
> +	bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte);
> +
>  	return pte_write(pte) ||
> -		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> +		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed);
>  }

Can we just pass the VMA in here?  This use is OK-ish, but I generally
detest true/false function arguments because you can't tell what they
are when they show up without a named variable.

But...  Why does this even matter?  Your own example showed that all
shadowstack PTEs have either DIRTY_HW or DIRTY_SW set, and pte_dirty()
checks both.

That makes this check seem a bit superfluous.
Peter Zijlstra July 11, 2018, 9:29 a.m. UTC | #2
On Tue, Jul 10, 2018 at 03:26:28PM -0700, Yu-cheng Yu wrote:
> There are three possible shadow stack PTE settings:
> 
>   Normal SHSTK PTE: (R/O + DIRTY_HW)
>   SHSTK PTE COW'ed: (R/O + DIRTY_HW)
>   SHSTK PTE shared as R/O data: (R/O + DIRTY_SW)

I count _2_ distinct states there.

> Update can_follow_write_pte/pmd for the shadow stack.

So the below disallows can_follow_write when shstk && _PAGE_DIRTY_SW,
but this here Changelog doesn't explain why. Doesn't even get close.

Also, the code is a right mess :/ Can't we try harder to not let this
shadow stack stuff escape arch code.
Yu, Yu-cheng July 11, 2018, 5:05 p.m. UTC | #3
On Tue, 2018-07-10 at 16:37 -0700, Dave Hansen wrote:
> On 07/10/2018 03:26 PM, Yu-cheng Yu wrote:
> > 
> > There are three possible shadow stack PTE settings:
> > 
> >   Normal SHSTK PTE: (R/O + DIRTY_HW)
> >   SHSTK PTE COW'ed: (R/O + DIRTY_HW)
> >   SHSTK PTE shared as R/O data: (R/O + DIRTY_SW)
> > 
> > Update can_follow_write_pte/pmd for the shadow stack.
> First of all, thanks for the excellent patch headers.  It's nice to
> have
> that reference every time even though it's repeated.
> 
> > 
> > -static inline bool can_follow_write_pte(pte_t pte, unsigned int
> > flags)
> > +static inline bool can_follow_write_pte(pte_t pte, unsigned int
> > flags,
> > +					bool shstk)
> >  {
> > +	bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte);
> > +
> >  	return pte_write(pte) ||
> > -		((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
> > pte_dirty(pte));
> > +		((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
> > pte_cowed);
> >  }
> Can we just pass the VMA in here?  This use is OK-ish, but I
> generally
> detest true/false function arguments because you can't tell what they
> are when they show up without a named variable.
> 
> But...  Why does this even matter?  Your own example showed that all
> shadowstack PTEs have either DIRTY_HW or DIRTY_SW set, and
> pte_dirty()
> checks both.
> 
> That makes this check seem a bit superfluous.

My understanding is that we don't want to follow write pte if the page
is shared as read-only.  For a SHSTK page, that is (R/O + DIRTY_SW),
which means the SHSTK page has not been COW'ed.  Is that right?

Thanks,
Yu-cheng
Dave Hansen July 13, 2018, 6:26 p.m. UTC | #4
On 07/11/2018 10:05 AM, Yu-cheng Yu wrote:
> My understanding is that we don't want to follow write pte if the page
> is shared as read-only.  For a SHSTK page, that is (R/O + DIRTY_SW),
> which means the SHSTK page has not been COW'ed.  Is that right?

Let's look at the code again:

> -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags,
> +					bool shstk)
>  {
> +	bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte);
> +
>  	return pte_write(pte) ||
> -		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> +		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed);
>  }

This is another case where the naming of pte_*() is biting us vs. the
perversion of the PTE bits.  The lack of comments and explanation inthe
patch is compounding the confusion.

We need to find a way to differentiate "someone can write to this PTE"
from "the write bit is set in this PTE".

In this particular hunk, we need to make it clear that pte_write() is
*never* true for shadowstack PTEs.  In other words, shadow stack VMAs
will (should?) never even *see* a pte_write() PTE.

I think this is a case where you just need to bite the bullet and
bifurcate can_follow_write_pte().  Just separate the shadowstack and
non-shadowstack parts.
Yu, Yu-cheng July 17, 2018, 11 p.m. UTC | #5
On Wed, 2018-07-11 at 11:29 +0200, Peter Zijlstra wrote:
> On Tue, Jul 10, 2018 at 03:26:28PM -0700, Yu-cheng Yu wrote:
> > 
> > There are three possible shadow stack PTE settings:
> > 
> >   Normal SHSTK PTE: (R/O + DIRTY_HW)
> >   SHSTK PTE COW'ed: (R/O + DIRTY_HW)
> >   SHSTK PTE shared as R/O data: (R/O + DIRTY_SW)
> I count _2_ distinct states there.
> 
> > 
> > Update can_follow_write_pte/pmd for the shadow stack.
> So the below disallows can_follow_write when shstk && _PAGE_DIRTY_SW,
> but this here Changelog doesn't explain why. Doesn't even get close.

Can we add the following to the log:

When a SHSTK PTE is shared, it is (R/O + DIRTY_SW); otherwise it is
(R/O + DIRTY_HW).

When we (FOLL_WRITE | FOLL_FORCE) on a SHSTK PTE, the following
must be true:

  - It has been COW'ed at least once (FOLL_COW is set);
  - It still is not shared, i.e. PTE is (R/O + DIRTY_HW);

> 
> Also, the code is a right mess :/ Can't we try harder to not let this
> shadow stack stuff escape arch code.

We either check here if the VMA is SHSTK mapping or move the logic
to pte_dirty().  The latter would be less obvious.  Or can we
create a can_follow_write_shstk_pte()?

Yu-cheng
diff mbox

Patch

diff --git a/mm/gup.c b/mm/gup.c
index b70d7ba7cc13..00171ee847af 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -64,10 +64,13 @@  static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
  * FOLL_FORCE can write to even unwritable pte's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
+static inline bool can_follow_write_pte(pte_t pte, unsigned int flags,
+					bool shstk)
 {
+	bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte);
+
 	return pte_write(pte) ||
-		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed);
 }
 
 static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -78,7 +81,9 @@  static struct page *follow_page_pte(struct vm_area_struct *vma,
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t *ptep, pte;
+	bool shstk;
 
+	shstk = is_shstk_mapping(vma->vm_flags);
 retry:
 	if (unlikely(pmd_bad(*pmd)))
 		return no_page_table(vma, flags);
@@ -105,7 +110,7 @@  static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 	if ((flags & FOLL_NUMA) && pte_protnone(pte))
 		goto no_page;
-	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
+	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, shstk)) {
 		pte_unmap_unlock(ptep, ptl);
 		return NULL;
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7f3e11d3b64a..db4c689a960a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1389,10 +1389,13 @@  int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
  * FOLL_FORCE can write to even unwritable pmd's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags,
+					bool shstk)
 {
+	bool pmd_cowed = shstk ? is_shstk_pmd(pmd):pmd_dirty(pmd);
+
 	return pmd_write(pmd) ||
-	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_cowed);
 }
 
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
@@ -1402,10 +1405,11 @@  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page = NULL;
+	bool shstk = is_shstk_mapping(vma->vm_flags);
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
-	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
+	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, shstk))
 		goto out;
 
 	/* Avoid dumping huge zero page */