Message ID | 20220929222936.14584-13-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadowstacks for userspace | expand |
On Thu, Sep 29, 2022 at 03:29:09PM -0700, Rick Edgecombe wrote: > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index 2f2963429f48..58c7bf9d7392 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -1287,6 +1287,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > static inline void ptep_set_wrprotect(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > +#ifdef CONFIG_X86_SHADOW_STACK > + /* > + * Avoid accidentally creating shadow stack PTEs > + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with > + * the hardware setting Dirty=1. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + pte_t old_pte, new_pte; > + > + old_pte = READ_ONCE(*ptep); > + do { > + new_pte = pte_wrprotect(old_pte); > + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); > + > + return; > + } > +#endif > clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > } Okay, this addresses my previous question. The need in cmpxchg is unfortunate, but well.
On Sep 29, 2022, at 3:29 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > When Shadow Stack is in use, Write=0,Dirty=1 PTE are reserved for shadow > stack. Copy-on-write PTes then have Write=0,Cow=1. > > When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could > become a transient shadow stack PTE in two cases: > > The first case is that some processors can start a write but end up seeing > a Write=0 PTE by the time they get to the Dirty bit, creating a transient > shadow stack PTE. However, this will not occur on processors supporting > Shadow Stack, and a TLB flush is not necessary. > > The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- > atomically, a transient shadow stack PTE can be created as a result. > Thus, prevent that with cmpxchg. > > Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many > insights to the issue. Jann Horn provided the cmpxchg solution. > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > --- > > v2: > - Compile out some code due to clang build error > - Clarify commit log (dhansen) > - Normalize PTE bit descriptions between patches (dhansen) > - Update comment with text from (dhansen) > > Yu-cheng v30: > - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). > > arch/x86/include/asm/pgtable.h | 36 ++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index 2f2963429f48..58c7bf9d7392 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -1287,6 +1287,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > static inline void ptep_set_wrprotect(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > +#ifdef CONFIG_X86_SHADOW_STACK > + /* > + * Avoid accidentally creating shadow stack PTEs > + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with > + * the hardware setting Dirty=1. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + pte_t old_pte, new_pte; > + > + old_pte = READ_ONCE(*ptep); > + do { > + new_pte = pte_wrprotect(old_pte); > + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); > + > + return; > + } > +#endif There is no way of using IS_ENABLED() here instead of these ifdefs? Did you have a look at ptep_set_access_flags() and friends and checked they do not need to be changed too? Perhaps you should at least add some assertion just to ensure nothing breaks.
On 10/3/22 11:11, Nadav Amit wrote: >> +#ifdef CONFIG_X86_SHADOW_STACK >> + /* >> + * Avoid accidentally creating shadow stack PTEs >> + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with >> + * the hardware setting Dirty=1. >> + */ >> + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { >> + pte_t old_pte, new_pte; >> + >> + old_pte = READ_ONCE(*ptep); >> + do { >> + new_pte = pte_wrprotect(old_pte); >> + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); >> + >> + return; >> + } >> +#endif > There is no way of using IS_ENABLED() here instead of these ifdefs? Actually, both the existing #ifdef and an IS_ENABLED() check would be is superfluous as-is. Adding X86_FEATURE_SHSTK disabled-features.h gives cpu_feature_enabled() compile-time optimizations for free. No need for *any* additional CONFIG_* checks. The only issue would be if the #ifdef'd code won't even compile with X86_FEATURE_SHSTK disabled.
On Mon, 2022-10-03 at 11:11 -0700, Nadav Amit wrote: > Did you have a look at ptep_set_access_flags() and friends and > checked they > do not need to be changed too? ptep_set_access_flags() doesn't actually set any additional dirty bits on x86, so I think it's ok. > Perhaps you should at least add some > assertion just to ensure nothing breaks. You mean in ptep_set_access_flags()? I think some assertions would be really great, I'm just not sure where.
On Oct 3, 2022, at 3:28 PM, Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > On Mon, 2022-10-03 at 11:11 -0700, Nadav Amit wrote: >> Did you have a look at ptep_set_access_flags() and friends and >> checked they >> do not need to be changed too? > > ptep_set_access_flags() doesn't actually set any additional dirty bits > on x86, so I think it's ok. Are you sure about that? (lost my confidence today so I am hesitant). Looking on insert_pfn(), I see: entry = maybe_mkwrite(pte_mkdirty(entry), vma); if (ptep_set_access_flags(vma, addr, pte, entry, 1)) ... This appears to set the dirty bit while potentially leaving the write-bit clear. This is the scenario you want to avoid, no? >> Perhaps you should at least add some >> assertion just to ensure nothing breaks. > > You mean in ptep_set_access_flags()? I think some assertions would be > really great, I'm just not sure where. Yes, on x86’s version of the function.
On Oct 3, 2022, at 4:17 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > On Oct 3, 2022, at 3:28 PM, Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > >> On Mon, 2022-10-03 at 11:11 -0700, Nadav Amit wrote: >>> Did you have a look at ptep_set_access_flags() and friends and >>> checked they >>> do not need to be changed too? >> >> ptep_set_access_flags() doesn't actually set any additional dirty bits >> on x86, so I think it's ok. > > Are you sure about that? (lost my confidence today so I am hesitant). > > Looking on insert_pfn(), I see: > > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > if (ptep_set_access_flags(vma, addr, pte, entry, 1)) ... > > This appears to set the dirty bit while potentially leaving the write-bit > clear. This is the scenario you want to avoid, no? No. I am not paying attention. Ignore.
On Oct 3, 2022, at 4:20 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > On Oct 3, 2022, at 4:17 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > >> On Oct 3, 2022, at 3:28 PM, Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: >> >>> On Mon, 2022-10-03 at 11:11 -0700, Nadav Amit wrote: >>>> Did you have a look at ptep_set_access_flags() and friends and >>>> checked they >>>> do not need to be changed too? >>> >>> ptep_set_access_flags() doesn't actually set any additional dirty bits >>> on x86, so I think it's ok. >> >> Are you sure about that? (lost my confidence today so I am hesitant). >> >> Looking on insert_pfn(), I see: >> >> entry = maybe_mkwrite(pte_mkdirty(entry), vma); >> if (ptep_set_access_flags(vma, addr, pte, entry, 1)) ... >> >> This appears to set the dirty bit while potentially leaving the write-bit >> clear. This is the scenario you want to avoid, no? > > No. I am not paying attention. Ignore. Sorry for the spam. Just this “dirty” argument is confusing. This indeed seems like a flow that can set the dirty bit. I think.
On Mon, 2022-10-03 at 16:25 -0700, Nadav Amit wrote: > On Oct 3, 2022, at 4:20 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > > On Oct 3, 2022, at 4:17 PM, Nadav Amit <nadav.amit@gmail.com> > > wrote: > > > > > On Oct 3, 2022, at 3:28 PM, Edgecombe, Rick P < > > > rick.p.edgecombe@intel.com> wrote: > > > > > > > On Mon, 2022-10-03 at 11:11 -0700, Nadav Amit wrote: > > > > > Did you have a look at ptep_set_access_flags() and friends > > > > > and > > > > > checked they > > > > > do not need to be changed too? > > > > > > > > ptep_set_access_flags() doesn't actually set any additional > > > > dirty bits > > > > on x86, so I think it's ok. > > > > > > Are you sure about that? (lost my confidence today so I am > > > hesitant). > > > > > > Looking on insert_pfn(), I see: > > > > > > entry = maybe_mkwrite(pte_mkdirty(entry), > > > vma); > > > if (ptep_set_access_flags(vma, addr, pte, > > > entry, 1)) ... > > > > > > This appears to set the dirty bit while potentially leaving the > > > write-bit > > > clear. This is the scenario you want to avoid, no? > > > > No. I am not paying attention. Ignore. > > Sorry for the spam. Just this “dirty” argument is confusing. This > indeed > seems like a flow that can set the dirty bit. I think. I think the HW dirty bit will not be set here. How it works is, pte_mkdirty() will not actually set the HW dirty bit, but instead the software COW bit. Here is the relevant snippet: static inline pte_t pte_mkdirty(pte_t pte) { pteval_t dirty = _PAGE_DIRTY; /* Avoid creating Dirty=1,Write=0 PTEs */ if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) dirty = _PAGE_COW; return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); } So for a !VM_WRITE vma, you end up with Write=0,Cow=1 PTE passed into ptep_set_access_flags(). Does it make sense?
On Oct 3, 2022, at 4:38 PM, Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > I think the HW dirty bit will not be set here. How it works is, > pte_mkdirty() will not actually set the HW dirty bit, but instead the > software COW bit. Here is the relevant snippet: > > static inline pte_t pte_mkdirty(pte_t pte) > { > pteval_t dirty = _PAGE_DIRTY; > > /* Avoid creating Dirty=1,Write=0 PTEs */ > if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) > dirty = _PAGE_COW; > > return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); > } > > So for a !VM_WRITE vma, you end up with Write=0,Cow=1 PTE passed > into ptep_set_access_flags(). Does it make sense? Thanks for your patience with me. I should have read the series in order.
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2f2963429f48..58c7bf9d7392 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1287,6 +1287,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_SHADOW_STACK + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1339,6 +1356,25 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_SHADOW_STACK + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); }