Message ID | 20180830143904.3168-13-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Control Flow Enforcement: Shadow Stack | expand |
On Thu, Aug 30, 2018 at 4:43 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE > setting is reserved only for the Shadow Stack. To track dirty of > non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW. > > Update ptep_set_wrprotect() and pmdp_set_wrprotect(). > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > --- > arch/x86/include/asm/pgtable.h | 42 ++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index 4d50de77ea96..556ef258eeff 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > static inline void ptep_set_wrprotect(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > + pte_t pte; > + > clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > + pte = *ptep; > + > + /* > + * Some processors can start a write, but ending up seeing > + * a read-only PTE by the time they get to the Dirty bit. > + * In this case, they will set the Dirty bit, leaving a > + * read-only, Dirty PTE which looks like a Shadow Stack PTE. > + * > + * However, this behavior has been improved and will not occur > + * on processors supporting Shadow Stacks. Without this > + * guarantee, a transition to a non-present PTE and flush the > + * TLB would be needed. > + * > + * When change a writable PTE to read-only and if the PTE has > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so > + * that the PTE is not a valid Shadow Stack PTE. > + */ > + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); > + set_pte_at(mm, addr, ptep, pte); > } I don't understand why it's okay that you first atomically clear the RW bit, then atomically switch from DIRTY_HW to DIRTY_SW. Doesn't that mean that between the two atomic writes, another core can incorrectly see a shadow stack?
On Thu, 2018-08-30 at 17:49 +0200, Jann Horn wrote: > On Thu, Aug 30, 2018 at 4:43 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > wrote: > > > > > > When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE > > setting is reserved only for the Shadow Stack. To track dirty of > > non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW. > > > > Update ptep_set_wrprotect() and pmdp_set_wrprotect(). > > > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > > --- > > arch/x86/include/asm/pgtable.h | 42 > > ++++++++++++++++++++++++++++++++++ > > 1 file changed, 42 insertions(+) > > > > diff --git a/arch/x86/include/asm/pgtable.h > > b/arch/x86/include/asm/pgtable.h > > index 4d50de77ea96..556ef258eeff 100644 > > --- a/arch/x86/include/asm/pgtable.h > > +++ b/arch/x86/include/asm/pgtable.h > > @@ -1203,7 +1203,28 @@ static inline pte_t > > ptep_get_and_clear_full(struct mm_struct *mm, > > static inline void ptep_set_wrprotect(struct mm_struct *mm, > > unsigned long addr, pte_t > > *ptep) > > { > > + pte_t pte; > > + > > clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > > + pte = *ptep; > > + > > + /* > > + * Some processors can start a write, but ending up seeing > > + * a read-only PTE by the time they get to the Dirty bit. > > + * In this case, they will set the Dirty bit, leaving a > > + * read-only, Dirty PTE which looks like a Shadow Stack > > PTE. > > + * > > + * However, this behavior has been improved and will not > > occur > > + * on processors supporting Shadow Stacks. Without this > > + * guarantee, a transition to a non-present PTE and flush > > the > > + * TLB would be needed. > > + * > > + * When change a writable PTE to read-only and if the PTE > > has > > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW > > so > > + * that the PTE is not a valid Shadow Stack PTE. > > + */ > > + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); > > + set_pte_at(mm, addr, ptep, pte); > > } > I don't understand why it's okay that you first atomically clear the > RW bit, then atomically switch from DIRTY_HW to DIRTY_SW. Doesn't > that > mean that between the two atomic writes, another core can > incorrectly > see a shadow stack? Yes, we had that concern earlier and checked. On processors supporting Shadow Stacks, that will not happen. Yu-cheng
On 08/30/2018 08:49 AM, Jann Horn wrote: >> @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, >> static inline void ptep_set_wrprotect(struct mm_struct *mm, >> unsigned long addr, pte_t *ptep) >> { >> + pte_t pte; >> + >> clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); >> + pte = *ptep; >> + >> + /* >> + * Some processors can start a write, but ending up seeing >> + * a read-only PTE by the time they get to the Dirty bit. >> + * In this case, they will set the Dirty bit, leaving a >> + * read-only, Dirty PTE which looks like a Shadow Stack PTE. >> + * >> + * However, this behavior has been improved and will not occur >> + * on processors supporting Shadow Stacks. Without this >> + * guarantee, a transition to a non-present PTE and flush the >> + * TLB would be needed. >> + * >> + * When change a writable PTE to read-only and if the PTE has >> + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so >> + * that the PTE is not a valid Shadow Stack PTE. >> + */ >> + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); >> + set_pte_at(mm, addr, ptep, pte); >> } > I don't understand why it's okay that you first atomically clear the > RW bit, then atomically switch from DIRTY_HW to DIRTY_SW. Doesn't that > mean that between the two atomic writes, another core can incorrectly > see a shadow stack? Good point. This could result in a spurious shadow-stack fault, or allow a shadow-stack write to the page in the transient state. But, the shadow-stack permissions are more restrictive than what could be in the TLB at this point, so I don't think there's a real security implication here. The only trouble is handling the spurious shadow-stack fault. The alternative is to go !Present for a bit, which we would probably just handle fine in the existing page fault code.
On Thu, Aug 30, 2018 at 6:09 PM Dave Hansen <dave.hansen@linux.intel.com> wrote: > > On 08/30/2018 08:49 AM, Jann Horn wrote: > >> @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > >> static inline void ptep_set_wrprotect(struct mm_struct *mm, > >> unsigned long addr, pte_t *ptep) > >> { > >> + pte_t pte; > >> + > >> clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > >> + pte = *ptep; > >> + > >> + /* > >> + * Some processors can start a write, but ending up seeing > >> + * a read-only PTE by the time they get to the Dirty bit. > >> + * In this case, they will set the Dirty bit, leaving a > >> + * read-only, Dirty PTE which looks like a Shadow Stack PTE. > >> + * > >> + * However, this behavior has been improved and will not occur > >> + * on processors supporting Shadow Stacks. Without this > >> + * guarantee, a transition to a non-present PTE and flush the > >> + * TLB would be needed. > >> + * > >> + * When change a writable PTE to read-only and if the PTE has > >> + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so > >> + * that the PTE is not a valid Shadow Stack PTE. > >> + */ > >> + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); > >> + set_pte_at(mm, addr, ptep, pte); > >> } > > I don't understand why it's okay that you first atomically clear the > > RW bit, then atomically switch from DIRTY_HW to DIRTY_SW. Doesn't that > > mean that between the two atomic writes, another core can incorrectly > > see a shadow stack? > > Good point. > > This could result in a spurious shadow-stack fault, or allow a > shadow-stack write to the page in the transient state. > > But, the shadow-stack permissions are more restrictive than what could > be in the TLB at this point, so I don't think there's a real security > implication here. How about this: Three threads (A, B, C) run with the same CR3. 1. a dirty+writable PTE is placed directly in front of B's shadow stack. (this can happen, right? or is there a guard page?) 2. C's TLB caches the dirty+writable PTE. 3. A performs some syscall that triggers ptep_set_wrprotect(). 4. A's syscall calls clear_bit(). 5. B's TLB caches the transient shadow stack. [now C has write access to B's transiently-extended shadow stack] 6. B recurses into the transiently-extended shadow stack 7. C overwrites the transiently-extended shadow stack area. 8. B returns through the transiently-extended shadow stack, giving the attacker instruction pointer control in B. 9. A's syscall broadcasts a TLB flush. Sure, it's not exactly an easy race and probably requires at least some black timing magic to exploit, if it's exploitable at all - but still. This seems suboptimal. > The only trouble is handling the spurious shadow-stack fault. The > alternative is to go !Present for a bit, which we would probably just > handle fine in the existing page fault code.
On 08/30/2018 09:23 AM, Jann Horn wrote: > Three threads (A, B, C) run with the same CR3. > > 1. a dirty+writable PTE is placed directly in front of B's shadow stack. > (this can happen, right? or is there a guard page?) > 2. C's TLB caches the dirty+writable PTE. > 3. A performs some syscall that triggers ptep_set_wrprotect(). > 4. A's syscall calls clear_bit(). > 5. B's TLB caches the transient shadow stack. > [now C has write access to B's transiently-extended shadow stack] > 6. B recurses into the transiently-extended shadow stack > 7. C overwrites the transiently-extended shadow stack area. > 8. B returns through the transiently-extended shadow stack, giving > the attacker instruction pointer control in B. > 9. A's syscall broadcasts a TLB flush. Heh, that's a good point. The shadow stack permissions are *not* strictly reduced because a page getting marked as shadow-stack has *increased* permissions when being used as a shadow stack. Fun. For general hardening, it seems like we want to ensure that there's a guard page at the bottom of the shadow stack. Yu-cheng, do we have a guard page? But, to keep B's TLB from picking up the entry, I think we can just make it !Present for a moment. No TLB can cache it, and I believe the same "don't set Dirty on a !Writable entry" logic also holds for !Present (modulo a weird erratum or two). If we do that, we just need to make sure that the fault handler knows it can get spurious faults from it, and might even run into the !Present PTE for a moment. It might be a bit confusing because it won't be a PROT_NONE, migration, or swap PTE, but will be !Present. We'll also have to make sure that we're doing this in a way which is friendly to the L1TF PTE handling.
On Thu, 2018-08-30 at 10:19 -0700, Dave Hansen wrote: > On 08/30/2018 09:23 AM, Jann Horn wrote: > > > > Three threads (A, B, C) run with the same CR3. > > > > 1. a dirty+writable PTE is placed directly in front of B's shadow > > stack. > > (this can happen, right? or is there a guard page?) > > 2. C's TLB caches the dirty+writable PTE. > > 3. A performs some syscall that triggers ptep_set_wrprotect(). > > 4. A's syscall calls clear_bit(). > > 5. B's TLB caches the transient shadow stack. > > [now C has write access to B's transiently-extended shadow stack] > > 6. B recurses into the transiently-extended shadow stack > > 7. C overwrites the transiently-extended shadow stack area. > > 8. B returns through the transiently-extended shadow stack, giving > > the attacker instruction pointer control in B. > > 9. A's syscall broadcasts a TLB flush. > Heh, that's a good point. The shadow stack permissions are *not* > strictly reduced because a page getting marked as shadow-stack has > *increased* permissions when being used as a shadow stack. Fun. > > For general hardening, it seems like we want to ensure that there's > a > guard page at the bottom of the shadow stack. Yu-cheng, do we have > a > guard page? We don't have the guard page now, but there is a shadow stack token there, which cannot be used as a return address. Yu-cheng
On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: > We don't have the guard page now, but there is a shadow stack token > there, which cannot be used as a return address. The overall concern is that we could overflow into a page that we did not intend. Either another actual shadow stack or something that a page that the attacker constructed, like the transient scenario Jann described.
> On Aug 30, 2018, at 10:19 AM, Dave Hansen <dave.hansen@linux.intel.com> wrote: > >> On 08/30/2018 09:23 AM, Jann Horn wrote: >> Three threads (A, B, C) run with the same CR3. >> >> 1. a dirty+writable PTE is placed directly in front of B's shadow stack. >> (this can happen, right? or is there a guard page?) >> 2. C's TLB caches the dirty+writable PTE. >> 3. A performs some syscall that triggers ptep_set_wrprotect(). >> 4. A's syscall calls clear_bit(). >> 5. B's TLB caches the transient shadow stack. >> [now C has write access to B's transiently-extended shadow stack] >> 6. B recurses into the transiently-extended shadow stack >> 7. C overwrites the transiently-extended shadow stack area. >> 8. B returns through the transiently-extended shadow stack, giving >> the attacker instruction pointer control in B. >> 9. A's syscall broadcasts a TLB flush. > > Heh, that's a good point. The shadow stack permissions are *not* > strictly reduced because a page getting marked as shadow-stack has > *increased* permissions when being used as a shadow stack. Fun. > > For general hardening, it seems like we want to ensure that there's a > guard page at the bottom of the shadow stack. Yu-cheng, do we have a > guard page? > > But, to keep B's TLB from picking up the entry, I think we can just make > it !Present for a moment. No TLB can cache it, and I believe the same > "don't set Dirty on a !Writable entry" logic also holds for !Present > (modulo a weird erratum or two). Can we get documentation? Pretty please?
On Thu, 2018-08-30 at 10:33 -0700, Dave Hansen wrote: > On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: > > > > We don't have the guard page now, but there is a shadow stack > > token > > there, which cannot be used as a return address. > The overall concern is that we could overflow into a page that we > did > not intend. Either another actual shadow stack or something that a > page > that the attacker constructed, like the transient scenario Jann > described. > A task could go beyond the bottom of its shadow stack by doing either 'ret' or 'incssp'. If it is the 'ret' case, the token prevents it. If it is the 'incssp' case, a guard page cannot prevent it entirely, right? Yu-cheng
On Thu, Aug 30, 2018 at 7:58 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > On Thu, 2018-08-30 at 10:33 -0700, Dave Hansen wrote: > > On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: > > > > > > We don't have the guard page now, but there is a shadow stack > > > token > > > there, which cannot be used as a return address. > > The overall concern is that we could overflow into a page that we > > did > > not intend. Either another actual shadow stack or something that a > > page > > that the attacker constructed, like the transient scenario Jann > > described. > > > > A task could go beyond the bottom of its shadow stack by doing either > 'ret' or 'incssp'. If it is the 'ret' case, the token prevents it. > If it is the 'incssp' case, a guard page cannot prevent it entirely, > right? I mean the other direction, on "call".
On 08/30/2018 10:34 AM, Andy Lutomirski wrote: >> But, to keep B's TLB from picking up the entry, I think we can just make >> it !Present for a moment. No TLB can cache it, and I believe the same >> "don't set Dirty on a !Writable entry" logic also holds for !Present >> (modulo a weird erratum or two). > Can we get documentation? Pretty please? The accessed bit description in the SDM looks pretty good to me today: > Whenever the processor uses a paging-structure entry as part of > linear-address translation, it sets the accessed flag in that entry > (if it is not already set). If it's !Present, it can't used as part of a translation so can't be set. I think that covers the thing I was unsure about. But, Dirty is a bit, er, muddier, but mostly because it only gets set on leaf entries: > Whenever there is a write to a linear address, the processor sets the > dirty flag (if it is not already set) in the paging- structure entry > that identifies the final physical address for the linear address > (either a PTE or a paging-structure entry in which the PS flag is > 1). That little hunk will definitely need to get updated with something like: On processors enumerating support for CET, the processor will on set the dirty flag on paging structure entries in which the W flag is 1.
On 08/30/2018 07:38 AM, Yu-cheng Yu wrote: > When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE > setting is reserved only for the Shadow Stack. To track dirty of > non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW. > > Update ptep_set_wrprotect() and pmdp_set_wrprotect(). > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > --- > arch/x86/include/asm/pgtable.h | 42 ++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index 4d50de77ea96..556ef258eeff 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > static inline void ptep_set_wrprotect(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > + pte_t pte; > + > clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > + pte = *ptep; > + > + /* > + * Some processors can start a write, but ending up seeing but end up seeing > + * a read-only PTE by the time they get to the Dirty bit. > + * In this case, they will set the Dirty bit, leaving a > + * read-only, Dirty PTE which looks like a Shadow Stack PTE. > + * > + * However, this behavior has been improved and will not occur > + * on processors supporting Shadow Stacks. Without this > + * guarantee, a transition to a non-present PTE and flush the > + * TLB would be needed. > + * > + * When change a writable PTE to read-only and if the PTE has changing > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so > + * that the PTE is not a valid Shadow Stack PTE. > + */ > + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); > + set_pte_at(mm, addr, ptep, pte); > } > > #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0) > @@ -1266,7 +1287,28 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, > static inline void pmdp_set_wrprotect(struct mm_struct *mm, > unsigned long addr, pmd_t *pmdp) > { > + pmd_t pmd; > + > clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); > + pmd = *pmdp; > + > + /* > + * Some processors can start a write, but ending up seeing but end up seeing > + * a read-only PTE by the time they get to the Dirty bit. > + * In this case, they will set the Dirty bit, leaving a > + * read-only, Dirty PTE which looks like a Shadow Stack PTE. > + * > + * However, this behavior has been improved and will not occur > + * on processors supporting Shadow Stacks. Without this > + * guarantee, a transition to a non-present PTE and flush the > + * TLB would be needed. > + * > + * When change a writable PTE to read-only and if the PTE has changing > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so > + * that the PTE is not a valid Shadow Stack PTE. > + */ > + pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); > + set_pmd_at(mm, addr, pmdp, pmd); > } > > #define pud_write pud_write >
On Thu, 2018-08-30 at 19:59 +0200, Jann Horn wrote: > On Thu, Aug 30, 2018 at 7:58 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > wrote: > > > > > > On Thu, 2018-08-30 at 10:33 -0700, Dave Hansen wrote: > > > > > > On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: > > > > > > > > > > > > We don't have the guard page now, but there is a shadow stack > > > > token > > > > there, which cannot be used as a return address. > > > The overall concern is that we could overflow into a page that > > > we > > > did > > > not intend. Either another actual shadow stack or something > > > that a > > > page > > > that the attacker constructed, like the transient scenario Jann > > > described. > > > > > A task could go beyond the bottom of its shadow stack by doing > > either > > 'ret' or 'incssp'. If it is the 'ret' case, the token prevents > > it. > > If it is the 'incssp' case, a guard page cannot prevent it > > entirely, > > right? > I mean the other direction, on "call". In the flow you described, if C writes to the overflow page before B gets in with a 'call', the return address is still correct for B. To make an attack, C needs to write again before the TLB flush. I agree that is possible. Assume we have a guard page, can someone in the short window do recursive calls in B, move ssp to the end of the guard page, and trigger the same again? He can simply take the incssp route.
On Thu, 2018-08-30 at 12:59 -0700, Randy Dunlap wrote: > On 08/30/2018 07:38 AM, Yu-cheng Yu wrote: > > > > When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE > > setting is reserved only for the Shadow Stack. To track dirty of > > non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW. > > > > Update ptep_set_wrprotect() and pmdp_set_wrprotect(). > > > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > > --- > > arch/x86/include/asm/pgtable.h | 42 > > ++++++++++++++++++++++++++++++++++ > > 1 file changed, 42 insertions(+) > > > > diff --git a/arch/x86/include/asm/pgtable.h > > b/arch/x86/include/asm/pgtable.h > > index 4d50de77ea96..556ef258eeff 100644 > > --- a/arch/x86/include/asm/pgtable.h > > +++ b/arch/x86/include/asm/pgtable.h > > @@ -1203,7 +1203,28 @@ static inline pte_t > > ptep_get_and_clear_full(struct mm_struct *mm, > > static inline void ptep_set_wrprotect(struct mm_struct *mm, > > unsigned long addr, pte_t > > *ptep) > > { > > + pte_t pte; > > + > > clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); > > + pte = *ptep; > > + > > + /* > > + * Some processors can start a write, but ending up > > seeing > but end up seeing > > > > > + * a read-only PTE by the time they get to the Dirty bit. > > + * In this case, they will set the Dirty bit, leaving a > > + * read-only, Dirty PTE which looks like a Shadow Stack > > PTE. > > + * > > + * However, this behavior has been improved and will not > > occur > > + * on processors supporting Shadow Stacks. Without this > > + * guarantee, a transition to a non-present PTE and flush > > the > > + * TLB would be needed. > > + * > > + * When change a writable PTE to read-only and if the PTE > > has > changing > > > > > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW > > so > > + * that the PTE is not a valid Shadow Stack PTE. > > + */ > > + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, > > _PAGE_DIRTY_SW); > > + set_pte_at(mm, addr, ptep, pte); > > } > > > > #define flush_tlb_fix_spurious_fault(vma, address) do { } while > > (0) > > @@ -1266,7 +1287,28 @@ static inline pud_t > > pudp_huge_get_and_clear(struct mm_struct *mm, > > static inline void pmdp_set_wrprotect(struct mm_struct *mm, > > unsigned long addr, pmd_t > > *pmdp) > > { > > + pmd_t pmd; > > + > > clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); > > + pmd = *pmdp; > > + > > + /* > > + * Some processors can start a write, but ending up > > seeing > but end up seeing > > > > > + * a read-only PTE by the time they get to the Dirty bit. > > + * In this case, they will set the Dirty bit, leaving a > > + * read-only, Dirty PTE which looks like a Shadow Stack > > PTE. > > + * > > + * However, this behavior has been improved and will not > > occur > > + * on processors supporting Shadow Stacks. Without this > > + * guarantee, a transition to a non-present PTE and flush > > the > > + * TLB would be needed. > > + * > > + * When change a writable PTE to read-only and if the PTE > > has > changing > > > > > + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW > > so > > + * that the PTE is not a valid Shadow Stack PTE. > > + */ > > + pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, > > _PAGE_DIRTY_SW); > > + set_pmd_at(mm, addr, pmdp, pmd); > > } > > > > #define pud_write pud_write > > > Thanks, I will fix it!
On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > On Thu, 2018-08-30 at 19:59 +0200, Jann Horn wrote: > > On Thu, Aug 30, 2018 at 7:58 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > > wrote: > > > > > > > > > On Thu, 2018-08-30 at 10:33 -0700, Dave Hansen wrote: > > > > > > > > On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: > > > > > > > > > > > > > > > We don't have the guard page now, but there is a shadow stack > > > > > token > > > > > there, which cannot be used as a return address. > > > > The overall concern is that we could overflow into a page that > > > > we > > > > did > > > > not intend. Either another actual shadow stack or something > > > > that a > > > > page > > > > that the attacker constructed, like the transient scenario Jann > > > > described. > > > > > > > A task could go beyond the bottom of its shadow stack by doing > > > either > > > 'ret' or 'incssp'. If it is the 'ret' case, the token prevents > > > it. > > > If it is the 'incssp' case, a guard page cannot prevent it > > > entirely, > > > right? > > I mean the other direction, on "call". > > In the flow you described, if C writes to the overflow page before B > gets in with a 'call', the return address is still correct for B. To > make an attack, C needs to write again before the TLB flush. I agree > that is possible. > > Assume we have a guard page, can someone in the short window do > recursive calls in B, move ssp to the end of the guard page, and > trigger the same again? He can simply take the incssp route. I don't understand what you're saying. If the shadow stack is between guard pages, you should never be able to move SSP past that area's guard pages without an appropriate shadow stack token (not even with INCSSP, since that has a maximum range of PAGE_SIZE/2), and therefore, it shouldn't matter whether memory outside that range is incorrectly marked as shadow stack. Am I missing something?
On Thu, 2018-08-30 at 22:44 +0200, Jann Horn wrote: > On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > wrote: ... > > In the flow you described, if C writes to the overflow page before > > B > > gets in with a 'call', the return address is still correct for > > B. To > > make an attack, C needs to write again before the TLB flush. I > > agree > > that is possible. > > > > Assume we have a guard page, can someone in the short window do > > recursive calls in B, move ssp to the end of the guard page, and > > trigger the same again? He can simply take the incssp route. > I don't understand what you're saying. If the shadow stack is > between > guard pages, you should never be able to move SSP past that area's > guard pages without an appropriate shadow stack token (not even with > INCSSP, since that has a maximum range of PAGE_SIZE/2), and > therefore, > it shouldn't matter whether memory outside that range is incorrectly > marked as shadow stack. Am I missing something? INCSSP has a range of 256, but we can do multiple of that. But I realize the key is not to have the transient SHSTK page at all. The guard page is !pte_write() and even we have flaws in ptep_set_wrprotect(), there will not be any transient SHSTK pages. I will add guard pages to both ends. Still thinking how to fix ptep_set_wrprotect(). Yu-cheng
On Thu, Aug 30, 2018 at 10:57 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > On Thu, 2018-08-30 at 22:44 +0200, Jann Horn wrote: > > On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > > wrote: > ... > > > In the flow you described, if C writes to the overflow page before > > > B > > > gets in with a 'call', the return address is still correct for > > > B. To > > > make an attack, C needs to write again before the TLB flush. I > > > agree > > > that is possible. > > > > > > Assume we have a guard page, can someone in the short window do > > > recursive calls in B, move ssp to the end of the guard page, and > > > trigger the same again? He can simply take the incssp route. > > I don't understand what you're saying. If the shadow stack is > > between > > guard pages, you should never be able to move SSP past that area's > > guard pages without an appropriate shadow stack token (not even with > > INCSSP, since that has a maximum range of PAGE_SIZE/2), and > > therefore, > > it shouldn't matter whether memory outside that range is incorrectly > > marked as shadow stack. Am I missing something? > > INCSSP has a range of 256, but we can do multiple of that. > But I realize the key is not to have the transient SHSTK page at all. > The guard page is !pte_write() and even we have flaws in > ptep_set_wrprotect(), there will not be any transient SHSTK pages. I > will add guard pages to both ends. > > Still thinking how to fix ptep_set_wrprotect(). cmpxchg loop? Or is that slow?
On Thu, Aug 30, 2018 at 11:01 PM Jann Horn <jannh@google.com> wrote: > > On Thu, Aug 30, 2018 at 10:57 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > > > On Thu, 2018-08-30 at 22:44 +0200, Jann Horn wrote: > > > On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu <yu-cheng.yu@intel.com> > > > wrote: > > ... > > > > In the flow you described, if C writes to the overflow page before > > > > B > > > > gets in with a 'call', the return address is still correct for > > > > B. To > > > > make an attack, C needs to write again before the TLB flush. I > > > > agree > > > > that is possible. > > > > > > > > Assume we have a guard page, can someone in the short window do > > > > recursive calls in B, move ssp to the end of the guard page, and > > > > trigger the same again? He can simply take the incssp route. > > > I don't understand what you're saying. If the shadow stack is > > > between > > > guard pages, you should never be able to move SSP past that area's > > > guard pages without an appropriate shadow stack token (not even with > > > INCSSP, since that has a maximum range of PAGE_SIZE/2), and > > > therefore, > > > it shouldn't matter whether memory outside that range is incorrectly > > > marked as shadow stack. Am I missing something? > > > > INCSSP has a range of 256, but we can do multiple of that. > > But I realize the key is not to have the transient SHSTK page at all. > > The guard page is !pte_write() and even we have flaws in > > ptep_set_wrprotect(), there will not be any transient SHSTK pages. I > > will add guard pages to both ends. > > > > Still thinking how to fix ptep_set_wrprotect(). > > cmpxchg loop? Or is that slow? Something like this: static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { pte_t pte = READ_ONCE(*ptep), new_pte; /* ... your comment about not needing a TLB shootdown here ... */ do { pte = pte_wrprotect(pte); /* note: relies on _PAGE_DIRTY_HW < _PAGE_DIRTY_SW */ /* dirty direct bit-twiddling; you can probably write this in a nicer way */ pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; pte.pte &= ~_PAGE_DIRTY_HW; pte = cmpxchg(ptep, pte, new_pte); } while (pte != new_pte); } I think this has the advantage of not generating weird spurious pagefaults. It's not compatible with Xen PV, but I'm guessing that this whole feature isn't going to support Xen PV anyway? So you could switch between two implementations of ptep_set_wrprotect using the pvop mechanism or so - one for environments that support shadow stacks, one for all other environments. Or is there some arcane reason why cmpxchg doesn't work here the way I think it should?
> On Aug 30, 2018, at 10:59 AM, Jann Horn <jannh@google.com> wrote: > >> On Thu, Aug 30, 2018 at 7:58 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: >> >>> On Thu, 2018-08-30 at 10:33 -0700, Dave Hansen wrote: >>>> On 08/30/2018 10:26 AM, Yu-cheng Yu wrote: >>>> >>>> We don't have the guard page now, but there is a shadow stack >>>> token >>>> there, which cannot be used as a return address. >>> The overall concern is that we could overflow into a page that we >>> did >>> not intend. Either another actual shadow stack or something that a >>> page >>> that the attacker constructed, like the transient scenario Jann >>> described. >>> >> >> A task could go beyond the bottom of its shadow stack by doing either >> 'ret' or 'incssp'. If it is the 'ret' case, the token prevents it. >> If it is the 'incssp' case, a guard page cannot prevent it entirely, >> right? > > I mean the other direction, on "call". I still think that shadow stacks should work just like mmap and that mmap should learn to add guard pages for all non-MAP_FIXED allocations.
On Thu, Aug 30, 2018 at 11:47:16PM +0200, Jann Horn wrote: > do { > pte = pte_wrprotect(pte); > /* note: relies on _PAGE_DIRTY_HW < _PAGE_DIRTY_SW */ > /* dirty direct bit-twiddling; you can probably write > this in a nicer way */ > pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> > _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; > pte.pte &= ~_PAGE_DIRTY_HW; > pte = cmpxchg(ptep, pte, new_pte); > } while (pte != new_pte); Please use the form: pte_t new_pte, pte = READ_ONCE(*ptep); do { new_pte = /* ... */; } while (!try_cmpxchg(ptep, &pte, new_pte); Also, this will fail to build on i386-PAE, but I suspect this code will be under some CONFIG option specific to x86_64 anyway.
On Fri, 2018-08-31 at 11:53 +0200, Peter Zijlstra wrote: > On Thu, Aug 30, 2018 at 11:47:16PM +0200, Jann Horn wrote: > > > > do { > > pte = pte_wrprotect(pte); > > /* note: relies on _PAGE_DIRTY_HW < _PAGE_DIRTY_SW > > */ > > /* dirty direct bit-twiddling; you can probably > > write > > this in a nicer way */ > > pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> > > _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; > > pte.pte &= ~_PAGE_DIRTY_HW; > > pte = cmpxchg(ptep, pte, new_pte); > > } while (pte != new_pte); > Please use the form: > > pte_t new_pte, pte = READ_ONCE(*ptep); > do { > new_pte = /* ... */; > } while (!try_cmpxchg(ptep, &pte, new_pte); > > Also, this will fail to build on i386-PAE, but I suspect this code > will > be under some CONFIG option specific to x86_64 anyway. Thanks! I will work on it. Yu-cheng
On 08/31/2018 07:33 AM, Yu-cheng Yu wrote: > Please use the form: > > pte_t new_pte, pte = READ_ONCE(*ptep); > do { > new_pte = /* ... */; > } while (!try_cmpxchg(ptep, &pte, new_pte); It's probably also worth doing some testing to see if you can detect the cost of the cmpxchg. It's definitely more than the old code. A loop that does mprotect(PROT_READ) followed by mprotect(PROT_READ|PROT_WRITE) should do it.
On Fri, 2018-08-31 at 07:47 -0700, Dave Hansen wrote: > On 08/31/2018 07:33 AM, Yu-cheng Yu wrote: > > > > Please use the form: > > > > pte_t new_pte, pte = READ_ONCE(*ptep); > > do { > > new_pte = /* ... */; > > } while (!try_cmpxchg(ptep, &pte, new_pte); > It's probably also worth doing some testing to see if you can detect > the > cost of the cmpxchg. It's definitely more than the old code. > > A loop that does mprotect(PROT_READ) followed by > mprotect(PROT_READ|PROT_WRITE) should do it. I created the test, https://github.com/yyu168/cet-smoke-test/blob/quick/quick/mprotect_ben ch.c then realized this won't work. To trigger a race in ptep_set_wrprotect(), we need to fork from one of three pthread siblings. Or do we measure only how much this affects fork? If there is no racing, the effect should be minimal. Yu-cheng
On 08/31/2018 08:48 AM, Yu-cheng Yu wrote: > To trigger a race in ptep_set_wrprotect(), we need to fork from one of > three pthread siblings. > > Or do we measure only how much this affects fork? > If there is no racing, the effect should be minimal. We don't need a race. I think the cmpxchg will be slower, even without a race, than the code that was there before. The cmpxchg is a simple, straightforward solution, but we're putting it in place of a plain memory write, which is suboptimal. But, before I nitpick the performance, I wanted to see if we could even detect a delta.
On Fri, Aug 31, 2018 at 08:58:39AM -0700, Dave Hansen wrote: > On 08/31/2018 08:48 AM, Yu-cheng Yu wrote: > > To trigger a race in ptep_set_wrprotect(), we need to fork from one of > > three pthread siblings. > > > > Or do we measure only how much this affects fork? > > If there is no racing, the effect should be minimal. > > We don't need a race. > > I think the cmpxchg will be slower, even without a race, than the code > that was there before. The cmpxchg is a simple, straightforward > solution, but we're putting it in place of a plain memory write, which > is suboptimal. Note quite, the clear_bit() is LOCK prefixed.
On 08/30/2018 07:38 AM, Yu-cheng Yu wrote: > + * Some processors can start a write, but ending up seeing > + * a read-only PTE by the time they get to the Dirty bit. > + * In this case, they will set the Dirty bit, leaving a > + * read-only, Dirty PTE which looks like a Shadow Stack PTE. > + * > + * However, this behavior has been improved and will not occur > + * on processors supporting Shadow Stacks. Without this > + * guarantee, a transition to a non-present PTE and flush the > + * TLB would be needed. Did we publicly document this behavior anywhere? I can't seem to find it.
On Thu, Aug 30, 2018 at 11:55 AM, Dave Hansen <dave.hansen@linux.intel.com> wrote: > On 08/30/2018 10:34 AM, Andy Lutomirski wrote: >>> But, to keep B's TLB from picking up the entry, I think we can just make >>> it !Present for a moment. No TLB can cache it, and I believe the same >>> "don't set Dirty on a !Writable entry" logic also holds for !Present >>> (modulo a weird erratum or two). >> Can we get documentation? Pretty please? > > The accessed bit description in the SDM looks pretty good to me today: > >> Whenever the processor uses a paging-structure entry as part of >> linear-address translation, it sets the accessed flag in that entry >> (if it is not already set). > If it's !Present, it can't used as part of a translation so can't be > set. I think that covers the thing I was unsure about. > > But, Dirty is a bit, er, muddier, but mostly because it only gets set on > leaf entries: > >> Whenever there is a write to a linear address, the processor sets the >> dirty flag (if it is not already set) in the paging- structure entry >> that identifies the final physical address for the linear address >> (either a PTE or a paging-structure entry in which the PS flag is >> 1). > > That little hunk will definitely need to get updated with something like: > > On processors enumerating support for CET, the processor will on > set the dirty flag on paging structure entries in which the W > flag is 1. Can we get something much stronger, perhaps? Like this: On processors enumerating support for CET, the processor will write to the accessed and/or dirty flags atomically, as if using the LOCK CMPXCHG instruction. The memory access, any cached entries in any paging-structure caches, and the values in the paging-structure entry before and after writing the A and/or D bits will all be consistent. I'm sure this could be worded better. The point is that the CPU should, atomically, load the PTE, check if it allows the access, set A and/or D appropriately, write the new value to the TLB, and use that value for the access. This is clearly a little bit slower than what old CPUs could do when writing to an already-in-TLB writable non-dirty entry, but new CPUs are going to have to atomically check the W bit. (I assume that even old CPUs will *atomically* set the D bit as if by LOCK BTS, but this is all very vague in the SDM IIRC.)
On 08/31/2018 10:46 AM, Andy Lutomirski wrote: > On Thu, Aug 30, 2018 at 11:55 AM, Dave Hansen >> That little hunk will definitely need to get updated with something like: >> >> On processors enumerating support for CET, the processor will on >> set the dirty flag on paging structure entries in which the W >> flag is 1. > > Can we get something much stronger, perhaps? Like this: > > On processors enumerating support for CET, the processor will write to > the accessed and/or dirty flags atomically, as if using the LOCK > CMPXCHG instruction. The memory access, any cached entries in any > paging-structure caches, and the values in the paging-structure entry > before and after writing the A and/or D bits will all be consistent. There's some talk of this already in: 8.1.2.1 Automatic Locking: > When updating page-directory and page-table entries — When updating > page-directory and page-table entries, the processor uses locked > cycles to set the accessed and dirty flag in the page-directory and > page-table entries. As for the A/D consistency, I'll see if I can share that before it hits the SDM for real and see if it's sufficient for everybody.
On Fri, 2018-08-31 at 18:29 +0200, Peter Zijlstra wrote: > On Fri, Aug 31, 2018 at 08:58:39AM -0700, Dave Hansen wrote: > > > > On 08/31/2018 08:48 AM, Yu-cheng Yu wrote: > > > > > > To trigger a race in ptep_set_wrprotect(), we need to fork from one of > > > three pthread siblings. > > > > > > Or do we measure only how much this affects fork? > > > If there is no racing, the effect should be minimal. > > We don't need a race. > > > > I think the cmpxchg will be slower, even without a race, than the code > > that was there before. The cmpxchg is a simple, straightforward > > solution, but we're putting it in place of a plain memory write, which > > is suboptimal. > Note quite, the clear_bit() is LOCK prefixed. With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less efficient than the other. So can we say this is probably fine in terms of efficiency? Yu-cheng --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1203,7 +1203,36 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER + pte_t new_pte, pte = READ_ONCE(*ptep); + + /* + * Some processors can start a write, but end up + * seeing a read-only PTE by the time they get + * to the Dirty bit. In this case, they will + * set the Dirty bit, leaving a read-only, Dirty + * PTE which looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and + * will not occur on processors supporting + * Shadow Stacks. Without this guarantee, a + * transition to a non-present PTE and flush the + * TLB would be needed. + * + * When changing a writable PTE to read-only and + * if the PTE has _PAGE_DIRTY_HW set, we move + * that bit to _PAGE_DIRTY_SW so that the PTE is + * not a valid Shadow Stack PTE. + */ + do { + new_pte = pte_wrprotect(pte); + new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pte.pte &= ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(ptep, &pte, new_pte)); +#else clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); +#endif }
On 09/14/2018 01:39 PM, Yu-cheng Yu wrote: > With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow > stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less > efficient than the other. So can we say this is probably fine in terms of > efficiency? Well, the first fork() will do all the hard work. I don't think subsequent fork()s will be affected. Did you do something to ensure this code was being run? I would guess that a loop like this: for (i = 0; i < 10000; i++) { mprotect(addr, len, PROT_READ); mprotect(addr, len, PROT_READ|PROT_WRITE); } might show it better.
On Fri, 2018-09-14 at 13:46 -0700, Dave Hansen wrote: > On 09/14/2018 01:39 PM, Yu-cheng Yu wrote: > > > > With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow > > stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less > > efficient than the other. So can we say this is probably fine in terms of > > efficiency? > Well, the first fork() will do all the hard work. I don't think > subsequent fork()s will be affected. Are you talking about a recent commit: 1b2de5d0 mm/cow: don't bother write protecting already write-protected pages With that, subsequent fork()s will not do all the hard work. However, I have not done that for shadow stack PTEs (do we want to do that?). I think the additional benefit for shadow stack is small? > > Did you do something to ensure this code was being run? > > I would guess that a loop like this: > > for (i = 0; i < 10000; i++) { > mprotect(addr, len, PROT_READ); > mprotect(addr, len, PROT_READ|PROT_WRITE); > } > > might show it better. Would mprotect() do copy_one_pte()? Otherwise it will not go through ptep_set_wrprotect()?
On 09/14/2018 02:08 PM, Yu-cheng Yu wrote: > On Fri, 2018-09-14 at 13:46 -0700, Dave Hansen wrote: >> On 09/14/2018 01:39 PM, Yu-cheng Yu wrote: >>> >>> With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow >>> stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less >>> efficient than the other. So can we say this is probably fine in terms of >>> efficiency? BTW, I wasn't particularly concerned about shadow stacks. Plain old memory is affected by this change too. Right? >> Well, the first fork() will do all the hard work. I don't think >> subsequent fork()s will be affected. > > Are you talking about a recent commit: > > 1b2de5d0 mm/cow: don't bother write protecting already write-protected pages > > With that, subsequent fork()s will not do all the hard work. > However, I have not done that for shadow stack PTEs (do we want to do that?). > I think the additional benefit for shadow stack is small? You're right. mprotect() doesn't use this path. But, that reminds me, can you take a quick look at change_pte_range() and double-check that it's not affected by this issue?
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 4d50de77ea96..556ef258eeff 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + pte_t pte; + clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); + pte = *ptep; + + /* + * Some processors can start a write, but ending up seeing + * a read-only PTE by the time they get to the Dirty bit. + * In this case, they will set the Dirty bit, leaving a + * read-only, Dirty PTE which looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and will not occur + * on processors supporting Shadow Stacks. Without this + * guarantee, a transition to a non-present PTE and flush the + * TLB would be needed. + * + * When change a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so + * that the PTE is not a valid Shadow Stack PTE. + */ + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); + set_pte_at(mm, addr, ptep, pte); } #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0) @@ -1266,7 +1287,28 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + pmd_t pmd; + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); + pmd = *pmdp; + + /* + * Some processors can start a write, but ending up seeing + * a read-only PTE by the time they get to the Dirty bit. + * In this case, they will set the Dirty bit, leaving a + * read-only, Dirty PTE which looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and will not occur + * on processors supporting Shadow Stacks. Without this + * guarantee, a transition to a non-present PTE and flush the + * TLB would be needed. + * + * When change a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so + * that the PTE is not a valid Shadow Stack PTE. + */ + pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW); + set_pmd_at(mm, addr, pmdp, pmd); } #define pud_write pud_write
When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE setting is reserved only for the Shadow Stack. To track dirty of non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW. Update ptep_set_wrprotect() and pmdp_set_wrprotect(). Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/pgtable.h | 42 ++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)