Message ID | 20210210175703.12492-9-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Control-flow Enforcement: Shadow Stack | expand |
On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote: > There is essentially no room left in the x86 hardware PTEs on some OSes > (not Linux). That left the hardware architects looking for a way to > represent a new memory type (shadow stack) within the existing bits. > They chose to repurpose a lightly-used state: Write=0, Dirty=1. > > The reason it's lightly used is that Dirty=1 is normally set by hardware > and cannot normally be set by hardware on a Write=0 PTE. Software must > normally be involved to create one of these PTEs, so software can simply > opt to not create them. > > In places where Linux normally creates Write=0, Dirty=1, it can use the > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other > words, whenever Linux needs to create Write=0, Dirty=1, it instead creates > Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This > clearly separates shadow stack from other data, and results in the > following: > > (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) > (b) A R/O page that has been COW'ed: (Write=0, Cow=1) > The user page is in a R/O VMA, and get_user_pages() needs a writable > copy. The page fault handler creates a copy of the page and sets > the new copy's PTE as Write=0 and Cow=1. > (c) A shadow stack PTE: (Write=0, Dirty=1) > (d) A shared shadow stack PTE: (Write=0, Cow=1) > When a shadow stack page is being shared among processes (this happens > at fork()), its PTE is made Dirty=0, so the next shadow stack access > causes a fault, and the page is duplicated and Dirty=1 is set again. > This is the COW equivalent for shadow stack pages, even though it's > copy-on-access rather than copy-on-write. > (e) A page where the processor observed a Write=1 PTE, started a write, set > Dirty=1, but then observed a Write=0 PTE. That's possible today, but > will not happen on processors that support shadow stack. > > Define _PAGE_COW and update pte_*() helpers and apply the same changes to > pmd and pud. I still find this commit confusing mostly due to _PAGE_COW being 0 without CET enabled. Shouldn't this just get changed universally? Why should this change depend on CET? -Kees > > After this, there are six free bits left in the 64-bit PTE, and no more > free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not > implemented for the 32-bit kernel. > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > --- > arch/x86/include/asm/pgtable.h | 136 ++++++++++++++++++++++++--- > arch/x86/include/asm/pgtable_types.h | 42 ++++++++- > 2 files changed, 165 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index a02c67291cfc..2309fa9d412e 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -121,9 +121,12 @@ extern pmdval_t early_pmd_flags; > * The following only work if pte_present() is true. > * Undefined behaviour if not.. > */ > -static inline int pte_dirty(pte_t pte) > +static inline bool pte_dirty(pte_t pte) > { > - return pte_flags(pte) & _PAGE_DIRTY; > + /* > + * A dirty PTE has Dirty=1 or Cow=1. > + */ > + return pte_flags(pte) & _PAGE_DIRTY_BITS; > } > > > @@ -160,9 +163,12 @@ static inline int pte_young(pte_t pte) > return pte_flags(pte) & _PAGE_ACCESSED; > } > > -static inline int pmd_dirty(pmd_t pmd) > +static inline bool pmd_dirty(pmd_t pmd) > { > - return pmd_flags(pmd) & _PAGE_DIRTY; > + /* > + * A dirty PMD has Dirty=1 or Cow=1. > + */ > + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; > } > > static inline int pmd_young(pmd_t pmd) > @@ -170,9 +176,12 @@ static inline int pmd_young(pmd_t pmd) > return pmd_flags(pmd) & _PAGE_ACCESSED; > } > > -static inline int pud_dirty(pud_t pud) > +static inline bool pud_dirty(pud_t pud) > { > - return pud_flags(pud) & _PAGE_DIRTY; > + /* > + * A dirty PUD has Dirty=1 or Cow=1. > + */ > + return pud_flags(pud) & _PAGE_DIRTY_BITS; > } > > static inline int pud_young(pud_t pud) > @@ -182,7 +191,15 @@ static inline int pud_young(pud_t pud) > > static inline int pte_write(pte_t pte) > { > - return pte_flags(pte) & _PAGE_RW; > + /* > + * When shadow stack is enabled, a directly writable PTE has RW=1. > + * A shadow stack PTE is logically writable and has RW=0, Dirty=1. > + * See comments for _PAGE_COW. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) > + return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY); > + else > + return pte_flags(pte) & _PAGE_RW; > } > > static inline int pte_huge(pte_t pte) > @@ -333,7 +350,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) > > static inline pte_t pte_mkclean(pte_t pte) > { > - return pte_clear_flags(pte, _PAGE_DIRTY); > + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); > } > > static inline pte_t pte_mkold(pte_t pte) > @@ -343,6 +360,18 @@ static inline pte_t pte_mkold(pte_t pte) > > static inline pte_t pte_wrprotect(pte_t pte) > { > + /* > + * Blindly clearing _PAGE_RW might accidentally create > + * a shadow stack PTE (RW=0, Dirty=1). Move the hardware > + * dirty value to the software bit. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pte_flags(pte) & _PAGE_DIRTY) { > + pte = pte_clear_flags(pte, _PAGE_DIRTY); > + pte = pte_set_flags(pte, _PAGE_COW); > + } > + } > + > return pte_clear_flags(pte, _PAGE_RW); > } > > @@ -353,6 +382,18 @@ static inline pte_t pte_mkexec(pte_t pte) > > static inline pte_t pte_mkdirty(pte_t pte) > { > + pteval_t dirty = _PAGE_DIRTY; > + > + /* Avoid creating (HW)Dirty=1, Write=0 PTEs */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) > + dirty = _PAGE_COW; > + > + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); > +} > + > +static inline pte_t pte_mkwrite_shstk(pte_t pte) > +{ > + pte = pte_clear_flags(pte, _PAGE_COW); > return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); > } > > @@ -363,6 +404,13 @@ static inline pte_t pte_mkyoung(pte_t pte) > > static inline pte_t pte_mkwrite(pte_t pte) > { > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pte_flags(pte) & _PAGE_COW) { > + pte = pte_clear_flags(pte, _PAGE_COW); > + pte = pte_set_flags(pte, _PAGE_DIRTY); > + } > + } > + > return pte_set_flags(pte, _PAGE_RW); > } > > @@ -434,16 +482,40 @@ static inline pmd_t pmd_mkold(pmd_t pmd) > > static inline pmd_t pmd_mkclean(pmd_t pmd) > { > - return pmd_clear_flags(pmd, _PAGE_DIRTY); > + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); > } > > static inline pmd_t pmd_wrprotect(pmd_t pmd) > { > + /* > + * Blindly clearing _PAGE_RW might accidentally create > + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware > + * dirty value to the software bit. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pmd_flags(pmd) & _PAGE_DIRTY) { > + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); > + pmd = pmd_set_flags(pmd, _PAGE_COW); > + } > + } > + > return pmd_clear_flags(pmd, _PAGE_RW); > } > > static inline pmd_t pmd_mkdirty(pmd_t pmd) > { > + pmdval_t dirty = _PAGE_DIRTY; > + > + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) > + dirty = _PAGE_COW; > + > + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); > +} > + > +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) > +{ > + pmd = pmd_clear_flags(pmd, _PAGE_COW); > return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); > } > > @@ -464,6 +536,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) > > static inline pmd_t pmd_mkwrite(pmd_t pmd) > { > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pmd_flags(pmd) & _PAGE_COW) { > + pmd = pmd_clear_flags(pmd, _PAGE_COW); > + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); > + } > + } > + > return pmd_set_flags(pmd, _PAGE_RW); > } > > @@ -488,17 +567,35 @@ static inline pud_t pud_mkold(pud_t pud) > > static inline pud_t pud_mkclean(pud_t pud) > { > - return pud_clear_flags(pud, _PAGE_DIRTY); > + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); > } > > static inline pud_t pud_wrprotect(pud_t pud) > { > + /* > + * Blindly clearing _PAGE_RW might accidentally create > + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware > + * dirty value to the software bit. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pud_flags(pud) & _PAGE_DIRTY) { > + pud = pud_clear_flags(pud, _PAGE_DIRTY); > + pud = pud_set_flags(pud, _PAGE_COW); > + } > + } > + > return pud_clear_flags(pud, _PAGE_RW); > } > > static inline pud_t pud_mkdirty(pud_t pud) > { > - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); > + pudval_t dirty = _PAGE_DIRTY; > + > + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) > + dirty = _PAGE_COW; > + > + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); > } > > static inline pud_t pud_mkdevmap(pud_t pud) > @@ -518,6 +615,13 @@ static inline pud_t pud_mkyoung(pud_t pud) > > static inline pud_t pud_mkwrite(pud_t pud) > { > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > + if (pud_flags(pud) & _PAGE_COW) { > + pud = pud_clear_flags(pud, _PAGE_COW); > + pud = pud_set_flags(pud, _PAGE_DIRTY); > + } > + } > + > return pud_set_flags(pud, _PAGE_RW); > } > > @@ -1131,7 +1235,15 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, > #define pmd_write pmd_write > static inline int pmd_write(pmd_t pmd) > { > - return pmd_flags(pmd) & _PAGE_RW; > + /* > + * When shadow stack is enabled, a directly writable PMD has RW=1. > + * A shadow stack PMD is logically writable and has RW=0, Dirty=1. > + * See comments for _PAGE_COW. > + */ > + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) > + return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY); > + else > + return pmd_flags(pmd) & _PAGE_RW; > } > > #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index b8b79d618bbc..437d7ff0ae80 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -23,7 +23,8 @@ > #define _PAGE_BIT_SOFTW2 10 /* " */ > #define _PAGE_BIT_SOFTW3 11 /* " */ > #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ > -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ > +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ > +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ > #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ > #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ > #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ > @@ -36,6 +37,15 @@ > #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 > > +/* > + * Indicates a copy-on-write page. > + */ > +#ifdef CONFIG_X86_CET > +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ > +#else > +#define _PAGE_BIT_COW 0 > +#endif > + > /* If _PAGE_BIT_PRESENT is clear, we use these: */ > /* - if the user mapped it with PROT_NONE; pte_present gives true */ > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL > @@ -117,6 +127,36 @@ > #define _PAGE_DEVMAP (_AT(pteval_t, 0)) > #endif > > +/* > + * The hardware requires shadow stack to be read-only and Dirty. > + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs > + * from shadow stack PTEs: > + * (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) > + * (b) A R/O page that has been COW'ed: (Write=0, Cow=1) > + * The user page is in a R/O VMA, and get_user_pages() needs a > + * writable copy. The page fault handler creates a copy of the page > + * and sets the new copy's PTE as Write=0, Cow=1. > + * (c) A shadow stack PTE: (Write=0, Dirty=1) > + * (d) A shared (copy-on-access) shadow stack PTE: (Write=0, Cow=1) > + * When a shadow stack page is being shared among processes (this > + * happens at fork()), its PTE is cleared of _PAGE_DIRTY, so the next > + * shadow stack access causes a fault, and the page is duplicated and > + * _PAGE_DIRTY is set again. This is the COW equivalent for shadow > + * stack pages, even though it's copy-on-access rather than > + * copy-on-write. > + * (e) A page where the processor observed a Write=1 PTE, started a write, > + * set Dirty=1, but then observed a Write=0 PTE (changed by another > + * thread). That's possible today, but will not happen on processors > + * that support shadow stack. > + */ > +#ifdef CONFIG_X86_CET > +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) > +#else > +#define _PAGE_COW (_AT(pteval_t, 0)) > +#endif > + > +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) > + > #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) > > /* > -- > 2.21.0 >
On 2/10/2021 11:42 AM, Kees Cook wrote: > On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote: >> There is essentially no room left in the x86 hardware PTEs on some OSes >> (not Linux). That left the hardware architects looking for a way to >> represent a new memory type (shadow stack) within the existing bits. >> They chose to repurpose a lightly-used state: Write=0, Dirty=1. >> >> The reason it's lightly used is that Dirty=1 is normally set by hardware >> and cannot normally be set by hardware on a Write=0 PTE. Software must >> normally be involved to create one of these PTEs, so software can simply >> opt to not create them. >> >> In places where Linux normally creates Write=0, Dirty=1, it can use the >> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other >> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates >> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This >> clearly separates shadow stack from other data, and results in the >> following: >> >> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) >> (b) A R/O page that has been COW'ed: (Write=0, Cow=1) >> The user page is in a R/O VMA, and get_user_pages() needs a writable >> copy. The page fault handler creates a copy of the page and sets >> the new copy's PTE as Write=0 and Cow=1. >> (c) A shadow stack PTE: (Write=0, Dirty=1) >> (d) A shared shadow stack PTE: (Write=0, Cow=1) >> When a shadow stack page is being shared among processes (this happens >> at fork()), its PTE is made Dirty=0, so the next shadow stack access >> causes a fault, and the page is duplicated and Dirty=1 is set again. >> This is the COW equivalent for shadow stack pages, even though it's >> copy-on-access rather than copy-on-write. >> (e) A page where the processor observed a Write=1 PTE, started a write, set >> Dirty=1, but then observed a Write=0 PTE. That's possible today, but >> will not happen on processors that support shadow stack. >> >> Define _PAGE_COW and update pte_*() helpers and apply the same changes to >> pmd and pud. > > I still find this commit confusing mostly due to _PAGE_COW being 0 > without CET enabled. Shouldn't this just get changed universally? Why > should this change depend on CET? > For example, in... static inline int pte_write(pte_t pte) { if (cpu_feature_enabled(X86_FEATURE_SHSTK)) return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY); else return pte_flags(pte) & _PAGE_RW; } There are four cases: (a) RW=1, Dirty=1 -> writable (b) RW=1, Dirty=0 -> writable (c) RW=0, Dirty=0 -> not writable (d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK Case (d) is ture only when shadow stack is enabled, otherwise it is not writable. With shadow stack feature, the usual dirty, copy-on-write PTE becomes RW=0, Cow=1. We can get this changed universally, but all usual dirty, copy-on-write PTEs need the Dirty/Cow swapping, always. Is that desirable? -- Yu-cheng [...]
On 2/10/2021 12:28 PM, Yu, Yu-cheng wrote: > On 2/10/2021 11:42 AM, Kees Cook wrote: >> On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote: >>> There is essentially no room left in the x86 hardware PTEs on some OSes >>> (not Linux). That left the hardware architects looking for a way to >>> represent a new memory type (shadow stack) within the existing bits. >>> They chose to repurpose a lightly-used state: Write=0, Dirty=1. >>> >>> The reason it's lightly used is that Dirty=1 is normally set by hardware >>> and cannot normally be set by hardware on a Write=0 PTE. Software must >>> normally be involved to create one of these PTEs, so software can simply >>> opt to not create them. >>> >>> In places where Linux normally creates Write=0, Dirty=1, it can use the >>> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In >>> other >>> words, whenever Linux needs to create Write=0, Dirty=1, it instead >>> creates >>> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. >>> This >>> clearly separates shadow stack from other data, and results in the >>> following: >>> >>> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) >>> (b) A R/O page that has been COW'ed: (Write=0, Cow=1) >>> The user page is in a R/O VMA, and get_user_pages() needs a >>> writable >>> copy. The page fault handler creates a copy of the page and sets >>> the new copy's PTE as Write=0 and Cow=1. >>> (c) A shadow stack PTE: (Write=0, Dirty=1) >>> (d) A shared shadow stack PTE: (Write=0, Cow=1) >>> When a shadow stack page is being shared among processes (this >>> happens >>> at fork()), its PTE is made Dirty=0, so the next shadow stack >>> access >>> causes a fault, and the page is duplicated and Dirty=1 is set >>> again. >>> This is the COW equivalent for shadow stack pages, even though it's >>> copy-on-access rather than copy-on-write. >>> (e) A page where the processor observed a Write=1 PTE, started a >>> write, set >>> Dirty=1, but then observed a Write=0 PTE. That's possible >>> today, but >>> will not happen on processors that support shadow stack. >>> >>> Define _PAGE_COW and update pte_*() helpers and apply the same >>> changes to >>> pmd and pud. >> >> I still find this commit confusing mostly due to _PAGE_COW being 0 >> without CET enabled. Shouldn't this just get changed universally? Why >> should this change depend on CET? >> > > For example, in... > > static inline int pte_write(pte_t pte) > { > if (cpu_feature_enabled(X86_FEATURE_SHSTK)) > return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY); > else > return pte_flags(pte) & _PAGE_RW; > } > > There are four cases: > > (a) RW=1, Dirty=1 -> writable > (b) RW=1, Dirty=0 -> writable > (c) RW=0, Dirty=0 -> not writable > (d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK > > Case (d) is ture only when shadow stack is enabled, otherwise it is not > writable. With shadow stack feature, the usual dirty, copy-on-write PTE > becomes RW=0, Cow=1. > > We can get this changed universally, but all usual dirty, copy-on-write > PTEs need the Dirty/Cow swapping, always. Is that desirable? > Instead of working on _PAGE_COW directly, create three helpers: pte_make_cow(), pte_clear_cow(), pte_shstk(). The decision of swapping _PAGE_DIRTY/_PAGE_COW is now in the helpers. For example, pte_wrprotect() is now: static inline pte_t pte_wrprotect(pte_t pte) { pte = pte_clear_flags(pte, _PAGE_RW); if (pte_dirty(pte)) pte = pte_make_cow(pte); return pte; } How is that? The revise patch is the following. Yu-cheng diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..1a6c0784af0a 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -121,11 +121,21 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + /* + * A dirty PTE has Dirty=1 or Cow=1. + */ + return pte_flags(pte) & _PAGE_DIRTY_BITS; } +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; +} static inline u32 read_pkru(void) { @@ -160,9 +170,20 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + /* + * A dirty PMD has Dirty=1 or Cow=1. + */ + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pmd_young(pmd_t pmd) @@ -170,9 +191,12 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + /* + * A dirty PUD has Dirty=1 or Cow=1. + */ + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -182,7 +206,7 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } static inline int pte_huge(pte_t pte) @@ -314,6 +338,24 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +static inline pte_t pte_make_cow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_COW); +} + +static inline pte_t pte_clear_cow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -333,7 +375,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -343,7 +385,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_make_cow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -353,7 +404,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return pte_clear_cow(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -363,7 +425,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -410,6 +477,24 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +static inline pmd_t pmd_make_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_COW); +} + +static inline pmd_t pmd_clear_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { @@ -434,17 +519,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_make_cow(pmd); + return pmd; } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -464,7 +568,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -481,6 +589,24 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +static inline pud_t pud_make_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_COW); +} + +static inline pud_t pud_clear_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_COW); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); @@ -488,17 +614,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_make_cow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -518,7 +659,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY @@ -1131,7 +1276,7 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..2309fa9d412e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -121,9 +121,12 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + /* + * A dirty PTE has Dirty=1 or Cow=1. + */ + return pte_flags(pte) & _PAGE_DIRTY_BITS; } @@ -160,9 +163,12 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + /* + * A dirty PMD has Dirty=1 or Cow=1. + */ + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } static inline int pmd_young(pmd_t pmd) @@ -170,9 +176,12 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + /* + * A dirty PUD has Dirty=1 or Cow=1. + */ + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -182,7 +191,15 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * When shadow stack is enabled, a directly writable PTE has RW=1. + * A shadow stack PTE is logically writable and has RW=0, Dirty=1. + * See comments for _PAGE_COW. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY); + else + return pte_flags(pte) & _PAGE_RW; } static inline int pte_huge(pte_t pte) @@ -333,7 +350,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -343,6 +360,18 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_flags(pte) & _PAGE_DIRTY) { + pte = pte_clear_flags(pte, _PAGE_DIRTY); + pte = pte_set_flags(pte, _PAGE_COW); + } + } + return pte_clear_flags(pte, _PAGE_RW); } @@ -353,6 +382,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_COW); return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); } @@ -363,6 +404,13 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_flags(pte) & _PAGE_COW) { + pte = pte_clear_flags(pte, _PAGE_COW); + pte = pte_set_flags(pte, _PAGE_DIRTY); + } + } + return pte_set_flags(pte, _PAGE_RW); } @@ -434,16 +482,40 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_flags(pmd) & _PAGE_DIRTY) { + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + pmd = pmd_set_flags(pmd, _PAGE_COW); + } + } + return pmd_clear_flags(pmd, _PAGE_RW); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_COW); return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); } @@ -464,6 +536,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_flags(pmd) & _PAGE_COW) { + pmd = pmd_clear_flags(pmd, _PAGE_COW); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + } + } + return pmd_set_flags(pmd, _PAGE_RW); } @@ -488,17 +567,35 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pud_flags(pud) & _PAGE_DIRTY) { + pud = pud_clear_flags(pud, _PAGE_DIRTY); + pud = pud_set_flags(pud, _PAGE_COW); + } + } + return pud_clear_flags(pud, _PAGE_RW); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -518,6 +615,13 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pud_flags(pud) & _PAGE_COW) { + pud = pud_clear_flags(pud, _PAGE_COW); + pud = pud_set_flags(pud, _PAGE_DIRTY); + } + } + return pud_set_flags(pud, _PAGE_RW); } @@ -1131,7 +1235,15 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * When shadow stack is enabled, a directly writable PMD has RW=1. + * A shadow stack PMD is logically writable and has RW=0, Dirty=1. + * See comments for _PAGE_COW. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY); + else + return pmd_flags(pmd) & _PAGE_RW; } #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b8b79d618bbc..437d7ff0ae80 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -23,7 +23,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -36,6 +37,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a copy-on-write page. + */ +#ifdef CONFIG_X86_CET +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +127,36 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be read-only and Dirty. + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs + * from shadow stack PTEs: + * (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) + * (b) A R/O page that has been COW'ed: (Write=0, Cow=1) + * The user page is in a R/O VMA, and get_user_pages() needs a + * writable copy. The page fault handler creates a copy of the page + * and sets the new copy's PTE as Write=0, Cow=1. + * (c) A shadow stack PTE: (Write=0, Dirty=1) + * (d) A shared (copy-on-access) shadow stack PTE: (Write=0, Cow=1) + * When a shadow stack page is being shared among processes (this + * happens at fork()), its PTE is cleared of _PAGE_DIRTY, so the next + * shadow stack access causes a fault, and the page is duplicated and + * _PAGE_DIRTY is set again. This is the COW equivalent for shadow + * stack pages, even though it's copy-on-access rather than + * copy-on-write. + * (e) A page where the processor observed a Write=1 PTE, started a write, + * set Dirty=1, but then observed a Write=0 PTE (changed by another + * thread). That's possible today, but will not happen on processors + * that support shadow stack. + */ +#ifdef CONFIG_X86_CET +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /*
There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0, Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. In places where Linux normally creates Write=0, Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0, Dirty=1, it instead creates Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This clearly separates shadow stack from other data, and results in the following: (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) (b) A R/O page that has been COW'ed: (Write=0, Cow=1) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. (c) A shadow stack PTE: (Write=0, Dirty=1) (d) A shared shadow stack PTE: (Write=0, Cow=1) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Define _PAGE_COW and update pte_*() helpers and apply the same changes to pmd and pud. After this, there are six free bits left in the 64-bit PTE, and no more free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not implemented for the 32-bit kernel. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/pgtable.h | 136 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 42 ++++++++- 2 files changed, 165 insertions(+), 13 deletions(-)