Message ID | 20220929222936.14584-9-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadowstacks for userspace | expand |
On Thu, Sep 29, 2022 at 03:29:05PM -0700, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > Processors sometimes directly create Write=0,Dirty=1 PTEs. These PTEs are > created by software. One such case is that kernel read-only pages are > historically set up as Dirty. > > New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as > shadow stack pages. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some > instructions can write to such supervisor memory. The kernel does not set > IA32_S_CET.SH_STK_EN, but to reduce ambiguity between shadow stack and > regular Write=0 pages, removed Dirty=1 from any kernel Write=0 PTEs. > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Peter Zijlstra <peterz@infradead.org> > > --- > > v2: > - Normalize PTE bit descriptions between patches > > arch/x86/include/asm/pgtable_types.h | 6 +++--- > arch/x86/mm/pat/set_memory.c | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index aa174fed3a71..ff82237e7b6b 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -192,10 +192,10 @@ enum page_cache_mode { > #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) > #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) > #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) > -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) > -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) > +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) > +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) > #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) > -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) > +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) > #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) > #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) > #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > index 1abd5438f126..ed9193b469ba 100644 > --- a/arch/x86/mm/pat/set_memory.c > +++ b/arch/x86/mm/pat/set_memory.c > @@ -1977,7 +1977,7 @@ int set_memory_nx(unsigned long addr, int numpages) > > int set_memory_ro(unsigned long addr, int numpages) > { > - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); > + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); > } Hm. Do we need to modify also *_wrprotect() helpers to clear dirty bit? I guess not (at least without a lot of audit), as we risk loosing dirty bit on page cache pages. But why is it safe? Do we only care about about kernel PTEs here? Userspace Write=0,Dirty=1 PTEs handled as before? > int set_memory_rw(unsigned long addr, int numpages) > -- > 2.17.1 >
On 29/09/2022 23:29, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > Processors sometimes directly create Write=0,Dirty=1 PTEs. Do they? (Rhetorical) Yes, this is a relevant anecdote for why CET isn't available on pre-TGL parts, but it one of the more wrong things to have as the first sentence of this commit message. The point you want to express is that under the CET-SS spec, R/O+Dirty has a new meaning as type=shstk, so stop using this bit combination for existing mappings. I'm not even sure it's relevant to note that CET capable processors can set D on a R/O mapping, because that depends on !CR0.WP which in turn prohibits CR4.CET being enabled. ~Andrew
On Wed, Oct 05, 2022 at 01:31:28AM +0000, Andrew Cooper wrote: > On 29/09/2022 23:29, Rick Edgecombe wrote: > > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > > > Processors sometimes directly create Write=0,Dirty=1 PTEs. > > Do they? (Rhetorical) > > Yes, this is a relevant anecdote for why CET isn't available on pre-TGL > parts, but it one of the more wrong things to have as the first sentence > of this commit message. > > The point you want to express is that under the CET-SS spec, R/O+Dirty > has a new meaning as type=shstk, so stop using this bit combination for > existing mappings. > > I'm not even sure it's relevant to note that CET capable processors can > set D on a R/O mapping, because that depends on !CR0.WP which in turn > prohibits CR4.CET being enabled. Whilst I agree that the Changelog is 'suboptimal' -- I do think it might be good to mention how we ended up at the current state where we explicitly set this non-sensical W=0,D=1 state. Looking at the git history this seems to be a bit of a hysterical accident, not something done on purpose to 'optimize' for these weird CPUs.
On 05/10/2022 12:16, Peter Zijlstra wrote: > On Wed, Oct 05, 2022 at 01:31:28AM +0000, Andrew Cooper wrote: >> On 29/09/2022 23:29, Rick Edgecombe wrote: >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com> >>> >>> Processors sometimes directly create Write=0,Dirty=1 PTEs. >> Do they? (Rhetorical) >> >> Yes, this is a relevant anecdote for why CET isn't available on pre-TGL >> parts, but it one of the more wrong things to have as the first sentence >> of this commit message. >> >> The point you want to express is that under the CET-SS spec, R/O+Dirty >> has a new meaning as type=shstk, so stop using this bit combination for >> existing mappings. >> >> I'm not even sure it's relevant to note that CET capable processors can >> set D on a R/O mapping, because that depends on !CR0.WP which in turn >> prohibits CR4.CET being enabled. > Whilst I agree that the Changelog is 'suboptimal' -- I do think it might > be good to mention how we ended up at the current state where we > explicitly set this non-sensical W=0,D=1 state. Sure, but that's got nothing to do with hardware errata. Having hardware set A/D bits is expensive. Being a locked operation, it's roughly a smp_mb() behind the scenes. Therefore, when A/D tracking doesn't matter, traditional wisdom says set both of them when creating the PTE. It's only now that R/O+Dirty has a meaning (other than being a slightly weird but safe bit combination), and we've got to be more careful about using it. ~Andrew
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index aa174fed3a71..ff82237e7b6b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -192,10 +192,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 1abd5438f126..ed9193b469ba 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1977,7 +1977,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rw(unsigned long addr, int numpages)