Message ID | 20200717072056.73134-5-ira.weiny@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PKS: Add Protection Keys Supervisor (PKS) support | expand |
On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index f362ce0d5ac0..d69250a7c1bf 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -42,6 +42,7 @@ > #include <asm/spec-ctrl.h> > #include <asm/io_bitmap.h> > #include <asm/proto.h> > +#include <asm/pkeys_internal.h> > > #include "process.h" > > @@ -184,6 +185,36 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, > return ret; > } > > +/* > + * NOTE: We wrap pks_init_task() and pks_sched_in() with > + * CONFIG_ARCH_HAS_SUPERVISOR_PKEYS because using IS_ENABLED() fails > + * due to the lack of task_struct->saved_pkrs in this configuration. > + * Furthermore, we place them here because of the complexity introduced by > + * header conflicts introduced to get the task_struct definition in the pkeys > + * headers. > + */ I don't see anything much useful in that comment. > +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS > +DECLARE_PER_CPU(u32, pkrs_cache); > +static inline void pks_init_task(struct task_struct *tsk) > +{ > + /* New tasks get the most restrictive PKRS value */ > + tsk->thread.saved_pkrs = INIT_PKRS_VALUE; > +} > +static inline void pks_sched_in(void) > +{ > + u64 current_pkrs = current->thread.saved_pkrs; > + > + /* Only update the MSR when current's pkrs is different from the MSR. */ > + if (this_cpu_read(pkrs_cache) == current_pkrs) > + return; > + > + write_pkrs(current_pkrs); Should we write that like: /* * PKRS is only temporarily changed during specific code paths. * Only a preemption during these windows away from the default * value would require updating the MSR. */ if (unlikely(this_cpu_read(pkrs_cache) != current_pkrs)) write_pkrs(current_pkrs); ? > +} > +#else > +static inline void pks_init_task(struct task_struct *tsk) { } > +static inline void pks_sched_in(void) { } > +#endif
On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > +/* > + * Write the PKey Register Supervisor. This must be run with preemption > + * disabled as it does not guarantee the atomicity of updating the pkrs_cache > + * and MSR on its own. > + */ > +void write_pkrs(u32 pkrs_val) > +{ > + this_cpu_write(pkrs_cache, pkrs_val); > + wrmsrl(MSR_IA32_PKRS, pkrs_val); > +} Should we write that like: void write_pkrs(u32 pkr) { u32 *pkrs = get_cpu_ptr(pkrs_cache); if (*pkrs != pkr) { *pkrs = pkr; wrmsrl(MSR_IA32_PKRS, pkr); } put_cpu_ptrpkrs_cache); } given that we fundamentally need to serialize againt schedule() here.
On Fri, Jul 17, 2020 at 10:31:40AM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index f362ce0d5ac0..d69250a7c1bf 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -42,6 +42,7 @@ > > #include <asm/spec-ctrl.h> > > #include <asm/io_bitmap.h> > > #include <asm/proto.h> > > +#include <asm/pkeys_internal.h> > > > > #include "process.h" > > > > @@ -184,6 +185,36 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, > > return ret; > > } > > > > +/* > > + * NOTE: We wrap pks_init_task() and pks_sched_in() with > > + * CONFIG_ARCH_HAS_SUPERVISOR_PKEYS because using IS_ENABLED() fails > > + * due to the lack of task_struct->saved_pkrs in this configuration. > > + * Furthermore, we place them here because of the complexity introduced by > > + * header conflicts introduced to get the task_struct definition in the pkeys > > + * headers. > > + */ > > I don't see anything much useful in that comment. I'm happy to delete. Internal reviews questioned the motive here so I added the comment to inform why this style was chosen rather than the preferred IS_ENABLED(). I've deleted it now. > > > +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS > > +DECLARE_PER_CPU(u32, pkrs_cache); > > +static inline void pks_init_task(struct task_struct *tsk) > > +{ > > + /* New tasks get the most restrictive PKRS value */ > > + tsk->thread.saved_pkrs = INIT_PKRS_VALUE; > > +} > > +static inline void pks_sched_in(void) > > +{ > > + u64 current_pkrs = current->thread.saved_pkrs; > > + > > + /* Only update the MSR when current's pkrs is different from the MSR. */ > > + if (this_cpu_read(pkrs_cache) == current_pkrs) > > + return; > > + > > + write_pkrs(current_pkrs); > > Should we write that like: > > /* > * PKRS is only temporarily changed during specific code paths. > * Only a preemption during these windows away from the default > * value would require updating the MSR. > */ > if (unlikely(this_cpu_read(pkrs_cache) != current_pkrs)) > write_pkrs(current_pkrs); > > ? Yes I think the unlikely is better. Thanks, Ira > > > +} > > +#else > > +static inline void pks_init_task(struct task_struct *tsk) { } > > +static inline void pks_sched_in(void) { } > > +#endif
On Fri, Jul 17, 2020 at 10:59:54AM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > > +/* > > + * Write the PKey Register Supervisor. This must be run with preemption > > + * disabled as it does not guarantee the atomicity of updating the pkrs_cache > > + * and MSR on its own. > > + */ > > +void write_pkrs(u32 pkrs_val) > > +{ > > + this_cpu_write(pkrs_cache, pkrs_val); > > + wrmsrl(MSR_IA32_PKRS, pkrs_val); > > +} > > Should we write that like: > > void write_pkrs(u32 pkr) > { > u32 *pkrs = get_cpu_ptr(pkrs_cache); > if (*pkrs != pkr) { > *pkrs = pkr; > wrmsrl(MSR_IA32_PKRS, pkr); > } > put_cpu_ptrpkrs_cache); > } > > given that we fundamentally need to serialize againt schedule() here. Yes. That seems better. That also means pks_sched_in() can be simplified to just static inline void pks_sched_in(void) { write_pkrs(current->thread.saved_pkrs); } Because of the built WRMSR avoidance. However, pkrs_cache is static so I think I need to use {get,put}_cpu_var() here don't I? Thanks! Ira
On Fri, Jul 17, 2020 at 03:34:07PM -0700, Ira Weiny wrote: > On Fri, Jul 17, 2020 at 10:59:54AM +0200, Peter Zijlstra wrote: > > On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > > > +/* > > > + * Write the PKey Register Supervisor. This must be run with preemption > > > + * disabled as it does not guarantee the atomicity of updating the pkrs_cache > > > + * and MSR on its own. > > > + */ > > > +void write_pkrs(u32 pkrs_val) > > > +{ > > > + this_cpu_write(pkrs_cache, pkrs_val); > > > + wrmsrl(MSR_IA32_PKRS, pkrs_val); > > > +} > > > > Should we write that like: > > > > void write_pkrs(u32 pkr) > > { > > u32 *pkrs = get_cpu_ptr(pkrs_cache); > > if (*pkrs != pkr) { > > *pkrs = pkr; > > wrmsrl(MSR_IA32_PKRS, pkr); > > } > > put_cpu_ptrpkrs_cache); > > } > > > > given that we fundamentally need to serialize againt schedule() here. > > Yes. That seems better. > > That also means pks_sched_in() can be simplified to just > > static inline void pks_sched_in(void) > { > write_pkrs(current->thread.saved_pkrs); > } > > Because of the built WRMSR avoidance. > > However, pkrs_cache is static so I think I need to use {get,put}_cpu_var() here > don't I? Or get_cpu_ptr(&pkrs_cache), sorry for the typo :-)
On Mon, Jul 20, 2020 at 11:15:54AM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2020 at 03:34:07PM -0700, Ira Weiny wrote: > > On Fri, Jul 17, 2020 at 10:59:54AM +0200, Peter Zijlstra wrote: > > > On Fri, Jul 17, 2020 at 12:20:43AM -0700, ira.weiny@intel.com wrote: > > > > +/* > > > > + * Write the PKey Register Supervisor. This must be run with preemption > > > > + * disabled as it does not guarantee the atomicity of updating the pkrs_cache > > > > + * and MSR on its own. > > > > + */ > > > > +void write_pkrs(u32 pkrs_val) > > > > +{ > > > > + this_cpu_write(pkrs_cache, pkrs_val); > > > > + wrmsrl(MSR_IA32_PKRS, pkrs_val); > > > > +} > > > > > > Should we write that like: > > > > > > void write_pkrs(u32 pkr) > > > { > > > u32 *pkrs = get_cpu_ptr(pkrs_cache); > > > if (*pkrs != pkr) { > > > *pkrs = pkr; > > > wrmsrl(MSR_IA32_PKRS, pkr); > > > } > > > put_cpu_ptrpkrs_cache); > > > } > > > > > > given that we fundamentally need to serialize againt schedule() here. > > > > Yes. That seems better. > > > > That also means pks_sched_in() can be simplified to just > > > > static inline void pks_sched_in(void) > > { > > write_pkrs(current->thread.saved_pkrs); > > } > > > > Because of the built WRMSR avoidance. > > > > However, pkrs_cache is static so I think I need to use {get,put}_cpu_var() here > > don't I? > > Or get_cpu_ptr(&pkrs_cache), sorry for the typo :-) Ah I see... sorry, yes that will work. Done, Ira
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index e8370e64a155..b6ffdfc3f388 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -727,6 +727,7 @@ #define MSR_IA32_TSC_DEADLINE 0x000006E0 +#define MSR_IA32_PKRS 0x000006E1 #define MSR_TSX_FORCE_ABORT 0x0000010F diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h index a9f086f1e4b4..05257cdc7200 100644 --- a/arch/x86/include/asm/pkeys_internal.h +++ b/arch/x86/include/asm/pkeys_internal.h @@ -8,4 +8,24 @@ #define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +/* + * Define a default PKRS value for each task. + * + * Key 0 has no restriction. All other keys are set to the most restrictive + * value which is access disabled (AD=1). + * + * NOTE: This needs to be a macro to be used as part of the INIT_THREAD macro. + */ +#define INIT_PKRS_VALUE (PKR_AD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ + PKR_AD_KEY(4) | PKR_AD_KEY(5) | PKR_AD_KEY(6) | \ + PKR_AD_KEY(7) | PKR_AD_KEY(8) | PKR_AD_KEY(9) | \ + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) + +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +void write_pkrs(u32 pkrs_val); +#else +static inline void write_pkrs(u32 pkrs_val) { } +#endif + #endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7da9855b5068..704d9f28fd4e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -545,6 +545,11 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 saved_pkrs; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* @@ -907,8 +912,15 @@ static inline void spin_lock_prefetch(const void *x) #define STACK_TOP TASK_SIZE_LOW #define STACK_TOP_MAX TASK_SIZE_MAX +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define INIT_THREAD_PKRS .saved_pkrs = INIT_PKRS_VALUE, +#else +#define INIT_THREAD_PKRS +#endif + #define INIT_THREAD { \ .addr_limit = KERNEL_DS, \ + INIT_THREAD_PKRS \ } extern unsigned long KSTK_ESP(struct task_struct *task); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f34bcefeda42..b8241936cbbf 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -56,6 +56,7 @@ #include <asm/intel-family.h> #include <asm/cpu_device_id.h> #include <asm/uv/uv.h> +#include <asm/pkeys_internal.h> #include "cpu.h" @@ -1442,6 +1443,7 @@ static void setup_pks(void) return; cr4_set_bits(X86_CR4_PKS); + write_pkrs(INIT_PKRS_VALUE); } /* diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index f362ce0d5ac0..d69250a7c1bf 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -42,6 +42,7 @@ #include <asm/spec-ctrl.h> #include <asm/io_bitmap.h> #include <asm/proto.h> +#include <asm/pkeys_internal.h> #include "process.h" @@ -184,6 +185,36 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, return ret; } +/* + * NOTE: We wrap pks_init_task() and pks_sched_in() with + * CONFIG_ARCH_HAS_SUPERVISOR_PKEYS because using IS_ENABLED() fails + * due to the lack of task_struct->saved_pkrs in this configuration. + * Furthermore, we place them here because of the complexity introduced by + * header conflicts introduced to get the task_struct definition in the pkeys + * headers. + */ +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +DECLARE_PER_CPU(u32, pkrs_cache); +static inline void pks_init_task(struct task_struct *tsk) +{ + /* New tasks get the most restrictive PKRS value */ + tsk->thread.saved_pkrs = INIT_PKRS_VALUE; +} +static inline void pks_sched_in(void) +{ + u64 current_pkrs = current->thread.saved_pkrs; + + /* Only update the MSR when current's pkrs is different from the MSR. */ + if (this_cpu_read(pkrs_cache) == current_pkrs) + return; + + write_pkrs(current_pkrs); +} +#else +static inline void pks_init_task(struct task_struct *tsk) { } +static inline void pks_sched_in(void) { } +#endif + void flush_thread(void) { struct task_struct *tsk = current; @@ -192,6 +223,8 @@ void flush_thread(void) memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); fpu__clear_all(&tsk->thread.fpu); + + pks_init_task(tsk); } void disable_TSC(void) @@ -655,6 +688,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) if ((tifp ^ tifn) & _TIF_SLD) switch_to_sld(tifn); + + pks_sched_in(); } /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index a5c680d32930..0f86f2374bd7 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -236,3 +236,16 @@ u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val) /* Return the old part along with the new part: */ return old_pkr | new_pkr_bits; } + +DEFINE_PER_CPU(u32, pkrs_cache); + +/* + * Write the PKey Register Supervisor. This must be run with preemption + * disabled as it does not guarantee the atomicity of updating the pkrs_cache + * and MSR on its own. + */ +void write_pkrs(u32 pkrs_val) +{ + this_cpu_write(pkrs_cache, pkrs_val); + wrmsrl(MSR_IA32_PKRS, pkrs_val); +}