Message ID | 20180914203501.qibhpmueosvkr74w@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
>> I think you can go a step further and exclude PKRU state from >> copy_kernel_to_fpregs altogether; you just use RDPKRU/WRPKRU. This also >> means you don't need to call __fpregs_* functions in write_pkru. > > something like this then? It looks like kvm excludes PKRU from > xsave/xrestore, too. This wouldn't be required then Yes, that's why the subject caught my eye! In fact, the reason for KVM to do so is either the opposite as your patch (it wants to call WRPKRU _after_ XRSTOR, not before) or the same (it wants to keep the userspace PKRU loaded for as long as possible), depending on how you look at it. > . This is (again) > untested since I have no box with this PKRU feature. This only available > on Intel's Xeon Scalable, right? It is available on QEMU too (the slower JIT thing without KVM, i.e. use /usr/bin/qemu-system-x86_64 instead of /usr/bin/qemu-kvm or /usr/bin/kvm). > - if (preload) > - __fpregs_load_activate(new_fpu, cpu); > + if (!preload) > + return; > + > + __fpregs_load_activate(new_fpu, cpu); > + > +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > + /* Protection keys need to be in place right at context switch time. */ > + if (boot_cpu_has(X86_FEATURE_OSPKE)) { > + if (new_fpu->pkru != __read_pkru()) > + __write_pkru(new_fpu->pkru); > + } > +#endif I think this would be before the "if (preload)"? > > if (boot_cpu_has(X86_FEATURE_OSPKE)) > - copy_init_pkru_to_fpregs(); > + pkru_set_init_value(); > } > Likewise, move this to fpu__clear and outside "if (static_cpu_has(X86_FEATURE_FPU))"? Also, a __read_pkru() seems to be missing in switch_fpu_prepare. Thanks, Paolo
On 2018-09-17 10:37:20 [+0200], Paolo Bonzini wrote: > > . This is (again) > > untested since I have no box with this PKRU feature. This only available > > on Intel's Xeon Scalable, right? > > It is available on QEMU too (the slower JIT thing without KVM, i.e. use > /usr/bin/qemu-system-x86_64 instead of /usr/bin/qemu-kvm or /usr/bin/kvm). okay. I managed to get the kernel to report this flag as available now. > > - if (preload) > > - __fpregs_load_activate(new_fpu, cpu); > > + if (!preload) > > + return; > > + > > + __fpregs_load_activate(new_fpu, cpu); > > + > > +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > > + /* Protection keys need to be in place right at context switch time. */ > > + if (boot_cpu_has(X86_FEATURE_OSPKE)) { > > + if (new_fpu->pkru != __read_pkru()) > > + __write_pkru(new_fpu->pkru); > > + } > > +#endif > > I think this would be before the "if (preload)"? I did not find an explicit loading of pkru except this "xrestore" which happens on "preload". From what I saw, preload was always set except for kernel threads. Based on that, it looks to me like it can be skipped if there is no FPU/kernel thread. > > > > if (boot_cpu_has(X86_FEATURE_OSPKE)) > > - copy_init_pkru_to_fpregs(); > > + pkru_set_init_value(); > > } > > > > Likewise, move this to fpu__clear and outside "if > (static_cpu_has(X86_FEATURE_FPU))"? okay. But if there is no FPU we did not save/restore the pkru value. Is this supposed to be an improvement? > Also, a __read_pkru() seems to be missing in switch_fpu_prepare. But the value is stored during write_pkru(). So the copy that is saved should be identical to the value that would be read, correct? > > Thanks, > > Paolo Sebastian
On 18/09/2018 16:27, Sebastian Andrzej Siewior wrote: >> Likewise, move this to fpu__clear and outside "if >> (static_cpu_has(X86_FEATURE_FPU))"? > okay. But if there is no FPU we did not save/restore the pkru value. Is > this supposed to be an improvement? Honestly it just seemed "more correct", but now that I think about it, kernel threads should run with PKRU=0. maybe there's a preexisting bug that your patch has the occasion to fix. Paolo >> Also, a __read_pkru() seems to be missing in switch_fpu_prepare. > But the value is stored during write_pkru(). So the copy that is saved > should be identical to the value that would be read, correct? >
On Tue, 2018-09-18 at 17:07 +0200, Paolo Bonzini wrote: > On 18/09/2018 16:27, Sebastian Andrzej Siewior wrote: > > > Likewise, move this to fpu__clear and outside "if > > > (static_cpu_has(X86_FEATURE_FPU))"? > > > > okay. But if there is no FPU we did not save/restore the pkru > > value. Is > > this supposed to be an improvement? > > Honestly it just seemed "more correct", but now that I think about > it, > kernel threads should run with PKRU=0. maybe there's a preexisting > bug > that your patch has the occasion to fix. I don't think it matters what the PKRU state is for kernel threads, since kernel PTEs should not be using protection keys anyway.
On 18/09/2018 17:11, Rik van Riel wrote: > On Tue, 2018-09-18 at 17:07 +0200, Paolo Bonzini wrote: >> On 18/09/2018 16:27, Sebastian Andrzej Siewior wrote: >>>> Likewise, move this to fpu__clear and outside "if >>>> (static_cpu_has(X86_FEATURE_FPU))"? >>> >>> okay. But if there is no FPU we did not save/restore the pkru >>> value. Is >>> this supposed to be an improvement? >> >> Honestly it just seemed "more correct", but now that I think about >> it, >> kernel threads should run with PKRU=0. maybe there's a preexisting >> bug >> that your patch has the occasion to fix. > > I don't think it matters what the PKRU state is > for kernel threads, since kernel PTEs should not > be using protection keys anyway. What about copy_from/to_user? Paolo
On 2018-09-18 17:29:52 [+0200], Paolo Bonzini wrote: > > I don't think it matters what the PKRU state is > > for kernel threads, since kernel PTEs should not > > be using protection keys anyway. > > What about copy_from/to_user? This doesn't work for a kernel thread, does it? I mean they share the init's MM and never do copy_{from|to}_user. > Paolo Sebastian
On Tue, 2018-09-18 at 18:04 +0200, Sebastian Andrzej Siewior wrote: > On 2018-09-18 17:29:52 [+0200], Paolo Bonzini wrote: > > > I don't think it matters what the PKRU state is > > > for kernel threads, since kernel PTEs should not > > > be using protection keys anyway. > > > > What about copy_from/to_user? > > This doesn't work for a kernel thread, does it? I mean they share the > init's MM and never do copy_{from|to}_user. Indeed, copy_from/to_user only works if current->mm points at an mm_struct with userspace memory.
On 18/09/2018 19:29, Rik van Riel wrote: > On Tue, 2018-09-18 at 18:04 +0200, Sebastian Andrzej Siewior wrote: >> On 2018-09-18 17:29:52 [+0200], Paolo Bonzini wrote: >>>> I don't think it matters what the PKRU state is >>>> for kernel threads, since kernel PTEs should not >>>> be using protection keys anyway. >>> >>> What about copy_from/to_user? >> >> This doesn't work for a kernel thread, does it? I mean they share the >> init's MM and never do copy_{from|to}_user. > > Indeed, copy_from/to_user only works if current->mm > points at an mm_struct with userspace memory. A kthread can do use_mm/unuse_mm. Paolo
On 2018-09-19 07:55:51 [+0200], Paolo Bonzini wrote: > A kthread can do use_mm/unuse_mm. indeed. The FPU struct for the kernel thread isn't valid / does not contain the expected PKRU value. So loading the pkru value from the struct FPU does not work as expected. We could set it to 0 for a kernel thread so we don't end up with a random value. If we want to get this usecase working then we would have to move pkru value from FPU to mm_struct and consider it in use_mm(). Do we want this? > Paolo Sebastian
On 19/09/2018 18:57, Sebastian Andrzej Siewior wrote: > On 2018-09-19 07:55:51 [+0200], Paolo Bonzini wrote: >> A kthread can do use_mm/unuse_mm. > > indeed. The FPU struct for the kernel thread isn't valid / does not > contain the expected PKRU value. So loading the pkru value from the > struct FPU does not work as expected. We could set it to 0 for a kernel > thread so we don't end up with a random value. > If we want to get this usecase working then we would have to move pkru > value from FPU to mm_struct and consider it in use_mm(). Do we want > this? As a start, I think keeping it in the FPU struct but loading it unconditionally will work. kthreads will not obey PKU but it will be better already. I honestly don't know if PKRU should be per-mm, I don't know mm very well despite my brilliant observation above. :) Paolo
On 2018-09-19 19:00:34 [+0200], Paolo Bonzini wrote: > On 19/09/2018 18:57, Sebastian Andrzej Siewior wrote: > > On 2018-09-19 07:55:51 [+0200], Paolo Bonzini wrote: > >> A kthread can do use_mm/unuse_mm. > > > > indeed. The FPU struct for the kernel thread isn't valid / does not > > contain the expected PKRU value. So loading the pkru value from the > > struct FPU does not work as expected. We could set it to 0 for a kernel > > thread so we don't end up with a random value. > > If we want to get this usecase working then we would have to move pkru > > value from FPU to mm_struct and consider it in use_mm(). Do we want > > this? > > As a start, I think keeping it in the FPU struct but loading it > unconditionally will work. kthreads will not obey PKU but it will be > better already. Are you saying I should load the uninitialized value for kthreads or load 0 to have in a known state? > I honestly don't know if PKRU should be per-mm, I don't know mm very > well despite my brilliant observation above. :) Now that I have qemu machine with PKRU I would need to figure out how this works. So unless I am mistaken mm is per task and the FPU is per thread. And since the FPU struct isn't initialized for kthreads, we should end up with 0. But setting to 0 if not used sounds good. > Paolo Sebastian
> On Sep 19, 2018, at 1:00 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 19/09/2018 18:57, Sebastian Andrzej Siewior wrote: >> On 2018-09-19 07:55:51 [+0200], Paolo Bonzini wrote: >>> A kthread can do use_mm/unuse_mm. >> >> indeed. The FPU struct for the kernel thread isn't valid / does not >> contain the expected PKRU value. So loading the pkru value from the >> struct FPU does not work as expected. We could set it to 0 for a kernel >> thread so we don't end up with a random value. >> If we want to get this usecase working then we would have to move pkru >> value from FPU to mm_struct and consider it in use_mm(). Do we want >> this? > > As a start, I think keeping it in the FPU struct but loading it > unconditionally will work. kthreads will not obey PKU but it will be > better already. > > I honestly don't know if PKRU should be per-mm, I don't know mm very > well despite my brilliant observation above. :) > One of the rumored use cases for PKRU is to allow different threads in the same process to have different memory permissions, while still sharing the same page tables. Making it per-mm would break that :)
> On Sep 19, 2018, at 10:00 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> On 19/09/2018 18:57, Sebastian Andrzej Siewior wrote: >>> On 2018-09-19 07:55:51 [+0200], Paolo Bonzini wrote: >>> A kthread can do use_mm/unuse_mm. >> >> indeed. The FPU struct for the kernel thread isn't valid / does not >> contain the expected PKRU value. So loading the pkru value from the >> struct FPU does not work as expected. We could set it to 0 for a kernel >> thread so we don't end up with a random value. >> If we want to get this usecase working then we would have to move pkru >> value from FPU to mm_struct and consider it in use_mm(). Do we want >> this? > > As a start, I think keeping it in the FPU struct but loading it > unconditionally will work. kthreads will not obey PKU but it will be > better already. > > I honestly don't know if PKRU should be per-mm, I don't know mm very > well despite my brilliant observation above. :) > > It must be per thread. I don’t think it’s possible to have sane semantics per mm. I also think that use_mm should set PKRU to the same value that signal handlers use. If we use 0, it’s a recipe for accidental PK bypass.
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16c4077ffc945..903ee77b6d5b0 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -573,8 +573,18 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu) bool preload = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized; - if (preload) - __fpregs_load_activate(new_fpu, cpu); + if (!preload) + return; + + __fpregs_load_activate(new_fpu, cpu); + +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS + /* Protection keys need to be in place right at context switch time. */ + if (boot_cpu_has(X86_FEATURE_OSPKE)) { + if (new_fpu->pkru != __read_pkru()) + __write_pkru(new_fpu->pkru); + } +#endif } /* diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 202c53918ecfa..257b092bdaa4e 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -293,6 +293,13 @@ struct fpu { */ unsigned int last_cpu; + /* + * Protection key bits. The protection key needs to be switched out + * immediately at context switch time, so it is in place for things + * like copy_to_user. + */ + unsigned int pkru; + /* * @initialized: * diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 48581988d78c7..abe8793fa50f9 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -29,7 +29,6 @@ XFEATURE_MASK_OPMASK | \ XFEATURE_MASK_ZMM_Hi256 | \ XFEATURE_MASK_Hi16_ZMM | \ - XFEATURE_MASK_PKRU | \ XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 690c0307afed0..d87bdfaf45e56 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -134,8 +134,11 @@ static inline u32 read_pkru(void) static inline void write_pkru(u32 pkru) { - if (boot_cpu_has(X86_FEATURE_OSPKE)) - __write_pkru(pkru); + if (!boot_cpu_has(X86_FEATURE_OSPKE)) + return; + + current->thread.fpu.pkru = pkru; + __write_pkru(pkru); } static inline int pte_young(pte_t pte) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 19b137f1b3beb..b184f916319e5 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); -extern void copy_init_pkru_to_fpregs(void); +extern void pkru_set_init_value(void); static inline int vma_pkey(struct vm_area_struct *vma) { diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 2ea85b32421a0..72cd2e2a07194 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -373,7 +373,7 @@ static inline void copy_init_fpstate_to_fpregs(void) copy_kernel_to_fregs(&init_fpstate.fsave); if (boot_cpu_has(X86_FEATURE_OSPKE)) - copy_init_pkru_to_fpregs(); + pkru_set_init_value(); } /* diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7642d36..11014d841b9f7 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -920,10 +920,6 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, int pkey_shift = (pkey * PKRU_BITS_PER_PKEY); u32 new_pkru_bits = 0; - /* - * This check implies XSAVE support. OSPKE only gets - * set if we enable XSAVE and we enable PKU in XCR0. - */ if (!boot_cpu_has(X86_FEATURE_OSPKE)) return -EINVAL; diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 6e98e0a7c9231..2f9d95206c741 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -137,26 +137,13 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) | PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); -/* - * Called from the FPU code when creating a fresh set of FPU - * registers. This is called from a very specific context where - * we know the FPU regstiers are safe for use and we can use PKRU - * directly. - */ -void copy_init_pkru_to_fpregs(void) +void pkru_set_init_value(void) { u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value); - /* - * Any write to PKRU takes it out of the XSAVE 'init - * state' which increases context switch cost. Avoid - * writing 0 when PKRU was already 0. - */ - if (!init_pkru_value_snapshot && !read_pkru()) + + if (init_pkru_value_snapshot == read_pkru()) return; - /* - * Override the PKRU state that came from 'init_fpstate' - * with the baseline from the process. - */ + write_pkru(init_pkru_value_snapshot); } diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba9760489..9a9efecc1388f 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -44,7 +44,7 @@ static inline bool arch_pkeys_enabled(void) return false; } -static inline void copy_init_pkru_to_fpregs(void) +static inline void pkru_set_init_value(void) { }