Message ID | 20181004140547.13014-6-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: load FPU registers on return to userland | expand |
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > The PKRU value is not set for kernel threads because they do not have > the ->initialized value set. As a result the kernel thread has a random > PKRU value set which it inherits from the previous task. > It has been suggested by Paolo Bonzini to set it for kernel threads, too > because it might be a fix. > I *think* this is not required because the kernel threads don't copy > data to/from userland and don't have access to any userspace mm in > general. > However there is this use_mm(). If we gain a mm by use_mm() we don't > have a matching PKRU value because those are per thread. It has been > suggested to use 0 as the PKRU value but this would bypass PKRU. > > Set the initial (default) PKRU value for kernel threads. We might want to do this for cleanliness reasons... Maybe. But this *should* have no practical effects. Kernel threads have no real 'mm' and no user pages. They should not have do access to user mappings. Protection keys *only* apply to user mappings. Thus, logically, they should never be affected by PKRU values. So I'm kinda missing the point of the patch.
On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen <dave.hansen@linux.intel.com> wrote: > > On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > > The PKRU value is not set for kernel threads because they do not have > > the ->initialized value set. As a result the kernel thread has a random > > PKRU value set which it inherits from the previous task. > > It has been suggested by Paolo Bonzini to set it for kernel threads, too > > because it might be a fix. > > I *think* this is not required because the kernel threads don't copy > > data to/from userland and don't have access to any userspace mm in > > general. > > However there is this use_mm(). If we gain a mm by use_mm() we don't > > have a matching PKRU value because those are per thread. It has been > > suggested to use 0 as the PKRU value but this would bypass PKRU. > > > > Set the initial (default) PKRU value for kernel threads. > > We might want to do this for cleanliness reasons... Maybe. > > But this *should* have no practical effects. Kernel threads have no > real 'mm' and no user pages. They should not have do access to user > mappings. Protection keys *only* apply to user mappings. Thus, > logically, they should never be affected by PKRU values. > > So I'm kinda missing the point of the patch. use_mm().
On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote: > On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen > <dave.hansen@linux.intel.com> wrote: > > > > On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > > > The PKRU value is not set for kernel threads because they do not have > > > the ->initialized value set. As a result the kernel thread has a random > > > PKRU value set which it inherits from the previous task. > > > It has been suggested by Paolo Bonzini to set it for kernel threads, too > > > because it might be a fix. > > > I *think* this is not required because the kernel threads don't copy > > > data to/from userland and don't have access to any userspace mm in > > > general. > > > However there is this use_mm(). If we gain a mm by use_mm() we don't > > > have a matching PKRU value because those are per thread. It has been > > > suggested to use 0 as the PKRU value but this would bypass PKRU. > > > > > > Set the initial (default) PKRU value for kernel threads. > > > > We might want to do this for cleanliness reasons... Maybe. > > > > But this *should* have no practical effects. Kernel threads have no > > real 'mm' and no user pages. They should not have do access to user > > mappings. Protection keys *only* apply to user mappings. Thus, > > logically, they should never be affected by PKRU values. > > > > So I'm kinda missing the point of the patch. > > use_mm(). So. I would drop that patch from queue. Anyone feels different about it? Sebastian
> On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote: >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen >> <dave.hansen@linux.intel.com> wrote: >>> >>>> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: >>>> The PKRU value is not set for kernel threads because they do not have >>>> the ->initialized value set. As a result the kernel thread has a random >>>> PKRU value set which it inherits from the previous task. >>>> It has been suggested by Paolo Bonzini to set it for kernel threads, too >>>> because it might be a fix. >>>> I *think* this is not required because the kernel threads don't copy >>>> data to/from userland and don't have access to any userspace mm in >>>> general. >>>> However there is this use_mm(). If we gain a mm by use_mm() we don't >>>> have a matching PKRU value because those are per thread. It has been >>>> suggested to use 0 as the PKRU value but this would bypass PKRU. >>>> >>>> Set the initial (default) PKRU value for kernel threads. >>> >>> We might want to do this for cleanliness reasons... Maybe. >>> >>> But this *should* have no practical effects. Kernel threads have no >>> real 'mm' and no user pages. They should not have do access to user >>> mappings. Protection keys *only* apply to user mappings. Thus, >>> logically, they should never be affected by PKRU values. >>> >>> So I'm kinda missing the point of the patch. >> >> use_mm(). > > So. I would drop that patch from queue. Anyone feels different about it? > I think we *do* want the patch. It’s a bugfix for use_mm users, right? > Sebastian
On 10/18/2018 09:48 AM, Andy Lutomirski wrote: >>>> We might want to do this for cleanliness reasons... Maybe. >>>> >>>> But this *should* have no practical effects. Kernel threads have no >>>> real 'mm' and no user pages. They should not have do access to user >>>> mappings. Protection keys *only* apply to user mappings. Thus, >>>> logically, they should never be affected by PKRU values. >>>> >>>> So I'm kinda missing the point of the patch. >>> use_mm(). >> So. I would drop that patch from queue. Anyone feels different about it? >> > I think we *do* want the patch. It’s a bugfix for use_mm users, right? Yes, we need it. I was being dense and Andy kindly reminded me of the point of the patch.
On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote: > > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote: > >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen > >>> So I'm kinda missing the point of the patch. > >> > >> use_mm(). > > > > So. I would drop that patch from queue. Anyone feels different about it? > > > > I think we *do* want the patch. It’s a bugfix for use_mm users, right? This is the loophole that has been pointed out. I am not convinced what the correct behaviour should be here (and we have five users of that interface). For instance f_fs[0]. It reads data from the USB EP and then writes it to userland task. Due to $circumstances it happens in a workqueue instead of the task's context. So it borrows the mm with use_mm(). The current behaviour random because the PKRU value can not be predicted. It may or may not work. Setting it to allow-all/none would let the operation always fail or succeed which might be an improvement in terms of debugging. However it is hard to judge what the correct behaviour should be. Should fail or succeed. But this is not the only loophole: There is ptrace interface which is used by gdb (just checked) and also bypasses PKRU. So… [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker() Sebastian
On Thu, Oct 18, 2018 at 11:25 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote: > > > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote: > > >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen > > >>> So I'm kinda missing the point of the patch. > > >> > > >> use_mm(). > > > > > > So. I would drop that patch from queue. Anyone feels different about it? > > > > > > > I think we *do* want the patch. It’s a bugfix for use_mm users, right? > > This is the loophole that has been pointed out. I am not convinced what > the correct behaviour should be here (and we have five users of that > interface). For instance f_fs[0]. It reads data from the USB EP and > then writes it to userland task. Due to $circumstances it happens in a > workqueue instead of the task's context. So it borrows the mm with > use_mm(). The current behaviour random because the PKRU value can not > be predicted. It may or may not work. > > Setting it to allow-all/none would let the operation always fail or > succeed which might be an improvement in terms of debugging. However it > is hard to judge what the correct behaviour should be. Should fail or > succeed. > But this is not the only loophole: There is ptrace interface which is > used by gdb (just checked) and also bypasses PKRU. So… > > [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker() > > Sebastian I think we need an entirely new API: user_mm_ctx_t ctx = user_mm_ctx_get(); ... use_user_mm_ctx(ctx); unuse_user_mm_ctx(ctx); ... user_mm_ctx_put(ctx); and ctx will store a copy of mm and PKRU.
On 10/18/2018 01:46 PM, Andy Lutomirski wrote: > Setting it to allow-all/none would let the operation always fail or > succeed which might be an improvement in terms of debugging. However it > is hard to judge what the correct behaviour should be. Should fail or > succeed. Succeed. :) > But this is not the only loophole: There is ptrace interface which is > used by gdb (just checked) and also bypasses PKRU. So… Bypassing protection keys is not a big deal IMNHO. In places where a sane one is not readily available, I'm totally fine with just effectively disabling it (PKRU=0) for the length of time it isn't available.
On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote: > > But this is not the only loophole: There is ptrace interface which is > > used by gdb (just checked) and also bypasses PKRU. So… > > Bypassing protection keys is not a big deal IMNHO. In places where a > sane one is not readily available, I'm totally fine with just > effectively disabling it (PKRU=0) for the length of time it isn't available. Okay, this makes things easier. Let document that for kernel threads we use PKRU=0. This should be happening in my try right now. I double check tomorrow just in case… Sebastian
> On Oct 18, 2018, at 2:24 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote: >>> But this is not the only loophole: There is ptrace interface which is >>> used by gdb (just checked) and also bypasses PKRU. So… >> >> Bypassing protection keys is not a big deal IMNHO. In places where a >> sane one is not readily available, I'm totally fine with just >> effectively disabling it (PKRU=0) for the length of time it isn't available. > > Okay, this makes things easier. Let document that for kernel threads we > use PKRU=0. This should be happening in my try right now. I double check > tomorrow just in case… > > If you document that, please at least document that it’s a bug and not intended behavior.
On 18/10/2018 22:46, Andy Lutomirski wrote: >> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker() >> >> Sebastian > I think we need an entirely new API: > > user_mm_ctx_t ctx = user_mm_ctx_get(); > > ... > > use_user_mm_ctx(ctx); > unuse_user_mm_ctx(ctx); > > ... > > user_mm_ctx_put(ctx); > > and ctx will store a copy of mm and PKRU. > That looks like a good API in general. The ffs_user_copy_worker that Sebastian mentioned seems to be used by AIO, in which case of course it has to happen in a kernel thread. But while the API is good, deciding on the desired semantics is "interesting". The submitting thread might be changing PKRU between the time the I/O operation is submitted and the time it is completed, for example. You could look up the task's PKRU at use_mm time, except that the task might have disappeared... In the end just using PKRU=0 makes some sense and it only should be documented that some kernel services will ignore PKRU. Paolo
> On Oct 19, 2018, at 12:44 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 18/10/2018 22:46, Andy Lutomirski wrote: >>> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker() >>> >>> Sebastian >> I think we need an entirely new API: >> >> user_mm_ctx_t ctx = user_mm_ctx_get(); >> >> ... >> >> use_user_mm_ctx(ctx); >> unuse_user_mm_ctx(ctx); >> >> ... >> >> user_mm_ctx_put(ctx); >> >> and ctx will store a copy of mm and PKRU. >> > > That looks like a good API in general. The ffs_user_copy_worker that > Sebastian mentioned seems to be used by AIO, in which case of course it > has to happen in a kernel thread. > > But while the API is good, deciding on the desired semantics is > "interesting". The submitting thread might be changing PKRU between the > time the I/O operation is submitted and the time it is completed, for > example. I think there’s only one sensible answer: capture PKRU at the time of submission.
On 10/19/2018 09:59 AM, Andy Lutomirski wrote: >> That looks like a good API in general. The ffs_user_copy_worker that >> Sebastian mentioned seems to be used by AIO, in which case of course it >> has to happen in a kernel thread. >> >> But while the API is good, deciding on the desired semantics is >> "interesting". The submitting thread might be changing PKRU between the >> time the I/O operation is submitted and the time it is completed, for >> example. > I think there’s only one sensible answer: capture PKRU at the time of submission. I think it's much more straightforward to just not enforce pkeys. Having this "phantom" value could cause a very odd, nearly undebuggable I/O failure.
> On Oct 19, 2018, at 10:01 AM, Dave Hansen <dave.hansen@linux.intel.com> wrote: > > On 10/19/2018 09:59 AM, Andy Lutomirski wrote: >>> That looks like a good API in general. The ffs_user_copy_worker that >>> Sebastian mentioned seems to be used by AIO, in which case of course it >>> has to happen in a kernel thread. >>> >>> But while the API is good, deciding on the desired semantics is >>> "interesting". The submitting thread might be changing PKRU between the >>> time the I/O operation is submitted and the time it is completed, for >>> example. >> I think there’s only one sensible answer: capture PKRU at the time of submission. > > I think it's much more straightforward to just not enforce pkeys. > Having this "phantom" value could cause a very odd, nearly undebuggable > I/O failure. But now we have the reverse. The IO can work if it’s truly async but, if the kernel decides to synchronously complete IO (with GUP or copy_to_user), it’ll fail, right. This isn’t exactly friendly either.
On 10/19/2018 10:37 AM, Andy Lutomirski wrote: >> I think it's much more straightforward to just not enforce pkeys. >> Having this "phantom" value could cause a very odd, nearly >> undebuggable I/O failure. > But now we have the reverse. The IO can work if it’s truly async but, > if the kernel decides to synchronously complete IO (with GUP or > copy_to_user), it’ll fail, right. This isn’t exactly friendly > either. Yeah, but a synchronous I/O failure is really straightforward to debug because you get an immediate error message about it. This is certainly not the weirdest behavior or asymmetry that we would see from synchronous vs. async I/O.
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 956d967ca824a..4ecaf4d22954e 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -14,6 +14,7 @@ #include <linux/compat.h> #include <linux/sched.h> #include <linux/slab.h> +#include <linux/pkeys.h> #include <asm/user.h> #include <asm/fpu/api.h> @@ -573,20 +574,23 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu) bool load_fpu; load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized; - if (!load_fpu) - return; - - __fpregs_load_activate(new_fpu, cpu); - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS if (static_cpu_has(X86_FEATURE_OSPKE)) { struct pkru_state *pk; - pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - if (pk->pkru != __read_pkru()) - __write_pkru(pk->pkru); + if (!load_fpu) { + pkru_set_init_value(); + } else { + pk = __raw_xsave_addr(&new_fpu->state.xsave, + XFEATURE_PKRU); + if (pk->pkru != __read_pkru()) + __write_pkru(pk->pkru); + } } #endif + if (!load_fpu) + return; + __fpregs_load_activate(new_fpu, cpu); } /*
The PKRU value is not set for kernel threads because they do not have the ->initialized value set. As a result the kernel thread has a random PKRU value set which it inherits from the previous task. It has been suggested by Paolo Bonzini to set it for kernel threads, too because it might be a fix. I *think* this is not required because the kernel threads don't copy data to/from userland and don't have access to any userspace mm in general. However there is this use_mm(). If we gain a mm by use_mm() we don't have a matching PKRU value because those are per thread. It has been suggested to use 0 as the PKRU value but this would bypass PKRU. Set the initial (default) PKRU value for kernel threads. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- arch/x86/include/asm/fpu/internal.h | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)