diff mbox series

[05/11] x86/fpu: set PKRU state for kernel threads

Message ID 20181004140547.13014-6-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show
Series x86: load FPU registers on return to userland | expand

Commit Message

Sebastian Andrzej Siewior Oct. 4, 2018, 2:05 p.m. UTC
The PKRU value is not set for kernel threads because they do not have
the ->initialized value set. As a result the kernel thread has a random
PKRU value set which it inherits from the previous task.
It has been suggested by Paolo Bonzini to set it for kernel threads, too
because it might be a fix.
I *think* this is not required because the kernel threads don't copy
data to/from userland and don't have access to any userspace mm in
general.
However there is this use_mm(). If we gain a mm by use_mm() we don't
have a matching PKRU value because those are per thread. It has been
suggested to use 0 as the PKRU value but this would bypass PKRU.

Set the initial (default) PKRU value for kernel threads.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Dave Hansen Oct. 12, 2018, 5:54 p.m. UTC | #1
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> The PKRU value is not set for kernel threads because they do not have
> the ->initialized value set. As a result the kernel thread has a random
> PKRU value set which it inherits from the previous task.
> It has been suggested by Paolo Bonzini to set it for kernel threads, too
> because it might be a fix.
> I *think* this is not required because the kernel threads don't copy
> data to/from userland and don't have access to any userspace mm in
> general.
> However there is this use_mm(). If we gain a mm by use_mm() we don't
> have a matching PKRU value because those are per thread. It has been
> suggested to use 0 as the PKRU value but this would bypass PKRU.
> 
> Set the initial (default) PKRU value for kernel threads.

We might want to do this for cleanliness reasons...  Maybe.

But this *should* have no practical effects.  Kernel threads have no
real 'mm' and no user pages.  They should not have do access to user
mappings.  Protection keys *only* apply to user mappings.  Thus,
logically, they should never be affected by PKRU values.

So I'm kinda missing the point of the patch.
Andy Lutomirski Oct. 12, 2018, 6:02 p.m. UTC | #2
On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > The PKRU value is not set for kernel threads because they do not have
> > the ->initialized value set. As a result the kernel thread has a random
> > PKRU value set which it inherits from the previous task.
> > It has been suggested by Paolo Bonzini to set it for kernel threads, too
> > because it might be a fix.
> > I *think* this is not required because the kernel threads don't copy
> > data to/from userland and don't have access to any userspace mm in
> > general.
> > However there is this use_mm(). If we gain a mm by use_mm() we don't
> > have a matching PKRU value because those are per thread. It has been
> > suggested to use 0 as the PKRU value but this would bypass PKRU.
> >
> > Set the initial (default) PKRU value for kernel threads.
>
> We might want to do this for cleanliness reasons...  Maybe.
>
> But this *should* have no practical effects.  Kernel threads have no
> real 'mm' and no user pages.  They should not have do access to user
> mappings.  Protection keys *only* apply to user mappings.  Thus,
> logically, they should never be affected by PKRU values.
>
> So I'm kinda missing the point of the patch.

use_mm().
Sebastian Andrzej Siewior Oct. 18, 2018, 4:26 p.m. UTC | #3
On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
> >
> > On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > > The PKRU value is not set for kernel threads because they do not have
> > > the ->initialized value set. As a result the kernel thread has a random
> > > PKRU value set which it inherits from the previous task.
> > > It has been suggested by Paolo Bonzini to set it for kernel threads, too
> > > because it might be a fix.
> > > I *think* this is not required because the kernel threads don't copy
> > > data to/from userland and don't have access to any userspace mm in
> > > general.
> > > However there is this use_mm(). If we gain a mm by use_mm() we don't
> > > have a matching PKRU value because those are per thread. It has been
> > > suggested to use 0 as the PKRU value but this would bypass PKRU.
> > >
> > > Set the initial (default) PKRU value for kernel threads.
> >
> > We might want to do this for cleanliness reasons...  Maybe.
> >
> > But this *should* have no practical effects.  Kernel threads have no
> > real 'mm' and no user pages.  They should not have do access to user
> > mappings.  Protection keys *only* apply to user mappings.  Thus,
> > logically, they should never be affected by PKRU values.
> >
> > So I'm kinda missing the point of the patch.
> 
> use_mm().

So. I would drop that patch from queue. Anyone feels different about it?

Sebastian
Andy Lutomirski Oct. 18, 2018, 4:48 p.m. UTC | #4
> On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
>> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
>> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
>> <dave.hansen@linux.intel.com> wrote:
>>> 
>>>> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
>>>> The PKRU value is not set for kernel threads because they do not have
>>>> the ->initialized value set. As a result the kernel thread has a random
>>>> PKRU value set which it inherits from the previous task.
>>>> It has been suggested by Paolo Bonzini to set it for kernel threads, too
>>>> because it might be a fix.
>>>> I *think* this is not required because the kernel threads don't copy
>>>> data to/from userland and don't have access to any userspace mm in
>>>> general.
>>>> However there is this use_mm(). If we gain a mm by use_mm() we don't
>>>> have a matching PKRU value because those are per thread. It has been
>>>> suggested to use 0 as the PKRU value but this would bypass PKRU.
>>>> 
>>>> Set the initial (default) PKRU value for kernel threads.
>>> 
>>> We might want to do this for cleanliness reasons...  Maybe.
>>> 
>>> But this *should* have no practical effects.  Kernel threads have no
>>> real 'mm' and no user pages.  They should not have do access to user
>>> mappings.  Protection keys *only* apply to user mappings.  Thus,
>>> logically, they should never be affected by PKRU values.
>>> 
>>> So I'm kinda missing the point of the patch.
>> 
>> use_mm().
> 
> So. I would drop that patch from queue. Anyone feels different about it?
> 

I think we *do* want the patch. It’s a bugfix for use_mm users, right?

> Sebastian
Dave Hansen Oct. 18, 2018, 5:47 p.m. UTC | #5
On 10/18/2018 09:48 AM, Andy Lutomirski wrote:
>>>> We might want to do this for cleanliness reasons...  Maybe.
>>>>
>>>> But this *should* have no practical effects.  Kernel threads have no
>>>> real 'mm' and no user pages.  They should not have do access to user
>>>> mappings.  Protection keys *only* apply to user mappings.  Thus,
>>>> logically, they should never be affected by PKRU values.
>>>>
>>>> So I'm kinda missing the point of the patch.
>>> use_mm().
>> So. I would drop that patch from queue. Anyone feels different about it?
>>
> I think we *do* want the patch. It’s a bugfix for use_mm users, right?

Yes, we need it.  I was being dense and Andy kindly reminded me of the
point of the patch.
Sebastian Andrzej Siewior Oct. 18, 2018, 6:25 p.m. UTC | #6
On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote:
> > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> >>> So I'm kinda missing the point of the patch.
> >> 
> >> use_mm().
> > 
> > So. I would drop that patch from queue. Anyone feels different about it?
> > 
> 
> I think we *do* want the patch. It’s a bugfix for use_mm users, right?

This is the loophole that has been pointed out. I am not convinced what
the correct behaviour should be here (and we have five users of that
interface). For instance f_fs[0].  It reads data from the USB EP and
then writes it to userland task. Due to $circumstances it happens in a
workqueue instead of the task's context.  So it borrows the mm with
use_mm().  The current behaviour random because the PKRU value can not
be predicted. It may or may not work.

Setting it to allow-all/none would let the operation always fail or
succeed which might be an improvement in terms of debugging. However it
is hard to judge what the correct behaviour should be. Should fail or
succeed.
But this is not the only loophole: There is ptrace interface which is
used by gdb (just checked) and also bypasses PKRU. So…

[0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()

Sebastian
Andy Lutomirski Oct. 18, 2018, 8:46 p.m. UTC | #7
On Thu, Oct 18, 2018 at 11:25 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote:
> > > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> > >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> > >>> So I'm kinda missing the point of the patch.
> > >>
> > >> use_mm().
> > >
> > > So. I would drop that patch from queue. Anyone feels different about it?
> > >
> >
> > I think we *do* want the patch. It’s a bugfix for use_mm users, right?
>
> This is the loophole that has been pointed out. I am not convinced what
> the correct behaviour should be here (and we have five users of that
> interface). For instance f_fs[0].  It reads data from the USB EP and
> then writes it to userland task. Due to $circumstances it happens in a
> workqueue instead of the task's context.  So it borrows the mm with
> use_mm().  The current behaviour random because the PKRU value can not
> be predicted. It may or may not work.
>
> Setting it to allow-all/none would let the operation always fail or
> succeed which might be an improvement in terms of debugging. However it
> is hard to judge what the correct behaviour should be. Should fail or
> succeed.
> But this is not the only loophole: There is ptrace interface which is
> used by gdb (just checked) and also bypasses PKRU. So…
>
> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>
> Sebastian

I think we need an entirely new API:

user_mm_ctx_t ctx = user_mm_ctx_get();

...

use_user_mm_ctx(ctx);
unuse_user_mm_ctx(ctx);

...

user_mm_ctx_put(ctx);

and ctx will store a copy of mm and PKRU.
Dave Hansen Oct. 18, 2018, 8:56 p.m. UTC | #8
On 10/18/2018 01:46 PM, Andy Lutomirski wrote:
> Setting it to allow-all/none would let the operation always fail or
> succeed which might be an improvement in terms of debugging. However it
> is hard to judge what the correct behaviour should be. Should fail or
> succeed.

Succeed. :)

> But this is not the only loophole: There is ptrace interface which is
> used by gdb (just checked) and also bypasses PKRU. So…

Bypassing protection keys is not a big deal IMNHO.  In places where a
sane one is not readily available, I'm totally fine with just
effectively disabling it (PKRU=0) for the length of time it isn't available.
Sebastian Andrzej Siewior Oct. 18, 2018, 9:24 p.m. UTC | #9
On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote:
> > But this is not the only loophole: There is ptrace interface which is
> > used by gdb (just checked) and also bypasses PKRU. So…
> 
> Bypassing protection keys is not a big deal IMNHO.  In places where a
> sane one is not readily available, I'm totally fine with just
> effectively disabling it (PKRU=0) for the length of time it isn't available.

Okay, this makes things easier. Let document that for kernel threads we
use PKRU=0. This should be happening in my try right now. I double check
tomorrow just in case…

Sebastian
Andy Lutomirski Oct. 18, 2018, 9:58 p.m. UTC | #10
> On Oct 18, 2018, at 2:24 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote:
>>> But this is not the only loophole: There is ptrace interface which is
>>> used by gdb (just checked) and also bypasses PKRU. So…
>> 
>> Bypassing protection keys is not a big deal IMNHO.  In places where a
>> sane one is not readily available, I'm totally fine with just
>> effectively disabling it (PKRU=0) for the length of time it isn't available.
> 
> Okay, this makes things easier. Let document that for kernel threads we
> use PKRU=0. This should be happening in my try right now. I double check
> tomorrow just in case…
> 
> 

If you document that, please at least document that it’s a bug and not intended behavior.
Paolo Bonzini Oct. 19, 2018, 7:44 a.m. UTC | #11
On 18/10/2018 22:46, Andy Lutomirski wrote:
>> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>>
>> Sebastian
> I think we need an entirely new API:
> 
> user_mm_ctx_t ctx = user_mm_ctx_get();
> 
> ...
> 
> use_user_mm_ctx(ctx);
> unuse_user_mm_ctx(ctx);
> 
> ...
> 
> user_mm_ctx_put(ctx);
> 
> and ctx will store a copy of mm and PKRU.
> 

That looks like a good API in general.  The ffs_user_copy_worker that
Sebastian mentioned seems to be used by AIO, in which case of course it
has to happen in a kernel thread.

But while the API is good, deciding on the desired semantics is
"interesting".  The submitting thread might be changing PKRU between the
time the I/O operation is submitted and the time it is completed, for
example.  You could look up the task's PKRU at use_mm time, except that
the task might have disappeared...  In the end just using PKRU=0 makes
some sense and it only should be documented that some kernel services
will ignore PKRU.

Paolo
Andy Lutomirski Oct. 19, 2018, 4:59 p.m. UTC | #12
> On Oct 19, 2018, at 12:44 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 18/10/2018 22:46, Andy Lutomirski wrote:
>>> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>>> 
>>> Sebastian
>> I think we need an entirely new API:
>> 
>> user_mm_ctx_t ctx = user_mm_ctx_get();
>> 
>> ...
>> 
>> use_user_mm_ctx(ctx);
>> unuse_user_mm_ctx(ctx);
>> 
>> ...
>> 
>> user_mm_ctx_put(ctx);
>> 
>> and ctx will store a copy of mm and PKRU.
>> 
> 
> That looks like a good API in general.  The ffs_user_copy_worker that
> Sebastian mentioned seems to be used by AIO, in which case of course it
> has to happen in a kernel thread.
> 
> But while the API is good, deciding on the desired semantics is
> "interesting".  The submitting thread might be changing PKRU between the
> time the I/O operation is submitted and the time it is completed, for
> example.

I think there’s only one sensible answer: capture PKRU at the time of submission.
Dave Hansen Oct. 19, 2018, 5:01 p.m. UTC | #13
On 10/19/2018 09:59 AM, Andy Lutomirski wrote:
>> That looks like a good API in general.  The ffs_user_copy_worker that
>> Sebastian mentioned seems to be used by AIO, in which case of course it
>> has to happen in a kernel thread.
>>
>> But while the API is good, deciding on the desired semantics is
>> "interesting".  The submitting thread might be changing PKRU between the
>> time the I/O operation is submitted and the time it is completed, for
>> example.
> I think there’s only one sensible answer: capture PKRU at the time of submission.

I think it's much more straightforward to just not enforce pkeys.
Having this "phantom" value could cause a very odd, nearly undebuggable
I/O failure.
Andy Lutomirski Oct. 19, 2018, 5:37 p.m. UTC | #14
> On Oct 19, 2018, at 10:01 AM, Dave Hansen <dave.hansen@linux.intel.com> wrote:
> 
> On 10/19/2018 09:59 AM, Andy Lutomirski wrote:
>>> That looks like a good API in general.  The ffs_user_copy_worker that
>>> Sebastian mentioned seems to be used by AIO, in which case of course it
>>> has to happen in a kernel thread.
>>> 
>>> But while the API is good, deciding on the desired semantics is
>>> "interesting".  The submitting thread might be changing PKRU between the
>>> time the I/O operation is submitted and the time it is completed, for
>>> example.
>> I think there’s only one sensible answer: capture PKRU at the time of submission.
> 
> I think it's much more straightforward to just not enforce pkeys.
> Having this "phantom" value could cause a very odd, nearly undebuggable
> I/O failure.

But now we have the reverse. The IO can work if it’s truly async but, if the kernel decides to synchronously complete IO (with GUP or copy_to_user), it’ll fail, right. This isn’t exactly friendly either.
Dave Hansen Oct. 19, 2018, 6:26 p.m. UTC | #15
On 10/19/2018 10:37 AM, Andy Lutomirski wrote:
>> I think it's much more straightforward to just not enforce pkeys. 
>> Having this "phantom" value could cause a very odd, nearly
>> undebuggable I/O failure.
> But now we have the reverse. The IO can work if it’s truly async but,
> if the kernel decides to synchronously complete IO (with GUP or
> copy_to_user), it’ll fail, right. This isn’t exactly friendly
> either.

Yeah, but a synchronous I/O failure is really straightforward to debug
because you get an immediate error message about it.  This is certainly
not the weirdest behavior or asymmetry that we would see from
synchronous vs. async I/O.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 956d967ca824a..4ecaf4d22954e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -14,6 +14,7 @@ 
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/pkeys.h>
 
 #include <asm/user.h>
 #include <asm/fpu/api.h>
@@ -573,20 +574,23 @@  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	bool load_fpu;
 
 	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
-	if (!load_fpu)
-		return;
-
-	__fpregs_load_activate(new_fpu, cpu);
-
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 	if (static_cpu_has(X86_FEATURE_OSPKE)) {
 		struct pkru_state *pk;
 
-		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
-		if (pk->pkru != __read_pkru())
-			__write_pkru(pk->pkru);
+		if (!load_fpu) {
+			pkru_set_init_value();
+		} else {
+			pk = __raw_xsave_addr(&new_fpu->state.xsave,
+					      XFEATURE_PKRU);
+			if (pk->pkru != __read_pkru())
+				__write_pkru(pk->pkru);
+		}
 	}
 #endif
+	if (!load_fpu)
+		return;
+	__fpregs_load_activate(new_fpu, cpu);
 }
 
 /*