diff mbox

[v4.1,02/10] asm/nospec, array_ptr: sanitize speculative array de-references

Message ID 151648236968.34747.390507912161115784.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Jan. 20, 2018, 9:06 p.m. UTC
'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 include/linux/nospec.h

Comments

Russell King (Oracle) Jan. 21, 2018, 10:40 a.m. UTC | #1
On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.

The more times I see this the more times I'm unhappy with this comment:

1. does this really mean "idx > size" or "idx >= size" ?  The code
   implements the latter not the former.

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
   or "the sign bit" would be better.

3. "and the value of ~(-1L) is zero."  So does this mean that when
   0 <= idx < size, somehow the rules of logic change and ~(-1L)
   magically becomes no longer zero!

I'd suggest changing the description to something like:

  * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

or:

  * When idx is out of bounds (iow, is negative or idx >= sz), the sign
  * bit will be set. Extend the sign bit to all bits and invert, giving
  * a result of zero for an out of bounds idx, or ~0UL if within bounds.

depending on how deeply you want to describe what's going on here.
Jann Horn Jan. 21, 2018, 3:07 p.m. UTC | #2
On Sun, Jan 21, 2018, 11:40 Russell King - ARM Linux <linux@armlinux.org.uk>
wrote:

> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
> > +/*
> > + * If idx is negative or if idx > size then bit 63 is set in the mask,
> > + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> > + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is equivalent
to "!(idx > sz - 1)", which is equivalent to "idx <= sz - 1", which is
(ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first part of
the condition.
Hmm... a problematic case might be "sz==0 && idx==2^(BITS_PER_LONG-1)-1".
The first part of the expression wouldn't trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1)
== 2^(BITS_PER_LONG-1)", which also wouldn't trigger, I think?

2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps
> up
> According to speedtest.net: 8.21Mbps down 510kbps up
>
Jann Horn Jan. 21, 2018, 8:15 p.m. UTC | #3
[whoops, resending as non-HTML mail]

On Sun, Jan 21, 2018 at 11:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.

Copying the code here for context:
return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);

That part of the condition (ignoring the overflow edgecases) is
equivalent to "!(idx > sz - 1)", which is equivalent to "idx <= sz -
1", which is (ignoring overflow edgecases) equivalent to "idx < sz".

Handling of edgecases:
idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first
part of the condition.
Hmm... a problematic case might be "sz==0 &&
idx==2^(BITS_PER_LONG-1)-1". The first part of the expression wouldn't
trigger, the second part would be
"2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1) == 2^(BITS_PER_LONG-1)", which
also wouldn't trigger, I think?
Jann Horn Jan. 21, 2018, 8:24 p.m. UTC | #4
On Sun, Jan 21, 2018 at 9:15 PM, Jann Horn <jannh@google.com> wrote:
> [whoops, resending as non-HTML mail]
>
> On Sun, Jan 21, 2018 at 11:40 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
>> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>>> +/*
>>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>>> + * failed, array_ptr will return NULL.
>>
>> The more times I see this the more times I'm unhappy with this comment:
>>
>> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>>    implements the latter not the former.
>
> Copying the code here for context:
> return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>
> That part of the condition (ignoring the overflow edgecases) is
> equivalent to "!(idx > sz - 1)", which is equivalent to "idx <= sz -
> 1", which is (ignoring overflow edgecases) equivalent to "idx < sz".
>
> Handling of edgecases:
> idx>=2^(BITS_PER_LONG-1) will cause a NULL return through the first
> part of the condition.
> Hmm... a problematic case might be "sz==0 &&
> idx==2^(BITS_PER_LONG-1)-1". The first part of the expression wouldn't
> trigger, the second part would be
> "2^(BITS_PER_LONG)-1-(2^(BITS_PER_LONG-1)-1) ==
> 2^(BITS_PER_LONG)-2^(BITS_PER_LONG-1) == 2^(BITS_PER_LONG-1)", which
> also wouldn't trigger, I think?

Er, of course 2^(BITS_PER_LONG-1) still has the high bit set.
Sorry for the noise.
Dan Williams Jan. 22, 2018, 6:07 p.m. UTC | #5
On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sat, Jan 20, 2018 at 01:06:09PM -0800, Dan Williams wrote:
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>
> The more times I see this the more times I'm unhappy with this comment:
>
> 1. does this really mean "idx > size" or "idx >= size" ?  The code
>    implements the latter not the former.
>
> 2. is "bit 63" relevant here - what if longs are 32-bit?  "the top bit"
>    or "the sign bit" would be better.
>
> 3. "and the value of ~(-1L) is zero."  So does this mean that when
>    0 <= idx < size, somehow the rules of logic change and ~(-1L)
>    magically becomes no longer zero!
>
> I'd suggest changing the description to something like:
>
>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> or:
>
>   * When idx is out of bounds (iow, is negative or idx >= sz), the sign
>   * bit will be set. Extend the sign bit to all bits and invert, giving
>   * a result of zero for an out of bounds idx, or ~0UL if within bounds.
>
> depending on how deeply you want to describe what's going on here.

Sounds good Russell, I'll fold the longer description in for the next
version to save the next person needing to decode this technique.
Linus Torvalds Jan. 22, 2018, 6:37 p.m. UTC | #6
On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
>>
>> I'd suggest changing the description to something like:
>>
>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.

Yes. HOWEVER.

We should make it clear that the signedness of the comparisons is
undefined, and that 'size' should always be positive for the above to
be well-specified.

Why?

Because one valid implementation of the above is:

   idx U< size ? ~0UL : 0

as a single unsigned comparison (I'm using "U<" as a "unsigned less than").

But an equally valid implementation of it might be

   idx S>= 0 && idx S< size ? ~0UL : 0

which does two signed comparisons instead.

And they are not exactly the same. They are the same _IFF_ 'size' is
positive in 'long'. But if size is an unsigned value so big as to be
negative in 'long', they give different results for 'idx'.

In fact, the ALU-only version actually did a third version of the
above that only uses "signed comparison against zero":

   idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0

which again is equivalent only when 'size' is positive in 'long'.

Note that 'idx' itself can have any range (that's kind of the _point_,
after all: checking that idx is in some range). But the logic only
works for a positive array size.

Which honestly is not really a limitation, but it's worth spelling out, I think.

In particular, if you have a user pointer, and a 3G:1G split in
user:kernel address space, you camn *not* do something like

    uptr = array_ptr(NULL, userptr, TASK_SIZE);

to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.

Hmm?

                     Linus
Dan Williams Jan. 22, 2018, 6:52 p.m. UTC | #7
On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 22, 2018 at 10:07 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Sun, Jan 21, 2018 at 2:40 AM, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>>>
>>> I'd suggest changing the description to something like:
>>>
>>>   * If 0 <= idx < size, return a mask of ~0UL, otherwise return zero.
>
> Yes. HOWEVER.
>
> We should make it clear that the signedness of the comparisons is
> undefined, and that 'size' should always be positive for the above to
> be well-specified.
>
> Why?
>
> Because one valid implementation of the above is:
>
>    idx U< size ? ~0UL : 0
>
> as a single unsigned comparison (I'm using "U<" as a "unsigned less than").
>
> But an equally valid implementation of it might be
>
>    idx S>= 0 && idx S< size ? ~0UL : 0
>
> which does two signed comparisons instead.
>
> And they are not exactly the same. They are the same _IFF_ 'size' is
> positive in 'long'. But if size is an unsigned value so big as to be
> negative in 'long', they give different results for 'idx'.
>
> In fact, the ALU-only version actually did a third version of the
> above that only uses "signed comparison against zero":
>
>    idx S> 0 && (size - idx - 1) S>= 0 ? ~0UL : 0
>
> which again is equivalent only when 'size' is positive in 'long'.
>
> Note that 'idx' itself can have any range (that's kind of the _point_,
> after all: checking that idx is in some range). But the logic only
> works for a positive array size.
>
> Which honestly is not really a limitation, but it's worth spelling out, I think.
>
> In particular, if you have a user pointer, and a 3G:1G split in
> user:kernel address space, you camn *not* do something like
>
>     uptr = array_ptr(NULL, userptr, TASK_SIZE);
>
> to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
>
> Hmm?

We could at least have a BUILD_BUG_ON when 'size' is a
builtin_contstant and greater than LONG_MAX. Alternatively we could
require archs to provide the equivalent of the x86 array_ptr_mask()
that does not have the LONG_MAX limitation?
Will Deacon Jan. 23, 2018, 10:17 a.m. UTC | #8
On Mon, Jan 22, 2018 at 10:52:54AM -0800, Dan Williams wrote:
> On Mon, Jan 22, 2018 at 10:37 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Note that 'idx' itself can have any range (that's kind of the _point_,
> > after all: checking that idx is in some range). But the logic only
> > works for a positive array size.
> >
> > Which honestly is not really a limitation, but it's worth spelling out, I think.

Agreed. We should be absolutely clear about when this thing works and when
it doesn't.

> > In particular, if you have a user pointer, and a 3G:1G split in
> > user:kernel address space, you camn *not* do something like
> >
> >     uptr = array_ptr(NULL, userptr, TASK_SIZE);
> >
> > to get a sanitized user pointer, because TASK_SIZE is not positive in 'long'.
> >
> > Hmm?
> 
> We could at least have a BUILD_BUG_ON when 'size' is a
> builtin_contstant and greater than LONG_MAX. Alternatively we could
> require archs to provide the equivalent of the x86 array_ptr_mask()
> that does not have the LONG_MAX limitation?

I'd rather avoid imposing that limitation and instead just treat uaccess
specially. It looks like we can satisfy the current definition of
array_ptr_mask with three instructions, but I'm not sure how we could get it
working on arm64 if we needed to deal with arbitrary sizes (particularly as
we're trying to avoid conditional instructions).

Will
diff mbox

Patch

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..dd3aa05fab87
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@ 
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+	return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz)					\
+({									\
+	union { typeof(*(base)) *_ptr; unsigned long _bit; } __u;	\
+	typeof(*(base)) *_arr = (base);					\
+	unsigned long _i = (idx);					\
+	unsigned long _mask = array_ptr_mask(_i, (sz));			\
+									\
+	__u._ptr = _arr + _i;						\
+	__u._bit &= _mask;						\
+	__u._ptr;							\
+})
+#endif /* __NOSPEC_H__ */