diff mbox

[v5,02/12] array_idx: sanitize speculative array de-references

Message ID 151703972396.26578.7326612698912543866.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Jan. 27, 2018, 7:55 a.m. UTC
'array_idx' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_idx' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Cyril Novikov <cnovikov@lynx.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 include/linux/nospec.h

Comments

Ingo Molnar Jan. 28, 2018, 8:55 a.m. UTC | #1
Firstly, I only got a few patches of this series so I couldn't review all of them 
- please Cc: me to all future Meltdown and Spectre related patches!

* Dan Williams <dan.j.williams@intel.com> wrote:

> 'array_idx' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_idx' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

nit: Stray closing parenthesis

s/cpus/CPUs

> Based on an original implementation by Linus Torvalds, tweaked to remove
> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
> introduce an x86 assembly implementation for the mask generation.
> 
> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>  create mode 100644 include/linux/nospec.h
> 
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> new file mode 100644
> index 000000000000..f59f81889ba3
> --- /dev/null
> +++ b/include/linux/nospec.h
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright(c) 2018 Intel Corporation. All rights reserved.

Given the close similarity of Linus's array_access() prototype pseudocode there 
should probably also be:

    Copyright (C) 2018 Linus Torvalds

in that file?

> +
> +#ifndef __NOSPEC_H__
> +#define __NOSPEC_H__
> +
> +/*
> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> + * Extend the sign bit to all bits and invert, giving a result of zero
> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> + */
> +#ifndef array_idx_mask
> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> +{
> +	/*
> +	 * Warn developers about inappropriate array_idx usage.
> +	 *
> +	 * Even if the cpu speculates past the WARN_ONCE branch, the

s/cpu/CPU

> +	 * sign bit of idx is taken into account when generating the
> +	 * mask.
> +	 *
> +	 * This warning is compiled out when the compiler can infer that
> +	 * idx and sz are less than LONG_MAX.

Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
flowing comment text. Also please use '()' to denote functions/methods.

I.e. something like:

	 * Warn developers about inappropriate array_idx() usage.
	 *
	 * Even if the CPU speculates past the WARN_ONCE() branch, the
	 * sign bit of 'idx' is taken into account when generating the
	 * mask.
	 *
	 * This warning is compiled out when the compiler can infer that
	 * 'idx' and 'sz' are less than LONG_MAX.

That's just one example - please apply it to all comments consistently.

> +	 */
> +	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
> +			"array_idx limited to range of [0, LONG_MAX]\n"))

Same in user facing messages:

			"array_idx() limited to range of [0, LONG_MAX]\n"))

> + * For a code sequence like:
> + *
> + *     if (idx < sz) {
> + *         idx = array_idx(idx, sz);
> + *         val = array[idx];
> + *     }
> + *
> + * ...if the cpu speculates past the bounds check then array_idx() will
> + * clamp the index within the range of [0, sz).

s/cpu/CPU

> + */
> +#define array_idx(idx, sz)						\
> +({									\
> +	typeof(idx) _i = (idx);						\
> +	typeof(sz) _s = (sz);						\
> +	unsigned long _mask = array_idx_mask(_i, _s);			\
> +									\
> +	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
> +	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
> +									\
> +	_i &= _mask;							\
> +	_i;								\
> +})
> +#endif /* __NOSPEC_H__ */

For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have 
a shortage of characters and can deobfuscate common primitives, can we?

Also, beyond the nits, I also hate the namespace here. We have a new generic 
header providing two new methods:

	#include <linux/nospec.h>

	array_idx_mask()
	array_idx()

which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.

Then we introduce uaccess API variants with a _nospec() postfix.

Then we add ifence() to x86.

There's no naming coherency to this.

A better approach would be to signal the 'no speculation' aspect of the 
array_idx() methods already: naming it array_idx_nospec() would be a solution,
as it clearly avoids speculation beyond the array boundaries.

Also, without seeing the full series it's hard to tell, whether the introduction 
of linux/nospec.h is justified, but it feels somewhat suspect.

Thanks,

	Ingo
Thomas Gleixner Jan. 28, 2018, 11:36 a.m. UTC | #2
On Sun, 28 Jan 2018, Ingo Molnar wrote:
> * Dan Williams <dan.j.williams@intel.com> wrote:
> > +
> > +#ifndef __NOSPEC_H__
> > +#define __NOSPEC_H__

_LINUX_NOSPEC_H

We have an established practice for those header guards...

> > +/*
> > + * When idx is out of bounds (idx >= sz), the sign bit will be set.
> > + * Extend the sign bit to all bits and invert, giving a result of zero
> > + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
> > + */
> > +#ifndef array_idx_mask
> > +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
> > +{
> > +	/*
> > +	 * Warn developers about inappropriate array_idx usage.
> > +	 *
> > +	 * Even if the cpu speculates past the WARN_ONCE branch, the
> 
> s/cpu/CPU
> 
> > +	 * sign bit of idx is taken into account when generating the
> > +	 * mask.
> > +	 *
> > +	 * This warning is compiled out when the compiler can infer that
> > +	 * idx and sz are less than LONG_MAX.
> 
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free 
> flowing comment text. Also please use '()' to denote functions/methods.
> 
> I.e. something like:
> 
> 	 * Warn developers about inappropriate array_idx() usage.
> 	 *
> 	 * Even if the CPU speculates past the WARN_ONCE() branch, the
> 	 * sign bit of 'idx' is taken into account when generating the
> 	 * mask.
> 	 *
> 	 * This warning is compiled out when the compiler can infer that
> 	 * 'idx' and 'sz' are less than LONG_MAX.

I rather prefer to have a proper kernel doc instead of the comment above
the function and then use @idx and @sz in the code comments.

Thanks,

	tglx
Dan Williams Jan. 28, 2018, 4:28 p.m. UTC | #3
On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Firstly, I only got a few patches of this series so I couldn't review all of them
> - please Cc: me to all future Meltdown and Spectre related patches!
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> 'array_idx' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_idx' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
> nit: Stray closing parenthesis
>
> s/cpus/CPUs
>
>> Based on an original implementation by Linus Torvalds, tweaked to remove
>> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
>> introduce an x86 assembly implementation for the mask generation.
>>
>> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Co-developed-by: Alexei Starovoitov <ast@kernel.org>
>> Suggested-by: Cyril Novikov <cnovikov@lynx.com>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  include/linux/nospec.h |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 64 insertions(+)
>>  create mode 100644 include/linux/nospec.h
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> new file mode 100644
>> index 000000000000..f59f81889ba3
>> --- /dev/null
>> +++ b/include/linux/nospec.h
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
>
> Given the close similarity of Linus's array_access() prototype pseudocode there
> should probably also be:
>
>     Copyright (C) 2018 Linus Torvalds
>
> in that file?

Yes, and Alexei as well.

>
>> +
>> +#ifndef __NOSPEC_H__
>> +#define __NOSPEC_H__
>> +
>> +/*
>> + * When idx is out of bounds (idx >= sz), the sign bit will be set.
>> + * Extend the sign bit to all bits and invert, giving a result of zero
>> + * for an out of bounds idx, or ~0UL if within bounds [0, sz).
>> + */
>> +#ifndef array_idx_mask
>> +static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>> +{
>> +     /*
>> +      * Warn developers about inappropriate array_idx usage.
>> +      *
>> +      * Even if the cpu speculates past the WARN_ONCE branch, the
>
> s/cpu/CPU
>
>> +      * sign bit of idx is taken into account when generating the
>> +      * mask.
>> +      *
>> +      * This warning is compiled out when the compiler can infer that
>> +      * idx and sz are less than LONG_MAX.
>
> Please use 'idx' and 'sz' in quotes, to make sure they stand out more in free
> flowing comment text. Also please use '()' to denote functions/methods.
>
> I.e. something like:
>
>          * Warn developers about inappropriate array_idx() usage.
>          *
>          * Even if the CPU speculates past the WARN_ONCE() branch, the
>          * sign bit of 'idx' is taken into account when generating the
>          * mask.
>          *
>          * This warning is compiled out when the compiler can infer that
>          * 'idx' and 'sz' are less than LONG_MAX.
>
> That's just one example - please apply it to all comments consistently.
>
>> +      */
>> +     if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
>> +                     "array_idx limited to range of [0, LONG_MAX]\n"))
>
> Same in user facing messages:
>
>                         "array_idx() limited to range of [0, LONG_MAX]\n"))
>
>> + * For a code sequence like:
>> + *
>> + *     if (idx < sz) {
>> + *         idx = array_idx(idx, sz);
>> + *         val = array[idx];
>> + *     }
>> + *
>> + * ...if the cpu speculates past the bounds check then array_idx() will
>> + * clamp the index within the range of [0, sz).
>
> s/cpu/CPU
>
>> + */
>> +#define array_idx(idx, sz)                                           \
>> +({                                                                   \
>> +     typeof(idx) _i = (idx);                                         \
>> +     typeof(sz) _s = (sz);                                           \
>> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
>> +                                                                     \
>> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
>> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
>> +                                                                     \
>> +     _i &= _mask;                                                    \
>> +     _i;                                                             \
>> +})
>> +#endif /* __NOSPEC_H__ */
>
> For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> a shortage of characters and can deobfuscate common primitives, can we?
>
> Also, beyond the nits, I also hate the namespace here. We have a new generic
> header providing two new methods:
>
>         #include <linux/nospec.h>
>
>         array_idx_mask()
>         array_idx()
>
> which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
>
> Then we introduce uaccess API variants with a _nospec() postfix.
>
> Then we add ifence() to x86.
>
> There's no naming coherency to this.

Ingo, I love you, but please take the incredulity down a bit,
especially when I had 'nospec' in all the names in v1. Thomas, Peter,
and Alexei wanted s/nospec_barrier/ifence/ and
s/array_idx_nospec/array_idx/. You can always follow on with a patch
to fix up the names and placements to your liking. While they'll pick
on my name choices, they won't pick on yours, because I simply can't
be bothered to care about a bikeshed color at this point after being
bounced around for 5 revisions of this patch set.

> A better approach would be to signal the 'no speculation' aspect of the
> array_idx() methods already: naming it array_idx_nospec() would be a solution,
> as it clearly avoids speculation beyond the array boundaries.
>
> Also, without seeing the full series it's hard to tell, whether the introduction
> of linux/nospec.h is justified, but it feels somewhat suspect.
>
Ingo Molnar Jan. 28, 2018, 6:33 p.m. UTC | #4
* Dan Williams <dan.j.williams@intel.com> wrote:

> Thomas, Peter, and Alexei wanted s/nospec_barrier/ifence/ and 

I just checked past discussions, and I cannot find that part, got any links or 
message-IDs?

PeterZ's feedback on Jan 8 was:

> On Sun, Jan 07, 2018 at 06:24:11PM -0800, Alexei Starovoitov wrote:
> > How about:
> > CONFIG_SPECTRE1_WORKAROUND_INDEX_MASK
> > CONFIG_SPECTRE1_WORKAROUND_LOAD_FENCE
>
> INSTRUCTION_FENCE if anything. LFENCE for Intel (and now also for AMD as
> per 0592b0bce169) is a misnomer, IFENCE would be a better name for it.

Which in that context clearly talked about the config space and how to name the 
instruction semantics in light of the confusion of LFENCE and MFENCE opcodes on 
Intel and AMD CPUs...

Also, those early reviews were fundamentally low level feedback related to the 
actual functionality of the barriers and their mapping to the hardware.

But the fact is, the current series introduces an inconsistent barrier namespace 
extension of:

	barrier()
	barrier_data()
	mb()
	rmb()
	wmb()
	store_mb()
	read_barrier_depends()
	...
+	ifence()
+	array_idx()
+	array_idx_mask()

This isn't bikeshed painting: _ALL_ existing barrier API names have 'barrier' or 
its abbreviation 'mb' (memory barrier) somewhere in their names, which makes it 
pretty easy to recognize them at a glance.

I'm giving you high level API naming feedback because we are now growing the size 
of the barrier API.

array_idx() on the other hand is pretty much close to a 'worst possible' name:

 - it's named in an overly generic, opaque fashion
 - doesn't indicate it at all that it's a barrier for something

... and since we want all kernel developers to use these facilities correctly, we 
want the names to be good and suggestive as well.

I'd accept pretty much anything else that adds 'barrier' or 'nospec' to the name: 
array_idx_barrier() or array_idx_nospec(). (I'm slightly leaning towards 'nospec' 
because it's more indicative of what is being done, and it also is what we do for 
get uaccess APIs.)

ifence() is a similar departure from existing barrier naming nomenclature, and I'd 
accept pretty much any other variant:

	barrier_nospec()
	ifence_nospec()

The kernel namespace cleanliness rules are clear and consistent, and there's 
nothing new about them:

 - the name of the API should unambiguously refer back to the API category. For
   barriers this common reference is 'barrier' or 'mb'.

 - use postfixes or prefixes consistently: pick one and don't mix them. If we go 
   with a _nospec() variant for the uaccess API names then we should use a similar
   naming for array_idx() and for the new barrier as well - no mixing.

> You can always follow on with a patch to fix up the names and placements to your 
> liking. While they'll pick on my name choices, they won't pick on yours, because 
> I simply can't be bothered to care about a bikeshed color at this point after 
> being bounced around for 5 revisions of this patch set.

Sorry, this kind of dismissive and condescending attitude won't cut it.

Thanks,

	Ingo
Thomas Gleixner Jan. 28, 2018, 6:36 p.m. UTC | #5
On Sun, 28 Jan 2018, Dan Williams wrote:
> On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> + */
> >> +#define array_idx(idx, sz)                                           \
> >> +({                                                                   \
> >> +     typeof(idx) _i = (idx);                                         \
> >> +     typeof(sz) _s = (sz);                                           \
> >> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
> >> +                                                                     \
> >> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
> >> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
> >> +                                                                     \
> >> +     _i &= _mask;                                                    \
> >> +     _i;                                                             \
> >> +})
> >> +#endif /* __NOSPEC_H__ */
> >
> > For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
> > a shortage of characters and can deobfuscate common primitives, can we?
> >
> > Also, beyond the nits, I also hate the namespace here. We have a new generic
> > header providing two new methods:
> >
> >         #include <linux/nospec.h>
> >
> >         array_idx_mask()
> >         array_idx()
> >
> > which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
> >
> > Then we introduce uaccess API variants with a _nospec() postfix.
> >
> > Then we add ifence() to x86.
> >
> > There's no naming coherency to this.
> 
> Ingo, I love you, but please take the incredulity down a bit,
> especially when I had 'nospec' in all the names in v1. Thomas, Peter,
> and Alexei wanted s/nospec_barrier/ifence/ and

Sorry, I never was involved in that discussion.

> s/array_idx_nospec/array_idx/. You can always follow on with a patch
> to fix up the names and placements to your liking. While they'll pick
> on my name choices, they won't pick on yours, because I simply can't
> be bothered to care about a bikeshed color at this point after being
> bounced around for 5 revisions of this patch set.

Oh well, we really need this kind of attitude right now. We are all fed up
with that mess, but Ingo and I care about the details, consistency and
general code quality beyond the current rush to get stuff solved. It's our
damned job as maintainers.

If you decide you don't care anymore, please let me know, so I can try to
free up some cycles to pick up the stuff from where you decided to dump it.

Thanks,

	tglx
Dan Williams Jan. 29, 2018, 4:45 p.m. UTC | #6
On Sun, Jan 28, 2018 at 10:33 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Thomas, Peter, and Alexei wanted s/nospec_barrier/ifence/ and
>
> I just checked past discussions, and I cannot find that part, got any links or
> message-IDs?
>
> PeterZ's feedback on Jan 8 was:
>
>> On Sun, Jan 07, 2018 at 06:24:11PM -0800, Alexei Starovoitov wrote:
>> > How about:
>> > CONFIG_SPECTRE1_WORKAROUND_INDEX_MASK
>> > CONFIG_SPECTRE1_WORKAROUND_LOAD_FENCE
>>
>> INSTRUCTION_FENCE if anything. LFENCE for Intel (and now also for AMD as
>> per 0592b0bce169) is a misnomer, IFENCE would be a better name for it.
>
> Which in that context clearly talked about the config space and how to name the
> instruction semantics in light of the confusion of LFENCE and MFENCE opcodes on
> Intel and AMD CPUs...
>
> Also, those early reviews were fundamentally low level feedback related to the
> actual functionality of the barriers and their mapping to the hardware.
>
> But the fact is, the current series introduces an inconsistent barrier namespace
> extension of:
>
>         barrier()
>         barrier_data()
>         mb()
>         rmb()
>         wmb()
>         store_mb()
>         read_barrier_depends()
>         ...
> +       ifence()
> +       array_idx()
> +       array_idx_mask()
>
> This isn't bikeshed painting: _ALL_ existing barrier API names have 'barrier' or
> its abbreviation 'mb' (memory barrier) somewhere in their names, which makes it
> pretty easy to recognize them at a glance.
>
> I'm giving you high level API naming feedback because we are now growing the size
> of the barrier API.
>
> array_idx() on the other hand is pretty much close to a 'worst possible' name:
>
>  - it's named in an overly generic, opaque fashion
>  - doesn't indicate it at all that it's a barrier for something
>
> ... and since we want all kernel developers to use these facilities correctly, we
> want the names to be good and suggestive as well.
>
> I'd accept pretty much anything else that adds 'barrier' or 'nospec' to the name:
> array_idx_barrier() or array_idx_nospec(). (I'm slightly leaning towards 'nospec'
> because it's more indicative of what is being done, and it also is what we do for
> get uaccess APIs.)
>
> ifence() is a similar departure from existing barrier naming nomenclature, and I'd
> accept pretty much any other variant:
>
>         barrier_nospec()
>         ifence_nospec()
>
> The kernel namespace cleanliness rules are clear and consistent, and there's
> nothing new about them:
>
>  - the name of the API should unambiguously refer back to the API category. For
>    barriers this common reference is 'barrier' or 'mb'.
>
>  - use postfixes or prefixes consistently: pick one and don't mix them. If we go
>    with a _nospec() variant for the uaccess API names then we should use a similar
>    naming for array_idx() and for the new barrier as well - no mixing.

This is the feedback I can take action with, thank you.

>
>> You can always follow on with a patch to fix up the names and placements to your
>> liking. While they'll pick on my name choices, they won't pick on yours, because
>> I simply can't be bothered to care about a bikeshed color at this point after
>> being bounced around for 5 revisions of this patch set.
>
> Sorry, this kind of dismissive and condescending attitude won't cut it.

I reacted to your "for heaven's sake", I'll send a v6.
Dan Williams Jan. 30, 2018, 6:29 a.m. UTC | #7
On Sun, Jan 28, 2018 at 10:36 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, 28 Jan 2018, Dan Williams wrote:
>> On Sun, Jan 28, 2018 at 12:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >> + */
>> >> +#define array_idx(idx, sz)                                           \
>> >> +({                                                                   \
>> >> +     typeof(idx) _i = (idx);                                         \
>> >> +     typeof(sz) _s = (sz);                                           \
>> >> +     unsigned long _mask = array_idx_mask(_i, _s);                   \
>> >> +                                                                     \
>> >> +     BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
>> >> +     BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
>> >> +                                                                     \
>> >> +     _i &= _mask;                                                    \
>> >> +     _i;                                                             \
>> >> +})
>> >> +#endif /* __NOSPEC_H__ */
>> >
>> > For heaven's sake, please name a size variable as 'size', not 'sz'. We don't have
>> > a shortage of characters and can deobfuscate common primitives, can we?
>> >
>> > Also, beyond the nits, I also hate the namespace here. We have a new generic
>> > header providing two new methods:
>> >
>> >         #include <linux/nospec.h>
>> >
>> >         array_idx_mask()
>> >         array_idx()
>> >
>> > which is then optimized for x86 in asm/barrier.h. That's already a non-sequitor.
>> >
>> > Then we introduce uaccess API variants with a _nospec() postfix.
>> >
>> > Then we add ifence() to x86.
>> >
>> > There's no naming coherency to this.
>>
>> Ingo, I love you, but please take the incredulity down a bit,
>> especially when I had 'nospec' in all the names in v1. Thomas, Peter,
>> and Alexei wanted s/nospec_barrier/ifence/ and
>
> Sorry, I never was involved in that discussion.
>
>> s/array_idx_nospec/array_idx/. You can always follow on with a patch
>> to fix up the names and placements to your liking. While they'll pick
>> on my name choices, they won't pick on yours, because I simply can't
>> be bothered to care about a bikeshed color at this point after being
>> bounced around for 5 revisions of this patch set.
>
> Oh well, we really need this kind of attitude right now. We are all fed up
> with that mess, but Ingo and I care about the details, consistency and
> general code quality beyond the current rush to get stuff solved. It's our
> damned job as maintainers.

Of course, and everything about the technical feedback and suggestions
was completely valid, on point, and made the patches that much better.
What wasn't appreciated and what I am tired of grinning and bearing is
the idea that it's only the maintainer that can show emotion, that
it's only the maintainer that can be exasperated and burnt out.

For example, I would have spun v6 the same day (same hour?) had the
mail started, "hey I'm late to the party, why aren't all these helpers
in a new _nospec namespace?". I might have pointed to the mails from
Peter about ifence being his preferred name for a speculation barrier
and Alexei's request to drop nospec [1] and moved on because naming is
a maintainer's prerogative.

> If you decide you don't care anymore, please let me know, so I can try to
> free up some cycles to pick up the stuff from where you decided to dump it.

Care is a two way street. I respect yours and Ingo's workload, please
respect mine.

[1]: https://lkml.org/lkml/2018/1/9/1232
Linus Torvalds Jan. 30, 2018, 7:38 p.m. UTC | #8
On Mon, Jan 29, 2018 at 10:29 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Of course, and everything about the technical feedback and suggestions
> was completely valid, on point, and made the patches that much better.
> What wasn't appreciated and what I am tired of grinning and bearing is
> the idea that it's only the maintainer that can show emotion, that
> it's only the maintainer that can be exasperated and burnt out.

Yeah, I think everybody is a bit tired of - and burnt out by - these
patches, and they are subtler and somewhat more core than most are,
which makes the stakes a bit higher too, and the explanations can be a
bit more difficult.

I think everybody is entitled to being a bit snippy occasionally.
Definitely not just maintainers.

So by all means, push right back.

Anyway, I do think the patches I've seen so far are ok, and the real
reason I'm writing this email is actually more about future patches:
do we have a good handle on where these array index sanitations will
be needed?

Also, while array limit checking was obviously the official
"spectre-v1" issue, I do wonder if there are possible other issues
where mispredicted conditional branches can end up leaking
information?

IOW, is there some work on tooling/analysis/similar? Not asking for
near-term, but more of a "big picture" question..

            Linus
Dan Williams Jan. 30, 2018, 8:13 p.m. UTC | #9
[ adding Arjan ]

On Tue, Jan 30, 2018 at 11:38 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
[..]
> Anyway, I do think the patches I've seen so far are ok, and the real
> reason I'm writing this email is actually more about future patches:
> do we have a good handle on where these array index sanitations will
> be needed?
>
> Also, while array limit checking was obviously the official
> "spectre-v1" issue, I do wonder if there are possible other issues
> where mispredicted conditional branches can end up leaking
> information?
>
> IOW, is there some work on tooling/analysis/similar? Not asking for
> near-term, but more of a "big picture" question..
>
>             Linus
Van De Ven, Arjan Jan. 30, 2018, 8:27 p.m. UTC | #10
> > Anyway, I do think the patches I've seen so far are ok, and the real

> > reason I'm writing this email is actually more about future patches:

> > do we have a good handle on where these array index sanitations will

> > be needed?


the obvious cases are currently obviously being discussed.

but to be realistic, the places people will find will go on for the next two+ years, and the distros will also 
end up needing to backport those for the next two years.

(v1 is like "buffer overflow", it's a class of issues. buffer overflows were not fixed in one patch or overnight)

 
> > Also, while array limit checking was obviously the official

> > "spectre-v1" issue, I do wonder if there are possible other issues

> > where mispredicted conditional branches can end up leaking

> > information?


there's obviously many things one can do to leak things, for example the v1 paper talks about using the cache
as a mechanism to leak, but you can trivially construct something that uses the TLBs as the channel to leak
(thankfully KPTI cuts that side off) but other variations are possible. I suspect many maser thesis projects
got redirected in the last few months ;-)


> > IOW, is there some work on tooling/analysis/similar? Not asking for

> > near-term, but more of a "big picture" question..


short term there was some extremely rudimentary static analysis done.
clearly that has extreme limitations both in insane rate of false positives, and missing cases.

the real case is to get things like compiler plugins and the like to identify suspect cases.

but we also need to consider changing the kernel a bit. Part of what makes fixing
V1 hard is that security checks on user data are spread in many places across drivers and subsystems.
If we do the security checks more concentrated we cut off many attacks.

For example, we are considering doing a

copy_from_user_struct(*dst, *src, type, validation_function)

where the validation function gets auto-generated from the uapi headers (with some form of annotation)
and if anything fails or the copy faults partway through, the dst is zeroed.

such a beast can do the basic security checks on the user data inside that function(wrapper), and that greatly
limits the number of places we need to take care of.

bonus is that this also will improve the situation of drivers being sloppy with input validation and where 
the current rootholes are happening
Ingo Molnar Jan. 31, 2018, 8:03 a.m. UTC | #11
* Van De Ven, Arjan <arjan.van.de.ven@intel.com> wrote:

> > > IOW, is there some work on tooling/analysis/similar? Not asking for
> > > near-term, but more of a "big picture" question..
> 
> short term there was some extremely rudimentary static analysis done. clearly 
> that has extreme limitations both in insane rate of false positives, and missing 
> cases.

What was the output roughly, how many suspect places that need array_idx_nospec() 
handling: a few, a few dozen, a few hundred or a few thousand?

I'm guessing 'static tool emitted hundreds of sites with many false positives 
included, but the real sites are probably a few dozen' - but that's really just a 
very, very wild guess.

Our whole mindset and approach with these generic APIs obviously very much depends 
on the order of magnitude:

- For array users up to a few dozen per 20 MLOC code base accumulated over 20
  years, and with no more than ~1 new site per kernel release we can probably
  do what we do for buffer overflows: static analyze and fix via 
  array_idx_nospec().

- If it's more than a few dozen then intuitively I'd also be very much in favor of
  compiler help: for example trickle down __user annotations that Sparse uses some
  more and let the compiler sanitize indices or warn about them - without hurting 
  performance of in-kernel array handling.

- If it's "hundreds" then probably both the correctness and the maintenance
  aspects won't scale well to a 20+ MLOC kernel code base - in that case we _have_
  to catch the out of range values at a much earlier stage, at the system call and
  other system ABI level, and probably need to do it via a self-maintaining 
  approach such as annotations/types that also warns about 'unsanitized' uses. But 
  filtering earlier has its own problems as well: mostly the layering violations 
  (high level interfaces often don't know the safe array index range) can create 
  worse bugs and more fragility than what is being fixed ...

- If it's "thousands" then it _clearly_ won't scale and we probably need compiler 
  help: i.e. a compiler that tracks ranges and automatically sanitizes indices. 
  This will have some performance effect, but should result in almost immediate 
  correctness.

Also, IMHO any tooling should very much be open source: this isn't the kind of bug 
that can be identified via code review, so there's no fall-back detection method 
like we have for buffer overflows.

Anyway, the approach taken very much depends on the order of magnitude of such 
affected array users we are targeting ...

Thanks,

	Ingo
Van De Ven, Arjan Jan. 31, 2018, 2:13 p.m. UTC | #12
> > short term there was some extremely rudimentary static analysis done. clearly

> > that has extreme limitations both in insane rate of false positives, and missing

> > cases.

> 

> What was the output roughly, how many suspect places that need

> array_idx_nospec()

> handling: a few, a few dozen, a few hundred or a few thousand?

> 

> I'm guessing 'static tool emitted hundreds of sites with many false positives

> included, but the real sites are probably a few dozen' - but that's really just a

> very, very wild guess.


your guess is pretty accurate; we ended up with some 15 or so places (the first patch kit Dan mailed out)


> 

> - If it's more than a few dozen then intuitively I'd also be very much in favor of

>   compiler help: for example trickle down __user annotations that Sparse uses

> some

>   more and let the compiler sanitize indices or warn about them - without hurting

>   performance of in-kernel array handling.


we need this kind of help even if it's only for the static analysis tool


 
> Also, IMHO any tooling should very much be open source: this isn't the kind of

> bug

> that can be identified via code review, so there's no fall-back detection method

> like we have for buffer overflows.


we absolutely need some good open source tooling; my personal preference will be a compiler plugin; that way you can use all the fancy logic inside the compilers for analysis, and you can make a "I don't care just fix it" option in addition to flagging for human inspection as the kernel would.
Greg KH Jan. 31, 2018, 2:21 p.m. UTC | #13
On Wed, Jan 31, 2018 at 02:13:45PM +0000, Van De Ven, Arjan wrote:
> > > short term there was some extremely rudimentary static analysis done. clearly
> > > that has extreme limitations both in insane rate of false positives, and missing
> > > cases.
> > 
> > What was the output roughly, how many suspect places that need
> > array_idx_nospec()
> > handling: a few, a few dozen, a few hundred or a few thousand?
> > 
> > I'm guessing 'static tool emitted hundreds of sites with many false positives
> > included, but the real sites are probably a few dozen' - but that's really just a
> > very, very wild guess.
> 
> your guess is pretty accurate; we ended up with some 15 or so places
> (the first patch kit Dan mailed out)

But in looking at that first patch set, I don't see many, if any, that
could be solved with your proposal of copy_from_user_struct().  The two
networking patches couldn't, the scsi one was just bizarre (seriously,
you had to trust the input from the hardware, if not, there's worse
things happening), and the networking drivers were dealing with other
data streams I think.

So while I love the idea of copy_from_user_struct(), I don't see it as
any type of "this will solve the issues we are trying to address here"
solution :(

I've been emailing the Coverity people recently about this, and they
say they are close to having a ruleset/test that can identify this data
pattern better than the original tool that Intel and others came up
with.  But as I haven't seen the output of it yet, I have no idea if
that's true or not.  Hopefully they will release it in a few days so we
can get an idea of if this is even going to be possible to automatically
scan for at all with their tool.

Other than Coverity, I don't know of any other tool that is trying to
even identify this pattern :(

> > Also, IMHO any tooling should very much be open source: this isn't the kind of
> > bug
> > that can be identified via code review, so there's no fall-back detection method
> > like we have for buffer overflows.
> 
> we absolutely need some good open source tooling; my personal
> preference will be a compiler plugin; that way you can use all the
> fancy logic inside the compilers for analysis, and you can make a "I
> don't care just fix it" option in addition to flagging for human
> inspection as the kernel would.

I thought gcc plugins were going to enable this type of analysis, or
maybe clang plugins, but I have yet to hear of anyone working on this.

thanks,

greg k-h
diff mbox

Patch

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f59f81889ba3
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,64 @@ 
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+/*
+ * When idx is out of bounds (idx >= sz), the sign bit will be set.
+ * Extend the sign bit to all bits and invert, giving a result of zero
+ * for an out of bounds idx, or ~0UL if within bounds [0, sz).
+ */
+#ifndef array_idx_mask
+static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
+{
+	/*
+	 * Warn developers about inappropriate array_idx usage.
+	 *
+	 * Even if the cpu speculates past the WARN_ONCE branch, the
+	 * sign bit of idx is taken into account when generating the
+	 * mask.
+	 *
+	 * This warning is compiled out when the compiler can infer that
+	 * idx and sz are less than LONG_MAX.
+	 */
+	if (WARN_ONCE(idx > LONG_MAX || sz > LONG_MAX,
+			"array_idx limited to range of [0, LONG_MAX]\n"))
+		return 0;
+
+	/*
+	 * Always calculate and emit the mask even if the compiler
+	 * thinks the mask is not needed. The compiler does not take
+	 * into account the value of idx under speculation.
+	 */
+	OPTIMIZER_HIDE_VAR(idx);
+	return ~(long)(idx | (sz - 1UL - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/*
+ * array_idx - sanitize an array index after a bounds check
+ *
+ * For a code sequence like:
+ *
+ *     if (idx < sz) {
+ *         idx = array_idx(idx, sz);
+ *         val = array[idx];
+ *     }
+ *
+ * ...if the cpu speculates past the bounds check then array_idx() will
+ * clamp the index within the range of [0, sz).
+ */
+#define array_idx(idx, sz)						\
+({									\
+	typeof(idx) _i = (idx);						\
+	typeof(sz) _s = (sz);						\
+	unsigned long _mask = array_idx_mask(_i, _s);			\
+									\
+	BUILD_BUG_ON(sizeof(_i) > sizeof(long));			\
+	BUILD_BUG_ON(sizeof(_s) > sizeof(long));			\
+									\
+	_i &= _mask;							\
+	_i;								\
+})
+#endif /* __NOSPEC_H__ */