[v3,7/7] x86/nospec: Optimise array_index_mask_nospec() for power-of-2 arrays
diff mbox series

Message ID 20191023135812.21348-8-andrew.cooper3@citrix.com
State New
Headers show
Series
  • Unbreak evaluate_nospec() and livepatching
Related show

Commit Message

Andrew Cooper Oct. 23, 2019, 1:58 p.m. UTC
When the compiler can determine that an array bound is a power of two, the
array index can be bounded even under speculation with a single and
instruction.

Respecify array_index_mask_nospec() to allow for masks other than ~0 and 0,
and introduce an IS_POWER_OF_2() helper.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Juergen Gross <jgross@suse.com>

This optimisation is not safe on ARM, because some CPUs do data value
speculation, which is why the CSDB barrer was introduced.
---
 xen/include/asm-x86/nospec.h | 25 +++++++++++++++++++------
 xen/include/xen/config.h     |  1 +
 xen/include/xen/nospec.h     |  3 ++-
 3 files changed, 22 insertions(+), 7 deletions(-)

Comments

Jan Beulich Oct. 25, 2019, 12:24 p.m. UTC | #1
On 23.10.2019 15:58, Andrew Cooper wrote:
> This optimisation is not safe on ARM, because some CPUs do data value
> speculation, which is why the CSDB barrer was introduced.

So if an x86 CPU appeared which did so too, it would immediately be
unsafe as well aiui. I.e. we'd have to again go and fix the logic.
Not to be taken as an outright objection, but to perhaps prompt
further consideration.

> --- a/xen/include/asm-x86/nospec.h
> +++ b/xen/include/asm-x86/nospec.h
> @@ -7,13 +7,20 @@
>  #include <asm/alternative.h>
>  
>  /**
> - * array_index_mask_nospec() - generate a mask that is ~0UL when the
> - *      bounds check succeeds and 0 otherwise
> + * array_index_mask_nospec() - generate a mask to bound an array index
> + * which is safe even under adverse speculation.
>   * @index: array element index
>   * @size: number of elements in array
>   *
> - * Returns:
> + * In general, returns:
>   *     0 - (index < size)
> + *
> + * This yeild ~0UL in within-bounds case, and 0 in the out-of-bounds

Nit: "yields"?

> @@ -21,9 +28,15 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>  {
>      unsigned long mask;
>  
> -    asm volatile ( "cmp %[size], %[index]; sbb %[mask], %[mask];"
> -                   : [mask] "=r" (mask)
> -                   : [size] "g" (size), [index] "r" (index) );
> +    if ( __builtin_constant_p(size) && IS_POWER_OF_2(size) )
> +    {
> +        mask = size - 1;
> +        OPTIMIZER_HIDE_VAR(mask);

I can't seem to be able to figure why you need this.

> --- a/xen/include/xen/config.h
> +++ b/xen/include/xen/config.h
> @@ -75,6 +75,7 @@
>  #define GB(_gb)     (_AC(_gb, ULL) << 30)
>  
>  #define IS_ALIGNED(val, align) (((val) & ((align) - 1)) == 0)
> +#define IS_POWER_OF_2(val) ((val) && IS_ALIGNED(val, val))

While the risk may seem low for someone to pass an expression with
side effect here, evaluating "val" up to three times here doesn't
look very desirable.

As a minor remark, without considering representation I'd expect
an expression IS_ALIGNED(val, val) to consistently produce "true"
for all non-zero values. E.g. 3 is certainly "aligned" on a
boundary of 3.

Finally this may want guarding against use on signed types - at
the very least it looks to produce the wrong answer for e.g.
INT_MIN or LONG_MIN. I.e. perhaps the expression to the left of
&& wants to be (val) > 0.

Jan
Andrew Cooper Oct. 25, 2019, 12:58 p.m. UTC | #2
On 25/10/2019 13:24, Jan Beulich wrote:
> On 23.10.2019 15:58, Andrew Cooper wrote:
>> This optimisation is not safe on ARM, because some CPUs do data value
>> speculation, which is why the CSDB barrer was introduced.
> So if an x86 CPU appeared which did so too, it would immediately be
> unsafe as well aiui. I.e. we'd have to again go and fix the logic.
> Not to be taken as an outright objection, but to perhaps prompt
> further consideration.

Actually any masking approach, even cmp/sbb, would be unsafe so I
suppose this note isn't accurate.

I'm aware of one x86 plan for data value speculation, which was delayed
indefinitely following the fallout from Spectre/Meltdown, especially as
L1TF at its core is a speculative address prediction bug.  Suffice it to
say that the vendors are aware that any plans along these lines will
need to be done with great care.

>
>> --- a/xen/include/asm-x86/nospec.h
>> +++ b/xen/include/asm-x86/nospec.h
>> @@ -7,13 +7,20 @@
>>  #include <asm/alternative.h>
>>  
>>  /**
>> - * array_index_mask_nospec() - generate a mask that is ~0UL when the
>> - *      bounds check succeeds and 0 otherwise
>> + * array_index_mask_nospec() - generate a mask to bound an array index
>> + * which is safe even under adverse speculation.
>>   * @index: array element index
>>   * @size: number of elements in array
>>   *
>> - * Returns:
>> + * In general, returns:
>>   *     0 - (index < size)
>> + *
>> + * This yeild ~0UL in within-bounds case, and 0 in the out-of-bounds
> Nit: "yields"?

Oops yes.

>
>> @@ -21,9 +28,15 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>>  {
>>      unsigned long mask;
>>  
>> -    asm volatile ( "cmp %[size], %[index]; sbb %[mask], %[mask];"
>> -                   : [mask] "=r" (mask)
>> -                   : [size] "g" (size), [index] "r" (index) );
>> +    if ( __builtin_constant_p(size) && IS_POWER_OF_2(size) )
>> +    {
>> +        mask = size - 1;
>> +        OPTIMIZER_HIDE_VAR(mask);
> I can't seem to be able to figure why you need this.

Because I found cases where the AND was elided by the compiler entirely
without it.

>
>> --- a/xen/include/xen/config.h
>> +++ b/xen/include/xen/config.h
>> @@ -75,6 +75,7 @@
>>  #define GB(_gb)     (_AC(_gb, ULL) << 30)
>>  
>>  #define IS_ALIGNED(val, align) (((val) & ((align) - 1)) == 0)
>> +#define IS_POWER_OF_2(val) ((val) && IS_ALIGNED(val, val))
> While the risk may seem low for someone to pass an expression with
> side effect here, evaluating "val" up to three times here doesn't
> look very desirable.

That is easy to fix.

> As a minor remark, without considering representation I'd expect
> an expression IS_ALIGNED(val, val) to consistently produce "true"
> for all non-zero values. E.g. 3 is certainly "aligned" on a
> boundary of 3.

IS_ALIGNED() comes with an expectation of being against a power of 2,
because otherwise you'd need a DIV instruction for the general case.

Some users can't cope with a compile time check.

> Finally this may want guarding against use on signed types - at
> the very least it looks to produce the wrong answer for e.g.
> INT_MIN or LONG_MIN. I.e. perhaps the expression to the left of
> && wants to be (val) > 0.

How about the above expansion fix becoming:

({
    unsigned typeof(val) _val = val;
    _val && (_val & (_val - 1)) == 0;
})

This check makes no sense on negative numbers.

~Andrew
Jan Beulich Oct. 25, 2019, 1:25 p.m. UTC | #3
On 25.10.2019 14:58, Andrew Cooper wrote:
> On 25/10/2019 13:24, Jan Beulich wrote:
>> On 23.10.2019 15:58, Andrew Cooper wrote:
>>> @@ -21,9 +28,15 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>>>  {
>>>      unsigned long mask;
>>>  
>>> -    asm volatile ( "cmp %[size], %[index]; sbb %[mask], %[mask];"
>>> -                   : [mask] "=r" (mask)
>>> -                   : [size] "g" (size), [index] "r" (index) );
>>> +    if ( __builtin_constant_p(size) && IS_POWER_OF_2(size) )
>>> +    {
>>> +        mask = size - 1;
>>> +        OPTIMIZER_HIDE_VAR(mask);
>> I can't seem to be able to figure why you need this.
> 
> Because I found cases where the AND was elided by the compiler entirely
> without it.

Would you mind mentioning this in the description, or in a comment?

>>> --- a/xen/include/xen/config.h
>>> +++ b/xen/include/xen/config.h
>>> @@ -75,6 +75,7 @@
>>>  #define GB(_gb)     (_AC(_gb, ULL) << 30)
>>>  
>>>  #define IS_ALIGNED(val, align) (((val) & ((align) - 1)) == 0)
>>> +#define IS_POWER_OF_2(val) ((val) && IS_ALIGNED(val, val))
>> While the risk may seem low for someone to pass an expression with
>> side effect here, evaluating "val" up to three times here doesn't
>> look very desirable.
> 
> That is easy to fix.
> 
>> As a minor remark, without considering representation I'd expect
>> an expression IS_ALIGNED(val, val) to consistently produce "true"
>> for all non-zero values. E.g. 3 is certainly "aligned" on a
>> boundary of 3.
> 
> IS_ALIGNED() comes with an expectation of being against a power of 2,
> because otherwise you'd need a DIV instruction for the general case.
> 
> Some users can't cope with a compile time check.
> 
>> Finally this may want guarding against use on signed types - at
>> the very least it looks to produce the wrong answer for e.g.
>> INT_MIN or LONG_MIN. I.e. perhaps the expression to the left of
>> && wants to be (val) > 0.
> 
> How about the above expansion fix becoming:
> 
> ({
>     unsigned typeof(val) _val = val;
>     _val && (_val & (_val - 1)) == 0;
> })

Well, if the "unsigned typeof()" construct works - why not (with
"val" properly parenthesized, and preferable the leading underscore
changed to a trailing one).

> This check makes no sense on negative numbers.

Of course not, but someone might use it on a signed type and get
back true when it was supposed to be false, just because the value
used ended up being a negative number.

Jan

Patch
diff mbox series

diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h
index 0039cd2713..4f36069eac 100644
--- a/xen/include/asm-x86/nospec.h
+++ b/xen/include/asm-x86/nospec.h
@@ -7,13 +7,20 @@ 
 #include <asm/alternative.h>
 
 /**
- * array_index_mask_nospec() - generate a mask that is ~0UL when the
- *      bounds check succeeds and 0 otherwise
+ * array_index_mask_nospec() - generate a mask to bound an array index
+ * which is safe even under adverse speculation.
  * @index: array element index
  * @size: number of elements in array
  *
- * Returns:
+ * In general, returns:
  *     0 - (index < size)
+ *
+ * This yeild ~0UL in within-bounds case, and 0 in the out-of-bounds
+ * case.
+ *
+ * When the compiler can determine that the array is a power of two, a
+ * lower overhead option is to mask the index with a single and
+ * instruction.
  */
 #define array_index_mask_nospec array_index_mask_nospec
 static inline unsigned long array_index_mask_nospec(unsigned long index,
@@ -21,9 +28,15 @@  static inline unsigned long array_index_mask_nospec(unsigned long index,
 {
     unsigned long mask;
 
-    asm volatile ( "cmp %[size], %[index]; sbb %[mask], %[mask];"
-                   : [mask] "=r" (mask)
-                   : [size] "g" (size), [index] "r" (index) );
+    if ( __builtin_constant_p(size) && IS_POWER_OF_2(size) )
+    {
+        mask = size - 1;
+        OPTIMIZER_HIDE_VAR(mask);
+    }
+    else
+        asm volatile ( "cmp %[size], %[index]; sbb %[mask], %[mask];"
+                       : [mask] "=r" (mask)
+                       : [size] "g" (size), [index] "r" (index) );
 
     return mask;
 }
diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h
index a106380a23..21c763617c 100644
--- a/xen/include/xen/config.h
+++ b/xen/include/xen/config.h
@@ -75,6 +75,7 @@ 
 #define GB(_gb)     (_AC(_gb, ULL) << 30)
 
 #define IS_ALIGNED(val, align) (((val) & ((align) - 1)) == 0)
+#define IS_POWER_OF_2(val) ((val) && IS_ALIGNED(val, val))
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 7578210f16..cfc31f11b7 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -12,7 +12,8 @@ 
 #include <asm/nospec.h>
 
 /**
- * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
+ * array_index_mask_nospec() - generate a mask to bound an array index
+ * which is safe even under adverse speculation.
  * @index: array element index
  * @size: number of elements in array
  *