diff mbox series

[for-4.13] x86/cpuid: Fix Lisbon/Magny-Cours Opterons WRT SSSE3/SSE4A

Message ID 20191119170019.18450-1-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series [for-4.13] x86/cpuid: Fix Lisbon/Magny-Cours Opterons WRT SSSE3/SSE4A | expand

Commit Message

Andrew Cooper Nov. 19, 2019, 5 p.m. UTC
c/s ff66ccefe5 "x86/CPUID: adjust SSEn dependencies" made SSE4A depend on
SSSE3, but these processors really do have have SSE4A without SSSE3.

This manifests as an upgrade regression, as the SSE4A feature disappears from
view.

Adjust the SSE4A feature to depend on SSE3 rather than SSSE3.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Juergen Gross <jgross@suse.com>

For 4.13.  Regression from 4.12
---
 xen/tools/gen-cpuid.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Jürgen Groß Nov. 19, 2019, 5:05 p.m. UTC | #1
On 19.11.19 18:00, Andrew Cooper wrote:
> c/s ff66ccefe5 "x86/CPUID: adjust SSEn dependencies" made SSE4A depend on
> SSSE3, but these processors really do have have SSE4A without SSSE3.
> 
> This manifests as an upgrade regression, as the SSE4A feature disappears from
> view.
> 
> Adjust the SSE4A feature to depend on SSE3 rather than SSSE3.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen
Jan Beulich Nov. 20, 2019, 7:27 a.m. UTC | #2
On 19.11.2019 18:00, Andrew Cooper wrote:
> c/s ff66ccefe5 "x86/CPUID: adjust SSEn dependencies" made SSE4A depend on
> SSSE3, but these processors really do have have SSE4A without SSSE3.
> 
> This manifests as an upgrade regression, as the SSE4A feature disappears from
> view.
> 
> Adjust the SSE4A feature to depend on SSE3 rather than SSSE3.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Nevertheless a remark:

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -205,9 +205,10 @@ def crunch_numbers(state):
>          # than to SSE.
>          SSE2: [SSE3, LM, AESNI, PCLMULQDQ, SHA, GFNI],
>  
> -        # Other SSEn each depend on their predecessor versions.
> -        SSE3: [SSSE3],
> -        SSSE3: [SSE4_1, SSE4A],
> +        # Other SSEn each depend on their predecessor versions.  AMD
> +        # Lisbon/Magny-Cours processors implemented SSE4A without SSSE3.
> +        SSE3: [SSSE3, SSE4A],
> +        SSSE3: [SSE4_1],
>          SSE4_1: [SSE4_2],

If we'd be taking the SDM by the word, this would still be too restrictive.
I suppose though it's been more a copy-and-paste effect that has lead to:

"Before an application attempts to use the SIMD subset of SSE3 extensions,
 the application should follow the steps illustrated in Section 11.6.2,
 “Checking for SSE/SSE2 Support.” Next, use the additional step provided
 below:
 • Check that the processor supports the SIMD and x87 SSE3 extensions (if
   CPUID.01H:ECX.SSE3[bit 0] = 1)."

"Before an application attempts to use the SSSE3 extensions, the application
 should follow the steps illustrated in Section 11.6.2, “Checking for
 SSE/SSE2 Support.” Next, use the additional step provided below:
• Check that the processor supports SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1)."

Which would imply SSSE3 only takes SSE2 (and implicitly SSE) as prereq.
As opposed:

"Before an application attempts to use SSE4.1 instructions, the application
 should follow the steps illustrated in Section 11.6.2, “Checking for
 SSE/SSE2 Support.” Next, use the additional step provided below:
 Check that the processor supports SSE4.1 (if
 CPUID.01H:ECX.SSE4_1[bit 19] = 1), SSE3 (if CPUID.01H:ECX.SSE3[bit 0] = 1),
 and SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1)."

Similar text is there for for SSE4.2, taking additionally SSE4.1 as prereq.

I'll fire off an email to Intel requesting clarification.

Jan
diff mbox series

Patch

diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 434a6ebf04..2e76f9abc0 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -205,9 +205,10 @@  def crunch_numbers(state):
         # than to SSE.
         SSE2: [SSE3, LM, AESNI, PCLMULQDQ, SHA, GFNI],
 
-        # Other SSEn each depend on their predecessor versions.
-        SSE3: [SSSE3],
-        SSSE3: [SSE4_1, SSE4A],
+        # Other SSEn each depend on their predecessor versions.  AMD
+        # Lisbon/Magny-Cours processors implemented SSE4A without SSSE3.
+        SSE3: [SSSE3, SSE4A],
+        SSSE3: [SSE4_1],
         SSE4_1: [SSE4_2],
 
         # AMD specify no relationship between POPCNT and SSE4.2.  Intel