diff mbox series

x86/emul: Adjust handling of CR8_LEGACY

Message ID 20250326152558.350103-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/emul: Adjust handling of CR8_LEGACY | expand

Commit Message

Andrew Cooper March 26, 2025, 3:25 p.m. UTC
The CR8_LEGACY feature was introduced in the K8 Revision F.  It doesn't exist
in prior revisions of the K8.

Furthermore, from APM Vol2 3.1.5 CR8 (Task Priority Register, TPR):

  The AMD64 architecture introduces a new control register, CR8, defined as
  the task priority register (TPR).

Give CR8_LEGACY a dependency on LM, seeing as CR8 doesn't exist on pre-64bit
CPUs.

Additionally, from APM Vol3 4 System Instructions MOV CRn:

  CR8 can be read and written in 64-bit mode, using a REX prefix.  CR8 can be
  read and written in all modes using a LOCK prefix instead of a REX prefix to
  specify the additional opcode bit.

i.e. the LOCK prefix serves as an alternative encoding for REX.R.

Switch decode_twobyte() from += 8 to |= 8 to better match the description
given.  Other indications that the encoding isn't additive are that the CR
intercepts stop at 15, that LOCK MOV CR8 generates #UD rather than becoming a
CR0 access.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Also, designers never put an ADD into silicon if they can possibly avoid it,
because it's slow and large compared to the single OR gate needed in this
case.
---
 xen/arch/x86/x86_emulate/decode.c |  4 ++--
 xen/tools/gen-cpuid.py            | 10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)


base-commit: 38adc2d7879c9a68b21a1dddb09d9c9f34d15ee4

Comments

Jan Beulich March 26, 2025, 3:33 p.m. UTC | #1
On 26.03.2025 16:25, Andrew Cooper wrote:
> The CR8_LEGACY feature was introduced in the K8 Revision F.  It doesn't exist
> in prior revisions of the K8.
> 
> Furthermore, from APM Vol2 3.1.5 CR8 (Task Priority Register, TPR):
> 
>   The AMD64 architecture introduces a new control register, CR8, defined as
>   the task priority register (TPR).
> 
> Give CR8_LEGACY a dependency on LM, seeing as CR8 doesn't exist on pre-64bit
> CPUs.

But that's not what LM stands for in the dependencies. If you want to run a
guest strictly as 32-bit one, you could suppress exposure of LM. That
shouldn't also suppress the ability to access CR8 then, though - the LOCK
way of accessing was - aiui - specifically added to allow access to it from
a 32-bit kernel.

> Additionally, from APM Vol3 4 System Instructions MOV CRn:
> 
>   CR8 can be read and written in 64-bit mode, using a REX prefix.  CR8 can be
>   read and written in all modes using a LOCK prefix instead of a REX prefix to
>   specify the additional opcode bit.
> 
> i.e. the LOCK prefix serves as an alternative encoding for REX.R.
> 
> Switch decode_twobyte() from += 8 to |= 8 to better match the description
> given.  Other indications that the encoding isn't additive are that the CR
> intercepts stop at 15, that LOCK MOV CR8 generates #UD rather than becoming a
> CR0 access.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Also, designers never put an ADD into silicon if they can possibly avoid it,
> because it's slow and large compared to the single OR gate needed in this
> case.

My reading of the respective doc was never resulting in something unambiguous.
I find this argument far more convincing than anything that's in the doc.

I probably should have tried out REX + LOCK. Maybe I even did and concluded
from it faulting that the two things are cumulative.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/x86_emulate/decode.c b/xen/arch/x86/x86_emulate/decode.c
index 7ce97c4726dc..e4a17ea2cd59 100644
--- a/xen/arch/x86/x86_emulate/decode.c
+++ b/xen/arch/x86/x86_emulate/decode.c
@@ -777,10 +777,10 @@  decode_twobyte(struct x86_emulate_state *s,
     case 0x20: case 0x22: /* mov to/from cr */
         if ( s->lock_prefix && vcpu_has_cr8_legacy() )
         {
-            s->modrm_reg += 8;
+            s->modrm_reg |= 8;
             s->lock_prefix = false;
         }
-        /* fall through */
+        fallthrough;
     case 0x21: case 0x23: /* mov to/from dr */
         ASSERT(s->ea.type == OP_REG); /* Early operand adjustment ensures this. */
         generate_exception_if(s->lock_prefix, X86_EXC_UD);
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index a77cb30bdb4f..bb7ac5291589 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -271,10 +271,12 @@  def crunch_numbers(state):
 
         # CX16 is only encodable in Long Mode.  LAHF_LM indicates that the
         # SAHF/LAHF instructions are reintroduced in Long Mode.  1GB
-        # superpages, PCID and PKU are only available in 4 level paging.
-        # NO_LMSL indicates the absense of Long Mode Segment Limits, which
-        # have been dropped in hardware.
-        LM: [CX16, PCID, LAHF_LM, PAGE1GB, PKU, NO_LMSL, AMX_TILE, CMPCCXADD],
+        # superpages, PCID and PKU are only available in 4 level paging.  The
+        # CR8 register only exists in the AMD64 architecture.  NO_LMSL
+        # indicates the absense of Long Mode Segment Limits, which have been
+        # dropped in hardware.
+        LM: [CX16, PCID, LAHF_LM, CR8_LEGACY, PAGE1GB, PKU, NO_LMSL,
+             AMX_TILE, CMPCCXADD],
 
         # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the
         # standard 3DNow in the earlier K6 processors.