diff mbox series

[2/2,4.17] x86emul: pull permission check ahead for REP INS/OUTS

Message ID 4d01771c-fd02-b607-c480-04bcb11fa7b3@suse.com (mailing list archive)
State New, archived
Headers show
Series x86emul: string insn corrections | expand

Commit Message

Jan Beulich Oct. 6, 2022, 1:11 p.m. UTC
Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.

Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Partly RFC for this not being documented anywhere; inquiry pending.

The referenced commit is still not really the one, but before it REP
handling was so broken that I didn't want to go hunt further.

Comments

Andrew Cooper Oct. 10, 2022, 6:08 p.m. UTC | #1
On 06/10/2022 14:11, Jan Beulich wrote:
> Based on observations on a fair range of hardware from both primary
> vendors even zero-iteration-count instances of these insns perform the
> port related permission checking first.
>
> Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Partly RFC for this not being documented anywhere; inquiry pending.

Intel do actually document this in two roundabout ways.

1) The order of checks in the pseudocode.  Multiple times in the past,
Intel have said that the order of checks in pseudocode is authoritative.

2) This paragraph I've just found at the end of the INS description.

"These instructions may read from the I/O port without writing to the
memory location if an exception or VM exit occurs due to the write (e.g.
#PF). If this would be problematic, for example because the I/O port
read has side-effects, software should ensure the write to the memory
location does not cause an exception or VM exit."

This makes it clear that the IO port is read before the memory operand
is interpreted.  (As a tangent, while the SDM statement is all true,
it's entirely useless advice for e.g. a migrating VM.)


Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, preferably with
some of ^ discussed in the commit message.

>
> The referenced commit is still not really the one, but before it REP
> handling was so broken that I didn't want to go hunt further.
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -4248,14 +4248,15 @@ x86_emulate(
>          goto imul;
>  
>      case 0x6c ... 0x6d: /* ins %dx,%es:%edi */ {
> -        unsigned long nr_reps = get_rep_prefix(false, false);
> +        unsigned long nr_reps;
>          unsigned int port = _regs.dx;
>  
>          dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
> -        dst.mem.seg = x86_seg_es;
> -        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
>          if ( (rc = ioport_access_check(port, dst.bytes, ctxt, ops)) != 0 )
>              goto done;
> +        nr_reps = get_rep_prefix(false, false);
> +        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
> +        dst.mem.seg = x86_seg_es;

As a further observation, both the Intel and AMD manuals elude to the
use of unsegmented memory space for the 64bit forms of these.

However, as both %ds (outs) and %es (ins) ignore their bases in 64bit
mode, I can't think of any practical consequences of conditionally not
using x86_seg_none here.

~Andrew
Jan Beulich Oct. 11, 2022, 10:20 a.m. UTC | #2
On 10.10.2022 20:08, Andrew Cooper wrote:
> On 06/10/2022 14:11, Jan Beulich wrote:
>> Based on observations on a fair range of hardware from both primary
>> vendors even zero-iteration-count instances of these insns perform the
>> port related permission checking first.
>>
>> Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Partly RFC for this not being documented anywhere; inquiry pending.
> 
> Intel do actually document this in two roundabout ways.
> 
> 1) The order of checks in the pseudocode.  Multiple times in the past,
> Intel have said that the order of checks in pseudocode is authoritative.

Which pseudo code are you referring to here? There's none I could find
for REP, and the INS and OUTS pages doesn't describe this specific
behavior for REP INS / REP OUTS. Instead, if the description of REP
was authoritative, then

WHILE CountReg ≠ 0
DO
...
OD;

would mean the entire INS/OUTS operation is contained in the body of
that loop, leading to no possible exceptions when the count is zero.

> 2) This paragraph I've just found at the end of the INS description.
> 
> "These instructions may read from the I/O port without writing to the
> memory location if an exception or VM exit occurs due to the write (e.g.
> #PF). If this would be problematic, for example because the I/O port
> read has side-effects, software should ensure the write to the memory
> location does not cause an exception or VM exit."
> 
> This makes it clear that the IO port is read before the memory operand
> is interpreted.  (As a tangent, while the SDM statement is all true,
> it's entirely useless advice for e.g. a migrating VM.)

I, too, had noticed that paragraph. But as above it adds no clarity
whatsoever for the count == 0 case.

> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, preferably with
> some of ^ discussed in the commit message.

Thanks, I'll apply this provisionally as I'll need to wait for an ack
from Henry anyway. In the meantime you might clarify whether my
responses above (which mean no further discussion in the description
for there being nothing to refer to) don't find your agreement.

>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -4248,14 +4248,15 @@ x86_emulate(
>>          goto imul;
>>  
>>      case 0x6c ... 0x6d: /* ins %dx,%es:%edi */ {
>> -        unsigned long nr_reps = get_rep_prefix(false, false);
>> +        unsigned long nr_reps;
>>          unsigned int port = _regs.dx;
>>  
>>          dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
>> -        dst.mem.seg = x86_seg_es;
>> -        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
>>          if ( (rc = ioport_access_check(port, dst.bytes, ctxt, ops)) != 0 )
>>              goto done;
>> +        nr_reps = get_rep_prefix(false, false);
>> +        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
>> +        dst.mem.seg = x86_seg_es;
> 
> As a further observation, both the Intel and AMD manuals elude to the
> use of unsegmented memory space for the 64bit forms of these.
> 
> However, as both %ds (outs) and %es (ins) ignore their bases in 64bit
> mode, I can't think of any practical consequences of conditionally not
> using x86_seg_none here.

I find "not using" irritating, but perhaps I'm simply not reading this
the way it was meant. I'm convinced the memory accesses by these insns
are normal ones, so using ES: means linear address unconditionally (for
INS) in 64-bit mode. For OUTS, however, I don't think an FS: or GS:
override would be ignored. The SDM text also doesn't read as if it
would, to me at least. What is it that you have derived your reply
from? "In 64-bit mode, ..., and 64-bit address is specified using RSI
by default" (for OUTS) doesn't say anything about the segment override
being ignored, and earlier text actually talks about the possibility of
an override, without restricting that to any subset of modes.

Jan
Henry Wang Oct. 11, 2022, 10:52 a.m. UTC | #3
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH 2/2][4.17] x86emul: pull permission check ahead for REP
> INS/OUTS
> 
> On 10.10.2022 20:08, Andrew Cooper wrote:
> > On 06/10/2022 14:11, Jan Beulich wrote:
> >> Based on observations on a fair range of hardware from both primary
> >> vendors even zero-iteration-count instances of these insns perform the
> >> port related permission checking first.
> >>
> >> Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, preferably
> with
> > some of ^ discussed in the commit message.
> 
> Thanks, I'll apply this provisionally as I'll need to wait for an ack
> from Henry anyway.

Sorry I was actually waiting for the review/ack in the Patch#1 of this series
so that I can ack them together after they are properly reviewed...

> In the meantime you might clarify whether my
> responses above (which mean no further discussion in the description
> for there being nothing to refer to) don't find your agreement.

...Since IIUC this patch is also trying to harden the code, so as long as you
and Andrew reach the agreement of this patch, you can have my:

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry
diff mbox series

Patch

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -4248,14 +4248,15 @@  x86_emulate(
         goto imul;
 
     case 0x6c ... 0x6d: /* ins %dx,%es:%edi */ {
-        unsigned long nr_reps = get_rep_prefix(false, false);
+        unsigned long nr_reps;
         unsigned int port = _regs.dx;
 
         dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
-        dst.mem.seg = x86_seg_es;
-        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
         if ( (rc = ioport_access_check(port, dst.bytes, ctxt, ops)) != 0 )
             goto done;
+        nr_reps = get_rep_prefix(false, false);
+        dst.mem.off = truncate_ea_and_reps(_regs.r(di), nr_reps, dst.bytes);
+        dst.mem.seg = x86_seg_es;
         /* Try the presumably most efficient approach first. */
         if ( !ops->rep_ins )
             nr_reps = 1;
@@ -4289,13 +4290,14 @@  x86_emulate(
     }
 
     case 0x6e ... 0x6f: /* outs %esi,%dx */ {
-        unsigned long nr_reps = get_rep_prefix(false, false);
+        unsigned long nr_reps;
         unsigned int port = _regs.dx;
 
         dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
-        ea.mem.off = truncate_ea_and_reps(_regs.r(si), nr_reps, dst.bytes);
         if ( (rc = ioport_access_check(port, dst.bytes, ctxt, ops)) != 0 )
             goto done;
+        nr_reps = get_rep_prefix(false, false);
+        ea.mem.off = truncate_ea_and_reps(_regs.r(si), nr_reps, dst.bytes);
         /* Try the presumably most efficient approach first. */
         if ( !ops->rep_outs )
             nr_reps = 1;