[RESEND,v2,1/2] KVM: VMX: use kvm_fast_pio_in for handling IN I/O
diff mbox

Message ID 20180308165727.23265-2-sean.j.christopherson@intel.com
State New
Headers show

Commit Message

Sean Christopherson March 8, 2018, 4:57 p.m. UTC
Fast emulation of processor I/O for IN was disabled on x86 (both VMX
and SVM) some years ago due to a buggy implementation.  The addition
of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
VMX instead of performing full emulation of IN.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Jim Mattson March 11, 2019, 5:07 p.m. UTC | #1
On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> and SVM) some years ago due to a buggy implementation.  The addition
> of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> VMX instead of performing full emulation of IN.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

This commit changes the userspace API in a significant way. Before
this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
instruction. Now, the guest %rip will be advanced to the next
instruction before the exit to userspace. This seems like a bug, but
I'm not entirely sure, because (1) the details of this API aren't
documented anywhere, and (2) the behavior for 'in' now matches the
behavior for 'out' (but 'ins' and 'outs' still retain the original
'in' behavior). Moreover, Paolo has checked in a self-test that
depends on the new behavior. So, maybe I'm wrong in thinking that the
guest %rip should point to the I/O instruction when kvm exits to
userspace with exit_reason KVM_EXIT_IO?
Sean Christopherson March 11, 2019, 9:07 p.m. UTC | #2
On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > and SVM) some years ago due to a buggy implementation.  The addition
> > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > VMX instead of performing full emulation of IN.
> >
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> This commit changes the userspace API in a significant way. Before
> this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> instruction. Now, the guest %rip will be advanced to the next
> instruction before the exit to userspace. This seems like a bug, but
> I'm not entirely sure, because (1) the details of this API aren't
> documented anywhere, and (2) the behavior for 'in' now matches the
> behavior for 'out' (but 'ins' and 'outs' still retain the original
> 'in' behavior). Moreover, Paolo has checked in a self-test that
> depends on the new behavior. So, maybe I'm wrong in thinking that the
> guest %rip should point to the I/O instruction when kvm exits to
> userspace with exit_reason KVM_EXIT_IO?

Yuck.  Semantically I think it's more correct for %rip to point at the
instruction than exited, e.g. if userspace explodes then it should be
able to yell about the IN/OUT instruction, not some random instruction
following the I/O.

However, after a blaming my way down the rabbit hole, updating %rip at
decode was a deliberate change:

  commit 0967b7bf1c22b55777aba46ff616547feed0b141
  Author: Avi Kivity <avi@qumranet.com>
  Date:   Sat Sep 15 17:34:36 2007 +0300

      KVM: Skip pio instruction when it is emulated, not executed

      If we defer updating rip until pio instructions are executed, we have a
      problem with reset:  a pio reset updates rip, and when the instruction
      completes we skip the emulated instruction, pointing rip somewhere completely
      unrelated.

      Fix by updating rip when we see decode the instruction, not after emulation.

      Signed-off-by: Avi Kivity <avi@qumranet.com>

Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
to KVM that it's doing a reset, it just stuffs register state and starts
running again.

So, fixing this for IN is trivial.  One idea for fixing OUT would be to
not skip the instruction in the complete_userspace_io() callback if the
register state implies a reset and the OUT port is a known reset port.

Fun side topic: since OUTSB still increments %rip post-emulate, you can
reproduce the bug fixed by Avi by coercing your guest kernel into issuing
an I/O reset via an OUTSB.
Jim Mattson March 11, 2019, 9:19 p.m. UTC | #3
How does early %rip advancement interact with KVM_GUESTDBG_SINGLESTEP?
It seems a bit odd to get the KVM_EXIT_DEBUG for the I/O instruction
*before* the KVM_EXIT_IO for the I/O instruction.


On Mon, Mar 11, 2019 at 2:07 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > > and SVM) some years ago due to a buggy implementation.  The addition
> > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > > VMX instead of performing full emulation of IN.
> > >
> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >
> > This commit changes the userspace API in a significant way. Before
> > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> > instruction. Now, the guest %rip will be advanced to the next
> > instruction before the exit to userspace. This seems like a bug, but
> > I'm not entirely sure, because (1) the details of this API aren't
> > documented anywhere, and (2) the behavior for 'in' now matches the
> > behavior for 'out' (but 'ins' and 'outs' still retain the original
> > 'in' behavior). Moreover, Paolo has checked in a self-test that
> > depends on the new behavior. So, maybe I'm wrong in thinking that the
> > guest %rip should point to the I/O instruction when kvm exits to
> > userspace with exit_reason KVM_EXIT_IO?
>
> Yuck.  Semantically I think it's more correct for %rip to point at the
> instruction than exited, e.g. if userspace explodes then it should be
> able to yell about the IN/OUT instruction, not some random instruction
> following the I/O.
>
> However, after a blaming my way down the rabbit hole, updating %rip at
> decode was a deliberate change:
>
>   commit 0967b7bf1c22b55777aba46ff616547feed0b141
>   Author: Avi Kivity <avi@qumranet.com>
>   Date:   Sat Sep 15 17:34:36 2007 +0300
>
>       KVM: Skip pio instruction when it is emulated, not executed
>
>       If we defer updating rip until pio instructions are executed, we have a
>       problem with reset:  a pio reset updates rip, and when the instruction
>       completes we skip the emulated instruction, pointing rip somewhere completely
>       unrelated.
>
>       Fix by updating rip when we see decode the instruction, not after emulation.
>
>       Signed-off-by: Avi Kivity <avi@qumranet.com>
>
> Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
> to KVM that it's doing a reset, it just stuffs register state and starts
> running again.
>
> So, fixing this for IN is trivial.  One idea for fixing OUT would be to
> not skip the instruction in the complete_userspace_io() callback if the
> register state implies a reset and the OUT port is a known reset port.
>
> Fun side topic: since OUTSB still increments %rip post-emulate, you can
> reproduce the bug fixed by Avi by coercing your guest kernel into issuing
> an I/O reset via an OUTSB.
Sean Christopherson March 11, 2019, 9:23 p.m. UTC | #4
On Mon, Mar 11, 2019 at 02:19:03PM -0700, Jim Mattson wrote:
> How does early %rip advancement interact with KVM_GUESTDBG_SINGLESTEP?
> It seems a bit odd to get the KVM_EXIT_DEBUG for the I/O instruction
> *before* the KVM_EXIT_IO for the I/O instruction.

It effectively suppresses the single step because vcpu->run->exit_reason
gets overwritten with KVM_EXIT_IO.  kvm_fast_pio() cheeses things and
calls kvm_fast_pio_{in,out}() even if kvm_skip_emulated_instruction()
requests an exit.
Jim Mattson March 11, 2019, 9:30 p.m. UTC | #5
Maybe it's just me, but that seems like just the kind of non-intuitive
behavior that should be documented. :-)

On Mon, Mar 11, 2019 at 2:23 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Mar 11, 2019 at 02:19:03PM -0700, Jim Mattson wrote:
> > How does early %rip advancement interact with KVM_GUESTDBG_SINGLESTEP?
> > It seems a bit odd to get the KVM_EXIT_DEBUG for the I/O instruction
> > *before* the KVM_EXIT_IO for the I/O instruction.
>
> It effectively suppresses the single step because vcpu->run->exit_reason
> gets overwritten with KVM_EXIT_IO.  kvm_fast_pio() cheeses things and
> calls kvm_fast_pio_{in,out}() even if kvm_skip_emulated_instruction()
> requests an exit.
Sean Christopherson March 11, 2019, 9:33 p.m. UTC | #6
On Mon, Mar 11, 2019 at 02:07:32PM -0700, Sean Christopherson wrote:
> On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > > and SVM) some years ago due to a buggy implementation.  The addition
> > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > > VMX instead of performing full emulation of IN.
> > >
> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > This commit changes the userspace API in a significant way. Before
> > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> > instruction. Now, the guest %rip will be advanced to the next
> > instruction before the exit to userspace. This seems like a bug, but
> > I'm not entirely sure, because (1) the details of this API aren't
> > documented anywhere, and (2) the behavior for 'in' now matches the
> > behavior for 'out' (but 'ins' and 'outs' still retain the original
> > 'in' behavior). Moreover, Paolo has checked in a self-test that
> > depends on the new behavior. So, maybe I'm wrong in thinking that the
> > guest %rip should point to the I/O instruction when kvm exits to
> > userspace with exit_reason KVM_EXIT_IO?
> 
> Yuck.  Semantically I think it's more correct for %rip to point at the
> instruction than exited, e.g. if userspace explodes then it should be
> able to yell about the IN/OUT instruction, not some random instruction
> following the I/O.
> 
> However, after a blaming my way down the rabbit hole, updating %rip at
> decode was a deliberate change:
> 
>   commit 0967b7bf1c22b55777aba46ff616547feed0b141
>   Author: Avi Kivity <avi@qumranet.com>
>   Date:   Sat Sep 15 17:34:36 2007 +0300
> 
>       KVM: Skip pio instruction when it is emulated, not executed
> 
>       If we defer updating rip until pio instructions are executed, we have a
>       problem with reset:  a pio reset updates rip, and when the instruction
>       completes we skip the emulated instruction, pointing rip somewhere completely
>       unrelated.
> 
>       Fix by updating rip when we see decode the instruction, not after emulation.
> 
>       Signed-off-by: Avi Kivity <avi@qumranet.com>
> 
> Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
> to KVM that it's doing a reset, it just stuffs register state and starts
> running again.
> 
> So, fixing this for IN is trivial.  One idea for fixing OUT would be to
> not skip the instruction in the complete_userspace_io() callback if the
> register state implies a reset and the OUT port is a known reset port.

A better idea would be to skip the %rip update if it's already dirty.
I think that would work even for things like INIT-SIPI-SIPI reset as
kvm_apic_accept_events() will mark %rip as dirty before the callback
is invoked.

I'll cook up a patch and see what happens...
  
> Fun side topic: since OUTSB still increments %rip post-emulate, you can
> reproduce the bug fixed by Avi by coercing your guest kernel into issuing
> an I/O reset via an OUTSB.
Sean Christopherson March 11, 2019, 11:02 p.m. UTC | #7
On Mon, Mar 11, 2019 at 02:07:32PM -0700, Sean Christopherson wrote:
> On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > > and SVM) some years ago due to a buggy implementation.  The addition
> > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > > VMX instead of performing full emulation of IN.
> > >
> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > This commit changes the userspace API in a significant way. Before
> > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> > instruction. Now, the guest %rip will be advanced to the next
> > instruction before the exit to userspace. This seems like a bug, but
> > I'm not entirely sure, because (1) the details of this API aren't
> > documented anywhere, and (2) the behavior for 'in' now matches the
> > behavior for 'out' (but 'ins' and 'outs' still retain the original
> > 'in' behavior). Moreover, Paolo has checked in a self-test that
> > depends on the new behavior. So, maybe I'm wrong in thinking that the
> > guest %rip should point to the I/O instruction when kvm exits to
> > userspace with exit_reason KVM_EXIT_IO?
> 
> Yuck.  Semantically I think it's more correct for %rip to point at the
> instruction than exited, e.g. if userspace explodes then it should be
> able to yell about the IN/OUT instruction, not some random instruction
> following the I/O.
> 
> However, after a blaming my way down the rabbit hole, updating %rip at
> decode was a deliberate change:
> 
>   commit 0967b7bf1c22b55777aba46ff616547feed0b141
>   Author: Avi Kivity <avi@qumranet.com>
>   Date:   Sat Sep 15 17:34:36 2007 +0300
> 
>       KVM: Skip pio instruction when it is emulated, not executed
> 
>       If we defer updating rip until pio instructions are executed, we have a
>       problem with reset:  a pio reset updates rip, and when the instruction
>       completes we skip the emulated instruction, pointing rip somewhere completely
>       unrelated.
> 
>       Fix by updating rip when we see decode the instruction, not after emulation.
> 
>       Signed-off-by: Avi Kivity <avi@qumranet.com>
> 
> Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
> to KVM that it's doing a reset, it just stuffs register state and starts
> running again.
> 
> So, fixing this for IN is trivial.  One idea for fixing OUT would be to
> not skip the instruction in the complete_userspace_io() callback if the
> register state implies a reset and the OUT port is a known reset port.
> 
> Fun side topic: since OUTSB still increments %rip post-emulate, you can
> reproduce the bug fixed by Avi by coercing your guest kernel into issuing
> an I/O reset via an OUTSB.

Heh, OUTSB doesn't have buggy behavior, I just botched the asm...

Patch
diff mbox

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 051dab74e4e9..c8a8391e95e5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6231,15 +6231,15 @@  static int handle_io(struct kvm_vcpu *vcpu)
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	string = (exit_qualification & 16) != 0;
-	in = (exit_qualification & 8) != 0;
 
 	++vcpu->stat.io_exits;
 
-	if (string || in)
+	if (string)
 		return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 
 	port = exit_qualification >> 16;
 	size = (exit_qualification & 7) + 1;
+	in = (exit_qualification & 8) != 0;
 
 	ret = kvm_skip_emulated_instruction(vcpu);
 
@@ -6247,7 +6247,10 @@  static int handle_io(struct kvm_vcpu *vcpu)
 	 * TODO: we might be squashing a KVM_GUESTDBG_SINGLESTEP-triggered
 	 * KVM_EXIT_DEBUG here.
 	 */
-	return kvm_fast_pio_out(vcpu, size, port) && ret;
+	if (in)
+		return kvm_fast_pio_in(vcpu, size, port) && ret;
+	else
+		return kvm_fast_pio_out(vcpu, size, port) && ret;
 }
 
 static void