KVM: x86: Don't update RIP or do single-step on faulting emulation
diff mbox series

Message ID 20190823205544.24052-1-sean.j.christopherson@intel.com
State New
Headers show
Series
  • KVM: x86: Don't update RIP or do single-step on faulting emulation
Related show

Commit Message

Sean Christopherson Aug. 23, 2019, 8:55 p.m. UTC
Don't advance RIP or inject a single-step #DB if emulation signals a
fault.  This logic applies to all state updates that are conditional on
clean retirement of the emulation instruction, e.g. updating RFLAGS was
previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
EFLAGS on faulting emulation").

Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
overwriting the #UD with #DB and thus restarting the bad SYSCALL over
and over.

Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---

Note, this has minor conflict with my recent series to cleanup the
emulator return flows[*].  The end result should look something like:

                if (!ctxt->have_exception ||
                    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
                        kvm_rip_write(vcpu, ctxt->eip);
                        if (r && ctxt->tf)
                                r = kvm_vcpu_do_singlestep(vcpu);
                        __kvm_set_rflags(vcpu, ctxt->eflags);
                }

[*] https://lkml.kernel.org/r/20190823010709.24879-1-sean.j.christopherson@intel.com

 arch/x86/kvm/x86.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Nadav Amit Aug. 23, 2019, 9:42 p.m. UTC | #1
> On Aug 23, 2019, at 1:55 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
> 
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> overwriting the #UD with #DB and thus restarting the bad SYSCALL over
> and over.
> 
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: stable@vger.kernel.org
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Seems fine. I guess I should’ve found it before…

Consider running the relevant self-tests (e.g., single_test_syscall) to
avoid regressions.
Andy Lutomirski Aug. 23, 2019, 10:46 p.m. UTC | #2
On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
>
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation

EFLAGS.TF=1
Sean Christopherson Aug. 26, 2019, 2:06 p.m. UTC | #3
On Fri, Aug 23, 2019 at 03:46:20PM -0700, Andy Lutomirski wrote:
> On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Don't advance RIP or inject a single-step #DB if emulation signals a
> > fault.  This logic applies to all state updates that are conditional on
> > clean retirement of the emulation instruction, e.g. updating RFLAGS was
> > previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> > EFLAGS on faulting emulation").
> >
> > Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> > ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> > fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> > invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> 
> EFLAGS.TF=1

It's always some mundane detail...
Radim Krčmář Aug. 27, 2019, 6:35 p.m. UTC | #4
2019-08-23 13:55-0700, Sean Christopherson:
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
> 
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> overwriting the #UD with #DB and thus restarting the bad SYSCALL over
> and over.
> 
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: stable@vger.kernel.org
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> 
> Note, this has minor conflict with my recent series to cleanup the
> emulator return flows[*].  The end result should look something like:
> 
>                 if (!ctxt->have_exception ||
>                     exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
>                         kvm_rip_write(vcpu, ctxt->eip);
>                         if (r && ctxt->tf)
>                                 r = kvm_vcpu_do_singlestep(vcpu);
>                         __kvm_set_rflags(vcpu, ctxt->eflags);
>                 }
> 
> [*] https://lkml.kernel.org/r/20190823010709.24879-1-sean.j.christopherson@intel.com
> 
>  arch/x86/kvm/x86.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b4cfd786d0b6..d2962671c3d3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6611,12 +6611,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>  		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>  		toggle_interruptibility(vcpu, ctxt->interruptibility);
>  		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> -		kvm_rip_write(vcpu, ctxt->eip);
> -		if (r == EMULATE_DONE && ctxt->tf)
> -			kvm_vcpu_do_singlestep(vcpu, &r);
>  		if (!ctxt->have_exception ||
> -		    exception_type(ctxt->exception.vector) == EXCPT_TRAP)
> +		    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {

Hm, EXCPT_TRAP is either #OF, #BP, or another #DB, none of which we want
to override.  The first two disable TF and the last one is the same as
its fault variant must take other path, so it works out in the end...

I've fixed the RF in commit message when applying, thanks.

---
We still seem to have at least a minor problem with single stepping:

SDM, Interrupt 1—Debug Exception (#DB):

  The following items detail the treatment of debug exceptions on the
  instruction boundary following execution of the MOV or the POP
  instruction that loads the SS register:
    • If EFLAGS.TF is 1, no single-step trap is generated.

I think a check for KVM_X86_SHADOW_INT_MOV_SS in
kvm_vcpu_do_singlestep() is missing.
Jim Mattson Aug. 27, 2019, 7:12 p.m. UTC | #5
On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
>
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> overwriting the #UD with #DB and thus restarting the bad SYSCALL over
> and over.
>
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: stable@vger.kernel.org
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>
> Note, this has minor conflict with my recent series to cleanup the
> emulator return flows[*].  The end result should look something like:
>
>                 if (!ctxt->have_exception ||
>                     exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
>                         kvm_rip_write(vcpu, ctxt->eip);
>                         if (r && ctxt->tf)
>                                 r = kvm_vcpu_do_singlestep(vcpu);
>                         __kvm_set_rflags(vcpu, ctxt->eflags);
>                 }
>
> [*] https://lkml.kernel.org/r/20190823010709.24879-1-sean.j.christopherson@intel.com
>
>  arch/x86/kvm/x86.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b4cfd786d0b6..d2962671c3d3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6611,12 +6611,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>                 unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>                 toggle_interruptibility(vcpu, ctxt->interruptibility);
>                 vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> -               kvm_rip_write(vcpu, ctxt->eip);
> -               if (r == EMULATE_DONE && ctxt->tf)
> -                       kvm_vcpu_do_singlestep(vcpu, &r);
>                 if (!ctxt->have_exception ||
> -                   exception_type(ctxt->exception.vector) == EXCPT_TRAP)
> +                   exception_type(ctxt->exception.vector) == EXCPT_TRAP) {

NYC, but...

I don't think this check for "exception_type" is quite right.  A
general detect fault (which can be synthesized by check_dr_read) is
mischaracterized by exception_type() as a trap. Or maybe I'm missing
something? (I often am.)

> +                       kvm_rip_write(vcpu, ctxt->eip);
> +                       if (r == EMULATE_DONE && ctxt->tf)
> +                               kvm_vcpu_do_singlestep(vcpu, &r);
>                         __kvm_set_rflags(vcpu, ctxt->eflags);
> +               }
>
>                 /*
>                  * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
> --
> 2.22.0
>
Sean Christopherson Aug. 27, 2019, 7:49 p.m. UTC | #6
On Tue, Aug 27, 2019 at 12:12:51PM -0700, Jim Mattson wrote:
> On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6611,12 +6611,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> >                 unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> >                 toggle_interruptibility(vcpu, ctxt->interruptibility);
> >                 vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> > -               kvm_rip_write(vcpu, ctxt->eip);
> > -               if (r == EMULATE_DONE && ctxt->tf)
> > -                       kvm_vcpu_do_singlestep(vcpu, &r);
> >                 if (!ctxt->have_exception ||
> > -                   exception_type(ctxt->exception.vector) == EXCPT_TRAP)
> > +                   exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
> 
> NYC, but...
> 
> I don't think this check for "exception_type" is quite right.  A
> general detect fault (which can be synthesized by check_dr_read) is
> mischaracterized by exception_type() as a trap. Or maybe I'm missing
> something? (I often am.)

Pretty sure you're not missing anything.

And while we're poking holes in #DB emulation, int1/icebp isn't emulated
correctly as it should be reinjected with INTR_TYPE_PRIV_SW_EXCEPTION, not
as a INTR_TYPE_HARD_EXCEPTION.  The CPU automically clears DR7.GD on #DB,
unless the #DB is due to int1...

Patch
diff mbox series

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4cfd786d0b6..d2962671c3d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6611,12 +6611,13 @@  int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
-		kvm_rip_write(vcpu, ctxt->eip);
-		if (r == EMULATE_DONE && ctxt->tf)
-			kvm_vcpu_do_singlestep(vcpu, &r);
 		if (!ctxt->have_exception ||
-		    exception_type(ctxt->exception.vector) == EXCPT_TRAP)
+		    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
+			kvm_rip_write(vcpu, ctxt->eip);
+			if (r == EMULATE_DONE && ctxt->tf)
+				kvm_vcpu_do_singlestep(vcpu, &r);
 			__kvm_set_rflags(vcpu, ctxt->eflags);
+		}
 
 		/*
 		 * For STI, interrupts are shadowed; so KVM_REQ_EVENT will