diff mbox series

[3/3] KVM: x86: always stop emulation on page fault

Message ID 1566911210-30059-4-git-send-email-jan.dakinevich@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series fix emulation error on Windows bootup | expand

Commit Message

Jan Dakinevich Aug. 27, 2019, 1:07 p.m. UTC
inject_emulated_exception() returns true if and only if nested page
fault happens. However, page fault can come from guest page tables
walk, either nested or not nested. In both cases we should stop an
attempt to read under RIP and give guest to step over its own page
fault handler.

Fixes: 6ea6e84 ("KVM: x86: inject exceptions produced by x86_decode_insn")
Cc: Denis Lunev <den@virtuozzo.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Aug. 27, 2019, 2:50 p.m. UTC | #1
+Cc Peng Hao and Yi Wang

On Tue, Aug 27, 2019 at 01:07:09PM +0000, Jan Dakinevich wrote:
> inject_emulated_exception() returns true if and only if nested page
> fault happens. However, page fault can come from guest page tables
> walk, either nested or not nested. In both cases we should stop an
> attempt to read under RIP and give guest to step over its own page
> fault handler.
> 
> Fixes: 6ea6e84 ("KVM: x86: inject exceptions produced by x86_decode_insn")
> Cc: Denis Lunev <den@virtuozzo.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> ---
>  arch/x86/kvm/x86.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 93b0bd4..45caa69 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6521,8 +6521,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
>  			if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
>  						emulation_type))
>  				return EMULATE_DONE;
> -			if (ctxt->have_exception && inject_emulated_exception(vcpu))
> +			if (ctxt->have_exception) {
> +				inject_emulated_exception(vcpu);
>  				return EMULATE_DONE;
> +			}


Yikes, this patch and the previous have quite the sordid history.


The non-void return from inject_emulated_exception() was added by commit

  ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception")

for the purpose of skipping writeback.  At the time, the above blob in the
decode flow didn't exist.


Decode exception handling was added by commit

  6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")

but it was dead code even then.  The patch discussion[1] even point out that
it was dead code, i.e. the change probably should have been reverted.


Peng Hao and Yi Wang later ran into what appears to be the same bug you're
hitting[2][3], and even had patches temporarily queued[4][5], but the
patches never made it to mainline as they broke kvm-unit-tests.  Fun side
note, Radim even pointed out[4] the bug fixed by patch 1/3.

So, the patches look correct, but there's the open question of why the
hypercall test was failing for Paolo.  I've tried to reproduce the #DF to
no avail.

[1] https://lore.kernel.org/patchwork/patch/850077/
[2] https://lkml.kernel.org/r/1537311828-4547-1-git-send-email-penghao122@sina.com.cn
[3] https://lkml.kernel.org/r/20190111133002.GA14852@flask
[4] https://lkml.kernel.org/r/20190111133002.GA14852@flask
[5] https://lkml.kernel.org/r/9835d255-dd9a-222b-f4a2-93611175b326@redhat.com

>  			if (emulation_type & EMULTYPE_SKIP)
>  				return EMULATE_FAIL;
>  			return handle_emulation_failure(vcpu, emulation_type);
> -- 
> 2.1.4
>
Sean Christopherson Aug. 27, 2019, 2:55 p.m. UTC | #2
Actually adding Peng Hao and Yi Wang...

On Tue, Aug 27, 2019 at 07:50:30AM -0700, Sean Christopherson wrote:
> +Cc Peng Hao and Yi Wang
> 
> On Tue, Aug 27, 2019 at 01:07:09PM +0000, Jan Dakinevich wrote:
> > inject_emulated_exception() returns true if and only if nested page
> > fault happens. However, page fault can come from guest page tables
> > walk, either nested or not nested. In both cases we should stop an
> > attempt to read under RIP and give guest to step over its own page
> > fault handler.
> > 
> > Fixes: 6ea6e84 ("KVM: x86: inject exceptions produced by x86_decode_insn")
> > Cc: Denis Lunev <den@virtuozzo.com>
> > Cc: Roman Kagan <rkagan@virtuozzo.com>
> > Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
> > Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> > ---
> >  arch/x86/kvm/x86.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 93b0bd4..45caa69 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6521,8 +6521,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> >  			if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
> >  						emulation_type))
> >  				return EMULATE_DONE;
> > -			if (ctxt->have_exception && inject_emulated_exception(vcpu))
> > +			if (ctxt->have_exception) {
> > +				inject_emulated_exception(vcpu);
> >  				return EMULATE_DONE;
> > +			}
> 
> 
> Yikes, this patch and the previous have quite the sordid history.
> 
> 
> The non-void return from inject_emulated_exception() was added by commit
> 
>   ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception")
> 
> for the purpose of skipping writeback.  At the time, the above blob in the
> decode flow didn't exist.
> 
> 
> Decode exception handling was added by commit
> 
>   6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
> 
> but it was dead code even then.  The patch discussion[1] even point out that
> it was dead code, i.e. the change probably should have been reverted.
> 
> 
> Peng Hao and Yi Wang later ran into what appears to be the same bug you're
> hitting[2][3], and even had patches temporarily queued[4][5], but the
> patches never made it to mainline as they broke kvm-unit-tests.  Fun side
> note, Radim even pointed out[4] the bug fixed by patch 1/3.
> 
> So, the patches look correct, but there's the open question of why the
> hypercall test was failing for Paolo.  I've tried to reproduce the #DF to
> no avail.
> 
> [1] https://lore.kernel.org/patchwork/patch/850077/
> [2] https://lkml.kernel.org/r/1537311828-4547-1-git-send-email-penghao122@sina.com.cn
> [3] https://lkml.kernel.org/r/20190111133002.GA14852@flask
> [4] https://lkml.kernel.org/r/20190111133002.GA14852@flask
> [5] https://lkml.kernel.org/r/9835d255-dd9a-222b-f4a2-93611175b326@redhat.com
> 
> >  			if (emulation_type & EMULTYPE_SKIP)
> >  				return EMULATE_FAIL;
> >  			return handle_emulation_failure(vcpu, emulation_type);
> > -- 
> > 2.1.4
> >
Jan Dakinevich Aug. 28, 2019, 10:19 a.m. UTC | #3
On Tue, 27 Aug 2019 07:50:30 -0700
Sean Christopherson <sean.j.christopherson@intel.com> wrote:

> +Cc Peng Hao and Yi Wang
> 
> On Tue, Aug 27, 2019 at 01:07:09PM +0000, Jan Dakinevich wrote:
> > inject_emulated_exception() returns true if and only if nested page
> > fault happens. However, page fault can come from guest page tables
> > walk, either nested or not nested. In both cases we should stop an
> > attempt to read under RIP and give guest to step over its own page
> > fault handler.
> > 
> > Fixes: 6ea6e84 ("KVM: x86: inject exceptions produced by x86_decode_insn")
> > Cc: Denis Lunev <den@virtuozzo.com>
> > Cc: Roman Kagan <rkagan@virtuozzo.com>
> > Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
> > Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> > ---
> >  arch/x86/kvm/x86.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 93b0bd4..45caa69 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6521,8 +6521,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> >  			if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
> >  						emulation_type))
> >  				return EMULATE_DONE;
> > -			if (ctxt->have_exception && inject_emulated_exception(vcpu))
> > +			if (ctxt->have_exception) {
> > +				inject_emulated_exception(vcpu);
> >  				return EMULATE_DONE;
> > +			}
> 
> 
> Yikes, this patch and the previous have quite the sordid history.
> 
> 
> The non-void return from inject_emulated_exception() was added by commit
> 
>   ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception")
> 
> for the purpose of skipping writeback.  At the time, the above blob in the
> decode flow didn't exist.
> 
> 
> Decode exception handling was added by commit
> 
>   6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
> 
> but it was dead code even then.  The patch discussion[1] even point out that
> it was dead code, i.e. the change probably should have been reverted.
> 
> 
> Peng Hao and Yi Wang later ran into what appears to be the same bug you're
> hitting[2][3], and even had patches temporarily queued[4][5], but the
> patches never made it to mainline as they broke kvm-unit-tests.  Fun side
> note, Radim even pointed out[4] the bug fixed by patch 1/3.
> 
> So, the patches look correct, but there's the open question of why the
> hypercall test was failing for Paolo.  

Sorry, I'm little confused. Could you please, point me which test or tests 
were broken? I've just run kvm-unit-test and I see same results with and 
without my changes.

> I've tried to reproduce the #DF to
> no avail.
> 
> [1] https://lore.kernel.org/patchwork/patch/850077/
> [2] https://lkml.kernel.org/r/1537311828-4547-1-git-send-email-penghao122@sina.com.cn
> [3] https://lkml.kernel.org/r/20190111133002.GA14852@flask
> [4] https://lkml.kernel.org/r/20190111133002.GA14852@flask
> [5] https://lkml.kernel.org/r/9835d255-dd9a-222b-f4a2-93611175b326@redhat.com
> 
> >  			if (emulation_type & EMULTYPE_SKIP)
> >  				return EMULATE_FAIL;
> >  			return handle_emulation_failure(vcpu, emulation_type);
> > -- 
> > 2.1.4
> >
Sean Christopherson Aug. 28, 2019, 2:23 p.m. UTC | #4
On Wed, Aug 28, 2019 at 10:19:51AM +0000, Jan Dakinevich wrote:
> On Tue, 27 Aug 2019 07:50:30 -0700
> Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > Yikes, this patch and the previous have quite the sordid history.
> > 
> > 
> > The non-void return from inject_emulated_exception() was added by commit
> > 
> >   ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception")
> > 
> > for the purpose of skipping writeback.  At the time, the above blob in the
> > decode flow didn't exist.
> > 
> > 
> > Decode exception handling was added by commit
> > 
> >   6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
> > 
> > but it was dead code even then.  The patch discussion[1] even point out that
> > it was dead code, i.e. the change probably should have been reverted.
> > 
> > 
> > Peng Hao and Yi Wang later ran into what appears to be the same bug you're
> > hitting[2][3], and even had patches temporarily queued[4][5], but the
> > patches never made it to mainline as they broke kvm-unit-tests.  Fun side
> > note, Radim even pointed out[4] the bug fixed by patch 1/3.
> > 
> > So, the patches look correct, but there's the open question of why the
> > hypercall test was failing for Paolo.  
> 
> Sorry, I'm little confused. Could you please, point me which test or tests 
> were broken? I've just run kvm-unit-test and I see same results with and 
> without my changes.
> 
> > I've tried to reproduce the #DF to
> > no avail.

Aha!  The #DF occurs if patch 2/3, but not patch 3/3, is applied, and the
VMware backdoor is enabled.  The backdoor is off by default, which is why
only Paolo was seeing the #DF.

To handle the VMware backdoor, KVM intercepts #GP faults, which includes
the non-canonical #GP from the hypercall unit test.  With only patch 2/3
applied, x86_emulate_instruction() injects a #GP for the non-canonical RIP
but returns EMULATE_FAIL instead of EMULATE_DONE.   EMULATE_FAIL causes
handle_exception_nmi() (or gp_interception() for SVM) to re-inject the
original #GP because it thinks emulation failed due to a non-VMware opcode.

Applying patch 3/3 resolves the issue as x86_emulate_instruction() returns
EMULATE_DONE after injecting the #GP.


TL;DR:

Swap the order of patches and everything should be hunky dory.  Please
rebase to the latest kvm/queue, which has an equivalent to patch 1/3.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93b0bd4..45caa69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6521,8 +6521,10 @@  int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 			if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
 						emulation_type))
 				return EMULATE_DONE;
-			if (ctxt->have_exception && inject_emulated_exception(vcpu))
+			if (ctxt->have_exception) {
+				inject_emulated_exception(vcpu);
 				return EMULATE_DONE;
+			}
 			if (emulation_type & EMULTYPE_SKIP)
 				return EMULATE_FAIL;
 			return handle_emulation_failure(vcpu, emulation_type);