[v3,3/4] kvm: vmx: Add last_cpu to struct vcpu_vmx

Message ID	20200601222416.71303-4-jmattson@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ZHpL=7O=vger.kernel.org=kvm-owner@kernel.org> Date: Mon, 1 Jun 2020 15:24:15 -0700 In-Reply-To: <20200601222416.71303-1-jmattson@google.com> Message-Id: <20200601222416.71303-4-jmattson@google.com> Mime-Version: 1.0 References: <20200601222416.71303-1-jmattson@google.com> Subject: [PATCH v3 3/4] kvm: vmx: Add last_cpu to struct vcpu_vmx From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com> Cc: Liran Alon <liran.alon@oracle.com>, Oliver Upton <oupton@google.com>, Peter Shier <pshier@google.com>, Jim Mattson <jmattson@google.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	Add logical CPU to KVM_EXIT_FAIL_ENTRY info \| expand [v3,0/4] Add logical CPU to KVM_EXIT_FAIL_ENTRY info [v3,1/4] kvm: svm: Prefer vcpu->cpu to raw_smp_processor_id() [v3,2/4] kvm: svm: Always set svm->last_cpu on VMRUN [v3,3/4] kvm: vmx: Add last_cpu to struct vcpu_vmx [v3,4/4] kvm: x86: Add "last CPU" to some KVM_EXIT information

Jim Mattson June 1, 2020, 10:24 p.m. UTC

As we already do in svm, record the last logical processor on which a
vCPU has run, so that it can be communicated to userspace for
potential hardware errors.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 1 +
 arch/x86/kvm/vmx/vmx.h | 3 +++
 2 files changed, 4 insertions(+)

Sean Christopherson June 2, 2020, 1:21 a.m. UTC | #1

On Mon, Jun 01, 2020 at 03:24:15PM -0700, Jim Mattson wrote:
> As we already do in svm, record the last logical processor on which a
> vCPU has run, so that it can be communicated to userspace for
> potential hardware errors.
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 1 +
>  arch/x86/kvm/vmx/vmx.h | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 170cc76a581f..42856970d3b8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6730,6 +6730,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.cr2 != read_cr2())
>  		write_cr2(vcpu->arch.cr2);
>  
> +	vmx->last_cpu = vcpu->cpu;

This is redundant in the EXIT_FASTPATH_REENTER_GUEST case.  Setting it
before reenter_guest is technically wrong if emulation_required is true, but
that doesn't seem like it'd be an issue in practice.

>  	vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
>  				   vmx->loaded_vmcs->launched);
>  
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 672c28f17e49..8a1e833cf4fb 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -302,6 +302,9 @@ struct vcpu_vmx {
>  	u64 ept_pointer;
>  
>  	struct pt_desc pt_desc;
> +
> +	/* which host CPU was used for running this vcpu */
> +	unsigned int last_cpu;

Why not put this in struct kvm_vcpu_arch?  I'd also vote to name it
last_run_cpu, as last_cpu is super misleading.

And if it's in arch, what about setting it vcpu_enter_guest?

>  };
>  
>  enum ept_pointers_status {
> -- 
> 2.27.0.rc2.251.g90737beb825-goog
>

Jim Mattson June 2, 2020, 5:33 p.m. UTC | #2

On Mon, Jun 1, 2020 at 6:21 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Jun 01, 2020 at 03:24:15PM -0700, Jim Mattson wrote:
> > As we already do in svm, record the last logical processor on which a
> > vCPU has run, so that it can be communicated to userspace for
> > potential hardware errors.
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 1 +
> >  arch/x86/kvm/vmx/vmx.h | 3 +++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 170cc76a581f..42856970d3b8 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6730,6 +6730,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >       if (vcpu->arch.cr2 != read_cr2())
> >               write_cr2(vcpu->arch.cr2);
> >
> > +     vmx->last_cpu = vcpu->cpu;
>
> This is redundant in the EXIT_FASTPATH_REENTER_GUEST case.  Setting it
> before reenter_guest is technically wrong if emulation_required is true, but
> that doesn't seem like it'd be an issue in practice.

I really would like to capture the last logical processor to execute
VMLAUNCH/VMRESUME (or VMRUN on the AMD side) on behalf of this vCPU.

> >       vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
> >                                  vmx->loaded_vmcs->launched);
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index 672c28f17e49..8a1e833cf4fb 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -302,6 +302,9 @@ struct vcpu_vmx {
> >       u64 ept_pointer;
> >
> >       struct pt_desc pt_desc;
> > +
> > +     /* which host CPU was used for running this vcpu */
> > +     unsigned int last_cpu;
>
> Why not put this in struct kvm_vcpu_arch?  I'd also vote to name it
> last_run_cpu, as last_cpu is super misleading.

I think last_run_cpu may also be misleading, since in the cases of
interest, nothing actually 'ran.' Maybe last_attempted_vmentry_cpu?

> And if it's in arch, what about setting it vcpu_enter_guest?

As you point out above, this isn't entirely accurate. (But that's the
way we roll in kvm, isn't it? :-)

> >  };
> >
> >  enum ept_pointers_status {
> > --
> > 2.27.0.rc2.251.g90737beb825-goog
> >

Sean Christopherson June 3, 2020, 2:24 a.m. UTC | #3

On Tue, Jun 02, 2020 at 10:33:51AM -0700, Jim Mattson wrote:
> On Mon, Jun 1, 2020 at 6:21 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Jun 01, 2020 at 03:24:15PM -0700, Jim Mattson wrote:
> > > As we already do in svm, record the last logical processor on which a
> > > vCPU has run, so that it can be communicated to userspace for
> > > potential hardware errors.
> > >
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > Reviewed-by: Peter Shier <pshier@google.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmx.c | 1 +
> > >  arch/x86/kvm/vmx/vmx.h | 3 +++
> > >  2 files changed, 4 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 170cc76a581f..42856970d3b8 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -6730,6 +6730,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > >       if (vcpu->arch.cr2 != read_cr2())
> > >               write_cr2(vcpu->arch.cr2);
> > >
> > > +     vmx->last_cpu = vcpu->cpu;
> >
> > This is redundant in the EXIT_FASTPATH_REENTER_GUEST case.  Setting it
> > before reenter_guest is technically wrong if emulation_required is true, but
> > that doesn't seem like it'd be an issue in practice.
> 
> I really would like to capture the last logical processor to execute
> VMLAUNCH/VMRESUME (or VMRUN on the AMD side) on behalf of this vCPU.

Does it matter though?  The flows that consume the variable are all directly
in the VM-Exit path.

> > >       vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
> > >                                  vmx->loaded_vmcs->launched);
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > index 672c28f17e49..8a1e833cf4fb 100644
> > > --- a/arch/x86/kvm/vmx/vmx.h
> > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > @@ -302,6 +302,9 @@ struct vcpu_vmx {
> > >       u64 ept_pointer;
> > >
> > >       struct pt_desc pt_desc;
> > > +
> > > +     /* which host CPU was used for running this vcpu */
> > > +     unsigned int last_cpu;
> >
> > Why not put this in struct kvm_vcpu_arch?  I'd also vote to name it
> > last_run_cpu, as last_cpu is super misleading.
> 
> I think last_run_cpu may also be misleading, since in the cases of
> interest, nothing actually 'ran.' Maybe last_attempted_vmentry_cpu?

Ya, that thought crossed my mind as well.

> > And if it's in arch, what about setting it vcpu_enter_guest?
> 
> As you point out above, this isn't entirely accurate. (But that's the
> way we roll in kvm, isn't it? :-)

As an alternative to storing the last run/attempted CPU, what about moving
the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
that is called after IRQs are enabled but before preemption is enabled, e.g.
detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
moved out of handle_exit.  VMX would require a little bit of refacotring for
it's "no handler" check, but that should be minor.

Jim Mattson June 3, 2020, 8:18 p.m. UTC | #4

On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Jun 02, 2020 at 10:33:51AM -0700, Jim Mattson wrote:
> > On Mon, Jun 1, 2020 at 6:21 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Mon, Jun 01, 2020 at 03:24:15PM -0700, Jim Mattson wrote:
> > > > As we already do in svm, record the last logical processor on which a
> > > > vCPU has run, so that it can be communicated to userspace for
> > > > potential hardware errors.
> > > >
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > > Reviewed-by: Peter Shier <pshier@google.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/vmx.c | 1 +
> > > >  arch/x86/kvm/vmx/vmx.h | 3 +++
> > > >  2 files changed, 4 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index 170cc76a581f..42856970d3b8 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -6730,6 +6730,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > > >       if (vcpu->arch.cr2 != read_cr2())
> > > >               write_cr2(vcpu->arch.cr2);
> > > >
> > > > +     vmx->last_cpu = vcpu->cpu;
> > >
> > > This is redundant in the EXIT_FASTPATH_REENTER_GUEST case.  Setting it
> > > before reenter_guest is technically wrong if emulation_required is true, but
> > > that doesn't seem like it'd be an issue in practice.
> >
> > I really would like to capture the last logical processor to execute
> > VMLAUNCH/VMRESUME (or VMRUN on the AMD side) on behalf of this vCPU.
>
> Does it matter though?  The flows that consume the variable are all directly
> in the VM-Exit path.
>
> > > >       vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
> > > >                                  vmx->loaded_vmcs->launched);
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > > index 672c28f17e49..8a1e833cf4fb 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.h
> > > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > > @@ -302,6 +302,9 @@ struct vcpu_vmx {
> > > >       u64 ept_pointer;
> > > >
> > > >       struct pt_desc pt_desc;
> > > > +
> > > > +     /* which host CPU was used for running this vcpu */
> > > > +     unsigned int last_cpu;
> > >
> > > Why not put this in struct kvm_vcpu_arch?  I'd also vote to name it
> > > last_run_cpu, as last_cpu is super misleading.
> >
> > I think last_run_cpu may also be misleading, since in the cases of
> > interest, nothing actually 'ran.' Maybe last_attempted_vmentry_cpu?
>
> Ya, that thought crossed my mind as well.
>
> > > And if it's in arch, what about setting it vcpu_enter_guest?
> >
> > As you point out above, this isn't entirely accurate. (But that's the
> > way we roll in kvm, isn't it? :-)
>
> As an alternative to storing the last run/attempted CPU, what about moving
> the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
> that is called after IRQs are enabled but before preemption is enabled, e.g.
> detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
> moved out of handle_exit.  VMX would require a little bit of refacotring for
> it's "no handler" check, but that should be minor.

Given the alternatives, I'm willing to compromise my principles wrt
emulation_required. :-) I'll send out v4 soon.

Sean Christopherson June 4, 2020, 6:46 p.m. UTC | #5

On Wed, Jun 03, 2020 at 01:18:31PM -0700, Jim Mattson wrote:
> On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > As an alternative to storing the last run/attempted CPU, what about moving
> > the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
> > that is called after IRQs are enabled but before preemption is enabled, e.g.
> > detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
> > moved out of handle_exit.  VMX would require a little bit of refacotring for
> > it's "no handler" check, but that should be minor.
> 
> Given the alternatives, I'm willing to compromise my principles wrt
> emulation_required. :-) I'll send out v4 soon.

What do you dislike about the alternative approach?

Jim Mattson June 4, 2020, 7 p.m. UTC | #6

On Thu, Jun 4, 2020 at 11:47 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Jun 03, 2020 at 01:18:31PM -0700, Jim Mattson wrote:
> > On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > > As an alternative to storing the last run/attempted CPU, what about moving
> > > the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
> > > that is called after IRQs are enabled but before preemption is enabled, e.g.
> > > detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
> > > moved out of handle_exit.  VMX would require a little bit of refacotring for
> > > it's "no handler" check, but that should be minor.
> >
> > Given the alternatives, I'm willing to compromise my principles wrt
> > emulation_required. :-) I'll send out v4 soon.
>
> What do you dislike about the alternative approach?

Mainly, I wanted to stash this in a common location so that I could
print it out in our local version of dump_vmcs(). Ideally, we'd like
to be able to identify the bad part(s) just from the kernel logs.
That, and I wouldn't have been as comfortable with the refactoring
without a lot more testing.

Sean Christopherson June 4, 2020, 7:26 p.m. UTC | #7

On Thu, Jun 04, 2020 at 12:00:33PM -0700, Jim Mattson wrote:
> On Thu, Jun 4, 2020 at 11:47 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Jun 03, 2020 at 01:18:31PM -0700, Jim Mattson wrote:
> > > On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > > As an alternative to storing the last run/attempted CPU, what about moving
> > > > the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
> > > > that is called after IRQs are enabled but before preemption is enabled, e.g.
> > > > detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
> > > > moved out of handle_exit.  VMX would require a little bit of refacotring for
> > > > it's "no handler" check, but that should be minor.
> > >
> > > Given the alternatives, I'm willing to compromise my principles wrt
> > > emulation_required. :-) I'll send out v4 soon.
> >
> > What do you dislike about the alternative approach?
> 
> Mainly, I wanted to stash this in a common location so that I could
> print it out in our local version of dump_vmcs(). Ideally, we'd like
> to be able to identify the bad part(s) just from the kernel logs.

But this would also move dump_vmcs() to before preemption is enabled, i.e.
your version could read the CPU directly.

And actually, if we're talking about ferreting out hardware issues, you
really do want this happening before preemption is enabled so that the VMCS
dump comes from the failing CPU.  If the vCPU is migrated, the VMCS will be
dumped after a VMCLEAR->VMPTRLD, i.e. will be written to memory and pulled
back into the VMCS cache on a different CPU, and will also have been written
to by the new CPU to update host state.  Odds are that wouldn't affect the
dump in a meaningful way, but never say never.

Tangentially related, what about adding an option to do VMCLEAR at the end
of dump_vmcs(), followed by a dump of raw memory?  It'd be useless for
debugging software issues, but might be potentially useful/interesting for
triaging hardware problems.

> That, and I wouldn't have been as comfortable with the refactoring
> without a lot more testing.

Jim Mattson June 4, 2020, 8:54 p.m. UTC | #8

On Thu, Jun 4, 2020 at 12:26 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Jun 04, 2020 at 12:00:33PM -0700, Jim Mattson wrote:
> > On Thu, Jun 4, 2020 at 11:47 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Wed, Jun 03, 2020 at 01:18:31PM -0700, Jim Mattson wrote:
> > > > On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson
> > > > <sean.j.christopherson@intel.com> wrote:
> > > > > As an alternative to storing the last run/attempted CPU, what about moving
> > > > > the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook
> > > > > that is called after IRQs are enabled but before preemption is enabled, e.g.
> > > > > detect_bad_exit or something?  All of the paths in patch 4/4 can easily be
> > > > > moved out of handle_exit.  VMX would require a little bit of refacotring for
> > > > > it's "no handler" check, but that should be minor.
> > > >
> > > > Given the alternatives, I'm willing to compromise my principles wrt
> > > > emulation_required. :-) I'll send out v4 soon.
> > >
> > > What do you dislike about the alternative approach?
> >
> > Mainly, I wanted to stash this in a common location so that I could
> > print it out in our local version of dump_vmcs(). Ideally, we'd like
> > to be able to identify the bad part(s) just from the kernel logs.
>
> But this would also move dump_vmcs() to before preemption is enabled, i.e.
> your version could read the CPU directly.

If it backports easily. The bigger the change, the less likely that is.

> And actually, if we're talking about ferreting out hardware issues, you
> really do want this happening before preemption is enabled so that the VMCS
> dump comes from the failing CPU.  If the vCPU is migrated, the VMCS will be
> dumped after a VMCLEAR->VMPTRLD, i.e. will be written to memory and pulled
> back into the VMCS cache on a different CPU, and will also have been written
> to by the new CPU to update host state.  Odds are that wouldn't affect the
> dump in a meaningful way, but never say never.

True.

> Tangentially related, what about adding an option to do VMCLEAR at the end
> of dump_vmcs(), followed by a dump of raw memory?  It'd be useless for
> debugging software issues, but might be potentially useful/interesting for
> triaging hardware problems.

Our dump_vmcs() dumps all vmreadable fields, which should be pretty
close to what we can get from a raw memory dump. We do have additional
instrumentation to aid in determining the layout of the VMCS in
memory, but it is too stupid to figure out how access rights are
stored. Maybe it could be beefed up a little, and we could at least
verify that VMCLEAR dumps the same thing to physical memory that we
get from the individual VMREADs.

> > That, and I wouldn't have been as comfortable with the refactoring
> > without a lot more testing.

[v3,3/4] kvm: vmx: Add last_cpu to struct vcpu_vmx

Commit Message

Comments

Patch