[v4,2/5] KVM: x86: Add IBPB support

Message ID	1517404231-22406-3-git-send-email-karahmed@amazon.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: KarimAllah Ahmed <karahmed@amazon.de> To: x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Ashok Raj <ashok.raj@intel.com>, Asit Mallick <asit.k.mallick@intel.com>, Dave Hansen <dave.hansen@intel.com>, Arjan Van De Ven <arjan.van.de.ven@intel.com>, Tim Chen <tim.c.chen@linux.intel.com>, Linus Torvalds <torvalds@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, Andi Kleen <ak@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Dan Williams <dan.j.williams@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Andy Lutomirski <luto@kernel.org>, Greg KH <gregkh@linuxfoundation.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Zijlstra <peterz@infradead.org>, David Woodhouse <dwmw@amazon.co.uk>, KarimAllah Ahmed <karahmed@amazon.de> Subject: [PATCH v4 2/5] KVM: x86: Add IBPB support Date: Wed, 31 Jan 2018 14:10:28 +0100 Message-Id: <1517404231-22406-3-git-send-email-karahmed@amazon.de> In-Reply-To: <1517404231-22406-1-git-send-email-karahmed@amazon.de> References: <1517404231-22406-1-git-send-email-karahmed@amazon.de> Sender: kvm-owner@vger.kernel.org Precedence: bulk

KarimAllah Ahmed Jan. 31, 2018, 1:10 p.m. UTC

From: Ashok Raj <ashok.raj@intel.com>

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM Spectre-v2 attacks.

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
 arch/x86/kvm/cpuid.c | 11 ++++++++++-
 arch/x86/kvm/svm.c   | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx.c   | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 2 deletions(-)

Jim Mattson Jan. 31, 2018, 4:50 p.m. UTC | #1

On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:

> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> +                                             MSR_TYPE_W);

Why not disable this intercept eagerly, rather than lazily? Unlike
MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
is no cost to disabling the intercept if the guest cpuid info declares
support for it.

> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
> +               nested_vmx_disable_intercept_for_msr(
> +                               msr_bitmap_l1, msr_bitmap_l0,
> +                               MSR_IA32_PRED_CMD,
> +                               MSR_TYPE_R);
> +       }

I don't think this should be predicated on
"to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
"guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
MSR_TYPE_R.

Paolo Bonzini Jan. 31, 2018, 4:55 p.m. UTC | #2

On 31/01/2018 11:50, Jim Mattson wrote:
>> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>> +               nested_vmx_disable_intercept_for_msr(
>> +                               msr_bitmap_l1, msr_bitmap_l0,
>> +                               MSR_IA32_PRED_CMD,
>> +                               MSR_TYPE_R);
>> +       }
> I don't think this should be predicated on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
> MSR_TYPE_R.

In fact this MSR can even be passed down unconditionally, since it needs
no save/restore and has no ill performance effect on the sibling
hyperthread.

Only MSR_IA32_SPEC_CTRL needs to be conditional on
"to_vmx(vcpu)->save_spec_ctrl_on_exit".

Paolo

KarimAllah Ahmed Jan. 31, 2018, 5:11 p.m. UTC | #3

On 01/31/2018 05:50 PM, Jim Mattson wrote:
> On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> 
>> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
>> +                                             MSR_TYPE_W);
> 
> Why not disable this intercept eagerly, rather than lazily? Unlike
> MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
> is no cost to disabling the intercept if the guest cpuid info declares
> support for it.
> 
> 
>> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>> +               nested_vmx_disable_intercept_for_msr(
>> +                               msr_bitmap_l1, msr_bitmap_l0,
>> +                               MSR_IA32_PRED_CMD,
>> +                               MSR_TYPE_R);
>> +       }
> 
> I don't think this should be predicated on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"?

Paolo suggested this on the previous revision because guest_cpuid_has()
would be slow.

> Also, the final argument to
> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
> MSR_TYPE_R.
> 
Oops! will fix!
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Paolo Bonzini Jan. 31, 2018, 5:15 p.m. UTC | #4

On 31/01/2018 12:11, KarimAllah Ahmed wrote:
> On 01/31/2018 05:50 PM, Jim Mattson wrote:
>> On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <karahmed@amazon.de>
>> wrote:
>>
>>> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
>>> MSR_IA32_PRED_CMD,
>>> +                                             MSR_TYPE_W);
>>
>> Why not disable this intercept eagerly, rather than lazily? Unlike
>> MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
>> is no cost to disabling the intercept if the guest cpuid info declares
>> support for it.
>>
>>
>>> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>>> +               nested_vmx_disable_intercept_for_msr(
>>> +                               msr_bitmap_l1, msr_bitmap_l0,
>>> +                               MSR_IA32_PRED_CMD,
>>> +                               MSR_TYPE_R);
>>> +       }
>>
>> I don't think this should be predicated on
>> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
>> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"?
> 
> Paolo suggested this on the previous revision because guest_cpuid_has()
> would be slow.

Sorry, that was for spec_ctrl.  Here there's no need to do any kind of
conditional check.

Paolo

>> Also, the final argument to
>> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
>> MSR_TYPE_R.
>>
> Oops! will fix!
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

KarimAllah Ahmed Jan. 31, 2018, 5:17 p.m. UTC | #5

On 01/31/2018 05:55 PM, Paolo Bonzini wrote:
> On 31/01/2018 11:50, Jim Mattson wrote:
>>> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>>> +               nested_vmx_disable_intercept_for_msr(
>>> +                               msr_bitmap_l1, msr_bitmap_l0,
>>> +                               MSR_IA32_PRED_CMD,
>>> +                               MSR_TYPE_R);
>>> +       }
>> I don't think this should be predicated on
>> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
>> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
>> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
>> MSR_TYPE_R.
> 
> In fact this MSR can even be passed down unconditionally, since it needs
> no save/restore and has no ill performance effect on the sibling
> hyperthread.
> 
> Only MSR_IA32_SPEC_CTRL needs to be conditional on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit".

That used to be the case in an earlier version. There seems to be two
opinions here:

1) Pass it only if CPUID for the guest has it.
2) Pass it unconditionally.

I do not really have a preference.


> 
> Paolo
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Paolo Bonzini Jan. 31, 2018, 5:32 p.m. UTC | #6

Looking at SVM now...

On 31/01/2018 08:10, KarimAllah Ahmed wrote:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f40d0da..89495cf 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -529,6 +529,7 @@ struct svm_cpu_data {
>  	struct kvm_ldttss_desc *tss_desc;
>  
>  	struct page *save_area;
> +	struct vmcb *current_vmcb;
>  };
>  
>  static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, svm);
> +	/*
> +	 * The vmcb page can be recycled, causing a false negative in
> +	 * svm_vcpu_load(). So do a full IBPB now.
> +	 */
> +	indirect_branch_prediction_barrier();
>  }
>  
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>  	int i;
>  
>  	if (unlikely(cpu != vcpu->cpu)) {
> @@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (static_cpu_has(X86_FEATURE_RDTSCP))
>  		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>  
> +	if (sd->current_vmcb != svm->vmcb) {
> +		sd->current_vmcb = svm->vmcb;
> +		indirect_branch_prediction_barrier();
> +	}
>  	avic_vcpu_load(vcpu, cpu);
>  }
>  
> @@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_IA32_TSC:
>  		kvm_write_tsc(vcpu, msr);
>  		break;
> +	case MSR_IA32_PRED_CMD:
> +		if (!msr->host_initiated &&
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +			return 1;
> +
> +		if (data & ~PRED_CMD_IBPB)
> +			return 1;
> +
> +		if (!data)
> +			break;
> +
> +		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

This is missing an

        { .index = MSR_IA32_PRED_CMD,                   .always = false },

in the direct_access_msrs array.  Once you do this, nested is taken care of
automatically.

Likewise for MSR_IA32_SPEC_CTRL in patch 5.

Paolo

> +		if (is_guest_mode(vcpu))
> +			break;
> +		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
> +		break;
>  	case MSR_STAR:
>  		svm->vmcb->save.star = data;
>  		break;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..96e672e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
>  		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>  		vmcs_load(vmx->loaded_vmcs->vmcs);
> +		indirect_branch_prediction_barrier();
>  	}
>  
>  	if (!already_loaded) {
> @@ -3342,6 +3343,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_TSC:
>  		kvm_write_tsc(vcpu, msr_info);
>  		break;
> +	case MSR_IA32_PRED_CMD:
> +		if (!msr_info->host_initiated &&
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> +			return 1;
> +
> +		if (data & ~PRED_CMD_IBPB)
> +			return 1;
> +
> +		if (!data)
> +			break;
> +
> +		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +
> +		if (is_guest_mode(vcpu))
> +			break;
> +
> +		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> +					      MSR_TYPE_W);
> +		break;
>  	case MSR_IA32_CR_PAT:
>  		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
>  			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> @@ -10046,7 +10066,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>  	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
>  
>  	/* This shortcut is ok because we support only x2APIC MSRs so far. */
> -	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
> +	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
> +	    !to_vmx(vcpu)->save_spec_ctrl_on_exit)
>  		return false;
>  
>  	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
> @@ -10079,6 +10100,14 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
>  				MSR_TYPE_W);
>  		}
>  	}
> +
> +	if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
> +		nested_vmx_disable_intercept_for_msr(
> +				msr_bitmap_l1, msr_bitmap_l0,
> +				MSR_IA32_PRED_CMD,
> +				MSR_TYPE_R);
> +	}
> +
>  	kunmap(page);
>  	kvm_release_page_clean(page);
>  
>

Jim Mattson Jan. 31, 2018, 5:39 p.m. UTC | #7

On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> In fact this MSR can even be passed down unconditionally, since it needs
> no save/restore and has no ill performance effect on the sibling
> hyperthread.

I'm a bit surprised to hear that IBPB has no ill performance impact on
the sibling hyperthread. On current CPUs, this has to flush the BTB,
doesn't it? And since the BTB is shared between hyperthreads, doesn't
the sibling lose all of its branch predictions?

Van De Ven, Arjan Jan. 31, 2018, 6:04 p.m. UTC | #8

> On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> 

> > In fact this MSR can even be passed down unconditionally, since it needs

> > no save/restore and has no ill performance effect on the sibling

> > hyperthread.

> 

> I'm a bit surprised to hear that IBPB has no ill performance impact on

> the sibling hyperthread. On current CPUs, this has to flush the BTB,

> doesn't it? And since the BTB is shared between hyperthreads, doesn't

> the sibling lose all of its branch predictions?


IBPB most definitely (in current implementations) will stop both hyperthreads for
the flush.

IBPB is not a cheap operation on a system level

Paolo Bonzini Jan. 31, 2018, 6:53 p.m. UTC | #9

On 31/01/2018 12:39, Jim Mattson wrote:
> On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> In fact this MSR can even be passed down unconditionally, since it needs
>> no save/restore and has no ill performance effect on the sibling
>> hyperthread.
> 
> I'm a bit surprised to hear that IBPB has no ill performance impact on
> the sibling hyperthread. On current CPUs, this has to flush the BTB,
> doesn't it? And since the BTB is shared between hyperthreads, doesn't
> the sibling lose all of its branch predictions?

Yeah, I knew about that, but I hadn't heard yet that it also blocks the
hyperthread while you do the write.  It makes sense in retrospect.

In any case, there's no difference (unlike IBRS) in the vmexit cost if
the MSR is passed through.

Paolo

[v4,2/5] KVM: x86: Add IBPB support

Commit Message

Comments

Patch