diff mbox

KVM: x86: disable PEBS before a guest entry

Message ID 1457031201-31723-1-git-send-email-rkrcmar@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář March 3, 2016, 6:53 p.m. UTC
Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
Cc: <stable@vger.kernel.org>
Reported-by: Ji?í Olša <jolsa@redhat.com>
Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

David Matlack March 3, 2016, 9:32 p.m. UTC | #1
On Thu, Mar 3, 2016 at 10:53 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
> would crash if you decided to run a host command that uses PEBS, like
>   perf record -e 'cpu/mem-stores/pp' -a
>
> This happens because KVM is using VMX MSR switching to disable PEBS, but
> SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
> isn't safe:
>   When software needs to reconfigure PEBS facilities, it should allow a
>   quiescent period between stopping the prior event counting and setting
>   up a new PEBS event. The quiescent period is to allow any latent
>   residual PEBS records to complete its capture at their previously
>   specified buffer address (provided by IA32_DS_AREA).
>
> There might not be a quiescent period after the MSR switch, so a CPU
> ends up using host's MSR_IA32_DS_AREA to access an area in guest's
> memory.  (Or MSR switching is just buggy on some models.)
>
> The guest can learn something about the host this way:
> If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
> in #PF where we leak host's MSR_IA32_DS_AREA through CR2.
>
> After that, a malicious guest can map and configure memory where
> MSR_IA32_DS_AREA is pointing and can therefore get an output from
> host's tracing.
>
> This is not a critical leak as the host must initiate with PEBS tracing
> and I have not been able to get a record from more than one instruction
> before vmentry in vmx_vcpu_run() (that place has most registers already
> overwritten with guest's).
>
> We could disable PEBS just few instructions before vmentry, but
> disabling it earlier shouldn't affect host tracing too much.
> We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
> optimization isn't worth its code, IMO.
>
> (If you are implementing PEBS for guests, be sure to handle the case
>  where both host and guest enable PEBS, because this patch doesn't.)
>
> Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
> Cc: <stable@vger.kernel.org>
> Reported-by: Ji?í Olša <jolsa@redhat.com>
> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 46154dac71e6..946582f4f105 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1767,6 +1767,13 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
>                         return;
>                 }
>                 break;
> +       case MSR_IA32_PEBS_ENABLE:
> +               /* PEBS needs a quiescent period after being disabled (to write
> +                * a record).  Disabling PEBS through VMX MSR swapping doesn't
> +                * provide that period, so a CPU could write host's record into
> +                * guest's memory.
> +                */
> +               wrmsrl(MSR_IA32_PEBS_ENABLE, 0);

Should this go in add_atomic_switch_msr instead of clear_atomic_switch_msr?

>         }
>
>         for (i = 0; i < m->nr; ++i)
> --
> 2.7.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 4, 2016, 1:27 p.m. UTC | #2
2016-03-03 13:32-0800, David Matlack:
> On Thu, Mar 3, 2016 at 10:53 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -1767,6 +1767,13 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
>>                         return;
>>                 }
>>                 break;
>> +       case MSR_IA32_PEBS_ENABLE:
>> +               /* PEBS needs a quiescent period after being disabled (to write
>> +                * a record).  Disabling PEBS through VMX MSR swapping doesn't
>> +                * provide that period, so a CPU could write host's record into
>> +                * guest's memory.
>> +                */
>> +               wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
> 
> Should this go in add_atomic_switch_msr instead of clear_atomic_switch_msr?

Yes, it could be cleared in both (in case guest PEBS can be non-zero),
but I wanted to have it only in add_atomic_switch_msr().

Thank you!

v2 underway.

(In case it makes you wonder how it was tested:
 I have a bad habit of trying whether a patch can be improved before
 posting and this one went awry, because I already returned the machine
 with reproducer and the change seemed simple enough.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 46154dac71e6..946582f4f105 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1767,6 +1767,13 @@  static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
 			return;
 		}
 		break;
+	case MSR_IA32_PEBS_ENABLE:
+		/* PEBS needs a quiescent period after being disabled (to write
+		 * a record).  Disabling PEBS through VMX MSR swapping doesn't
+		 * provide that period, so a CPU could write host's record into
+		 * guest's memory.
+		 */
+		wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
 	}
 
 	for (i = 0; i < m->nr; ++i)