diff mbox

[v3] x86/vpmu_intel: Fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

Message ID 20170504213017.5433-1-mohit.gambhir@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mohit Gambhir May 4, 2017, 9:30 p.m. UTC
Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
Protection Fault and thus results in a hypervisor crash. This behavior has
been observed on two generations of Intel processors namely, Haswell and
Broadwell. Other Intel processor generations were not tested. However, it
does seem to be a possible erratum that hasn't yet been confirmed by Intel.

To fix the problem this patch masks PC bit and returns an error in
case any guest tries to write to it on any Intel processor. In addition
to the fact that setting this bit crashes the hypervisor on Haswell and
Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
every time the programmed event occurs and the hardware behavior in
response to the toggle is undefined in the SDM, which makes this bit
unsafe to be used by guests and hence should be masked on all machines.

Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>
---
 xen/arch/x86/cpu/vpmu_intel.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jan Beulich May 5, 2017, 10:16 a.m. UTC | #1
>>> On 04.05.17 at 23:30, <mohit.gambhir@oracle.com> wrote:
> Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
> Protection Fault and thus results in a hypervisor crash. This behavior has
> been observed on two generations of Intel processors namely, Haswell and
> Broadwell. Other Intel processor generations were not tested. However, it
> does seem to be a possible erratum that hasn't yet been confirmed by Intel.
> 
> To fix the problem this patch masks PC bit and returns an error in
> case any guest tries to write to it on any Intel processor. In addition
> to the fact that setting this bit crashes the hypervisor on Haswell and
> Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
> every time the programmed event occurs and the hardware behavior in
> response to the toggle is undefined in the SDM, which makes this bit
> unsafe to be used by guests and hence should be masked on all machines.
> 
> Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Iirc the intention was to have this in 4.9, in which case you should
have Cc-ed Juline (now added).

Jan
Tian, Kevin May 7, 2017, 10:58 p.m. UTC | #2
> From: Jan Beulich

> Sent: Friday, May 5, 2017 6:16 PM

> 

> >>> On 04.05.17 at 23:30, <mohit.gambhir@oracle.com> wrote:

> > Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General

> > Protection Fault and thus results in a hypervisor crash. This behavior has

> > been observed on two generations of Intel processors namely, Haswell and

> > Broadwell. Other Intel processor generations were not tested. However, it

> > does seem to be a possible erratum that hasn't yet been confirmed by Intel.

> >

> > To fix the problem this patch masks PC bit and returns an error in

> > case any guest tries to write to it on any Intel processor. In addition

> > to the fact that setting this bit crashes the hypervisor on Haswell and

> > Broadwell, the PC flag bit toggles a hardware pin on the physical CPU

> > every time the programmed event occurs and the hardware behavior in

> > response to the toggle is undefined in the SDM, which makes this bit

> > unsafe to be used by guests and hence should be masked on all machines.

> >

> > Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>

> 

> Reviewed-by: Jan Beulich <jbeulich@suse.com>

> 

> Iirc the intention was to have this in 4.9, in which case you should

> have Cc-ed Juline (now added).

> 


Acked-by: Kevin Tian <kevin.tian@intel.com>
Julien Grall May 8, 2017, 10:30 a.m. UTC | #3
Hi Jan,

On 05/05/17 11:16, Jan Beulich wrote:
>>>> On 04.05.17 at 23:30, <mohit.gambhir@oracle.com> wrote:
>> Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
>> Protection Fault and thus results in a hypervisor crash. This behavior has
>> been observed on two generations of Intel processors namely, Haswell and
>> Broadwell. Other Intel processor generations were not tested. However, it
>> does seem to be a possible erratum that hasn't yet been confirmed by Intel.
>>
>> To fix the problem this patch masks PC bit and returns an error in
>> case any guest tries to write to it on any Intel processor. In addition
>> to the fact that setting this bit crashes the hypervisor on Haswell and
>> Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
>> every time the programmed event occurs and the hardware behavior in
>> response to the toggle is undefined in the SDM, which makes this bit
>> unsafe to be used by guests and hence should be masked on all machines.
>>
>> Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> Iirc the intention was to have this in 4.9, in which case you should
> have Cc-ed Juline (now added).

Release-acked-by: Julien Grall <julien.grall@arm.com>

Cheers,
diff mbox

Patch

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index 3f0322c..6d768cb 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -76,12 +76,13 @@  static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_ANYTHREAD_MASK 0x4
 
 #define ARCH_CNTR_ENABLED   (1ULL << 22)
+#define ARCH_CNTR_PIN_CONTROL (1ULL << 19)
 
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
 /* Masks used for testing whether and MSR is valid */
-#define ARCH_CTRL_MASK  (~((1ull << 32) - 1) | (1ull << 21))
+#define ARCH_CTRL_MASK  (~((1ull << 32) - 1) | (1ull << 21) | ARCH_CNTR_PIN_CONTROL)
 static uint64_t __read_mostly fixed_ctrl_mask, fixed_counters_mask;
 static uint64_t __read_mostly global_ovf_ctrl_mask, global_ctrl_mask;