diff mbox

[v3,1/2] target-i386: KVM: add basic Intel LMCE support

Message ID 20160603060944.17373-2-haozhong.zhang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Haozhong Zhang June 3, 2016, 6:09 a.m. UTC
This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
will be injected to only one VCPU rather than broadcast to all
VCPUs. As KVM reports LMCE support on Intel platforms, this features is
only available on Intel platforms.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Boris Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
---
 target-i386/cpu.c | 26 ++++++++++++++++++++++++++
 target-i386/cpu.h | 13 ++++++++++++-
 target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
 3 files changed, 69 insertions(+), 5 deletions(-)

Comments

Radim Krčmář June 3, 2016, 3:57 p.m. UTC | #1
2016-06-03 14:09+0800, Haozhong Zhang:
> This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> will be injected to only one VCPU rather than broadcast to all
> VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> only available on Intel platforms.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> @@ -2786,6 +2798,20 @@ static void mce_init(X86CPU *cpu)
>          && (cenv->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
>              (CPUID_MCE | CPUID_MCA)) {
>          cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
> +
> +        if (cpu->enable_lmce) {
> +            if (lmce_supported()) {
> +                cenv->mcg_cap |= MCG_LMCE_P;
> +                cenv->msr_ia32_feature_control |=
> +                    MSR_IA32_FEATURE_CONTROL_LMCE |
> +                    MSR_IA32_FEATURE_CONTROL_LOCKED;

Locking right from the start breaks nested KVM, because nested relies on
setting VMXON feature from inside of the guest.

Do we keep it unlocked, or move everything into QEMU?

(The latter seems simpler.)

> +            } else {
> +                error_report("Warning: KVM unavailable or not support LMCE, "
> +                             "LMCE disabled");
> +                cpu->enable_lmce = false;
> +            }
> +        }
> +
>          cenv->mcg_ctl = ~(uint64_t)0;
>          for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
>              cenv->mce_banks[bank * 4] = ~(uint64_t)0;
Borislav Petkov June 4, 2016, 10:15 a.m. UTC | #2
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

>This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
>will be injected to only one VCPU rather than broadcast to all
>VCPUs. As KVM reports LMCE support on Intel platforms, this features is
>only available on Intel platforms.
>
>Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>---
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Richard Henderson <rth@twiddle.net>
>Cc: Eduardo Habkost <ehabkost@redhat.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: Boris Petkov <bp@suse.de>
>Cc: kvm@vger.kernel.org
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Andi Kleen <andi.kleen@intel.com>
>---
> target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> target-i386/cpu.h | 13 ++++++++++++-
> target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> 3 files changed, 69 insertions(+), 5 deletions(-)

...

>@@ -2786,6 +2798,20 @@ static void mce_init(X86CPU *cpu)
>         && (cenv->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
>             (CPUID_MCE | CPUID_MCA)) {
>         cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
>+
>+        if (cpu->enable_lmce) {
>+            if (lmce_supported()) {
>+                cenv->mcg_cap |= MCG_LMCE_P;
>+                cenv->msr_ia32_feature_control |=
>+                    MSR_IA32_FEATURE_CONTROL_LMCE |
>+                    MSR_IA32_FEATURE_CONTROL_LOCKED;
>+            } else {
>+                error_report("Warning: KVM unavailable or not support
>LMCE, "
>+                             "LMCE disabled");

"... or LMCE not supported..."

Also, do not split the string for easier grepping.
Borislav Petkov June 4, 2016, 10:34 a.m. UTC | #3
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

>This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
>will be injected to only one VCPU rather than broadcast to all
>VCPUs. As KVM reports LMCE support on Intel platforms, this features is
>only available on Intel platforms.
>
>Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>---
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Richard Henderson <rth@twiddle.net>
>Cc: Eduardo Habkost <ehabkost@redhat.com>
>Cc: Marcelo Tosatti <mtosatti@redhat.com>
>Cc: Boris Petkov <bp@suse.de>
>Cc: kvm@vger.kernel.org
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Andi Kleen <andi.kleen@intel.com>
>---
> target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> target-i386/cpu.h | 13 ++++++++++++-
> target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> 3 files changed, 69 insertions(+), 5 deletions(-)

...

>@@ -1173,6 +1182,8 @@ struct X86CPU {
>      */
>     bool enable_pmu;
> 
>+    bool enable_lmce;

That struct would go fat pretty fast if it grows a bool per CPU feature. Perhaps a more clever, a-bit-per-featurebit scheme would be in order.
Eduardo Habkost June 4, 2016, 9:03 p.m. UTC | #4
On Sat, Jun 04, 2016 at 12:34:39PM +0200, Boris Petkov wrote:
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> >This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> >will be injected to only one VCPU rather than broadcast to all
> >VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> >only available on Intel platforms.
> >
> >Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >Cc: Paolo Bonzini <pbonzini@redhat.com>
> >Cc: Richard Henderson <rth@twiddle.net>
> >Cc: Eduardo Habkost <ehabkost@redhat.com>
> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >Cc: Boris Petkov <bp@suse.de>
> >Cc: kvm@vger.kernel.org
> >Cc: Tony Luck <tony.luck@intel.com>
> >Cc: Andi Kleen <andi.kleen@intel.com>
> >---
> > target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> > target-i386/cpu.h | 13 ++++++++++++-
> > target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> > 3 files changed, 69 insertions(+), 5 deletions(-)
> 
> ...
> 
> >@@ -1173,6 +1182,8 @@ struct X86CPU {
> >      */
> >     bool enable_pmu;
> > 
> >+    bool enable_lmce;
> 
> That struct would go fat pretty fast if it grows a bool per CPU
> feature. Perhaps a more clever, a-bit-per-featurebit scheme
> would be in order.

We already have X86CPU.features, but it's specific for feature
flags appearing on CPUID. We can eventually extend
FeatureWord/FeatureWordInfo/x86_cpu_get_supported_feature_word()
to represent features that don't appear directly on CPUID.
Haozhong Zhang June 5, 2016, 3:32 p.m. UTC | #5
On 06/03/16 17:57, Radim Krčmář wrote:
> 2016-06-03 14:09+0800, Haozhong Zhang:
> > This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> > will be injected to only one VCPU rather than broadcast to all
> > VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> > only available on Intel platforms.
> > 
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > @@ -2786,6 +2798,20 @@ static void mce_init(X86CPU *cpu)
> >          && (cenv->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> >              (CPUID_MCE | CPUID_MCA)) {
> >          cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
> > +
> > +        if (cpu->enable_lmce) {
> > +            if (lmce_supported()) {
> > +                cenv->mcg_cap |= MCG_LMCE_P;
> > +                cenv->msr_ia32_feature_control |=
> > +                    MSR_IA32_FEATURE_CONTROL_LMCE |
> > +                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> 
> Locking right from the start breaks nested KVM, because nested relies on
> setting VMXON feature from inside of the guest.
> 
> Do we keep it unlocked, or move everything into QEMU?
> 
> (The latter seems simpler.)
>

Setting guest MSR_IA32_FEATURE_CONTROL is not necessary here, it's
instead the guest BIOS/OS duty to enable and lock corresponding
features. I'll remove this in the next version.

Thanks,
Haozhong
Haozhong Zhang June 5, 2016, 3:35 p.m. UTC | #6
On 06/04/16 12:15, Boris Petkov wrote:
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> >This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> >will be injected to only one VCPU rather than broadcast to all
> >VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> >only available on Intel platforms.
> >
> >Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >Cc: Paolo Bonzini <pbonzini@redhat.com>
> >Cc: Richard Henderson <rth@twiddle.net>
> >Cc: Eduardo Habkost <ehabkost@redhat.com>
> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >Cc: Boris Petkov <bp@suse.de>
> >Cc: kvm@vger.kernel.org
> >Cc: Tony Luck <tony.luck@intel.com>
> >Cc: Andi Kleen <andi.kleen@intel.com>
> >---
> > target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> > target-i386/cpu.h | 13 ++++++++++++-
> > target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> > 3 files changed, 69 insertions(+), 5 deletions(-)
> 
> ...
> 
> >@@ -2786,6 +2798,20 @@ static void mce_init(X86CPU *cpu)
> >         && (cenv->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> >             (CPUID_MCE | CPUID_MCA)) {
> >         cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
> >+
> >+        if (cpu->enable_lmce) {
> >+            if (lmce_supported()) {
> >+                cenv->mcg_cap |= MCG_LMCE_P;
> >+                cenv->msr_ia32_feature_control |=
> >+                    MSR_IA32_FEATURE_CONTROL_LMCE |
> >+                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> >+            } else {
> >+                error_report("Warning: KVM unavailable or not support
> >LMCE, "
> >+                             "LMCE disabled");
> 
> "... or LMCE not supported..."
>

will change

> Also, do not split the string for easier grepping.
>

OK, I was to avoid expiring 80 characters per line.

Thanks,
Haozhong
Haozhong Zhang June 5, 2016, 3:41 p.m. UTC | #7
On 06/04/16 12:34, Boris Petkov wrote:
> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> 
> >This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> >will be injected to only one VCPU rather than broadcast to all
> >VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> >only available on Intel platforms.
> >
> >Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >Cc: Paolo Bonzini <pbonzini@redhat.com>
> >Cc: Richard Henderson <rth@twiddle.net>
> >Cc: Eduardo Habkost <ehabkost@redhat.com>
> >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >Cc: Boris Petkov <bp@suse.de>
> >Cc: kvm@vger.kernel.org
> >Cc: Tony Luck <tony.luck@intel.com>
> >Cc: Andi Kleen <andi.kleen@intel.com>
> >---
> > target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> > target-i386/cpu.h | 13 ++++++++++++-
> > target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> > 3 files changed, 69 insertions(+), 5 deletions(-)
> 
> ...
> 
> >@@ -1173,6 +1182,8 @@ struct X86CPU {
> >      */
> >     bool enable_pmu;
> > 
> >+    bool enable_lmce;
> 
> That struct would go fat pretty fast if it grows a bool per CPU feature. Perhaps a more clever, a-bit-per-featurebit scheme would be in order.

OK, I'll use a 64-bit integer for current and future features.

Thanks,
Haozhong
Haozhong Zhang June 7, 2016, 9:41 a.m. UTC | #8
On 06/04/16 18:03, Eduardo Habkost wrote:
> On Sat, Jun 04, 2016 at 12:34:39PM +0200, Boris Petkov wrote:
> > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > 
> > >This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> > >will be injected to only one VCPU rather than broadcast to all
> > >VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> > >only available on Intel platforms.
> > >
> > >Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > >---
> > >Cc: Paolo Bonzini <pbonzini@redhat.com>
> > >Cc: Richard Henderson <rth@twiddle.net>
> > >Cc: Eduardo Habkost <ehabkost@redhat.com>
> > >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > >Cc: Boris Petkov <bp@suse.de>
> > >Cc: kvm@vger.kernel.org
> > >Cc: Tony Luck <tony.luck@intel.com>
> > >Cc: Andi Kleen <andi.kleen@intel.com>
> > >---
> > > target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> > > target-i386/cpu.h | 13 ++++++++++++-
> > > target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> > > 3 files changed, 69 insertions(+), 5 deletions(-)
> > 
> > ...
> > 
> > >@@ -1173,6 +1182,8 @@ struct X86CPU {
> > >      */
> > >     bool enable_pmu;
> > > 
> > >+    bool enable_lmce;
> > 
> > That struct would go fat pretty fast if it grows a bool per CPU
> > feature. Perhaps a more clever, a-bit-per-featurebit scheme
> > would be in order.
> 
> We already have X86CPU.features, but it's specific for feature
> flags appearing on CPUID. We can eventually extend
> FeatureWord/FeatureWordInfo/x86_cpu_get_supported_feature_word()
> to represent features that don't appear directly on CPUID.
>

You mean CPUX86State.features?

enable_lmce is also used in patch 2 to avoid migrating from
lmce-enabled qemu to lmce-disabled qemu, so I don't think
CPUX86State.features is the correct place for enable_lmce.

Or, we may append one or more bit words to X86CPU.filtered_features
for enable_pmu, enable_lmce and future features, e.g.

modified   target-i386/cpu.h
@@ -455,6 +455,14 @@ typedef enum FeatureWord {
 
 typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 
+typedef enum ExtFeatureBit {
+    EXT_FEAT_PMU,
+    EXT_FEAT_LMCE,
+    EXT_FEATURE_BITS,
+} ExtFeatureBit;
+
+#define EXT_FEATURE_WORDS ((EXT_FEATURE_BITS + 31) / 32)
+
+#define EXT_FEATURE_FILTER_ADD(words, feat)             \
+    do {                                                \
+        uint32_t *__words = (words);                    \
+        int __idx = (feat) / 32;                        \
+        int __oft = (feat) % 32;                        \
+        __words[__idx + FEATURE_WORDS] |= (1 << __oft); \
+    } while (0)
+
+#define EXT_FEATURE_FILTER_REMOVE(words, feat)           \
+    do {                                                 \
+        uint32_t *__words = (words);                     \
+        int __idx = (feat) / 32;                         \
+        int __oft = (feat) % 32;                         \
+        __words[__idx + FEATURE_WORDS] &= ~(1 << __oft); \
+    } while (0)
+
+#define EXT_FEATURE_FILTERED(words, feat)           \
+    ({                                              \
+        uint32_t *__words = (words);                \
+        int __idx = (feat) / 32;                    \
+        int __oft = (feat) % 32;                    \
+        __words[__idx + FEATURE_WORDS] & (1 << oft) \
+    })
+
 /* cpuid_features bits */
 #define CPUID_FP87 (1U << 0)
 #define CPUID_VME  (1U << 1)
@@ -1173,21 +1181,7 @@ struct X86CPU {
     /* Features that were filtered out because of missing host capabilities */
-    uint32_t filtered_features[FEATURE_WORDS];
+    /* Features that were filtered out because of missing host capabilities or
+       being disabled by default */
+    uint32_t filtered_features[FEATURE_WORDS + EXT_FEATURE_WORDS];
-
-    /* Enable PMU CPUID bits. This can't be enabled by default yet because
-     * it doesn't have ABI stability guarantees, as it passes all PMU CPUID
-     * bits returned by GET_SUPPORTED_CPUID (that depend on host CPU and kernel
-     * capabilities) directly to the guest.
-     */
-    bool enable_pmu;
-
-    /* Enable LMCE support which is set via cpu option 'lmce=on/off'. LMCE is
-     * disabled by default to avoid breaking the migration between QEMU with
-     * different LMCE support. Only migrating between QEMU with the same LMCE
-     * support is allowed.
-     */
-    bool enable_lmce;

Every place using X86CPU.enable_pmu and .enable_lmce is then replaced
by above macros EXT_FEATURE_FILTER_ADD, EXT_FEATURE_FILTER_REMOVE and
EXT_FEATURE_FILTERED. Of course, those bits in X86CPU.filtered_features
have the opposite meaning to original enable_lmce and enable_pmu.


Haozhong
Haozhong Zhang June 7, 2016, 11:47 a.m. UTC | #9
On 06/07/16 17:41, Haozhong Zhang wrote:
> On 06/04/16 18:03, Eduardo Habkost wrote:
> > On Sat, Jun 04, 2016 at 12:34:39PM +0200, Boris Petkov wrote:
> > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> > > 
> > > >This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> > > >will be injected to only one VCPU rather than broadcast to all
> > > >VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> > > >only available on Intel platforms.
> > > >
> > > >Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > > >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > >---
> > > >Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > >Cc: Richard Henderson <rth@twiddle.net>
> > > >Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > >Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > >Cc: Boris Petkov <bp@suse.de>
> > > >Cc: kvm@vger.kernel.org
> > > >Cc: Tony Luck <tony.luck@intel.com>
> > > >Cc: Andi Kleen <andi.kleen@intel.com>
> > > >---
> > > > target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> > > > target-i386/cpu.h | 13 ++++++++++++-
> > > > target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> > > > 3 files changed, 69 insertions(+), 5 deletions(-)
> > > 
> > > ...
> > > 
> > > >@@ -1173,6 +1182,8 @@ struct X86CPU {
> > > >      */
> > > >     bool enable_pmu;
> > > > 
> > > >+    bool enable_lmce;
> > > 
> > > That struct would go fat pretty fast if it grows a bool per CPU
> > > feature. Perhaps a more clever, a-bit-per-featurebit scheme
> > > would be in order.
> > 
> > We already have X86CPU.features, but it's specific for feature
> > flags appearing on CPUID. We can eventually extend
> > FeatureWord/FeatureWordInfo/x86_cpu_get_supported_feature_word()
> > to represent features that don't appear directly on CPUID.
> >
> 
> You mean CPUX86State.features?
> 
> enable_lmce is also used in patch 2 to avoid migrating from
> lmce-enabled qemu to lmce-disabled qemu, so I don't think
> CPUX86State.features is the correct place for enable_lmce.
>

Sorry, I got things wrong: I thought that CPUX86State.features on
destination is overwritten by the source one in the migration. As long
as it's in fact not, we may extend CPUX86State.features in a similar
way to I proposed for X86CPU.filtered_features to include features not
directly appear in cpuid.

Haozhong

> Or, we may append one or more bit words to X86CPU.filtered_features
> for enable_pmu, enable_lmce and future features, e.g.
> 
> modified   target-i386/cpu.h
> @@ -455,6 +455,14 @@ typedef enum FeatureWord {
>  
>  typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>  
> +typedef enum ExtFeatureBit {
> +    EXT_FEAT_PMU,
> +    EXT_FEAT_LMCE,
> +    EXT_FEATURE_BITS,
> +} ExtFeatureBit;
> +
> +#define EXT_FEATURE_WORDS ((EXT_FEATURE_BITS + 31) / 32)
> +
> +#define EXT_FEATURE_FILTER_ADD(words, feat)             \
> +    do {                                                \
> +        uint32_t *__words = (words);                    \
> +        int __idx = (feat) / 32;                        \
> +        int __oft = (feat) % 32;                        \
> +        __words[__idx + FEATURE_WORDS] |= (1 << __oft); \
> +    } while (0)
> +
> +#define EXT_FEATURE_FILTER_REMOVE(words, feat)           \
> +    do {                                                 \
> +        uint32_t *__words = (words);                     \
> +        int __idx = (feat) / 32;                         \
> +        int __oft = (feat) % 32;                         \
> +        __words[__idx + FEATURE_WORDS] &= ~(1 << __oft); \
> +    } while (0)
> +
> +#define EXT_FEATURE_FILTERED(words, feat)           \
> +    ({                                              \
> +        uint32_t *__words = (words);                \
> +        int __idx = (feat) / 32;                    \
> +        int __oft = (feat) % 32;                    \
> +        __words[__idx + FEATURE_WORDS] & (1 << oft) \
> +    })
> +
>  /* cpuid_features bits */
>  #define CPUID_FP87 (1U << 0)
>  #define CPUID_VME  (1U << 1)
> @@ -1173,21 +1181,7 @@ struct X86CPU {
>      /* Features that were filtered out because of missing host capabilities */
> -    uint32_t filtered_features[FEATURE_WORDS];
> +    /* Features that were filtered out because of missing host capabilities or
> +       being disabled by default */
> +    uint32_t filtered_features[FEATURE_WORDS + EXT_FEATURE_WORDS];
> -
> -    /* Enable PMU CPUID bits. This can't be enabled by default yet because
> -     * it doesn't have ABI stability guarantees, as it passes all PMU CPUID
> -     * bits returned by GET_SUPPORTED_CPUID (that depend on host CPU and kernel
> -     * capabilities) directly to the guest.
> -     */
> -    bool enable_pmu;
> -
> -    /* Enable LMCE support which is set via cpu option 'lmce=on/off'. LMCE is
> -     * disabled by default to avoid breaking the migration between QEMU with
> -     * different LMCE support. Only migrating between QEMU with the same LMCE
> -     * support is allowed.
> -     */
> -    bool enable_lmce;
> 
> Every place using X86CPU.enable_pmu and .enable_lmce is then replaced
> by above macros EXT_FEATURE_FILTER_ADD, EXT_FEATURE_FILTER_REMOVE and
> EXT_FEATURE_FILTERED. Of course, those bits in X86CPU.filtered_features
> have the opposite meaning to original enable_lmce and enable_pmu.
> 
> 
> Haozhong
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Habkost June 7, 2016, 8:10 p.m. UTC | #10
On Fri, Jun 03, 2016 at 02:09:43PM +0800, Haozhong Zhang wrote:
[...]
> +
> +        if (cpu->enable_lmce) {
> +            if (lmce_supported()) {
> +                cenv->mcg_cap |= MCG_LMCE_P;
> +                cenv->msr_ia32_feature_control |=
> +                    MSR_IA32_FEATURE_CONTROL_LMCE |
> +                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> +            } else {
> +                error_report("Warning: KVM unavailable or not support LMCE, "
> +                             "LMCE disabled");
> +                cpu->enable_lmce = false;

Please don't do that. If the user explicitly asked for LMCE, you
should refuse to start if the host doesn't have the required
capabilities.


> +            }
> +        }
> +
>          cenv->mcg_ctl = ~(uint64_t)0;
>          for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
>              cenv->mce_banks[bank * 4] = ~(uint64_t)0;
[...]
Haozhong Zhang June 8, 2016, 1:43 a.m. UTC | #11
On 06/07/16 17:10, Eduardo Habkost wrote:
> On Fri, Jun 03, 2016 at 02:09:43PM +0800, Haozhong Zhang wrote:
> [...]
> > +
> > +        if (cpu->enable_lmce) {
> > +            if (lmce_supported()) {
> > +                cenv->mcg_cap |= MCG_LMCE_P;
> > +                cenv->msr_ia32_feature_control |=
> > +                    MSR_IA32_FEATURE_CONTROL_LMCE |
> > +                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> > +            } else {
> > +                error_report("Warning: KVM unavailable or not support LMCE, "
> > +                             "LMCE disabled");
> > +                cpu->enable_lmce = false;
> 
> Please don't do that. If the user explicitly asked for LMCE, you
> should refuse to start if the host doesn't have the required
> capabilities.
>

OK, I'll change in the next version.

Thanks,
Haozhong

> 
> > +            }
> > +        }
> > +
> >          cenv->mcg_ctl = ~(uint64_t)0;
> >          for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
> >              cenv->mce_banks[bank * 4] = ~(uint64_t)0;
> [...]
> 
> -- 
> Eduardo
Paolo Bonzini June 8, 2016, 11:32 a.m. UTC | #12
On 03/06/2016 17:57, Radim Krčmář wrote:
>> > +                cenv->msr_ia32_feature_control |=
>> > +                    MSR_IA32_FEATURE_CONTROL_LMCE |
>> > +                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> Locking right from the start breaks nested KVM, because nested relies on
> setting VMXON feature from inside of the guest.
> 
> Do we keep it unlocked, or move everything into QEMU?
> 
> (The latter seems simpler.)

I think it should be moved into the firmware, with QEMU publishing the
desired setting via fw_cfg.  The same as what is done in real hardware,
that's the KVM mantra. :)

For v4 it's okay to just remove this.

Paolo
Paolo Bonzini June 8, 2016, 11:34 a.m. UTC | #13
On 05/06/2016 17:41, Haozhong Zhang wrote:
> On 06/04/16 12:34, Boris Petkov wrote:
>> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>>
>>> This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
>>> will be injected to only one VCPU rather than broadcast to all
>>> VCPUs. As KVM reports LMCE support on Intel platforms, this features is
>>> only available on Intel platforms.
>>>
>>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>> ---
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Richard Henderson <rth@twiddle.net>
>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>>> Cc: Boris Petkov <bp@suse.de>
>>> Cc: kvm@vger.kernel.org
>>> Cc: Tony Luck <tony.luck@intel.com>
>>> Cc: Andi Kleen <andi.kleen@intel.com>
>>> ---
>>> target-i386/cpu.c | 26 ++++++++++++++++++++++++++
>>> target-i386/cpu.h | 13 ++++++++++++-
>>> target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
>>> 3 files changed, 69 insertions(+), 5 deletions(-)
>>
>> ...
>>
>>> @@ -1173,6 +1182,8 @@ struct X86CPU {
>>>      */
>>>     bool enable_pmu;
>>>
>>> +    bool enable_lmce;
>>
>> That struct would go fat pretty fast if it grows a bool per CPU feature. Perhaps a more clever, a-bit-per-featurebit scheme would be in order.
> 
> OK, I'll use a 64-bit integer for current and future features.

No, please keep this as is for now.  It can be refactored later.

Paolo
Haozhong Zhang June 9, 2016, 6:52 a.m. UTC | #14
On 06/08/16 13:34, Paolo Bonzini wrote:
> 
> 
> On 05/06/2016 17:41, Haozhong Zhang wrote:
> > On 06/04/16 12:34, Boris Petkov wrote:
> >> Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> >>
> >>> This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
> >>> will be injected to only one VCPU rather than broadcast to all
> >>> VCPUs. As KVM reports LMCE support on Intel platforms, this features is
> >>> only available on Intel platforms.
> >>>
> >>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> >>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >>> ---
> >>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>> Cc: Richard Henderson <rth@twiddle.net>
> >>> Cc: Eduardo Habkost <ehabkost@redhat.com>
> >>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> >>> Cc: Boris Petkov <bp@suse.de>
> >>> Cc: kvm@vger.kernel.org
> >>> Cc: Tony Luck <tony.luck@intel.com>
> >>> Cc: Andi Kleen <andi.kleen@intel.com>
> >>> ---
> >>> target-i386/cpu.c | 26 ++++++++++++++++++++++++++
> >>> target-i386/cpu.h | 13 ++++++++++++-
> >>> target-i386/kvm.c | 35 +++++++++++++++++++++++++++++++----
> >>> 3 files changed, 69 insertions(+), 5 deletions(-)
> >>
> >> ...
> >>
> >>> @@ -1173,6 +1182,8 @@ struct X86CPU {
> >>>      */
> >>>     bool enable_pmu;
> >>>
> >>> +    bool enable_lmce;
> >>
> >> That struct would go fat pretty fast if it grows a bool per CPU feature. Perhaps a more clever, a-bit-per-featurebit scheme would be in order.
> > 
> > OK, I'll use a 64-bit integer for current and future features.
> 
> No, please keep this as is for now.  It can be refactored later.
>

OK

Thanks,
Haozhong
Haozhong Zhang June 13, 2016, 7:55 a.m. UTC | #15
On 06/08/16 13:32, Paolo Bonzini wrote:
> 
> 
> On 03/06/2016 17:57, Radim Krčmář wrote:
> >> > +                cenv->msr_ia32_feature_control |=
> >> > +                    MSR_IA32_FEATURE_CONTROL_LMCE |
> >> > +                    MSR_IA32_FEATURE_CONTROL_LOCKED;
> > Locking right from the start breaks nested KVM, because nested relies on
> > setting VMXON feature from inside of the guest.
> > 
> > Do we keep it unlocked, or move everything into QEMU?
> > 
> > (The latter seems simpler.)
> 
> I think it should be moved into the firmware, with QEMU publishing the
> desired setting via fw_cfg.  The same as what is done in real hardware,
> that's the KVM mantra. :)
> 
> For v4 it's okay to just remove this.
> 
> Paolo

Currently, only VMX bits (bit 1 & 2), LMCE bit (bit 20) as well as
lock bit (bit 0) in MSR_IA32_FEATURE_CONTROL are used for guest. The
availability of features indicated by those bits (except the lock bit)
can be discovered from cpuid and other MSR, so it looks not necessary
to publish them via fw_cfg. Or do you have other concerns?

Thanks,
Haozhong
Paolo Bonzini June 13, 2016, 8:33 a.m. UTC | #16
On 13/06/2016 09:55, Haozhong Zhang wrote:
> Currently, only VMX bits (bit 1 & 2), LMCE bit (bit 20) as well as
> lock bit (bit 0) in MSR_IA32_FEATURE_CONTROL are used for guest. The
> availability of features indicated by those bits (except the lock bit)
> can be discovered from cpuid and other MSR, so it looks not necessary
> to publish them via fw_cfg. Or do you have other concerns?

I would prefer to avoid having to change the firmware (SeaBIOS and OVMF)
every time a new bit is added.  Using fw_cfg makes it possible to
develop the feature in the firmware once and for all.

Paolo
Haozhong Zhang June 13, 2016, 10:01 a.m. UTC | #17
On 06/13/16 10:33, Paolo Bonzini wrote:
> 
> 
> On 13/06/2016 09:55, Haozhong Zhang wrote:
> > Currently, only VMX bits (bit 1 & 2), LMCE bit (bit 20) as well as
> > lock bit (bit 0) in MSR_IA32_FEATURE_CONTROL are used for guest. The
> > availability of features indicated by those bits (except the lock bit)
> > can be discovered from cpuid and other MSR, so it looks not necessary
> > to publish them via fw_cfg. Or do you have other concerns?
> 
> I would prefer to avoid having to change the firmware (SeaBIOS and OVMF)
> every time a new bit is added.  Using fw_cfg makes it possible to
> develop the feature in the firmware once and for all.
> 

Thanks for the explanation! Is it proper to add a key in fw_cfg for
this purpose, e.g FW_CFG_MSR_FEATURE_CONTROL to pass bits that are
supposed to be set by firmware?

Thanks,
Haozhong
Paolo Bonzini June 13, 2016, 10:07 a.m. UTC | #18
On 13/06/2016 12:01, Haozhong Zhang wrote:
> > I would prefer to avoid having to change the firmware (SeaBIOS and OVMF)
> > every time a new bit is added.  Using fw_cfg makes it possible to
> > develop the feature in the firmware once and for all.
> 
> Thanks for the explanation! Is it proper to add a key in fw_cfg for
> this purpose, e.g FW_CFG_MSR_FEATURE_CONTROL to pass bits that are
> supposed to be set by firmware?

We usually add new files now, so it would be a new file named
"etc/msr-feature-control".

Thanks,

Paolo
Haozhong Zhang June 13, 2016, 10:09 a.m. UTC | #19
On 06/13/16 12:07, Paolo Bonzini wrote:
> 
> 
> On 13/06/2016 12:01, Haozhong Zhang wrote:
> > > I would prefer to avoid having to change the firmware (SeaBIOS and OVMF)
> > > every time a new bit is added.  Using fw_cfg makes it possible to
> > > develop the feature in the firmware once and for all.
> > 
> > Thanks for the explanation! Is it proper to add a key in fw_cfg for
> > this purpose, e.g FW_CFG_MSR_FEATURE_CONTROL to pass bits that are
> > supposed to be set by firmware?
> 
> We usually add new files now, so it would be a new file named
> "etc/msr-feature-control".
> 

Got it, thanks!

Haozhong
diff mbox

Patch

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 895a386..9b4dbab 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2777,6 +2777,18 @@  static void x86_cpu_machine_reset_cb(void *opaque)
 }
 #endif
 
+static bool lmce_supported(void)
+{
+    uint64_t mce_cap;
+
+    if (!kvm_enabled() ||
+        kvm_ioctl(kvm_state, KVM_X86_GET_MCE_CAP_SUPPORTED, &mce_cap) < 0) {
+        return false;
+    }
+
+    return !!(mce_cap & MCG_LMCE_P);
+}
+
 static void mce_init(X86CPU *cpu)
 {
     CPUX86State *cenv = &cpu->env;
@@ -2786,6 +2798,20 @@  static void mce_init(X86CPU *cpu)
         && (cenv->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
             (CPUID_MCE | CPUID_MCA)) {
         cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
+
+        if (cpu->enable_lmce) {
+            if (lmce_supported()) {
+                cenv->mcg_cap |= MCG_LMCE_P;
+                cenv->msr_ia32_feature_control |=
+                    MSR_IA32_FEATURE_CONTROL_LMCE |
+                    MSR_IA32_FEATURE_CONTROL_LOCKED;
+            } else {
+                error_report("Warning: KVM unavailable or not support LMCE, "
+                             "LMCE disabled");
+                cpu->enable_lmce = false;
+            }
+        }
+
         cenv->mcg_ctl = ~(uint64_t)0;
         for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
             cenv->mce_banks[bank * 4] = ~(uint64_t)0;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 0426459..2d411ba 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -292,6 +292,7 @@ 
 
 #define MCG_CTL_P       (1ULL<<8)   /* MCG_CAP register available */
 #define MCG_SER_P       (1ULL<<24) /* MCA recovery/new status bits */
+#define MCG_LMCE_P      (1ULL<<27) /* Local Machine Check Supported */
 
 #define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
 #define MCE_BANKS_DEF   10
@@ -301,6 +302,9 @@ 
 #define MCG_STATUS_RIPV (1ULL<<0)   /* restart ip valid */
 #define MCG_STATUS_EIPV (1ULL<<1)   /* ip points to correct instruction */
 #define MCG_STATUS_MCIP (1ULL<<2)   /* machine check in progress */
+#define MCG_STATUS_LMCE (1ULL<<3)   /* Local MCE signaled */
+
+#define MCG_EXT_CTL_LMCE_EN (1ULL<<0) /* Local MCE enabled */
 
 #define MCI_STATUS_VAL   (1ULL<<63)  /* valid error */
 #define MCI_STATUS_OVER  (1ULL<<62)  /* previous errors lost */
@@ -325,6 +329,8 @@ 
 #define MSR_IA32_APICBASE_ENABLE        (1<<11)
 #define MSR_IA32_APICBASE_BASE          (0xfffffU<<12)
 #define MSR_IA32_FEATURE_CONTROL        0x0000003a
+#define MSR_IA32_FEATURE_CONTROL_LOCKED (1<<0)
+#define MSR_IA32_FEATURE_CONTROL_LMCE   (1<<20)
 #define MSR_TSC_ADJUST                  0x0000003b
 #define MSR_IA32_TSCDEADLINE            0x6e0
 
@@ -343,6 +349,7 @@ 
 #define MSR_MCG_CAP                     0x179
 #define MSR_MCG_STATUS                  0x17a
 #define MSR_MCG_CTL                     0x17b
+#define MSR_MCG_EXT_CTL                 0x4d0
 
 #define MSR_P6_EVNTSEL0                 0x186
 
@@ -1011,7 +1018,6 @@  typedef struct CPUX86State {
 
     uint64_t mcg_status;
     uint64_t msr_ia32_misc_enable;
-    uint64_t msr_ia32_feature_control;
 
     uint64_t msr_fixed_ctr_ctrl;
     uint64_t msr_global_ctrl;
@@ -1104,8 +1110,11 @@  typedef struct CPUX86State {
     int64_t user_tsc_khz; /* for sanity check only */
     void *kvm_xsave_buf;
 
+    uint64_t msr_ia32_feature_control;
+
     uint64_t mcg_cap;
     uint64_t mcg_ctl;
+    uint64_t mcg_ext_ctl;
     uint64_t mce_banks[MCE_BANKS_DEF*4];
 
     uint64_t tsc_aux;
@@ -1173,6 +1182,8 @@  struct X86CPU {
      */
     bool enable_pmu;
 
+    bool enable_lmce;
+
     /* in order to simplify APIC support, we leave this pointer to the
        user */
     struct DeviceState *apic_state;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index abf50e6..ea442b3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -107,6 +107,8 @@  static int has_xsave;
 static int has_xcrs;
 static int has_pit_state2;
 
+static bool has_msr_mcg_ext_ctl;
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
@@ -378,10 +380,12 @@  static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
 
 static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
 {
+    CPUState *cs = CPU(cpu);
     CPUX86State *env = &cpu->env;
     uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
                       MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
     uint64_t mcg_status = MCG_STATUS_MCIP;
+    int flags = 0;
 
     if (code == BUS_MCEERR_AR) {
         status |= MCI_STATUS_AR | 0x134;
@@ -390,10 +394,19 @@  static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
         status |= 0xc0;
         mcg_status |= MCG_STATUS_RIPV;
     }
+
+    flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
+    /* We need to read back the value of MSR_EXT_MCG_CTL that was set by the
+     * guest kernel back into env->mcg_ext_ctl.
+     */
+    cpu_synchronize_state(cs);
+    if (env->mcg_ext_ctl & MCG_EXT_CTL_LMCE_EN) {
+        mcg_status |= MCG_STATUS_LMCE;
+        flags = 0;
+    }
+
     cpu_x86_inject_mce(NULL, cpu, 9, status, mcg_status, paddr,
-                       (MCM_ADDR_PHYS << 6) | 0xc,
-                       cpu_x86_support_mca_broadcast(env) ?
-                       MCE_INJECT_BROADCAST : 0);
+                       (MCM_ADDR_PHYS << 6) | 0xc, flags);
 }
 
 static void hardware_memory_error(void)
@@ -878,7 +891,12 @@  int kvm_arch_init_vcpu(CPUState *cs)
     c = cpuid_find_entry(&cpuid_data.cpuid, 1, 0);
     if (c) {
         has_msr_feature_control = !!(c->ecx & CPUID_EXT_VMX) ||
-                                  !!(c->ecx & CPUID_EXT_SMX);
+                                  !!(c->ecx & CPUID_EXT_SMX) ||
+                                  !!(env->mcg_cap & MCG_LMCE_P);
+    }
+
+    if (has_msr_feature_control && (env->mcg_cap & MCG_LMCE_P)) {
+        has_msr_mcg_ext_ctl = true;
     }
 
     c = cpuid_find_entry(&cpuid_data.cpuid, 0x80000007, 0);
@@ -1702,6 +1720,9 @@  static int kvm_put_msrs(X86CPU *cpu, int level)
 
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, env->mcg_status);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, env->mcg_ctl);
+        if (has_msr_mcg_ext_ctl) {
+            kvm_msr_entry_add(cpu, MSR_MCG_EXT_CTL, env->mcg_ext_ctl);
+        }
         for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
             kvm_msr_entry_add(cpu, MSR_MC0_CTL + i, env->mce_banks[i]);
         }
@@ -2005,6 +2026,9 @@  static int kvm_get_msrs(X86CPU *cpu)
     if (env->mcg_cap) {
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
+        if (has_msr_mcg_ext_ctl) {
+            kvm_msr_entry_add(cpu, MSR_MCG_EXT_CTL, 0);
+        }
         for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
             kvm_msr_entry_add(cpu, MSR_MC0_CTL + i, 0);
         }
@@ -2133,6 +2157,9 @@  static int kvm_get_msrs(X86CPU *cpu)
         case MSR_MCG_CTL:
             env->mcg_ctl = msrs[i].data;
             break;
+        case MSR_MCG_EXT_CTL:
+            env->mcg_ext_ctl = msrs[i].data;
+            break;
         case MSR_IA32_MISC_ENABLE:
             env->msr_ia32_misc_enable = msrs[i].data;
             break;