[2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]

Message ID	20230922164239.2253604-2-jmattson@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> Date: Fri, 22 Sep 2023 09:42:38 -0700 In-Reply-To: <20230922164239.2253604-1-jmattson@google.com> Mime-Version: 1.0 References: <20230922164239.2253604-1-jmattson@google.com> Message-ID: <20230922164239.2253604-2-jmattson@google.com> Subject: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org, "'Sean Christopherson '" <seanjc@google.com>, "'Paolo Bonzini '" <pbonzini@redhat.com> Cc: Jim Mattson <jmattson@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	[1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set \| expand [1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set [2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] [3/3] KVM: selftests: Test behavior of HWCR

Jim Mattson Sept. 22, 2023, 4:42 p.m. UTC

On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
set. If it isn't set, they complain:
	[Firmware Bug]: TSC doesn't count with P0 frequency!

Eliminate this complaint by setting the bit on virtual processors for
which Linux guests expect it to be set.

Note that this bit is read-only on said processors.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/cpuid.c | 10 ++++++++++
 arch/x86/kvm/x86.c   |  7 +++++++
 2 files changed, 17 insertions(+)

Sean Christopherson Sept. 22, 2023, 5:21 p.m. UTC | #1

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> set. If it isn't set, they complain:
> 	[Firmware Bug]: TSC doesn't count with P0 frequency!
> 
> Eliminate this complaint by setting the bit on virtual processors for
> which Linux guests expect it to be set.
> 
> Note that this bit is read-only on said processors.
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 10 ++++++++++
>  arch/x86/kvm/x86.c   |  7 +++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0544e30b4946..2d7dcd13dcc3 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -373,6 +373,16 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>  	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
>  
> +	/*
> +	 * HWCR.TscFreqSel[bit 24] has a reset value of 1 on some processors.
> +	 */
> +	if (guest_cpuid_is_amd_or_hygon(vcpu) &&
> +	    guest_cpuid_has(vcpu, X86_FEATURE_CONSTANT_TSC) &&
> +	    (guest_cpuid_family(vcpu) > 0x10 ||
> +	     (guest_cpuid_family(vcpu) == 0x10 &&
> +	      guest_cpuid_model(vcpu) >= 2)))
> +		vcpu->arch.msr_hwcr |= BIT(24);

Oh hell no.  It's bad enough that KVM _allows_ setting uarch specific bits, but
actively setting bits is a step too far.

IMO, we should delete the offending kernel code.  I don't see how it provides any
value these days.

And *if* we want to change something in KVM so that we stop getting coustomer
complaints about a useless bit, just let userspace stuff the bit.

I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
that bits in HWCR that KVM allows to be set are architecturally defined.  It's
totally fine if the value of bit 24 is uarch specific, but the behavior needs to
be something that won't change from processor to processor.

>  	kvm_pmu_refresh(vcpu);
>  	vcpu->arch.cr4_guest_rsvd_bits =
>  	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3421ed7fcee0..cb02a7c2938b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		data &= ~(u64)0x40;	/* ignore flush filter disable */
>  		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
>  		data &= ~(u64)0x8;	/* ignore TLB cache disable */
> +		data &= ~(u64)0x1000000;/* ignore TscFreqSel */
>  
>  		/* Handle McStatusWrEn */
>  		if (data & ~BIT_ULL(18)) {
>  			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>  			return 1;
>  		}
> +
> +		/*
> +		 * When set, TscFreqSel is read-only. Attempts to
> +		 * clear it are ignored.
> +		 */
> +		data |= vcpu->arch.msr_hwcr & BIT_ULL(24);


The bit is read-only from the guest, but KVM needs to let userspace clear the
bit.

>  		vcpu->arch.msr_hwcr = data;
>  		break;
>  	case MSR_FAM10H_MMIO_CONF_BASE:
> -- 
> 2.42.0.515.g380fc7ccd1-goog
>

Jim Mattson Sept. 22, 2023, 5:48 p.m. UTC | #2

On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> > set. If it isn't set, they complain:
> >       [Firmware Bug]: TSC doesn't count with P0 frequency!
> >
> > Eliminate this complaint by setting the bit on virtual processors for
> > which Linux guests expect it to be set.
> >
> > Note that this bit is read-only on said processors.
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 10 ++++++++++
> >  arch/x86/kvm/x86.c   |  7 +++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 0544e30b4946..2d7dcd13dcc3 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -373,6 +373,16 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >       vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> >       vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> >
> > +     /*
> > +      * HWCR.TscFreqSel[bit 24] has a reset value of 1 on some processors.
> > +      */
> > +     if (guest_cpuid_is_amd_or_hygon(vcpu) &&
> > +         guest_cpuid_has(vcpu, X86_FEATURE_CONSTANT_TSC) &&
> > +         (guest_cpuid_family(vcpu) > 0x10 ||
> > +          (guest_cpuid_family(vcpu) == 0x10 &&
> > +           guest_cpuid_model(vcpu) >= 2)))
> > +             vcpu->arch.msr_hwcr |= BIT(24);
>
> Oh hell no.  It's bad enough that KVM _allows_ setting uarch specific bits, but
> actively setting bits is a step too far.

The bit should be set on power on. From the PPR for AMD Family 17h
Model 01h, Revision B1 Processors,

> TscFreqSel: TSC frequency select. Read-only. Reset: 1.

Leaving it clear is a violation of the CPU vendor's hardware specification.

> IMO, we should delete the offending kernel code.  I don't see how it provides any
> value these days.

Sure, but that doesn't help legacy guests.

> And *if* we want to change something in KVM so that we stop getting coustomer
> complaints about a useless bit, just let userspace stuff the bit.

We want to make customers happy. That should not even be a question.

> I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> be something that won't change from processor to processor.
>
> >       kvm_pmu_refresh(vcpu);
> >       vcpu->arch.cr4_guest_rsvd_bits =
> >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3421ed7fcee0..cb02a7c2938b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> >
> >               /* Handle McStatusWrEn */
> >               if (data & ~BIT_ULL(18)) {
> >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> >                       return 1;
> >               }
> > +
> > +             /*
> > +              * When set, TscFreqSel is read-only. Attempts to
> > +              * clear it are ignored.
> > +              */
> > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
>
>
> The bit is read-only from the guest, but KVM needs to let userspace clear the
> bit.

Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
"reads as one" bit.

Sean Christopherson Sept. 22, 2023, 6:15 p.m. UTC | #3

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > value these days.
> 
> Sure, but that doesn't help legacy guests.

Heh, IMO they don't need help, their owners just need to be placated ;-)

> > And *if* we want to change something in KVM so that we stop getting coustomer
> > complaints about a useless bit, just let userspace stuff the bit.
> 
> We want to make customers happy. That should not even be a question.

Can we really not tell them "this is a benign guest bug, ignore it"?

> > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> > that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> > totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> > be something that won't change from processor to processor.
> >
> > >       kvm_pmu_refresh(vcpu);
> > >       vcpu->arch.cr4_guest_rsvd_bits =
> > >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 3421ed7fcee0..cb02a7c2938b 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> > >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> > >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> > >
> > >               /* Handle McStatusWrEn */
> > >               if (data & ~BIT_ULL(18)) {
> > >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> > >                       return 1;
> > >               }
> > > +
> > > +             /*
> > > +              * When set, TscFreqSel is read-only. Attempts to
> > > +              * clear it are ignored.
> > > +              */
> > > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
> >
> >
> > The bit is read-only from the guest, but KVM needs to let userspace clear the
> > bit.
> 
> Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
> "reads as one" bit.

Because that's architectural behavior, not dependent on FMS, and KVM needs to
write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
to userspace is not a realistic option.

As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
it's at least possible.

That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
but then KVM would have to be 100% precise on the FMS matching, which would be a
maintenance nightmare.

In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
to allow userspace to define a sane model, but with *very* few exceptions, KVM
should not try to "help" userspace by stuffing bits.

Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
it presents a bogus model if userspace decides to not enumerate a vPMU to the
guest (Aaron was allegedly going to send a patch for this...).

Jim Mattson Sept. 22, 2023, 6:27 p.m. UTC | #4

On Fri, Sep 22, 2023 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > > value these days.
> >
> > Sure, but that doesn't help legacy guests.
>
> Heh, IMO they don't need help, their owners just need to be placated ;-)
>
> > > And *if* we want to change something in KVM so that we stop getting coustomer
> > > complaints about a useless bit, just let userspace stuff the bit.
> >
> > We want to make customers happy. That should not even be a question.
>
> Can we really not tell them "this is a benign guest bug, ignore it"?

What is the mechanism for doing that?

> > > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> > > that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> > > totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> > > be something that won't change from processor to processor.
> > >
> > > >       kvm_pmu_refresh(vcpu);
> > > >       vcpu->arch.cr4_guest_rsvd_bits =
> > > >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 3421ed7fcee0..cb02a7c2938b 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> > > >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> > > >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > > > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> > > >
> > > >               /* Handle McStatusWrEn */
> > > >               if (data & ~BIT_ULL(18)) {
> > > >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> > > >                       return 1;
> > > >               }
> > > > +
> > > > +             /*
> > > > +              * When set, TscFreqSel is read-only. Attempts to
> > > > +              * clear it are ignored.
> > > > +              */
> > > > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
> > >
> > >
> > > The bit is read-only from the guest, but KVM needs to let userspace clear the
> > > bit.
> >
> > Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
> > "reads as one" bit.
>
> Because that's architectural behavior, not dependent on FMS, and KVM needs to
> write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
> to userspace is not a realistic option.

Remind me what "MSR" stands for. :)

> As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
> something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
> end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
> it's at least possible.
>
> That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
> but then KVM would have to be 100% precise on the FMS matching, which would be a
> maintenance nightmare.

What if I did something crude like we do for MSR_IA32_MISC_ENABLE, and
just set the bit at reset regardless of FMS:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cb02a7c2938b..4d7d0de42a9d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12086,6 +12086,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu,
bool init_event)
                vcpu->arch.msr_misc_features_enables = 0;
                vcpu->arch.ia32_misc_enable_msr =
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |

MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+               vcpu_arch.msr_hwcr = BIT_ULL(24);

                __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
                __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);

> In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
> to allow userspace to define a sane model, but with *very* few exceptions, KVM
> should not try to "help" userspace by stuffing bits.

Okay. What about the IA32_MISC_ENABLE bits above?

> Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
> perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
> it presents a bogus model if userspace decides to not enumerate a vPMU to the
> guest (Aaron was allegedly going to send a patch for this...).

KVM is nothing if not inconsistent.

Sean Christopherson Sept. 22, 2023, 7:40 p.m. UTC | #5

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Sep 22, 2023, Jim Mattson wrote:
> > > On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > > > value these days.
> > >
> > > Sure, but that doesn't help legacy guests.
> >
> > Heh, IMO they don't need help, their owners just need to be placated ;-)
> >
> > > > And *if* we want to change something in KVM so that we stop getting coustomer
> > > > complaints about a useless bit, just let userspace stuff the bit.
> > >
> > > We want to make customers happy. That should not even be a question.
> >
> > Can we really not tell them "this is a benign guest bug, ignore it"?
> 
> What is the mechanism for doing that?

Not my problem?  :-D

> > Because that's architectural behavior, not dependent on FMS, and KVM needs to
> > write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
> > to userspace is not a realistic option.
> 
> Remind me what "MSR" stands for. :)

Heh, that acronym has long, long since lost all meaning.

Joking aside, I think KVM needs to set a very, very high bare for emulating any
part of any MSR that is truly model specific.  IMO, it's far too likely that KVM
will be the one left holding the bag in such situations.

> > As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
> > something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
> > end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
> > it's at least possible.
> >
> > That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
> > but then KVM would have to be 100% precise on the FMS matching, which would be a
> > maintenance nightmare.
> 
> What if I did something crude like we do for MSR_IA32_MISC_ENABLE, and
> just set the bit at reset regardless of FMS:

I'd prefer that over playing games with FMS.  Though my first chioce would still
be to punt the decision to userspace.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cb02a7c2938b..4d7d0de42a9d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12086,6 +12086,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu,
> bool init_event)
>                 vcpu->arch.msr_misc_features_enables = 0;
>                 vcpu->arch.ia32_misc_enable_msr =
> MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
> 
> MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
> +               vcpu_arch.msr_hwcr = BIT_ULL(24);
> 
>                 __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
>                 __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);
> 
> > In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
> > to allow userspace to define a sane model, but with *very* few exceptions, KVM
> > should not try to "help" userspace by stuffing bits.
> 
> Okay. What about the IA32_MISC_ENABLE bits above?

One of the exceptions where I don't see a better option, and hopefully something
that Intel won't repeat in the future.  Though I'm not exactly brimming with
confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
in the future when they realize they forgot add a useful CPUID feature bit.

> > Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
> > perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
> > it presents a bogus model if userspace decides to not enumerate a vPMU to the
> > guest (Aaron was allegedly going to send a patch for this...).
> 
> KVM is nothing if not inconsistent.

Yeah, the existing inconsistencies are painful, but that's not a good reason to
continue what I see as bad behavior.

Jim Mattson Sept. 22, 2023, 8:16 p.m. UTC | #6

On Fri, Sep 22, 2023 at 12:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > Okay. What about the IA32_MISC_ENABLE bits above?
>
> One of the exceptions where I don't see a better option, and hopefully something
> that Intel won't repeat in the future.  Though I'm not exactly brimming with
> confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
> in the future when they realize they forgot add a useful CPUID feature bit.

I don't understand the difference here. Why not make userspace
responsible for setting these bits as well?

Sean Christopherson Sept. 22, 2023, 8:51 p.m. UTC | #7

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 12:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Sep 22, 2023, Jim Mattson wrote:
> > > Okay. What about the IA32_MISC_ENABLE bits above?
> >
> > One of the exceptions where I don't see a better option, and hopefully something
> > that Intel won't repeat in the future.  Though I'm not exactly brimming with
> > confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
> > in the future when they realize they forgot add a useful CPUID feature bit.
> 
> I don't understand the difference here. Why not make userspace
> responsible for setting these bits as well?

That probably would have been the ideal approach.  I'm not entirely sure it would
have actually been feasible though, as I suspect enumerting X86_FEATURE_DS without
any kind of guard would break userspace that reflects KVM_GET_SUPPORTED_CPUID
back into KVM_SET_CPUID(2).

Even better would have been to never merge PEBS support in KVM in its current
form.  The whole thing is a house of cards, e.g. if counters are "cross-mapped"
then the guest counters simply stop working.  And those warts aside, the entire
enabling was a chaotic mess.  See commit 9fc222967a39 ("KVM: x86: Give host
userspace full control of MSR_IA32_MISC_ENABLES").

In other words, setting the UNAVAILABLE bits was the least awful way to salvage
the mess.

[2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]

Commit Message

Comments

Patch