diff mbox series

[v3,2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]

Message ID 20230928185128.824140-3-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set | expand

Commit Message

Jim Mattson Sept. 28, 2023, 6:51 p.m. UTC
On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
set. If it isn't set, they complain:
	[Firmware Bug]: TSC doesn't count with P0 frequency!

Allow userspace to set this bit in the virtual HWCR to eliminate the
above complaint.

Attempts to clear this bit from within the guest are ignored, to match
the behavior of modern AMD processors.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/x86.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Sean Christopherson Sept. 29, 2023, 12:56 a.m. UTC | #1
On Thu, Sep 28, 2023, Jim Mattson wrote:
> On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> set. If it isn't set, they complain:
> 	[Firmware Bug]: TSC doesn't count with P0 frequency!
> 
> Allow userspace to set this bit in the virtual HWCR to eliminate the
> above complaint.
> 
> Attempts to clear this bit from within the guest are ignored, to match
> the behavior of modern AMD processors.
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/x86.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a323cae219c..9209fc0d1a51 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3700,11 +3700,26 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
>  		data &= ~(u64)0x8;	/* ignore TLB cache disable */
>  
> -		/* Handle McStatusWrEn */
> -		if (data & ~BIT_ULL(18)) {
> +		/*
> +		 * Ignore guest attempts to set TscFreqSel.
> +		 */

No need for a multi-line comment.
/
> +		if (!msr_info->host_initiated)
> +			data &= ~BIT_ULL(24);

There's no need to clear this before the check below.  The (arguably useless)
print will show the "supported" bit, but I can't imagine anyone will care.

> +
> +		/*
> +		 * Allow McStatusWrEn and (from the host) TscFreqSel.

This is unnecessarily confusing IMO, just state that writes to TscFreqSel are
architecturally ignored.  This would also be an opportune time to explain why
KVM allows this stupidity...

> +		 */
> +		if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
>  			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>  			return 1;
>  		}
> +
> +		/*
> +		 * TscFreqSel is read-only from within the
> +		 * guest. Attempts to clear it are ignored.

Overly aggressive wrapping.

How about this?

---
 arch/x86/kvm/x86.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 791a644dd481..4dd64d359142 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3700,11 +3700,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
 		data &= ~(u64)0x8;	/* ignore TLB cache disable */
 
-		/* Handle McStatusWrEn */
-		if (data & ~BIT_ULL(18)) {
+		/*
+		 * Allow McStatusWrEn and TscFreqSel, some guests whine if they
+		 * aren't set.
+		 */
+		if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
 			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
 			return 1;
 		}
+
+		/* TscFreqSel is architecturally read-only, writes are ignored */
+		if (!msr_info->host_initiated)
+			data = ~(data & BIT_ULL(24)) |
+			       (vcpu->arch.msr_hwcr & BIT_ULL(24));
 		vcpu->arch.msr_hwcr = data;
 		break;
 	case MSR_FAM10H_MMIO_CONF_BASE:

base-commit: 831ee29a0d4a2219d30268f9fc577217d222e339
--
Jim Mattson Sept. 29, 2023, 1:18 p.m. UTC | #2
On Thu, Sep 28, 2023 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Sep 28, 2023, Jim Mattson wrote:
> > On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> > set. If it isn't set, they complain:
> >       [Firmware Bug]: TSC doesn't count with P0 frequency!
> >
> > Allow userspace to set this bit in the virtual HWCR to eliminate the
> > above complaint.
> >
> > Attempts to clear this bit from within the guest are ignored, to match
> > the behavior of modern AMD processors.
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/kvm/x86.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1a323cae219c..9209fc0d1a51 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3700,11 +3700,26 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> >
> > -             /* Handle McStatusWrEn */
> > -             if (data & ~BIT_ULL(18)) {
> > +             /*
> > +              * Ignore guest attempts to set TscFreqSel.
> > +              */
>
> No need for a multi-line comment.
> /
> > +             if (!msr_info->host_initiated)
> > +                     data &= ~BIT_ULL(24);
>
> There's no need to clear this before the check below.  The (arguably useless)
> print will show the "supported" bit, but I can't imagine anyone will care.
>
> > +
> > +             /*
> > +              * Allow McStatusWrEn and (from the host) TscFreqSel.
>
> This is unnecessarily confusing IMO, just state that writes to TscFreqSel are
> architecturally ignored.  This would also be an opportune time to explain why
> KVM allows this stupidity...
>
> > +              */
> > +             if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
> >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> >                       return 1;
> >               }
> > +
> > +             /*
> > +              * TscFreqSel is read-only from within the
> > +              * guest. Attempts to clear it are ignored.
>
> Overly aggressive wrapping.
>
> How about this?
>
> ---
>  arch/x86/kvm/x86.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 791a644dd481..4dd64d359142 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3700,11 +3700,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 data &= ~(u64)0x100;    /* ignore ignne emulation enable */
>                 data &= ~(u64)0x8;      /* ignore TLB cache disable */
>
> -               /* Handle McStatusWrEn */
> -               if (data & ~BIT_ULL(18)) {
> +               /*
> +                * Allow McStatusWrEn and TscFreqSel, some guests whine if they
> +                * aren't set.
> +                */

The whining is only about TscFreqSel. KVM actually supports the
functionality of McStatusWrEn (i.e. allows the guest to write to the
MCi_STATUS MSRs).

How about...

/*
* Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2 through
* at least v6.6 whine if TscFreqSel is clear, depending on F/M/S.
*/

> +               if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
>                         kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>                         return 1;
>                 }
> +
> +               /* TscFreqSel is architecturally read-only, writes are ignored */

This isn't true. TscFreqSel is not architectural at all. On Family
10h, per https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf,
it was R/W and powered on as 0. In Family 15h, one of the "changes
relative to Family 10H Revision D processors," per
https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/42301_15h_Mod_00h-0Fh_BKDG.pdf,
was:

• MSRC001_0015 [Hardware Configuration (HWCR)]:
  • Dropped TscFreqSel; TSC can no longer be selected to run at NB P0-state.

Despite the "Dropped" above, that same document later describes
HWCR[bit 24] as follows:

TscFreqSel: TSC frequency select. Read-only. Reset: 1. 1=The TSC
increments at the P0 frequency

So, perhaps this block of code can just be dropped? Who really cares
if the guest changes the value (unless it goes on to kexec a new
kernel, which will whine about the bit being clear)?

> +               if (!msr_info->host_initiated)
> +                       data = ~(data & BIT_ULL(24)) |
> +                              (vcpu->arch.msr_hwcr & BIT_ULL(24));
>                 vcpu->arch.msr_hwcr = data;
>                 break;
>         case MSR_FAM10H_MMIO_CONF_BASE:
>
> base-commit: 831ee29a0d4a2219d30268f9fc577217d222e339
> --
>
Sean Christopherson Sept. 29, 2023, 4:28 p.m. UTC | #3
On Fri, Sep 29, 2023, Jim Mattson wrote:
> On Thu, Sep 28, 2023 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote:
> > ---
> >  arch/x86/kvm/x86.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 791a644dd481..4dd64d359142 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3700,11 +3700,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> >                 data &= ~(u64)0x8;      /* ignore TLB cache disable */
> >
> > -               /* Handle McStatusWrEn */
> > -               if (data & ~BIT_ULL(18)) {
> > +               /*
> > +                * Allow McStatusWrEn and TscFreqSel, some guests whine if they
> > +                * aren't set.
> > +                */
> 
> The whining is only about TscFreqSel. KVM actually supports the
> functionality of McStatusWrEn (i.e. allows the guest to write to the
> MCi_STATUS MSRs).

Ugh, I missed the usage can_set_mci_status().  Stupid open coded bit numbers.

> How about...
> 
> /*
> * Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2 through
> * at least v6.6 whine if TscFreqSel is clear, depending on F/M/S.
> */
> 
> > +               if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
> >                         kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> >                         return 1;
> >                 }
> > +
> > +               /* TscFreqSel is architecturally read-only, writes are ignored */
> 
> This isn't true. TscFreqSel is not architectural at all.

Doh, right, the whole source of this mess is the uarch specific nature of the MSR.

> On Family
> 10h, per https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf,
> it was R/W and powered on as 0. In Family 15h, one of the "changes
> relative to Family 10H Revision D processors," per
> https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/42301_15h_Mod_00h-0Fh_BKDG.pdf,
> was:
> 
> • MSRC001_0015 [Hardware Configuration (HWCR)]:
>   • Dropped TscFreqSel; TSC can no longer be selected to run at NB P0-state.
> 
> Despite the "Dropped" above, that same document later describes
> HWCR[bit 24] as follows:
> 
> TscFreqSel: TSC frequency select. Read-only. Reset: 1. 1=The TSC
> increments at the P0 frequency
> 
> So, perhaps this block of code can just be dropped? Who really cares
> if the guest changes the value (unless it goes on to kexec a new
> kernel, which will whine about the bit being clear)?

Works for me.  If userspace wants to precisely emulate model specific behavior,
then userspace can darn well intercept writes to the MSR.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a323cae219c..9209fc0d1a51 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3700,11 +3700,26 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
 		data &= ~(u64)0x8;	/* ignore TLB cache disable */
 
-		/* Handle McStatusWrEn */
-		if (data & ~BIT_ULL(18)) {
+		/*
+		 * Ignore guest attempts to set TscFreqSel.
+		 */
+		if (!msr_info->host_initiated)
+			data &= ~BIT_ULL(24);
+
+		/*
+		 * Allow McStatusWrEn and (from the host) TscFreqSel.
+		 */
+		if (data & ~(BIT_ULL(18) | BIT_ULL(24))) {
 			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
 			return 1;
 		}
+
+		/*
+		 * TscFreqSel is read-only from within the
+		 * guest. Attempts to clear it are ignored.
+		 */
+		if (!msr_info->host_initiated)
+			data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
 		vcpu->arch.msr_hwcr = data;
 		break;
 	case MSR_FAM10H_MMIO_CONF_BASE: