mbox series

[0/2] KVM: X86: Make bus lock frequency for vapic timer configurable

Message ID cover.1699383993.git.isaku.yamahata@intel.com (mailing list archive)
Headers show
Series KVM: X86: Make bus lock frequency for vapic timer configurable | expand

Message

Isaku Yamahata Nov. 7, 2023, 7:22 p.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
frequency.  When using this capability, the user space VMM should configure
CPUID[0x15] to advertise the frequency.

TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
APIC timer emulation uses hrtimer, whose unit is nanosecond.

There are options to reconcile the mismatch.  1) Make apic bus clock frequency
configurable (this patch).  2) TDX KVM code adjusts TMICT value.  This is hacky
and it results in losing MSB bits from 32 bit width to 30 bit width.  3). Make
the guest kernel use tsc deadline timer instead of acpi oneshot/periodic timer.
This is guest kernel choice.  It's out of control of VMM.

[1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/

Isaku Yamahata (2):
  KVM: x86: Make the hardcoded APIC bus frequency vm variable
  KVM: X86: Add a capability to configure bus frequency for APIC timer

 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/hyperv.c           |  2 +-
 arch/x86/kvm/lapic.c            |  6 ++++--
 arch/x86/kvm/lapic.h            |  4 ++--
 arch/x86/kvm/x86.c              | 16 ++++++++++++++++
 include/uapi/linux/kvm.h        |  1 +
 6 files changed, 26 insertions(+), 5 deletions(-)

Comments

Isaku Yamahata Nov. 7, 2023, 7:29 p.m. UTC | #1
I meant bus clock frequency, not bus lock. Sorry for typo.

On Tue, Nov 07, 2023 at 11:22:32AM -0800,
isaku.yamahata@intel.com wrote:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
> frequency.  When using this capability, the user space VMM should configure
> CPUID[0x15] to advertise the frequency.
> 
> TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
> causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> APIC timer emulation uses hrtimer, whose unit is nanosecond.
> 
> There are options to reconcile the mismatch.  1) Make apic bus clock frequency
> configurable (this patch).  2) TDX KVM code adjusts TMICT value.  This is hacky
> and it results in losing MSB bits from 32 bit width to 30 bit width.  3). Make
> the guest kernel use tsc deadline timer instead of acpi oneshot/periodic timer.
> This is guest kernel choice.  It's out of control of VMM.
> 
> [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> 
> Isaku Yamahata (2):
>   KVM: x86: Make the hardcoded APIC bus frequency vm variable
>   KVM: X86: Add a capability to configure bus frequency for APIC timer
> 
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/hyperv.c           |  2 +-
>  arch/x86/kvm/lapic.c            |  6 ++++--
>  arch/x86/kvm/lapic.h            |  4 ++--
>  arch/x86/kvm/x86.c              | 16 ++++++++++++++++
>  include/uapi/linux/kvm.h        |  1 +
>  6 files changed, 26 insertions(+), 5 deletions(-)
> 
> -- 
> 2.25.1
>
Jim Mattson Nov. 7, 2023, 8:03 p.m. UTC | #2
On Tue, Nov 7, 2023 at 11:29 AM Isaku Yamahata
<isaku.yamahata@linux.intel.com> wrote:
>
> I meant bus clock frequency, not bus lock. Sorry for typo.
>
> On Tue, Nov 07, 2023 at 11:22:32AM -0800,
> isaku.yamahata@intel.com wrote:
>
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> > crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> > KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
> > frequency.  When using this capability, the user space VMM should configure
> > CPUID[0x15] to advertise the frequency.
> >
> > TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> > x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
> > causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> > APIC timer emulation uses hrtimer, whose unit is nanosecond.
> >
> > There are options to reconcile the mismatch.  1) Make apic bus clock frequency
> > configurable (this patch).  2) TDX KVM code adjusts TMICT value.  This is hacky
> > and it results in losing MSB bits from 32 bit width to 30 bit width.  3). Make
> > the guest kernel use tsc deadline timer instead of acpi oneshot/periodic timer.
> > This is guest kernel choice.  It's out of control of VMM.
> >
> > [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> >
> > Isaku Yamahata (2):
> >   KVM: x86: Make the hardcoded APIC bus frequency vm variable
> >   KVM: X86: Add a capability to configure bus frequency for APIC timer

I think I know the answer, but do you have any tests for this new feature?
Isaku Yamahata Nov. 8, 2023, 11:54 p.m. UTC | #3
On Tue, Nov 07, 2023 at 12:03:35PM -0800,
Jim Mattson <jmattson@google.com> wrote:

> On Tue, Nov 7, 2023 at 11:29 AM Isaku Yamahata
> <isaku.yamahata@linux.intel.com> wrote:
> >
> > I meant bus clock frequency, not bus lock. Sorry for typo.
> >
> > On Tue, Nov 07, 2023 at 11:22:32AM -0800,
> > isaku.yamahata@intel.com wrote:
> >
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > >
> > > Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> > > crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> > > KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
> > > frequency.  When using this capability, the user space VMM should configure
> > > CPUID[0x15] to advertise the frequency.
> > >
> > > TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> > > x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
> > > causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> > > APIC timer emulation uses hrtimer, whose unit is nanosecond.
> > >
> > > There are options to reconcile the mismatch.  1) Make apic bus clock frequency
> > > configurable (this patch).  2) TDX KVM code adjusts TMICT value.  This is hacky
> > > and it results in losing MSB bits from 32 bit width to 30 bit width.  3). Make
> > > the guest kernel use tsc deadline timer instead of acpi oneshot/periodic timer.
> > > This is guest kernel choice.  It's out of control of VMM.
> > >
> > > [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> > >
> > > Isaku Yamahata (2):
> > >   KVM: x86: Make the hardcoded APIC bus frequency vm variable
> > >   KVM: X86: Add a capability to configure bus frequency for APIC timer
> 
> I think I know the answer, but do you have any tests for this new feature?

If you mean kvm kselftest, no.
I have
- TDX patched qemu
- kvm-unit-tests: test_apic_timer_one_shot() @ kvm-unit-tests/x86/apic.c
  TDX version is found at https://github.com/intel/kvm-unit-tests-tdx
  We're planning to upstream the changes for TDX

How far do we want to go?
- Run kvm-unit-tests with TDX. What I have right now.
- kvm-unit-tests: extend qemu for default VM case and update
  test_apic_timer_one_host()
- kselftest
  Right now kvm kselftest doesn't have test cases even for in-kernel IRQCHIP
  creation.
  - Add test cases to create/destory in-kernel IRQCHIP
  - Add KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREQUENCY_CONTROL) test cases
  - Test if apic timer frequency is as spcified.
Sean Christopherson Nov. 9, 2023, 3:55 p.m. UTC | #4
On Wed, Nov 08, 2023, Isaku Yamahata wrote:
> On Tue, Nov 07, 2023 at 12:03:35PM -0800, Jim Mattson <jmattson@google.com> wrote:
> > I think I know the answer, but do you have any tests for this new feature?
> 
> If you mean kvm kselftest, no.
> I have
> - TDX patched qemu
> - kvm-unit-tests: test_apic_timer_one_shot() @ kvm-unit-tests/x86/apic.c
>   TDX version is found at https://github.com/intel/kvm-unit-tests-tdx
>   We're planning to upstream the changes for TDX
> 
> How far do we want to go?
> - Run kvm-unit-tests with TDX. What I have right now.
> - kvm-unit-tests: extend qemu for default VM case and update
>   test_apic_timer_one_host()

Hrm, I'm not sure that we can do a whole lot for test_apic_timer_one_shot().  Or
rather, I'm not sure it's worth the effort to try and add coverage beyond what's
already there.

As for TDX, *if* we extend KUT, please don't make it depend on TDX.  Very few people
have access to TDX platforms and anything CoCo is pretty much guaranteed to be harder
to debug.

> - kselftest
>   Right now kvm kselftest doesn't have test cases even for in-kernel IRQCHIP
>   creation.

Selftests always create an in-kernel APIC.  And I think selftests are perfectly
suited to complement the coverage provided by KUT.  Specifically, the failure
scenario for this is that KVM emulates at 1Ghz whereas TDX advertises 25Mhz, i.e.
the test case we want is to verify that the APIC timer doesn't expire early.

There's no need for any APIC infrastructure, e.g. a selftest doesn't even need to
handle an interrupt.  Get the TSC frequency from KVM, program up an arbitrary APIC
bus clock frequency, set TMICT such that it expires waaaay in the future, and then
verify that the APIC timer counts reasonably close to the programmed frequency.
E.g. if the test sets the bus clock to 25Mhz, the "drift" due to KVM counting at
1Ghz should be super obvious.

LOL, side topic, KUT has this:

	/*
	 * For LVT Timer clock, SDM vol 3 10.5.4 says it should be
	 * derived from processor's bus clock (IIUC which is the same  <======
	 * as TSC), however QEMU seems to be using nanosecond. In all
	 * cases, the following should satisfy on all modern
	 * processors.
	 */
	report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
	       "APIC LVT timer one shot");
Isaku Yamahata Nov. 10, 2023, 12:29 a.m. UTC | #5
On Thu, Nov 09, 2023 at 07:55:45AM -0800,
Sean Christopherson <seanjc@google.com> wrote:

> On Wed, Nov 08, 2023, Isaku Yamahata wrote:
> > On Tue, Nov 07, 2023 at 12:03:35PM -0800, Jim Mattson <jmattson@google.com> wrote:
> > > I think I know the answer, but do you have any tests for this new feature?
> > 
> > If you mean kvm kselftest, no.
> > I have
> > - TDX patched qemu
> > - kvm-unit-tests: test_apic_timer_one_shot() @ kvm-unit-tests/x86/apic.c
> >   TDX version is found at https://github.com/intel/kvm-unit-tests-tdx
> >   We're planning to upstream the changes for TDX
> > 
> > How far do we want to go?
> > - Run kvm-unit-tests with TDX. What I have right now.
> > - kvm-unit-tests: extend qemu for default VM case and update
> >   test_apic_timer_one_host()
> 
> Hrm, I'm not sure that we can do a whole lot for test_apic_timer_one_shot().  Or
> rather, I'm not sure it's worth the effort to try and add coverage beyond what's
> already there.
> 
> As for TDX, *if* we extend KUT, please don't make it depend on TDX.  Very few people
> have access to TDX platforms and anything CoCo is pretty much guaranteed to be harder
> to debug.

It made the test cases work with TDX + UEFI bios by adjusting command line to
invoke qemu.  And skip unsuitable tests.
Maybe we can generalize the way to twist qemu command line.


> > - kselftest
> >   Right now kvm kselftest doesn't have test cases even for in-kernel IRQCHIP
> >   creation.
> 
> Selftests always create an in-kernel APIC.  And I think selftests are perfectly
> suited to complement the coverage provided by KUT.  Specifically, the failure
> scenario for this is that KVM emulates at 1Ghz whereas TDX advertises 25Mhz, i.e.
> the test case we want is to verify that the APIC timer doesn't expire early.
> 
> There's no need for any APIC infrastructure, e.g. a selftest doesn't even need to
> handle an interrupt.  Get the TSC frequency from KVM, program up an arbitrary APIC
> bus clock frequency, set TMICT such that it expires waaaay in the future, and then
> verify that the APIC timer counts reasonably close to the programmed frequency.
> E.g. if the test sets the bus clock to 25Mhz, the "drift" due to KVM counting at
> 1Ghz should be super obvious.

Oh, only check the register value without interrupt. Good idea.  Let me give it
a try.