diff mbox series

[2/2] KVM: X86: Add a capability to configure bus frequency for APIC timer

Message ID 70c2a2277f57b804c715c5b4b4aa0b3561ed6a4f.1699383993.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86: Make bus lock frequency for vapic timer configurable | expand

Commit Message

Isaku Yamahata Nov. 7, 2023, 7:22 p.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
frequency.  When using this capability, the user space VMM should configure
CPUID[0x15] to advertise the frequency.

TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
APIC timer emulation uses hrtimer, whose unit is nanosecond.  Make the
parameter configurable for conversion from the TMICT value to nanosecond.

This patch doesn't affect the TSC deadline timer emulation.  The TSC
deadline emulation path records its expiring TSC value and calculates the
expiring time in nanoseconds.  The APIC timer emulation path calculates the
TSC value from the TMICT register value and uses the TSC deadline timer
path.  This patch touches the APIC timer-specific code but doesn't touch
common logic.

[1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
Reported-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c       | 14 ++++++++++++++
 include/uapi/linux/kvm.h |  1 +
 2 files changed, 15 insertions(+)

Comments

Jim Mattson Nov. 7, 2023, 7:59 p.m. UTC | #1
On Tue, Nov 7, 2023 at 11:24 AM <isaku.yamahata@intel.com> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
Nit: FREQUENCY
> frequency.  When using this capability, the user space VMM should configure
> CPUID[0x15] to advertise the frequency.

Is it necessary to advertise the frequency in CPUID.15H? No hardware
that I know of has a 1 GHz crystal clock, but the current
implementation works fine without CPUID.15H.

> TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
Nit: frequency
> causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> APIC timer emulation uses hrtimer, whose unit is nanosecond.  Make the
> parameter configurable for conversion from the TMICT value to nanosecond.
>
> This patch doesn't affect the TSC deadline timer emulation.  The TSC
> deadline emulation path records its expiring TSC value and calculates the
> expiring time in nanoseconds.  The APIC timer emulation path calculates the
> TSC value from the TMICT register value and uses the TSC deadline timer
> path.  This patch touches the APIC timer-specific code but doesn't touch
> common logic.
>
> [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> Reported-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  arch/x86/kvm/x86.c       | 14 ++++++++++++++
>  include/uapi/linux/kvm.h |  1 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9f4991b3e2e..20849d2cd0e8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_ENABLE_CAP:
>         case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
>         case KVM_CAP_IRQFD_RESAMPLE:
> +       case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:

This capability should be documented in Documentation/virtual/kvm/api.txt.

>                 r = 1;
>                 break;
>         case KVM_CAP_EXIT_HYPERCALL:
> @@ -6616,6 +6617,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 }
>                 mutex_unlock(&kvm->lock);
>                 break;
> +       case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
> +               u64 bus_frequency = cap->args[0];
> +               u64 bus_cycle_ns;
> +

To avoid potentially bizarre behavior, perhaps we should disallow
changing the APIC bus frequency once a vCPU has been created?

> +               if (!bus_frequency)
> +                       return -EINVAL;
> +               bus_cycle_ns = 1000000000UL / bus_frequency;
> +               if (!bus_cycle_ns)
> +                       return -EINVAL;
> +               kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> +               kvm->arch.apic_bus_frequency = bus_frequency;
> +               return 0;

Should this be disallowed if !lapic_in_kernel?

> +       }
>         default:
>                 r = -EINVAL;
>                 break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 211b86de35ac..d74a057df173 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1201,6 +1201,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
>  #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
>  #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
> +#define KVM_CAP_X86_BUS_FREQUENCY_CONTROL 231
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.25.1
>
>
Isaku Yamahata Nov. 8, 2023, 11:41 p.m. UTC | #2
On Tue, Nov 07, 2023 at 11:59:35AM -0800,
Jim Mattson <jmattson@google.com> wrote:

> On Tue, Nov 7, 2023 at 11:24 AM <isaku.yamahata@intel.com> wrote:
> >
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> > crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> > KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
> Nit: FREQUENCY
> > frequency.  When using this capability, the user space VMM should configure
> > CPUID[0x15] to advertise the frequency.
> 
> Is it necessary to advertise the frequency in CPUID.15H? No hardware
> that I know of has a 1 GHz crystal clock, but the current
> implementation works fine without CPUID.15H.

It's not necessary. When the kernel can't determine the frequency based on
cpuid (or cpu model), it determines the frequency based on other known
frequency. e.g. TSC, cmos. I'll drop the sentence.


> > TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> > x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
> Nit: frequency
> > causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> > APIC timer emulation uses hrtimer, whose unit is nanosecond.  Make the
> > parameter configurable for conversion from the TMICT value to nanosecond.
> >
> > This patch doesn't affect the TSC deadline timer emulation.  The TSC
> > deadline emulation path records its expiring TSC value and calculates the
> > expiring time in nanoseconds.  The APIC timer emulation path calculates the
> > TSC value from the TMICT register value and uses the TSC deadline timer
> > path.  This patch touches the APIC timer-specific code but doesn't touch
> > common logic.
> >
> > [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> > Reported-by: Vishal Annapurve <vannapurve@google.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> >  arch/x86/kvm/x86.c       | 14 ++++++++++++++
> >  include/uapi/linux/kvm.h |  1 +
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a9f4991b3e2e..20849d2cd0e8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >         case KVM_CAP_ENABLE_CAP:
> >         case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
> >         case KVM_CAP_IRQFD_RESAMPLE:
> > +       case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
> 
> This capability should be documented in Documentation/virtual/kvm/api.txt.
> 
> >                 r = 1;
> >                 break;
> >         case KVM_CAP_EXIT_HYPERCALL:
> > @@ -6616,6 +6617,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 }
> >                 mutex_unlock(&kvm->lock);
> >                 break;
> > +       case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
> > +               u64 bus_frequency = cap->args[0];
> > +               u64 bus_cycle_ns;
> > +
> 
> To avoid potentially bizarre behavior, perhaps we should disallow
> changing the APIC bus frequency once a vCPU has been created?
> 
> > +               if (!bus_frequency)
> > +                       return -EINVAL;
> > +               bus_cycle_ns = 1000000000UL / bus_frequency;
> > +               if (!bus_cycle_ns)
> > +                       return -EINVAL;
> > +               kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> > +               kvm->arch.apic_bus_frequency = bus_frequency;
> > +               return 0;
> 
> Should this be disallowed if !lapic_in_kernel?

That makes sense. How about this?
It's difficult to check if vcpu has been created because vcpu may be destroyed.
Check if the vm has vcpus now instead.


diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7025b3751027..cc976df2651e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
 cause CPU stuck (due to event windows don't open up) and make the CPU
 unavailable to host or other VMs.
 
+7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
+--------------------------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is the value of apic bus clock frequency
+:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
+          frequency, or -ENXIO if virtual local APIC isn't enabled by
+          KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
+
+This capability sets the APIC bus clock frequency (or core crystal clock
+frequency) for kvm to emulate APIC in the kernel.  The default value is 1000000
+(1GHz).
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 20849d2cd0e8..388a9989ef7c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6626,9 +6626,25 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		bus_cycle_ns = 1000000000UL / bus_frequency;
 		if (!bus_cycle_ns)
 			return -EINVAL;
-		kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
-		kvm->arch.apic_bus_frequency = bus_frequency;
-		return 0;
+
+		r = 0;
+		mutex_lock(&kvm->lock);
+		/*
+		 * Don't allow to change the frequency dynamically during vcpu
+		 * running to avoid potentially bizarre behavior.
+		 */
+		if (kvm->created_vcpus)
+			r = -EBUSY;
+		/* This is for in-kernel vAPIC emulation. */
+		else if (!irqchip_in_kernel(kvm))
+			r = ENXIO;
+
+		if (!r) {
+			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+			kvm->arch.apic_bus_frequency = bus_frequency;
+		}
+		mutex_unlock(&kvm->lock);
+		return r;
 	}
 	default:
 		r = -EINVAL;
Yujie Liu Nov. 10, 2023, 5:37 a.m. UTC | #3
Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on kvm/queue]
[also build test ERROR on linus/master next-20231109]
[cannot apply to mst-vhost/linux-next kvm/linux-next v6.6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/isaku-yamahata-intel-com/KVM-x86-Make-the-hardcoded-APIC-bus-frequency-vm-variable/20231108-032736
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/70c2a2277f57b804c715c5b4b4aa0b3561ed6a4f.1699383993.git.isaku.yamahata%40intel.com
patch subject: [PATCH 2/2] KVM: X86: Add a capability to configure bus frequency for APIC timer
config: i386-buildonly-randconfig-002-20231109 (https://download.01.org/0day-ci/archive/20231110/202311100209.zIaZqZhg-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311100209.zIaZqZhg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/r/202311100209.zIaZqZhg-lkp@intel.com/

All errors (this is a 32-bit build, new ones prefixed by >>):

   ld: arch/x86/kvm/x86.o: in function `kvm_vm_ioctl_enable_cap':
>> x86.c:(.text+0x1265b): undefined reference to `__udivdi3'
Sean Christopherson Nov. 10, 2023, 2:42 p.m. UTC | #4
On Fri, Nov 10, 2023, kernel test robot wrote:
> Hi,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on kvm/queue]
> [also build test ERROR on linus/master next-20231109]
> [cannot apply to mst-vhost/linux-next kvm/linux-next v6.6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/isaku-yamahata-intel-com/KVM-x86-Make-the-hardcoded-APIC-bus-frequency-vm-variable/20231108-032736
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
> patch link:    https://lore.kernel.org/r/70c2a2277f57b804c715c5b4b4aa0b3561ed6a4f.1699383993.git.isaku.yamahata%40intel.com
> patch subject: [PATCH 2/2] KVM: X86: Add a capability to configure bus frequency for APIC timer
> config: i386-buildonly-randconfig-002-20231109 (https://download.01.org/0day-ci/archive/20231110/202311100209.zIaZqZhg-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311100209.zIaZqZhg-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <yujie.liu@intel.com>
> | Closes: https://lore.kernel.org/r/202311100209.zIaZqZhg-lkp@intel.com/
> 
> All errors (this is a 32-bit build, new ones prefixed by >>):
> 
>    ld: arch/x86/kvm/x86.o: in function `kvm_vm_ioctl_enable_cap':
> >> x86.c:(.text+0x1265b): undefined reference to `__udivdi3'

Heh, this inscrutable error is due to 64-bit division on 32-bit kernels.

	u64 bus_frequency = cap->args[0];
	u64 bus_cycle_ns;

	if (!bus_frequency)
		return -EINVAL;

	bus_cycle_ns = 1000000000UL / bus_frequency;  <========

I don't see any reason to allow 64-bit values, e.g. Intel's CPUID 0x15 only
supports a 32-bit frequency in Hz.  I.e. just truncate it to a u32.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9f4991b3e2e..20849d2cd0e8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4625,6 +4625,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ENABLE_CAP:
 	case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
 	case KVM_CAP_IRQFD_RESAMPLE:
+	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
 		r = 1;
 		break;
 	case KVM_CAP_EXIT_HYPERCALL:
@@ -6616,6 +6617,19 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
+		u64 bus_frequency = cap->args[0];
+		u64 bus_cycle_ns;
+
+		if (!bus_frequency)
+			return -EINVAL;
+		bus_cycle_ns = 1000000000UL / bus_frequency;
+		if (!bus_cycle_ns)
+			return -EINVAL;
+		kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+		kvm->arch.apic_bus_frequency = bus_frequency;
+		return 0;
+	}
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 211b86de35ac..d74a057df173 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1201,6 +1201,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
 #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
+#define KVM_CAP_X86_BUS_FREQUENCY_CONTROL 231
 
 #ifdef KVM_CAP_IRQ_ROUTING