Message ID | 20250320142022.766201-4-seanjc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: x86: Add a module param for device posted IRQs | expand |
On Thu, Mar 20, 2025 at 7:31 AM Sean Christopherson <seanjc@google.com> wrote: > > Add a module param to allow disabling device posted interrupts without > having to sacrifice all of APICv/AVIC, and to also effectively enumerate > to userspace whether or not KVM may be utilizing device posted IRQs. > Disabling device posted interrupts is very desirable for testing, and can > even be desirable for production environments, e.g. if the host kernel > wants to interpose on device interrupts. Are you referring to CONFIG_X86_POSTED_MSI, or something else that doesn't exist yet?
On Thu, Mar 20, 2025, Jim Mattson wrote: > On Thu, Mar 20, 2025 at 7:31 AM Sean Christopherson <seanjc@google.com> wrote: > > > > Add a module param to allow disabling device posted interrupts without > > having to sacrifice all of APICv/AVIC, and to also effectively enumerate > > to userspace whether or not KVM may be utilizing device posted IRQs. > > Disabling device posted interrupts is very desirable for testing, and can > > even be desirable for production environments, e.g. if the host kernel > > wants to interpose on device interrupts. > > Are you referring to CONFIG_X86_POSTED_MSI, or something else that > doesn't exist yet? Yeah, that, and/or out-of-tree hackery to do similar coalescing (or ratelimiting).
On Thu, Mar 20, 2025, Sean Christopherson wrote: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index f76d655dc9a8..e7eb2198db26 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -227,6 +227,10 @@ EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); > bool __read_mostly enable_apicv = true; > EXPORT_SYMBOL_GPL(enable_apicv); > > +bool __read_mostly enable_device_posted_irqs = true; > +module_param(enable_device_posted_irqs, bool, 0444); > +EXPORT_SYMBOL_GPL(enable_device_posted_irqs); > + > const struct _kvm_stats_desc kvm_vm_stats_desc[] = { > KVM_GENERIC_VM_STATS(), > STATS_DESC_COUNTER(VM, mmu_shadow_zapped), > @@ -9772,6 +9776,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > if (r != 0) > goto out_mmu_exit; > > + enable_device_posted_irqs &= enable_apicv && > + irq_remapping_cap(IRQ_POSTING_CAP); Drat, this is flawed. Putting the module param in kvm.ko means that loading kvm.ko with enable_device_posted_irqs=true, but a vendor module with APICv/AVIC disabled, leaves enable_device_posted_irqs disabled for the lifetime of kvm.ko. I.e. reloading the vendor module with APICv/AVIC enabled can't enable device posted IRQs. Option #1 is to do what we do for enable_mmio_caching, and snapshot userspace's desire. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e7eb2198db26..c84ad9109108 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -228,6 +228,7 @@ bool __read_mostly enable_apicv = true; EXPORT_SYMBOL_GPL(enable_apicv); bool __read_mostly enable_device_posted_irqs = true; +bool __ro_after_init allow_device_posted_irqs; module_param(enable_device_posted_irqs, bool, 0444); EXPORT_SYMBOL_GPL(enable_device_posted_irqs); @@ -9776,8 +9777,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r != 0) goto out_mmu_exit; - enable_device_posted_irqs &= enable_apicv && - irq_remapping_cap(IRQ_POSTING_CAP); + enable_device_posted_irqs = allow_device_posted_irqs && enable_apicv && + irq_remapping_cap(IRQ_POSTING_CAP); kvm_ops_update(ops); @@ -14033,6 +14034,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); static int __init kvm_x86_init(void) { + allow_device_posted_irqs = enable_device_posted_irqs; + kvm_init_xstate_sizes(); kvm_mmu_x86_module_init(); Option #2 is to shove the module param into vendor code, but leave the variable in kvm.ko, like we do for enable_apicv. I'm leaning toward option #2, as it's more flexible, arguably more intuitive, and doesn't prevent putting the logic in kvm_x86_vendor_init().
On Thu, Mar 20, 2025 at 10:59:19AM -0700, Sean Christopherson wrote: > On Thu, Mar 20, 2025, Sean Christopherson wrote: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index f76d655dc9a8..e7eb2198db26 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -227,6 +227,10 @@ EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); > > bool __read_mostly enable_apicv = true; > > EXPORT_SYMBOL_GPL(enable_apicv); > > > > +bool __read_mostly enable_device_posted_irqs = true; > > +module_param(enable_device_posted_irqs, bool, 0444); > > +EXPORT_SYMBOL_GPL(enable_device_posted_irqs); > > + > > const struct _kvm_stats_desc kvm_vm_stats_desc[] = { > > KVM_GENERIC_VM_STATS(), > > STATS_DESC_COUNTER(VM, mmu_shadow_zapped), > > @@ -9772,6 +9776,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > > if (r != 0) > > goto out_mmu_exit; > > > > + enable_device_posted_irqs &= enable_apicv && > > + irq_remapping_cap(IRQ_POSTING_CAP); > > Drat, this is flawed. Putting the module param in kvm.ko means that loading > kvm.ko with enable_device_posted_irqs=true, but a vendor module with APICv/AVIC > disabled, leaves enable_device_posted_irqs disabled for the lifetime of kvm.ko. > I.e. reloading the vendor module with APICv/AVIC enabled can't enable device > posted IRQs. > > Option #1 is to do what we do for enable_mmio_caching, and snapshot userspace's > desire. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e7eb2198db26..c84ad9109108 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -228,6 +228,7 @@ bool __read_mostly enable_apicv = true; > EXPORT_SYMBOL_GPL(enable_apicv); > > bool __read_mostly enable_device_posted_irqs = true; > +bool __ro_after_init allow_device_posted_irqs; > module_param(enable_device_posted_irqs, bool, 0444); > EXPORT_SYMBOL_GPL(enable_device_posted_irqs); > > @@ -9776,8 +9777,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > if (r != 0) > goto out_mmu_exit; > > - enable_device_posted_irqs &= enable_apicv && > - irq_remapping_cap(IRQ_POSTING_CAP); > + enable_device_posted_irqs = allow_device_posted_irqs && enable_apicv && > + irq_remapping_cap(IRQ_POSTING_CAP); > > kvm_ops_update(ops); > > @@ -14033,6 +14034,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); > > static int __init kvm_x86_init(void) > { > + allow_device_posted_irqs = enable_device_posted_irqs; > + > kvm_init_xstate_sizes(); > > kvm_mmu_x86_module_init(); > > > Option #2 is to shove the module param into vendor code, but leave the variable > in kvm.ko, like we do for enable_apicv. > > I'm leaning toward option #2, as it's more flexible, arguably more intuitive, and > doesn't prevent putting the logic in kvm_x86_vendor_init(). +1, option #1 seems a bit confusing to me.
On Thu, Mar 20, 2025 at 10:59:19AM -0700, Sean Christopherson wrote: >On Thu, Mar 20, 2025, Sean Christopherson wrote: >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index f76d655dc9a8..e7eb2198db26 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -227,6 +227,10 @@ EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); >> bool __read_mostly enable_apicv = true; >> EXPORT_SYMBOL_GPL(enable_apicv); >> >> +bool __read_mostly enable_device_posted_irqs = true; >> +module_param(enable_device_posted_irqs, bool, 0444); >> +EXPORT_SYMBOL_GPL(enable_device_posted_irqs); can this variable be declared as static? >> + >> const struct _kvm_stats_desc kvm_vm_stats_desc[] = { >> KVM_GENERIC_VM_STATS(), >> STATS_DESC_COUNTER(VM, mmu_shadow_zapped), >> @@ -9772,6 +9776,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) >> if (r != 0) >> goto out_mmu_exit; >> >> + enable_device_posted_irqs &= enable_apicv && >> + irq_remapping_cap(IRQ_POSTING_CAP); > >Drat, this is flawed. Putting the module param in kvm.ko means that loading >kvm.ko with enable_device_posted_irqs=true, but a vendor module with APICv/AVIC >disabled, leaves enable_device_posted_irqs disabled for the lifetime of kvm.ko. >I.e. reloading the vendor module with APICv/AVIC enabled can't enable device >posted IRQs. > >Option #1 is to do what we do for enable_mmio_caching, and snapshot userspace's >desire. > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index e7eb2198db26..c84ad9109108 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -228,6 +228,7 @@ bool __read_mostly enable_apicv = true; > EXPORT_SYMBOL_GPL(enable_apicv); > > bool __read_mostly enable_device_posted_irqs = true; >+bool __ro_after_init allow_device_posted_irqs; > module_param(enable_device_posted_irqs, bool, 0444); > EXPORT_SYMBOL_GPL(enable_device_posted_irqs); > >@@ -9776,8 +9777,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > if (r != 0) > goto out_mmu_exit; > >- enable_device_posted_irqs &= enable_apicv && >- irq_remapping_cap(IRQ_POSTING_CAP); >+ enable_device_posted_irqs = allow_device_posted_irqs && enable_apicv && >+ irq_remapping_cap(IRQ_POSTING_CAP); Can we simply drop this ... > > kvm_ops_update(ops); > >@@ -14033,6 +14034,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); > > static int __init kvm_x86_init(void) > { >+ allow_device_posted_irqs = enable_device_posted_irqs; >+ > kvm_init_xstate_sizes(); > > kvm_mmu_x86_module_init(); > > >Option #2 is to shove the module param into vendor code, but leave the variable >in kvm.ko, like we do for enable_apicv. > >I'm leaning toward option #2, as it's more flexible, arguably more intuitive, and >doesn't prevent putting the logic in kvm_x86_vendor_init(). > and do bool kvm_arch_has_irq_bypass(void) { return enable_device_posted_irqs && enable_apicv && irq_remapping_cap(IRQ_POSTING_CAP); }
On Fri, Mar 21, 2025, Chao Gao wrote: > On Thu, Mar 20, 2025 at 10:59:19AM -0700, Sean Christopherson wrote: > >@@ -9776,8 +9777,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > > if (r != 0) > > goto out_mmu_exit; > > > >- enable_device_posted_irqs &= enable_apicv && > >- irq_remapping_cap(IRQ_POSTING_CAP); > >+ enable_device_posted_irqs = allow_device_posted_irqs && enable_apicv && > >+ irq_remapping_cap(IRQ_POSTING_CAP); > > Can we simply drop this ... > > > > > kvm_ops_update(ops); > > > >@@ -14033,6 +14034,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); > > > > static int __init kvm_x86_init(void) > > { > >+ allow_device_posted_irqs = enable_device_posted_irqs; > >+ > > kvm_init_xstate_sizes(); > > > > kvm_mmu_x86_module_init(); > > > > > >Option #2 is to shove the module param into vendor code, but leave the variable > >in kvm.ko, like we do for enable_apicv. > > > >I'm leaning toward option #2, as it's more flexible, arguably more intuitive, and > >doesn't prevent putting the logic in kvm_x86_vendor_init(). > > > > and do > > bool kvm_arch_has_irq_bypass(void) > { > return enable_device_posted_irqs && enable_apicv && > irq_remapping_cap(IRQ_POSTING_CAP); > } That would avoid the vendor module issues, but it would result in allow_device_posted_irqs not reflecting the state of KVM. We could partially address that by having the variable incorporate irq_remapping_cap(IRQ_POSTING_CAP) but not enable_apicv, but that's still a bit funky. Given that enable_apicv already has the "variable in kvm.ko, module param in kvm-{amd,intel}.ko" behavior, and that I am planning on giving enable_ipiv the same treatment (long story), my strong vote is to go with option #2 as it's the most flexibile, most accurate, and consistent with existing knobs.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d881e7d276b1..bf11c5ee50cb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1922,6 +1922,7 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern bool __read_mostly allow_smaller_maxphyaddr; extern bool __read_mostly enable_apicv; +extern bool __read_mostly enable_device_posted_irqs; extern struct kvm_x86_ops kvm_x86_ops; #define kvm_x86_call(func) static_call(kvm_x86_##func) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f76d655dc9a8..e7eb2198db26 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -227,6 +227,10 @@ EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); bool __read_mostly enable_apicv = true; EXPORT_SYMBOL_GPL(enable_apicv); +bool __read_mostly enable_device_posted_irqs = true; +module_param(enable_device_posted_irqs, bool, 0444); +EXPORT_SYMBOL_GPL(enable_device_posted_irqs); + const struct _kvm_stats_desc kvm_vm_stats_desc[] = { KVM_GENERIC_VM_STATS(), STATS_DESC_COUNTER(VM, mmu_shadow_zapped), @@ -9772,6 +9776,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r != 0) goto out_mmu_exit; + enable_device_posted_irqs &= enable_apicv && + irq_remapping_cap(IRQ_POSTING_CAP); + kvm_ops_update(ops); for_each_online_cpu(cpu) { @@ -13552,7 +13559,7 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); bool kvm_arch_has_irq_bypass(void) { - return enable_apicv && irq_remapping_cap(IRQ_POSTING_CAP); + return enable_device_posted_irqs; } EXPORT_SYMBOL_GPL(kvm_arch_has_irq_bypass);
Add a module param to allow disabling device posted interrupts without having to sacrifice all of APICv/AVIC, and to also effectively enumerate to userspace whether or not KVM may be utilizing device posted IRQs. Disabling device posted interrupts is very desirable for testing, and can even be desirable for production environments, e.g. if the host kernel wants to interpose on device interrupts. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 9 ++++++++- 2 files changed, 9 insertions(+), 1 deletion(-)