mbox series

[GIT,PULL] KVM/x86 changes for Linux 6.12

Message ID 20240928153302.92406-1-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] KVM/x86 changes for Linux 6.12 | expand

Pull-request

https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

Message

Paolo Bonzini Sept. 28, 2024, 3:33 p.m. UTC
Linus,

The following changes since commit da3ea35007d0af457a0afc87e84fddaebc4e0b63:

  Linux 6.11-rc7 (2024-09-08 14:50:28 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to efbc6bd090f48ccf64f7a8dd5daea775821d57ec:

  Documentation: KVM: fix warning in "make htmldocs" (2024-09-27 11:45:50 -0400)

Apologize for the late pull request; all the traveling made things a
bit messy.  Also, we have a known regression here on ancient processors
and will fix it next week.

Paolo
----------------------------------------------------------------
x86:

* KVM currently invalidates the entirety of the page tables, not just
  those for the memslot being touched, when a memslot is moved or deleted.
  The former does not have particularly noticeable overhead, but Intel's
  TDX will require the guest to re-accept private pages if they are
  dropped from the secure EPT, which is a non starter.  Actually,
  the only reason why this is not already being done is a bug which
  was never fully investigated and caused VM instability with assigned
  GeForce GPUs, so allow userspace to opt into the new behavior.

* Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
  functionality that is on the horizon).

* Rework common MSR handling code to suppress errors on userspace accesses to
  unsupported-but-advertised MSRs.  This will allow removing (almost?) all of
  KVM's exemptions for userspace access to MSRs that shouldn't exist based on
  the vCPU model (the actual cleanup is non-trivial future work).

* Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
  64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
  stores the entire 64-bit value at the ICR offset.

* Fix a bug where KVM would fail to exit to userspace if one was triggered by
  a fastpath exit handler.

* Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
  there's already a pending wake event at the time of the exit.

* Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
  state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
  (the SHUTDOWN hits the VM altogether, not the nested guest)

* Overhaul the "unprotect and retry" logic to more precisely identify cases
  where retrying is actually helpful, and to harden all retry paths against
  putting the guest into an infinite retry loop.

* Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
  the shadow MMU.

* Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
  adding multi generation LRU support in KVM.

* Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
  i.e. when the CPU has already flushed the RSB.

* Trace the per-CPU host save area as a VMCB pointer to improve readability
  and cleanup the retrieval of the SEV-ES host save area.

* Remove unnecessary accounting of temporary nested VMCB related allocations.

* Set FINAL/PAGE in the page fault error code for EPT violations if and only
  if the GVA is valid.  If the GVA is NOT valid, there is no guest-side page
  table walk and so stuffing paging related metadata is nonsensical.

* Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
  emulating posted interrupt delivery to L2.

* Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.

* Harden eVMCS loading against an impossible NULL pointer deref (really truly
  should be impossible).

* Minor SGX fix and a cleanup.

* Misc cleanups

Generic:

* Register KVM's cpuhp and syscore callbacks when enabling virtualization in
  hardware, as the sole purpose of said callbacks is to disable and re-enable
  virtualization as needed.

* Enable virtualization when KVM is loaded, not right before the first VM
  is created.  Together with the previous change, this simplifies a
  lot the logic of the callbacks, because their very existence implies
  virtualization is enabled.

* Fix a bug that results in KVM prematurely exiting to userspace for coalesced
  MMIO/PIO in many cases, clean up the related code, and add a testcase.

* Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
  the gpa+len crosses a page boundary, which thankfully is guaranteed to not
  happen in the current code base.  Add WARNs in more helpers that read/write
  guest memory to detect similar bugs.

Selftests:

* Fix a goof that caused some Hyper-V tests to be skipped when run on bare
  metal, i.e. NOT in a VM.

* Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.

* Explicitly include one-off assets in .gitignore.  Past Sean was completely
  wrong about not being able to detect missing .gitignore entries.

* Verify userspace single-stepping works when KVM happens to handle a VM-Exit
  in its fastpath.

* Misc cleanups

----------------------------------------------------------------
Amit Shah (1):
      KVM: SVM: let alternatives handle the cases when RSB filling is required

Christoph Schlameuss (7):
      selftests: kvm: s390: Define page sizes in shared header
      selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace tests
      selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
      selftests: kvm: s390: Add test fixture and simple VM setup tests
      selftests: kvm: s390: Add debug print functions
      selftests: kvm: s390: Add VM run test case
      s390: Enable KVM_S390_UCONTROL config in debug_defconfig

Hariharan Mari (1):
      KVM: s390: Fix SORTL and DFLTCC instruction format error in __insn32_query

Ilias Stamatis (1):
      KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit

Kai Huang (2):
      KVM: VMX: Do not account for temporary memory allocation in ECREATE emulation
      KVM: VMX: Also clear SGX EDECCSSA in KVM CPU caps when SGX is disabled

Li Chen (1):
      KVM: x86: Use this_cpu_ptr() in kvm_user_return_msr_cpu_online

Maxim Levitsky (1):
      KVM: nVMX: Use vmx_segment_cache_clear() instead of open coded equivalent

Paolo Bonzini (12):
      Merge tag 'kvm-s390-next-6.12-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      Merge branch 'kvm-memslot-zap-quirk' into HEAD
      Merge branch 'kvm-redo-enable-virt' into HEAD
      Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-selftests-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-mmu-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEAD
      Documentation: KVM: fix warning in "make htmldocs"
      Merge remote-tracking branch 'origin/master' into HEAD

Peter Gonda (1):
      KVM: selftests: Add SEV-ES shutdown test

Qiang Liu (1):
      KVM: VMX: Modify the BUILD_BUG_ON_MSG of the 32-bit field in the vmcs_check16 function

Sean Christopherson (94):
      x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)
      x86/cpu: KVM: Move macro to encode PAT value to common header
      KVM: x86: Stuff vCPU's PAT with default value at RESET, not creation
      KVM: nVMX: Add a helper to encode VMCS info in MSR_IA32_VMX_BASIC
      KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h
      KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit
      KVM: x86/mmu: Clean up function comments for dirty logging APIs
      KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value
      KVM: x86: Move MSR_TYPE_{R,W,RW} values from VMX to x86, as enums
      KVM: x86: Rename KVM_MSR_RET_INVALID to KVM_MSR_RET_UNSUPPORTED
      KVM: x86: Refactor kvm_x86_ops.get_msr_feature() to avoid kvm_msr_entry
      KVM: x86: Rename get_msr_feature() APIs to get_feature_msr()
      KVM: x86: Refactor kvm_get_feature_msr() to avoid struct kvm_msr_entry
      KVM: x86: Funnel all fancy MSR return value handling into a common helper
      KVM: x86: Hoist x86.c's global msr_* variables up above kvm_do_msr_access()
      KVM: x86: Suppress failures on userspace access to advertised, unsupported MSRs
      KVM: x86: Suppress userspace access failures on unsupported, "emulated" MSRs
      KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits
      KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()
      KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
      KVM: selftests: Open code vcpu_run() equivalent in guest_printf test
      KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts
      KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs
      KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled
      KVM: selftests: Test x2APIC ICR reserved bits
      KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote
      KVM: selftests: Play nice with AMD's AVIC errata
      KVM: selftests: Remove unused kvm_memcmp_hva_gva()
      KVM: selftests: Always unlink memory regions when deleting (VM free)
      KVM: x86/mmu: Decrease indentation in logic to sync new indirect shadow page
      KVM: x86/mmu: Drop pointless "return" wrapper label in FNAME(fetch)
      KVM: x86/mmu: Reword a misleading comment about checking gpte_changed()
      KVM: SVM: Add a helper to convert a SME-aware PA back to a struct page
      KVM: SVM: Add host SEV-ES save area structure into VMCB via a union
      KVM: SVM: Track the per-CPU host save area as a VMCB pointer
      KVM: selftests: Add a test for coalesced MMIO (and PIO on x86)
      KVM: Clean up coalesced MMIO ring full check
      KVM: selftests: Explicitly include committed one-off assets in .gitignore
      KVM: x86: Re-enter guest if WRMSR(X2APIC_ICR) fastpath is successful
      KVM: x86: Dedup fastpath MSR post-handling logic
      KVM: x86: Exit to userspace if fastpath triggers one on instruction skip
      KVM: x86: Reorganize code in x86.c to co-locate vCPU blocking/running helpers
      KVM: x86: Add fastpath handling of HLT VM-Exits
      KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
      KVM: Register cpuhp and syscore callbacks when enabling hardware
      KVM: Rename symbols related to enabling virtualization hardware
      KVM: Rename arch hooks related to per-CPU virtualization enabling
      KVM: MIPS: Rename virtualization {en,dis}abling APIs to match common KVM
      KVM: x86: Rename virtualization {en,dis}abling APIs to match common KVM
      KVM: Add a module param to allow enabling virtualization when KVM is loaded
      KVM: Add arch hooks for enabling/disabling virtualization
      x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
      KVM: x86: Register "emergency disable" callbacks when virt is enabled
      KVM: x86: Forcibly leave nested if RSM to L2 hits shutdown
      KVM: selftests: Verify single-stepping a fastpath VM-Exit exits to userspace
      KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API
      KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
      KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ
      KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection
      KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt()
      KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter
      KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes
      KVM: Write the per-page "segment" when clearing (part of) a guest page
      KVM: Harden guest memory APIs against out-of-bounds accesses
      KVM: x86/mmu: Replace PFERR_NESTED_GUEST_PAGE with a more descriptive helper
      KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults
      KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected
      KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped
      KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip
      KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry
      KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path
      KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs
      KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction()
      KVM: x86: Fold retry_instruction() into x86_emulate_instruction()
      KVM: x86/mmu: Don't try to unprotect an INVALID_GPA
      KVM: x86/mmu: Always walk guest PTEs with WRITE access when unprotecting
      KVM: x86/mmu: Move event re-injection unprotect+retry into common path
      KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation
      KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
      KVM: x86: Apply retry protection to "unprotect on failure" path
      KVM: x86: Update retry protection fields when forcing retry on emulation failure
      KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
      KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
      KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
      KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
      KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
      KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
      KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
      KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
      KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
      KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
      KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
      KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
      KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid

Tao Su (1):
      KVM: x86: Advertise AVX10.1 CPUID to userspace

Thorsten Blum (1):
      KVM: x86: Optimize local variable in start_sw_tscdeadline()

Vitaly Kuznetsov (3):
      KVM: VMX: hyper-v: Prevent impossible NULL pointer dereference in evmcs_load()
      KVM: selftests: Move Hyper-V specific functions out of processor.c
      KVM: selftests: Re-enable hyperv_evmcs/hyperv_svm_test on bare metal

Xin Li (5):
      KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.h
      KVM: VMX: Track CPU's MSR_IA32_VMX_BASIC as a single 64-bit value
      KVM: nVMX: Use macros and #defines in vmx_restore_vmx_basic()
      KVM: VMX: Open code VMX preemption timer rate mask in its accessor
      KVM: nVMX: Use macros and #defines in vmx_restore_vmx_misc()

Yan Zhao (4):
      KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
      KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled
      KVM: selftests: Allow slot modification stress test with quirk disabled
      KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled

Yongqiang Liu (1):
      KVM: SVM: Remove unnecessary GFP_KERNEL_ACCOUNT in svm_set_nested_state()

Yue Haibing (1):
      KVM: x86: Remove some unused declarations

 Documentation/admin-guide/kernel-parameters.txt    |   17 +
 Documentation/virt/kvm/api.rst                     |   31 +-
 Documentation/virt/kvm/locking.rst                 |   32 +-
 arch/arm64/kvm/arm.c                               |    6 +-
 arch/loongarch/kvm/main.c                          |    4 +-
 arch/mips/include/asm/kvm_host.h                   |    4 +-
 arch/mips/kvm/mips.c                               |    8 +-
 arch/mips/kvm/vz.c                                 |    8 +-
 arch/riscv/kvm/main.c                              |    4 +-
 arch/s390/configs/debug_defconfig                  |    1 +
 arch/s390/kvm/kvm-s390.c                           |   27 +-
 arch/x86/include/asm/cpuid.h                       |    1 +
 arch/x86/include/asm/kvm-x86-ops.h                 |    6 +-
 arch/x86/include/asm/kvm_host.h                    |   32 +-
 arch/x86/include/asm/msr-index.h                   |   34 +-
 arch/x86/include/asm/reboot.h                      |    2 +-
 arch/x86/include/asm/svm.h                         |   20 +-
 arch/x86/include/asm/vmx.h                         |   40 +-
 arch/x86/include/uapi/asm/kvm.h                    |    1 +
 arch/x86/kernel/cpu/mtrr/mtrr.c                    |    6 +
 arch/x86/kvm/cpuid.c                               |   30 +-
 arch/x86/kvm/irq.c                                 |   10 +-
 arch/x86/kvm/lapic.c                               |   84 +-
 arch/x86/kvm/lapic.h                               |    3 +-
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |  558 ++++++-----
 arch/x86/kvm/mmu/mmu_internal.h                    |    5 +-
 arch/x86/kvm/mmu/mmutrace.h                        |    1 +
 arch/x86/kvm/mmu/paging_tmpl.h                     |   63 +-
 arch/x86/kvm/mmu/tdp_mmu.c                         |    6 +-
 arch/x86/kvm/reverse_cpuid.h                       |    8 +
 arch/x86/kvm/smm.c                                 |   24 +-
 arch/x86/kvm/svm/nested.c                          |    4 +-
 arch/x86/kvm/svm/svm.c                             |   87 +-
 arch/x86/kvm/svm/svm.h                             |   18 +-
 arch/x86/kvm/svm/vmenter.S                         |    8 +-
 arch/x86/kvm/vmx/capabilities.h                    |   10 +-
 arch/x86/kvm/vmx/main.c                            |   10 +-
 arch/x86/kvm/vmx/nested.c                          |  134 ++-
 arch/x86/kvm/vmx/nested.h                          |    8 +-
 arch/x86/kvm/vmx/sgx.c                             |    2 +-
 arch/x86/kvm/vmx/vmx.c                             |   67 +-
 arch/x86/kvm/vmx/vmx.h                             |    9 +-
 arch/x86/kvm/vmx/vmx_onhyperv.h                    |    8 +
 arch/x86/kvm/vmx/vmx_ops.h                         |    2 +-
 arch/x86/kvm/vmx/x86_ops.h                         |    7 +-
 arch/x86/kvm/x86.c                                 | 1006 ++++++++++----------
 arch/x86/kvm/x86.h                                 |   31 +-
 arch/x86/mm/pat/memtype.c                          |   36 +-
 include/linux/kvm_host.h                           |   18 +-
 tools/testing/selftests/kvm/.gitignore             |    4 +
 tools/testing/selftests/kvm/Makefile               |    4 +
 tools/testing/selftests/kvm/coalesced_io_test.c    |  236 +++++
 tools/testing/selftests/kvm/guest_print_test.c     |   19 +-
 tools/testing/selftests/kvm/include/kvm_util.h     |   28 +-
 .../selftests/kvm/include/s390x/debug_print.h      |   69 ++
 .../selftests/kvm/include/s390x/processor.h        |    5 +
 tools/testing/selftests/kvm/include/s390x/sie.h    |  240 +++++
 tools/testing/selftests/kvm/include/x86_64/apic.h  |   23 +-
 .../testing/selftests/kvm/include/x86_64/hyperv.h  |   18 +
 .../selftests/kvm/include/x86_64/processor.h       |    7 +-
 tools/testing/selftests/kvm/lib/kvm_util.c         |   85 +-
 tools/testing/selftests/kvm/lib/s390x/processor.c  |   10 +-
 tools/testing/selftests/kvm/lib/x86_64/hyperv.c    |   67 ++
 tools/testing/selftests/kvm/lib/x86_64/processor.c |   69 +-
 .../kvm/memslot_modification_stress_test.c         |   19 +-
 tools/testing/selftests/kvm/memslot_perf_test.c    |   12 +-
 tools/testing/selftests/kvm/s390x/cmma_test.c      |    7 +-
 tools/testing/selftests/kvm/s390x/config           |    2 +
 tools/testing/selftests/kvm/s390x/debug_test.c     |    4 +-
 tools/testing/selftests/kvm/s390x/memop.c          |    4 +-
 tools/testing/selftests/kvm/s390x/tprot.c          |    5 +-
 tools/testing/selftests/kvm/s390x/ucontrol_test.c  |  332 +++++++
 .../testing/selftests/kvm/set_memory_region_test.c |   29 +-
 tools/testing/selftests/kvm/x86_64/debug_regs.c    |   11 +-
 tools/testing/selftests/kvm/x86_64/hyperv_evmcs.c  |    2 +-
 .../testing/selftests/kvm/x86_64/hyperv_svm_test.c |    2 +-
 .../testing/selftests/kvm/x86_64/sev_smoke_test.c  |   32 +
 .../selftests/kvm/x86_64/xapic_state_test.c        |   54 +-
 .../testing/selftests/kvm/x86_64/xen_vmcall_test.c |    1 +
 virt/kvm/coalesced_mmio.c                          |   31 +-
 virt/kvm/kvm_main.c                                |  281 +++---
 82 files changed, 2803 insertions(+), 1452 deletions(-)

Comments

Linus Torvalds Sept. 28, 2024, 4:30 p.m. UTC | #1
On Sat, 28 Sept 2024 at 08:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Apologize for the late pull request; all the traveling made things a
> bit messy.  Also, we have a known regression here on ancient processors
> and will fix it next week.

Gaah. Don't leave it hanging like that. When somebody reports a
problem, I need to know if it's this known one.

I've pulled it, but you really need to add a pointer to "look, this is
the known one, we have a fix in the works"

             Linus
pr-tracker-bot@kernel.org Sept. 28, 2024, 4:45 p.m. UTC | #2
The pull request you sent on Sat, 28 Sep 2024 11:33:02 -0400:

> https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3efc57369a0ce8f76bf0804f7e673982384e4ac9

Thank you!
Paolo Bonzini Sept. 28, 2024, 5:19 p.m. UTC | #3
On Sat, Sep 28, 2024 at 6:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 28 Sept 2024 at 08:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > Apologize for the late pull request; all the traveling made things a
> > bit messy.  Also, we have a known regression here on ancient processors
> > and will fix it next week.
>
> Gaah. Don't leave it hanging like that. When somebody reports a
> problem, I need to know if it's this known one.
> I've pulled it, but you really need to add a pointer to "look, this is
> the known one, we have a fix in the works"

If that's what you mean, it was not reported by users (and it's very
unlikely that it will, unless they run selftests on pre-2008
processors or with non-standard module parameters). It's a NULL
pointer dereference on VM shutdown, caused by the selftests added by
commit b4ed2c67d275 ("KVM: selftests: Test slot move/delete with slot
zap quirk enabled/disabled").

It's also not reproducible yet outside selftests since the bug is in a
new API; which is also why we didn't revert with prejudice and didn't
go too much into detail above.

Paolo
Linus Torvalds Sept. 29, 2024, 5:35 p.m. UTC | #4
On Sat, 28 Sept 2024 at 08:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Apologize for the late pull request; all the traveling made things a
> bit messy.  Also, we have a known regression here on ancient processors
> and will fix it next week.

.. actually, much worse than that, you have a build error.

  arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
  arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function
‘cpu_emergency_register_virt_callback’
[-Wimplicit-function-declaration]
  12517 |
cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
  arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function
‘cpu_emergency_unregister_virt_callback’
[-Wimplicit-function-declaration]
  12522 |
cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

which I hadn't noticed before, because I did just allmodconfig builds.

But with a smaller config, the above error happens.

The culprit is commit 590b09b1d88e ("KVM: x86: Register "emergency
disable" callbacks when virt is enabled"), and the reason seems to be
this:

  #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
  void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
  ...

ie if you have a config with KVM enabled, but neither KVM_INTEL nor
KVM_AMD set, you don't get that callback thing.

The fix may be something like the attached.

                   Linus
Paolo Bonzini Sept. 30, 2024, 9:59 a.m. UTC | #5
On Sun, Sep 29, 2024 at 7:36 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The culprit is commit 590b09b1d88e ("KVM: x86: Register "emergency
> disable" callbacks when virt is enabled"), and the reason seems to be
> this:
>
>   #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
>   void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
>   ...
>
> ie if you have a config with KVM enabled, but neither KVM_INTEL nor
> KVM_AMD set, you don't get that callback thing.
>
> The fix may be something like the attached.

Yeah, there was an attempt in commit 6d55a94222db ("x86/reboot:
Unconditionally define cpu_emergency_virt_cb typedef") but that only
covers the headers and the !CONFIG_KVM case; not the !CONFIG_KVM_INTEL
&& !CONFIG_KVM_AMD one that you stumbled upon.

Your fix is not wrong, but there's no point in compiling kvm.ko if
nobody is using it.

This is what I'll test more and submit:

------------------ 8< ------------------
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH] KVM: x86: leave kvm.ko out of the build if no vendor module is requested
     
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled.  In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
     
Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.
     
This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled.  The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4287a8071a3a..aee054a91031 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -17,8 +17,8 @@ menuconfig VIRTUALIZATION
  
  if VIRTUALIZATION
  
-config KVM
-	tristate "Kernel-based Virtual Machine (KVM) support"
+config KVM_X86_COMMON
+	def_tristate KVM if KVM_INTEL || KVM_AMD
  	depends on HIGH_RES_TIMERS
  	depends on X86_LOCAL_APIC
  	select KVM_COMMON
@@ -46,6 +47,9 @@ config KVM
  	select KVM_GENERIC_HARDWARE_ENABLING
  	select KVM_GENERIC_PRE_FAULT_MEMORY
  	select KVM_WERROR if WERROR
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
  	help
  	  Support hosting fully virtualized guest machines using hardware
  	  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 5494669a055a..4304c89d6b64 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -32,7 +32,7 @@ kvm-intel-y		+= vmx/vmx_onhyperv.o vmx/hyperv_evmcs.o
  kvm-amd-y		+= svm/svm_onhyperv.o
  endif
  
-obj-$(CONFIG_KVM)	+= kvm.o
+obj-$(CONFIG_KVM_X86_COMMON) += kvm.o
  obj-$(CONFIG_KVM_INTEL)	+= kvm-intel.o
  obj-$(CONFIG_KVM_AMD)	+= kvm-amd.o
  
------------------ 8< ------------------

On top of this, the CONFIG_KVM #ifdefs could be changed to either
CONFIG_KVM_X86_COMMON or (most of them) to CONFIG_KVM_INTEL; I started
cleaning up the Kconfigs a few months ago and it's time to finish it
off for 6.13.

Paolo
Sean Christopherson Sept. 30, 2024, 4:53 p.m. UTC | #6
On Mon, Sep 30, 2024, Paolo Bonzini wrote:
> On Sun, Sep 29, 2024 at 7:36 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > The culprit is commit 590b09b1d88e ("KVM: x86: Register "emergency
> > disable" callbacks when virt is enabled"), and the reason seems to be
> > this:
> > 
> >   #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> >   void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
> >   ...
> > 
> > ie if you have a config with KVM enabled, but neither KVM_INTEL nor
> > KVM_AMD set, you don't get that callback thing.
> > 
> > The fix may be something like the attached.
> 
> Yeah, there was an attempt in commit 6d55a94222db ("x86/reboot:
> Unconditionally define cpu_emergency_virt_cb typedef") but that only
> covers the headers and the !CONFIG_KVM case; not the !CONFIG_KVM_INTEL
> && !CONFIG_KVM_AMD one that you stumbled upon.
> 
> Your fix is not wrong, but there's no point in compiling kvm.ko if
> nobody is using it.
> 
> This is what I'll test more and submit:
> 
> ------------------ 8< ------------------
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH] KVM: x86: leave kvm.ko out of the build if no vendor module is requested
> kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
> It provides no functionality on its own and it is unnecessary unless one
> of the vendor-specific module is compiled.  In particular, /dev/kvm is
> not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
> Use CONFIG_KVM to decide if it is built-in or a module, but use the
> vendor-specific modules for the actual decision on whether to build it.
> This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
> are both disabled.  The cpu_emergency_register_virt_callback() function
> is called from kvm.ko, but it is only defined if at least one of
> CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 4287a8071a3a..aee054a91031 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -17,8 +17,8 @@ menuconfig VIRTUALIZATION
>  if VIRTUALIZATION
> -config KVM
> -	tristate "Kernel-based Virtual Machine (KVM) support"
> +config KVM_X86_COMMON
> +	def_tristate KVM if KVM_INTEL || KVM_AMD
>  	depends on HIGH_RES_TIMERS
>  	depends on X86_LOCAL_APIC
>  	select KVM_COMMON
> @@ -46,6 +47,9 @@ config KVM
>  	select KVM_GENERIC_HARDWARE_ENABLING
>  	select KVM_GENERIC_PRE_FAULT_MEMORY
>  	select KVM_WERROR if WERROR
> +
> +config KVM
> +	tristate "Kernel-based Virtual Machine (KVM) support"

I like the idea, but allowing users to select KVM=m|y but not building any parts
of KVM seems like it will lead to confusion.  What if we hide KVM entirely, and
autoselect m/y/n based on the vendor modules?  AFAICT, this behaves as expected.

Not having documentation for kvm.ko is unfortunate, but explaining how kvm.ko
interacts with kvm-{amd,intel}.ko probably belongs in Documentation/virt/kvm/?
anyways.

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 730c2f34d347..4350b83b63d8 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -18,7 +18,7 @@ menuconfig VIRTUALIZATION
 if VIRTUALIZATION
 
 config KVM
-       tristate "Kernel-based Virtual Machine (KVM) support"
+       def_tristate m if KVM_INTEL=m || KVM_AMD=m
        depends on X86_LOCAL_APIC
        select KVM_COMMON
        select KVM_GENERIC_MMU_NOTIFIER
@@ -45,19 +45,6 @@ config KVM
        select KVM_GENERIC_HARDWARE_ENABLING
        select KVM_GENERIC_PRE_FAULT_MEMORY
        select KVM_WERROR if WERROR
-       help
-         Support hosting fully virtualized guest machines using hardware
-         virtualization extensions.  You will need a fairly recent
-         processor equipped with virtualization extensions. You will also
-         need to select one or more of the processor modules below.
-
-         This module provides access to the hardware capabilities through
-         a character device node named /dev/kvm.
-
-         To compile this as a module, choose M here: the module
-         will be called kvm.
-
-         If unsure, say N.
 
 config KVM_WERROR
        bool "Compile KVM with -Werror"
@@ -88,7 +75,8 @@ config KVM_SW_PROTECTED_VM
 
 config KVM_INTEL
        tristate "KVM for Intel (and compatible) processors support"
-       depends on KVM && IA32_FEAT_CTL
+       depends on IA32_FEAT_CTL
+       select KVM if KVM_INTEL=y
        help
          Provides support for KVM on processors equipped with Intel's VT
          extensions, a.k.a. Virtual Machine Extensions (VMX).
@@ -125,7 +113,8 @@ config X86_SGX_KVM
 
 config KVM_AMD
        tristate "KVM for AMD processors support"
-       depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON)
+       depends on CPU_SUP_AMD || CPU_SUP_HYGON
+       select KVM if KVM_AMD=y
        help
          Provides support for KVM on AMD processors equipped with the AMD-V
          (SVM) extensions.


>  	help
>  	  Support hosting fully virtualized guest machines using hardware
>  	  virtualization extensions.  You will need a fairly recent
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 5494669a055a..4304c89d6b64 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -32,7 +32,7 @@ kvm-intel-y		+= vmx/vmx_onhyperv.o vmx/hyperv_evmcs.o
>  kvm-amd-y		+= svm/svm_onhyperv.o
>  endif
> -obj-$(CONFIG_KVM)	+= kvm.o
> +obj-$(CONFIG_KVM_X86_COMMON) += kvm.o
>  obj-$(CONFIG_KVM_INTEL)	+= kvm-intel.o
>  obj-$(CONFIG_KVM_AMD)	+= kvm-amd.o
> ------------------ 8< ------------------
> 
> On top of this, the CONFIG_KVM #ifdefs could be changed to either
> CONFIG_KVM_X86_COMMON or (most of them) to CONFIG_KVM_INTEL; I started
> cleaning up the Kconfigs a few months ago and it's time to finish it
> off for 6.13.

If you haven't already, can you also kill off KVM_COMMON?  The only usage is in
scripts/gdb/linux/constants.py.in, to print Intel's posted interrupt IRQs in
scripts/gdb/linux/interrupts.py, which is quite ridiculous.
Paolo Bonzini Sept. 30, 2024, 5:17 p.m. UTC | #7
On 9/30/24 18:53, Sean Christopherson wrote:
> On Mon, Sep 30, 2024, Paolo Bonzini wrote:
>> On Sun, Sep 29, 2024 at 7:36 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> The culprit is commit 590b09b1d88e ("KVM: x86: Register "emergency
>>> disable" callbacks when virt is enabled"), and the reason seems to be
>>> this:
>>>
>>>    #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
>>>    void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
>>>    ...
>>>
>>> ie if you have a config with KVM enabled, but neither KVM_INTEL nor
>>> KVM_AMD set, you don't get that callback thing.
>>>
>>> The fix may be something like the attached.
>>
>> Yeah, there was an attempt in commit 6d55a94222db ("x86/reboot:
>> Unconditionally define cpu_emergency_virt_cb typedef") but that only
>> covers the headers and the !CONFIG_KVM case; not the !CONFIG_KVM_INTEL
>> && !CONFIG_KVM_AMD one that you stumbled upon.
>>
>> Your fix is not wrong, but there's no point in compiling kvm.ko if
>> nobody is using it.
>>
>> This is what I'll test more and submit:
>>
>> ------------------ 8< ------------------
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Subject: [PATCH] KVM: x86: leave kvm.ko out of the build if no vendor module is requested
>> kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
>> It provides no functionality on its own and it is unnecessary unless one
>> of the vendor-specific module is compiled.  In particular, /dev/kvm is
>> not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
>> Use CONFIG_KVM to decide if it is built-in or a module, but use the
>> vendor-specific modules for the actual decision on whether to build it.
>> This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
>> are both disabled.  The cpu_emergency_register_virt_callback() function
>> is called from kvm.ko, but it is only defined if at least one of
>> CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index 4287a8071a3a..aee054a91031 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -17,8 +17,8 @@ menuconfig VIRTUALIZATION
>>   if VIRTUALIZATION
>> -config KVM
>> -	tristate "Kernel-based Virtual Machine (KVM) support"
>> +config KVM_X86_COMMON
>> +	def_tristate KVM if KVM_INTEL || KVM_AMD
>>   	depends on HIGH_RES_TIMERS
>>   	depends on X86_LOCAL_APIC
>>   	select KVM_COMMON
>> @@ -46,6 +47,9 @@ config KVM
>>   	select KVM_GENERIC_HARDWARE_ENABLING
>>   	select KVM_GENERIC_PRE_FAULT_MEMORY
>>   	select KVM_WERROR if WERROR
>> +
>> +config KVM
>> +	tristate "Kernel-based Virtual Machine (KVM) support"
> 
> I like the idea, but allowing users to select KVM=m|y but not building any parts
> of KVM seems like it will lead to confusion.  What if we hide KVM entirely, and
> autoselect m/y/n based on the vendor modules?  AFAICT, this behaves as expected.

Showing the KVM option is useful anyway as a grouping for other options 
(e.g. SW-protected VMs, Xen, etc.).  I can play with reordering 
everything and using "select" to group these options, but I doubt it
will be better/more user-friendly than the above minimal change.

And also...

> Not having documentation for kvm.ko is unfortunate, but explaining how kvm.ko
> interacts with kvm-{amd,intel}.ko probably belongs in Documentation/virt/kvm/?
> anyways.

... documentation changes can wait for 6.13 anyway, unlike fixing
the build (even if in a rare config that would mostly be hit by
randconfig).

> If you haven't already, can you also kill off KVM_COMMON?  The only usage is in
> scripts/gdb/linux/constants.py.in, to print Intel's posted interrupt IRQs in
> scripts/gdb/linux/interrupts.py, which is quite ridiculous.

Sure.

Paolo