mbox series

[v5,0/8] KVM: x86: Add CMCI and UCNA emulation

Message ID 20220610171134.772566-1-juew@google.com (mailing list archive)
Headers show
Series KVM: x86: Add CMCI and UCNA emulation | expand

Message

Jue Wang June 10, 2022, 5:11 p.m. UTC
This patch series implement emulation for Corrected Machine Check
Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
error injection.

UCNA errors signaled via CMCI allow a guest to be notified as soon as
uncorrectable memory errors get detected by some background threads,
e.g., threads that migrate guest memory across hosts or threads that
constantly scan system memory for errors [1].

Upon receiving UCNAs, guest kernel isolates the poisoned pages which may
still be free, preventing future accesses that may cause fatal MCEs.

1. https://lore.kernel.org/linux-mm/8eceffc0-01e8-2a55-6eb9-b26faa9e3caf@intel.com/t/


Patch 1-3 clean up KVM APIC LVT logic in preparation to adding APIC_LVTCMCI.

Patch 4 adds APIC_LVTCMCI emulation.

Patch 5 updates mce_banks to use array allocation api.

Patch 6 adds emulation for MSR_IA32_MCx_CTL2 registers that provide per
bank control of CMCI signaling.

Patch 7 enables MCG_CMCI_P and handles injected UCNA errors.

Patch 8 adds a KVM self test that validates UCNA injection and CMCI
emulation.

v5 changes
- Incorporate feedback from David Matlack <dmatlack@google.com>
- Rewrite the change log to be more concise and accurate.
- Removes several duplicated checks in UCNA injection code.
- Add test cases that validate CMCI emulation to self test.

v4 changes
- Incorporate feedback from David Matlack <dmatlack@google.com>
- Rewrite the change logs to be more descriptive.
- Add a KVM self test.

v3 changes
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split clean up to KVM APIC LVT logic to 3 patches.
- Put the clean up of mce_array allocation in a separate patch.
- Base the MCi_CTL2 register emulation on Sean's clean up and fix
series [2]
- Fix bugs around MCi_CTL2 register offset validation and the free of
mci_ctl2_banks array.
- Rewrite the change log with more details in architectural information
about CMCI, UCNA and MCG_CMCI_P.
- Fix various comments and wrapping style.

2. https://lore.kernel.org/lkml/20220512222716.4112548-1-seanjc@google.com/T/

v2 chanegs
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split the single patch into 4:
  1). clean up KVM APIC LVT logic
  2). add CMCI emulation to lapic
  3). add emulation of MSR_IA32_MCx_CTL2
  4). enable MCG_CMCI_P and handle injected UCNAs
- Fix various style issues.

Jue Wang (8):
  KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
  KVM: x86: Add APIC_LVTx() macro.
  KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to
    lapic.
  KVM: x86: Use kcalloc to allocate the mce_banks array.
  KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  KVM: x86: Enable CMCI capability by default and handle injected UCNA
    errors
  KVM: selftests: Add a self test for CMCI and UCNA emulations.

 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/kvm/lapic.c                          |  66 ++--
 arch/x86/kvm/lapic.h                          |  16 +-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            | 178 ++++++---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   1 +
 .../selftests/kvm/include/x86_64/mce.h        |  25 ++
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/ucna_injection_test.c          | 347 ++++++++++++++++++
 12 files changed, 573 insertions(+), 67 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c

Comments

Paolo Bonzini June 23, 2022, 5:25 p.m. UTC | #1
On 6/10/22 19:11, Jue Wang wrote:
> This patch series implement emulation for Corrected Machine Check
> Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> error injection.
> 
> UCNA errors signaled via CMCI allow a guest to be notified as soon as
> uncorrectable memory errors get detected by some background threads,
> e.g., threads that migrate guest memory across hosts or threads that
> constantly scan system memory for errors [1].
> 
> Upon receiving UCNAs, guest kernel isolates the poisoned pages which may
> still be free, preventing future accesses that may cause fatal MCEs.
> 
> 1. https://lore.kernel.org/linux-mm/8eceffc0-01e8-2a55-6eb9-b26faa9e3caf@intel.com/t/
> 
> 
> Patch 1-3 clean up KVM APIC LVT logic in preparation to adding APIC_LVTCMCI.
> 
> Patch 4 adds APIC_LVTCMCI emulation.
> 
> Patch 5 updates mce_banks to use array allocation api.
> 
> Patch 6 adds emulation for MSR_IA32_MCx_CTL2 registers that provide per
> bank control of CMCI signaling.
> 
> Patch 7 enables MCG_CMCI_P and handles injected UCNA errors.
> 
> Patch 8 adds a KVM self test that validates UCNA injection and CMCI
> emulation.
> 
> v5 changes
> - Incorporate feedback from David Matlack <dmatlack@google.com>
> - Rewrite the change log to be more concise and accurate.
> - Removes several duplicated checks in UCNA injection code.
> - Add test cases that validate CMCI emulation to self test.
> 
> v4 changes
> - Incorporate feedback from David Matlack <dmatlack@google.com>
> - Rewrite the change logs to be more descriptive.
> - Add a KVM self test.
> 
> v3 changes
> - Incorporate feedback from Sean Christopherson <seanjc@google.com>
> - Split clean up to KVM APIC LVT logic to 3 patches.
> - Put the clean up of mce_array allocation in a separate patch.
> - Base the MCi_CTL2 register emulation on Sean's clean up and fix
> series [2]
> - Fix bugs around MCi_CTL2 register offset validation and the free of
> mci_ctl2_banks array.
> - Rewrite the change log with more details in architectural information
> about CMCI, UCNA and MCG_CMCI_P.
> - Fix various comments and wrapping style.
> 
> 2. https://lore.kernel.org/lkml/20220512222716.4112548-1-seanjc@google.com/T/
> 
> v2 chanegs
> - Incorporate feedback from Sean Christopherson <seanjc@google.com>
> - Split the single patch into 4:
>    1). clean up KVM APIC LVT logic
>    2). add CMCI emulation to lapic
>    3). add emulation of MSR_IA32_MCx_CTL2
>    4). enable MCG_CMCI_P and handle injected UCNAs
> - Fix various style issues.
> 
> Jue Wang (8):
>    KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
>    KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
>    KVM: x86: Add APIC_LVTx() macro.
>    KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to
>      lapic.
>    KVM: x86: Use kcalloc to allocate the mce_banks array.
>    KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
>    KVM: x86: Enable CMCI capability by default and handle injected UCNA
>      errors
>    KVM: selftests: Add a self test for CMCI and UCNA emulations.
> 
>   arch/x86/include/asm/kvm_host.h               |   1 +
>   arch/x86/kvm/lapic.c                          |  66 ++--
>   arch/x86/kvm/lapic.h                          |  16 +-
>   arch/x86/kvm/vmx/vmx.c                        |   1 +
>   arch/x86/kvm/x86.c                            | 178 ++++++---
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/include/x86_64/apic.h       |   1 +
>   .../selftests/kvm/include/x86_64/mce.h        |  25 ++
>   .../selftests/kvm/include/x86_64/processor.h  |   1 +
>   .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
>   .../kvm/x86_64/ucna_injection_test.c          | 347 ++++++++++++++++++
>   12 files changed, 573 insertions(+), 67 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
>   create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> 

Queued, thanks.  The test of course required some changes to adapt to 
the new API.

Paolo
Xiaoyao Li June 30, 2022, 1:48 p.m. UTC | #2
On 6/11/2022 1:11 AM, Jue Wang wrote:
> This patch series implement emulation for Corrected Machine Check
> Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> error injection.
> 

It seems the main purpose of this series is to allow UCNA error 
injection and notify it to guest via CMCI.

But it doesn't emulate CMCI fully. E.g., guest's error threshold of 
MCi_CTL2 doesn't work. It's still controlled by host's value.
Jue Wang June 30, 2022, 2:05 p.m. UTC | #3
On Thu, Jun 30, 2022 at 6:48 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 6/11/2022 1:11 AM, Jue Wang wrote:
> > This patch series implement emulation for Corrected Machine Check
> > Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> > error injection.
> >
>
> It seems the main purpose of this series is to allow UCNA error
> injection and notify it to guest via CMCI.
>
> But it doesn't emulate CMCI fully. E.g., guest's error threshold of
> MCi_CTL2 doesn't work. It's still controlled by host's value.
>
Both of the above points are correct.

In fact, this series does not enable injecting corrected errors into a
KVM guest at all, for:

1. It is not necessary since corrected errors are transparent to
software executions (host / guest).
2. Corrected errors can be fully managed on the host side (monitor, re-act).
3. Prevent guest initiated row-hammer attack based on the injected
corrected errors feedback loop.

Thanks,
-Jue