Message ID | 20240227232100.478238-1-pbonzini@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | TDX/SNP part 1 of n, for 6.9 | expand |
On Tue, Feb 27, 2024, Paolo Bonzini wrote: > This is a first set of, hopefully non-controversial patches from the Heh, you jinxed yourself. :-) > SNP and TDX series. They cover mostly changes to generic code and new > gmem APIs, and in general have already been reviewed when posted by > Isaku and Michael. > > One important change is that the gmem hook for initializing memory > is designed to return -EEXIST if the page already exists in the > guestmemfd filemap. The idea is that the special case of > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > return an uninitialized page and make it guest-owned, can be be done at > most once per page unless the ioctl fails. > > Of course these patches add a bunch of dead code. This is intentional > because it's the only way to trim the large TDX (and to some extent SNP) > series to the point that it's possible to discuss them. The next step is > probably going to be the private<->shared page logic from the TDX series. > > Paolo > > Isaku Yamahata (5): > KVM: x86/mmu: Add Suppress VE bit to EPT > shadow_mmio_mask/shadow_present_mask > KVM: VMX: Introduce test mode related to EPT violation VE > KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at > allocation > KVM: x86/tdp_mmu: Sprinkle __must_check > KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults I have a slight tweak to this patch (drop truncation), and a rewritten changelog. > Michael Roth (2): > KVM: x86: Add gmem hook for invalidating memory > KVM: x86: Add gmem hook for determining max NPT mapping level > > Paolo Bonzini (6): > KVM: x86/mmu: pass error code back to MMU when async pf is ready > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results in false positives for SEV and SEV-ES guests[*]. I have a medium-sized series to add a KVM-defined synthetic flag, and clean up the related code (it also has my slight variation on the 64-bit error code patch). I'll post my series exactly as I have it, mostly so that I don't need to redo testing, but also because it's pretty much a drop-in replacement. This series applies cleanly on top, except for the two obvious conflicts. [*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com > KVM: guest_memfd: pass error up from filemap_grab_folio > filemap: add FGP_CREAT_ONLY > KVM: x86: Add gmem hook for initializing memory > KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn > > Sean Christopherson (7): > KVM: x86: Split core of hypercall emulation to helper function > KVM: Allow page-sized MMU caches to be initialized with custom 64-bit > values > KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE > KVM: x86/mmu: Track shadow MMIO value on a per-VM basis > KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed > SPTE > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > argument > > Tom Lendacky (1): > KVM: SEV: Use a VMSA physical address variable for populating VMCB > > arch/x86/include/asm/kvm-x86-ops.h | 3 + > arch/x86/include/asm/kvm_host.h | 12 + > arch/x86/include/asm/vmx.h | 13 + > arch/x86/kvm/Makefile | 2 +- > arch/x86/kvm/mmu.h | 1 + > arch/x86/kvm/mmu/mmu.c | 55 ++-- > arch/x86/kvm/mmu/mmu_internal.h | 6 +- > arch/x86/kvm/mmu/mmutrace.h | 2 +- > arch/x86/kvm/mmu/paging_tmpl.h | 4 +- > arch/x86/kvm/mmu/spte.c | 16 +- > arch/x86/kvm/mmu/spte.h | 21 +- > arch/x86/kvm/mmu/tdp_iter.h | 12 + > arch/x86/kvm/mmu/tdp_mmu.c | 74 +++-- > arch/x86/kvm/svm/sev.c | 3 +- > arch/x86/kvm/svm/svm.c | 9 +- > arch/x86/kvm/svm/svm.h | 1 + > arch/x86/kvm/vmx/main.c | 168 +++++++++++ > arch/x86/kvm/vmx/vmcs.h | 5 + > arch/x86/kvm/vmx/vmx.c | 460 +++++++++++------------------ > arch/x86/kvm/vmx/vmx.h | 6 +- > arch/x86/kvm/vmx/x86_ops.h | 124 ++++++++ > arch/x86/kvm/x86.c | 69 +++-- > include/linux/kvm_host.h | 25 ++ > include/linux/kvm_types.h | 1 + > include/linux/pagemap.h | 2 + > mm/filemap.c | 4 + > virt/kvm/Kconfig | 8 + > virt/kvm/guest_memfd.c | 120 +++++++- > virt/kvm/kvm_main.c | 16 +- > 29 files changed, 855 insertions(+), 387 deletions(-) > create mode 100644 arch/x86/kvm/vmx/main.c > create mode 100644 arch/x86/kvm/vmx/x86_ops.h > > -- > 2.39.0 >
I would strongly prefer we taret 6.10, not 6.9. The TDX and SNP folks don't need any of this code to be in Linus' tree, they just need it in _a_ KVM tree so that they can develop on top. And I will have limited availability the rest of this week (potentially very limited), and I obviously have strong opinions about some of this code. But even if I had cycles to review this properly, I just don't see a reason to rush it in. For the guest_memfd changes in particular, they're impossible to review in this series. Rather than prematurely shove them into mainline, we should create a volatile topic branch and use that to enable TDX/SNP development. That way we can fixup patches if things need to change. On Tue, Feb 27, 2024, Paolo Bonzini wrote: > This is a first set of, hopefully non-controversial patches from the > SNP and TDX series. They cover mostly changes to generic code and new > gmem APIs, and in general have already been reviewed when posted by > Isaku and Michael. > > One important change is that the gmem hook for initializing memory > is designed to return -EEXIST if the page already exists in the > guestmemfd filemap. The idea is that the special case of > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > return an uninitialized page and make it guest-owned, can be be done at > most once per page unless the ioctl fails. > > Of course these patches add a bunch of dead code. This is intentional > because it's the only way to trim the large TDX (and to some extent SNP) > series to the point that it's possible to discuss them. The next step is > probably going to be the private<->shared page logic from the TDX series.
On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Feb 27, 2024, Paolo Bonzini wrote: > > This is a first set of, hopefully non-controversial patches from the > > Heh, you jinxed yourself. :-) Well I > > SNP and TDX series. They cover mostly changes to generic code and new > > gmem APIs, and in general have already been reviewed when posted by > > Isaku and Michael. > > > > One important change is that the gmem hook for initializing memory > > is designed to return -EEXIST if the page already exists in the > > guestmemfd filemap. The idea is that the special case of > > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > > return an uninitialized page and make it guest-owned, can be be done at > > most once per page unless the ioctl fails. > > > > Of course these patches add a bunch of dead code. This is intentional > > because it's the only way to trim the large TDX (and to some extent SNP) > > series to the point that it's possible to discuss them. The next step is > > probably going to be the private<->shared page logic from the TDX series. > > > > Paolo > > > > Isaku Yamahata (5): > > KVM: x86/mmu: Add Suppress VE bit to EPT > > shadow_mmio_mask/shadow_present_mask > > KVM: VMX: Introduce test mode related to EPT violation VE > > KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at > > allocation > > KVM: x86/tdp_mmu: Sprinkle __must_check > > KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults > > I have a slight tweak to this patch (drop truncation), and a rewritten changelog. > > > Michael Roth (2): > > KVM: x86: Add gmem hook for invalidating memory > > KVM: x86: Add gmem hook for determining max NPT mapping level > > > > Paolo Bonzini (6): > > KVM: x86/mmu: pass error code back to MMU when async pf is ready > > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > in false positives for SEV and SEV-ES guests[*]. You didn't look at the patch did you? :) It does check for has_private_mem (alternatively I could have dropped the bit in SVM code for SEV and SEV-ES guests). > I have a medium-sized series to add a KVM-defined synthetic flag, and clean up > the related code (it also has my slight variation on the 64-bit error code patch). > > I'll post my series exactly as I have it, mostly so that I don't need to redo > testing, but also because it's pretty much a drop-in replacement. This series > applies cleanly on top, except for the two obvious conflicts. Ok, I will check it out. This is exactly why I posted these. Paolo > [*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com > > > KVM: guest_memfd: pass error up from filemap_grab_folio > > filemap: add FGP_CREAT_ONLY > > KVM: x86: Add gmem hook for initializing memory > > KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn > > > > Sean Christopherson (7): > > KVM: x86: Split core of hypercall emulation to helper function > > KVM: Allow page-sized MMU caches to be initialized with custom 64-bit > > values > > KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE > > KVM: x86/mmu: Track shadow MMIO value on a per-VM basis > > KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed > > SPTE > > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > > argument > > > > Tom Lendacky (1): > > KVM: SEV: Use a VMSA physical address variable for populating VMCB > > > > arch/x86/include/asm/kvm-x86-ops.h | 3 + > > arch/x86/include/asm/kvm_host.h | 12 + > > arch/x86/include/asm/vmx.h | 13 + > > arch/x86/kvm/Makefile | 2 +- > > arch/x86/kvm/mmu.h | 1 + > > arch/x86/kvm/mmu/mmu.c | 55 ++-- > > arch/x86/kvm/mmu/mmu_internal.h | 6 +- > > arch/x86/kvm/mmu/mmutrace.h | 2 +- > > arch/x86/kvm/mmu/paging_tmpl.h | 4 +- > > arch/x86/kvm/mmu/spte.c | 16 +- > > arch/x86/kvm/mmu/spte.h | 21 +- > > arch/x86/kvm/mmu/tdp_iter.h | 12 + > > arch/x86/kvm/mmu/tdp_mmu.c | 74 +++-- > > arch/x86/kvm/svm/sev.c | 3 +- > > arch/x86/kvm/svm/svm.c | 9 +- > > arch/x86/kvm/svm/svm.h | 1 + > > arch/x86/kvm/vmx/main.c | 168 +++++++++++ > > arch/x86/kvm/vmx/vmcs.h | 5 + > > arch/x86/kvm/vmx/vmx.c | 460 +++++++++++------------------ > > arch/x86/kvm/vmx/vmx.h | 6 +- > > arch/x86/kvm/vmx/x86_ops.h | 124 ++++++++ > > arch/x86/kvm/x86.c | 69 +++-- > > include/linux/kvm_host.h | 25 ++ > > include/linux/kvm_types.h | 1 + > > include/linux/pagemap.h | 2 + > > mm/filemap.c | 4 + > > virt/kvm/Kconfig | 8 + > > virt/kvm/guest_memfd.c | 120 +++++++- > > virt/kvm/kvm_main.c | 16 +- > > 29 files changed, 855 insertions(+), 387 deletions(-) > > create mode 100644 arch/x86/kvm/vmx/main.c > > create mode 100644 arch/x86/kvm/vmx/x86_ops.h > > > > -- > > 2.39.0 > > >
On Wed, Feb 28, 2024, Paolo Bonzini wrote: > On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote: > > > Michael Roth (2): > > > KVM: x86: Add gmem hook for invalidating memory > > > KVM: x86: Add gmem hook for determining max NPT mapping level > > > > > > Paolo Bonzini (6): > > > KVM: x86/mmu: pass error code back to MMU when async pf is ready > > > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private > > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > in false positives for SEV and SEV-ES guests[*]. > > You didn't look at the patch did you? :) Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't look at what you postd. But it's a moot point, because now I did look at what you posted, and it's still broken :-) > It does check for has_private_mem (alternatively I could have dropped the bit > in SVM code for SEV and SEV-ES guests). The problem isn't with *KVM* setting the bit, it's with *hardware* setting the bit for SEV and SEV-ES guests. That results in this: .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), marking the fault as private. Which, in a vacuum, isn't technically wrong, since from hardware's perspective the vCPU access was "private". But from KVM's perspective, SEV and SEV-ES guests don't have private memory, they have memory that can be *encrypted*, and marking the access as "private" results in violations of KVM's rules for private memory. Specifically, it results in KVM triggering emulated MMIO for faults that are marked private, which we want to disallow for SNP and TDX. And because the flag only gets set on SNP capable hardware (in my limited testing of a whole two systems), running the same VM on different hardware would result in faults being marked private on one system, but not the other. Which means that KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't retroactively enforce anything (not to mention that that might break existing VMs).
On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote: > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > > in false positives for SEV and SEV-ES guests[*]. > > > > You didn't look at the patch did you? :) > > Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't > look at what you postd. But it's a moot point, because now I did look at what you > posted, and it's still broken :-) > > > It does check for has_private_mem (alternatively I could have dropped the bit > > in SVM code for SEV and SEV-ES guests). > > The problem isn't with *KVM* setting the bit, it's with *hardware* setting the > bit for SEV and SEV-ES guests. That results in this: > > .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), > > marking the fault as private. Which, in a vacuum, isn't technically wrong, since > from hardware's perspective the vCPU access was "private". But from KVM's > perspective, SEV and SEV-ES guests don't have private memory vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types series. It's false on SEV and SEV-ES VMs, therefore fault->is_private is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for me? :) Paolo > And because the flag only gets set on SNP capable hardware (in my limited testing > of a whole two systems), running the same VM on different hardware would result > in faults being marked private on one system, but not the other. Which means that > KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't > retroactively enforce anything (not to mention that that might break existing VMs). >
On Wed, Feb 28, 2024, Paolo Bonzini wrote: > On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote: > > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > > > in false positives for SEV and SEV-ES guests[*]. > > > > > > You didn't look at the patch did you? :) > > > > Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't > > look at what you postd. But it's a moot point, because now I did look at what you > > posted, and it's still broken :-) > > > > > It does check for has_private_mem (alternatively I could have dropped the bit > > > in SVM code for SEV and SEV-ES guests). > > > > The problem isn't with *KVM* setting the bit, it's with *hardware* setting the > > bit for SEV and SEV-ES guests. That results in this: > > > > .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), > > > > marking the fault as private. Which, in a vacuum, isn't technically wrong, since > > from hardware's perspective the vCPU access was "private". But from KVM's > > perspective, SEV and SEV-ES guests don't have private memory > > vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types > series. It's false on SEV and SEV-ES VMs, therefore fault->is_private > is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for > me? :) *sigh*, ENOCOFFEE.