mbox series

[RFC,0/6] KVM: SVM: Defer page pinning for SEV guests

Message ID 20220118110621.62462-1-nikunj@amd.com (mailing list archive)
Headers show
Series KVM: SVM: Defer page pinning for SEV guests | expand

Message

Nikunj A. Dadhania Jan. 18, 2022, 11:06 a.m. UTC
SEV guest requires the guest's pages to be pinned in host physical
memory as migration of encrypted pages is not supported. The memory
encryption scheme uses the physical address of the memory being
encrypted. If guest pages are moved by the host, content decrypted in
the guest would be incorrect thereby corrupting guest's memory.

For SEV/SEV-ES guests, the hypervisor doesn't know which pages are
encrypted and when the guest is done using those pages. Hypervisor
should treat all the guest pages as encrypted until the guest is
destroyed.

Actual pinning management is handled by vendor code via new
kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on
demand. Metadata of the pinning is stored in architecture specific
memslot area. During the memslot freeing path guest pages are
unpinned.

Initially started with [1], where the idea was to store the pinning
information using the software bit in the SPTE to track the pinned
page. That is not feasible for the following reason:

The pinned SPTE information gets stored in the shadow pages(SP). The
way current MMU is designed, the full MMU context gets dropped
multiple number of times even when CR0.WP bit gets flipped. Due to
dropping of the MMU context (aka roots), there is a huge amount of SP
alloc/remove churn. Pinned information stored in the SP gets lost
during the dropping of the root and subsequent SP at the child levels.
Without this information making decisions about re-pinnning page or
unpinning during the guest shutdown will not be possible

[1] https://patchwork.kernel.org/project/kvm/cover/20200731212323.21746-1-sean.j.christopherson@intel.com/ 

Nikunj A Dadhania (4):
  KVM: x86/mmu: Add hook to pin PFNs on demand in MMU
  KVM: SVM: Add pinning metadata in the arch memslot
  KVM: SVM: Implement demand page pinning
  KVM: SEV: Carve out routine for allocation of pages

Sean Christopherson (2):
  KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV/TDX
  KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()

 arch/x86/include/asm/kvm-x86-ops.h |   3 +
 arch/x86/include/asm/kvm_host.h    |   9 +
 arch/x86/kvm/mmu.h                 |   3 +
 arch/x86/kvm/mmu/mmu.c             |  41 +++
 arch/x86/kvm/mmu/tdp_mmu.c         |   7 +
 arch/x86/kvm/svm/sev.c             | 423 +++++++++++++++++++----------
 arch/x86/kvm/svm/svm.c             |   4 +
 arch/x86/kvm/svm/svm.h             |   9 +-
 arch/x86/kvm/x86.c                 |  11 +-
 9 files changed, 359 insertions(+), 151 deletions(-)

Comments

Mingwei Zhang March 6, 2022, 8:07 p.m. UTC | #1
On Tue, Jan 18, 2022, Nikunj A Dadhania wrote:
> SEV guest requires the guest's pages to be pinned in host physical
> memory as migration of encrypted pages is not supported. The memory
> encryption scheme uses the physical address of the memory being
> encrypted. If guest pages are moved by the host, content decrypted in
> the guest would be incorrect thereby corrupting guest's memory.
> 
> For SEV/SEV-ES guests, the hypervisor doesn't know which pages are
> encrypted and when the guest is done using those pages. Hypervisor
> should treat all the guest pages as encrypted until the guest is
> destroyed.
"Hypervisor should treat all the guest pages as encrypted until they are
deallocated or the guest is destroyed".

Note: in general, the guest VM could ask the user-level VMM to free the
page by either free the memslot or free the pages (munmap(2)).

> 
> Actual pinning management is handled by vendor code via new
> kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on
> demand. Metadata of the pinning is stored in architecture specific
> memslot area. During the memslot freeing path guest pages are
> unpinned.

"During the memslot freeing path and deallocation path"

> 
> Initially started with [1], where the idea was to store the pinning
> information using the software bit in the SPTE to track the pinned
> page. That is not feasible for the following reason:
> 
> The pinned SPTE information gets stored in the shadow pages(SP). The
> way current MMU is designed, the full MMU context gets dropped
> multiple number of times even when CR0.WP bit gets flipped. Due to
> dropping of the MMU context (aka roots), there is a huge amount of SP
> alloc/remove churn. Pinned information stored in the SP gets lost
> during the dropping of the root and subsequent SP at the child levels.
> Without this information making decisions about re-pinnning page or
> unpinning during the guest shutdown will not be possible
> 
> [1] https://patchwork.kernel.org/project/kvm/cover/20200731212323.21746-1-sean.j.christopherson@intel.com/ 
> 

A general feedback: I really like this patch set and I think doing
memory pinning at fault path in kernel and storing the metadata in
memslot is the right thing to do.

This basically solves all the problems triggered by the KVM based API
that trusts the user-level VMM to do the memory pinning.

Thanks.
> Nikunj A Dadhania (4):
>   KVM: x86/mmu: Add hook to pin PFNs on demand in MMU
>   KVM: SVM: Add pinning metadata in the arch memslot
>   KVM: SVM: Implement demand page pinning
>   KVM: SEV: Carve out routine for allocation of pages
> 
> Sean Christopherson (2):
>   KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV/TDX
>   KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()
> 
>  arch/x86/include/asm/kvm-x86-ops.h |   3 +
>  arch/x86/include/asm/kvm_host.h    |   9 +
>  arch/x86/kvm/mmu.h                 |   3 +
>  arch/x86/kvm/mmu/mmu.c             |  41 +++
>  arch/x86/kvm/mmu/tdp_mmu.c         |   7 +
>  arch/x86/kvm/svm/sev.c             | 423 +++++++++++++++++++----------
>  arch/x86/kvm/svm/svm.c             |   4 +
>  arch/x86/kvm/svm/svm.h             |   9 +-
>  arch/x86/kvm/x86.c                 |  11 +-
>  9 files changed, 359 insertions(+), 151 deletions(-)
> 
> -- 
> 2.32.0
>
Nikunj A. Dadhania March 7, 2022, 1:02 p.m. UTC | #2
On 3/7/2022 1:37 AM, Mingwei Zhang wrote:
> On Tue, Jan 18, 2022, Nikunj A Dadhania wrote:
>> SEV guest requires the guest's pages to be pinned in host physical
>> memory as migration of encrypted pages is not supported. The memory
>> encryption scheme uses the physical address of the memory being
>> encrypted. If guest pages are moved by the host, content decrypted in
>> the guest would be incorrect thereby corrupting guest's memory.
>>
>> For SEV/SEV-ES guests, the hypervisor doesn't know which pages are
>> encrypted and when the guest is done using those pages. Hypervisor
>> should treat all the guest pages as encrypted until the guest is
>> destroyed.
> "Hypervisor should treat all the guest pages as encrypted until they are
> deallocated or the guest is destroyed".
> 
> Note: in general, the guest VM could ask the user-level VMM to free the
> page by either free the memslot or free the pages (munmap(2)).
> 

Sure, will update

>>
>> Actual pinning management is handled by vendor code via new
>> kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on
>> demand. Metadata of the pinning is stored in architecture specific
>> memslot area. During the memslot freeing path guest pages are
>> unpinned.
> 
> "During the memslot freeing path and deallocation path"

Sure.

> 
>>
>> Initially started with [1], where the idea was to store the pinning
>> information using the software bit in the SPTE to track the pinned
>> page. That is not feasible for the following reason:
>>
>> The pinned SPTE information gets stored in the shadow pages(SP). The
>> way current MMU is designed, the full MMU context gets dropped
>> multiple number of times even when CR0.WP bit gets flipped. Due to
>> dropping of the MMU context (aka roots), there is a huge amount of SP
>> alloc/remove churn. Pinned information stored in the SP gets lost
>> during the dropping of the root and subsequent SP at the child levels.
>> Without this information making decisions about re-pinnning page or
>> unpinning during the guest shutdown will not be possible
>>
>> [1] https://patchwork.kernel.org/project/kvm/cover/20200731212323.21746-1-sean.j.christopherson@intel.com/ 
>>
> 
> A general feedback: I really like this patch set and I think doing
> memory pinning at fault path in kernel and storing the metadata in
> memslot is the right thing to do.
> 
> This basically solves all the problems triggered by the KVM based API
> that trusts the user-level VMM to do the memory pinning.
> 
Thanks for the feedback.

Regards
Nikunj