diff mbox series

[Part2,RFC,v4,25/40] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates

Message ID 20210707183616.5620-26-brijesh.singh@amd.com (mailing list archive)
State New
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh July 7, 2021, 6:36 p.m. UTC
The guest pages of the SEV-SNP VM maybe added as a private page in the
RMP entry (assigned bit is set). The guest private pages must be
transitioned to the hypervisor state before its freed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

Sean Christopherson July 16, 2021, 8:09 p.m. UTC | #1
On Wed, Jul 07, 2021, Brijesh Singh wrote:
> The guest pages of the SEV-SNP VM maybe added as a private page in the
> RMP entry (assigned bit is set). The guest private pages must be
> transitioned to the hypervisor state before its freed.

Isn't this patch needed much earlier in the series, i.e. when the first RMPUPDATE
usage goes in?

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1f0635ac9ff9..4468995dd209 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1940,6 +1940,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
>  static void __unregister_enc_region_locked(struct kvm *kvm,
>  					   struct enc_region *region)
>  {
> +	struct rmpupdate val = {};
> +	unsigned long i, pfn;
> +	struct rmpentry *e;
> +	int level, rc;
> +
> +	/*
> +	 * The guest memory pages are assigned in the RMP table. Unassign it
> +	 * before releasing the memory.
> +	 */
> +	if (sev_snp_guest(kvm)) {
> +		for (i = 0; i < region->npages; i++) {
> +			pfn = page_to_pfn(region->pages[i]);
> +
> +			if (need_resched())
> +				schedule();

This can simply be "cond_resched();"

> +
> +			e = snp_lookup_page_in_rmptable(region->pages[i], &level);
> +			if (unlikely(!e))
> +				continue;
> +
> +			/* If its not a guest assigned page then skip it. */
> +			if (!rmpentry_assigned(e))
> +				continue;
> +
> +			/* Is the page part of a 2MB RMP entry? */
> +			if (level == PG_LEVEL_2M) {
> +				val.pagesize = RMP_PG_SIZE_2M;
> +				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +			} else {
> +				val.pagesize = RMP_PG_SIZE_4K;

This raises yet more questions (for me) as to the interaction between Page-Size
and Hyperivsor-Owned flags in the RMP.  It also raises questions on the correctness
of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch).

> +			}
> +
> +			/* Transition the page to hypervisor owned. */
> +			rc = rmpupdate(pfn_to_page(pfn), &val);
> +			if (rc)
> +				pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc);

This is not robust, e.g. KVM will unpin the memory and release it back to the
kernel with a stale RMP entry.  Shouldn't this be a WARN+leak situation?

> +		}
> +	}
> +
>  	sev_unpin_memory(kvm, region->pages, region->npages);
>  	list_del(&region->list);
>  	kfree(region);
> -- 
> 2.17.1
>
Brijesh Singh July 16, 2021, 10:16 p.m. UTC | #2
On 7/16/21 3:09 PM, Sean Christopherson wrote:
> On Wed, Jul 07, 2021, Brijesh Singh wrote:
>> The guest pages of the SEV-SNP VM maybe added as a private page in the
>> RMP entry (assigned bit is set). The guest private pages must be
>> transitioned to the hypervisor state before its freed.
> Isn't this patch needed much earlier in the series, i.e. when the first RMPUPDATE
> usage goes in?

Yes, the first RMPUPDATE usage is in the LAUNCH_UPDATE patch and this
should be squashed in that patch.


>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 1f0635ac9ff9..4468995dd209 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -1940,6 +1940,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
>>  static void __unregister_enc_region_locked(struct kvm *kvm,
>>  					   struct enc_region *region)
>>  {
>> +	struct rmpupdate val = {};
>> +	unsigned long i, pfn;
>> +	struct rmpentry *e;
>> +	int level, rc;
>> +
>> +	/*
>> +	 * The guest memory pages are assigned in the RMP table. Unassign it
>> +	 * before releasing the memory.
>> +	 */
>> +	if (sev_snp_guest(kvm)) {
>> +		for (i = 0; i < region->npages; i++) {
>> +			pfn = page_to_pfn(region->pages[i]);
>> +
>> +			if (need_resched())
>> +				schedule();
> This can simply be "cond_resched();"

Yes.


>
>> +
>> +			e = snp_lookup_page_in_rmptable(region->pages[i], &level);
>> +			if (unlikely(!e))
>> +				continue;
>> +
>> +			/* If its not a guest assigned page then skip it. */
>> +			if (!rmpentry_assigned(e))
>> +				continue;
>> +
>> +			/* Is the page part of a 2MB RMP entry? */
>> +			if (level == PG_LEVEL_2M) {
>> +				val.pagesize = RMP_PG_SIZE_2M;
>> +				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
>> +			} else {
>> +				val.pagesize = RMP_PG_SIZE_4K;
> This raises yet more questions (for me) as to the interaction between Page-Size
> and Hyperivsor-Owned flags in the RMP.  It also raises questions on the correctness
> of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch).

I assume you mean the LAUNCH_UPDATE because that's when we need to
perform the RMPUPDATE. The hypervisor owned means all zero in the RMP entry.


>> +			}
>> +
>> +			/* Transition the page to hypervisor owned. */
>> +			rc = rmpupdate(pfn_to_page(pfn), &val);
>> +			if (rc)
>> +				pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc);
> This is not robust, e.g. KVM will unpin the memory and release it back to the
> kernel with a stale RMP entry.  Shouldn't this be a WARN+leak situation?

Yes. Maybe we should increase the page refcount to ensure that this page
is not reused after the process is terminated ?


>> +		}
>> +	}
>> +
>>  	sev_unpin_memory(kvm, region->pages, region->npages);
>>  	list_del(&region->list);
>>  	kfree(region);
>> -- 
>> 2.17.1
>>
Sean Christopherson July 17, 2021, 12:46 a.m. UTC | #3
On Fri, Jul 16, 2021, Brijesh Singh wrote:
> 
> On 7/16/21 3:09 PM, Sean Christopherson wrote:
> > On Wed, Jul 07, 2021, Brijesh Singh wrote:
> >> +			e = snp_lookup_page_in_rmptable(region->pages[i], &level);
> >> +			if (unlikely(!e))
> >> +				continue;
> >> +
> >> +			/* If its not a guest assigned page then skip it. */
> >> +			if (!rmpentry_assigned(e))
> >> +				continue;
> >> +
> >> +			/* Is the page part of a 2MB RMP entry? */
> >> +			if (level == PG_LEVEL_2M) {
> >> +				val.pagesize = RMP_PG_SIZE_2M;
> >> +				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> >> +			} else {
> >> +				val.pagesize = RMP_PG_SIZE_4K;
> > This raises yet more questions (for me) as to the interaction between Page-Size
> > and Hyperivsor-Owned flags in the RMP.  It also raises questions on the correctness
> > of zeroing the RMP entry if KVM_SEV_SNP_LAUNCH_START (in the previous patch).
> 
> I assume you mean the LAUNCH_UPDATE because that's when we need to
> perform the RMPUPDATE.

Doh, yes.

> The hypervisor owned means all zero in the RMP entry.

Figured out where I went wrong after reading the RMPUDPATE pseudocode.  RMPUPDATE
takes the page size as a parameter even though it unconditionally zeros the page
size flag in the RMP entry for unassigned pages.

A wrapper around rmpupdate() would definitely help, e.g. (though level might need
to be an "int" to avoid a bunch of casts).

  int rmp_make_shared(u64 pfn, enum pg_level level);

Wrappers for "private" and "firmware" would probably be helpful too.  And if you
do that, I think you can bury both "struct rmpupdate", rmpupdate(), and
X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c.  snp_set_rmptable_state() might
need some refactoring to avoid three booleans, but I guess maybe that could be
an exception?  Not sure.  Anyways, was thinking something like:

  int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid);
  int rmp_make_firmware(u64 pfn);

It would consolidate a bit of code, and more importantly it would give visual
cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared".

Side topic, what happens if a firmware entry is configured with page_size=1?

And one architectural question: what prevents a malicious VMM from punching a 4k
shared page into a 2mb private page?  E.g.

  rmpupdate(1 << 20, [private, 2mb]);
  rmpupdate(1 << 20 + 4096, [shared, 4kb]);

I don't see any checks in the pseudocode that will detect this, and presumably the
whole point of a 2mb private RMP entry is to not have to go walk the individual
4kb entries on a private access.

  NEW_RMP = READ_MEM.o [NEW_RMP_PTR]

  IF ((NEW_RMP.PAGE_SIZE == 2MB) && (SYSTEM_PA[20:12] != 0))  <-- not taken, 4kb entry
          EAX = FAIL_INPUT
          EXIT

  IF (!NEW_RMP.ASSIGNED && (NEW_RMP.IMMUTABLE || (NEW_RMP.ASID != 0))  <-- not taken, new entry valid
          EAX = FAIL_INPUT
          EXIT

  RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16
  IF (RMP_ENTRY_PA > RMP_END)
          EAX = FAIL_INPUT
          EXIT

  // System address must have an RMP entry
  OLD_RMP = READ_MEM_PA.o [RMP_ENTRY_PA]
  IF (OLD_RMP.IMMUTABLE) <-- passes, private entry not immutable
          EAX = FAIL_PERMISSION
          EXIT

  IF (NEW_RMP.PAGE_SIZE == 4KB)
          IF ((SYSTEM_PA[20:12] == 0) && (OLD_RMP.PAGE_SIZE == 2MB)) <- not taken, PA[12] == 1
                  EAX = FAIL_OVERLAP
                  EXIT
  ELSE
          IF (Any 4KB RMP entry with (RMP.ASSIGNED == 1) exists in 2MB region)
                  EAX = FAIL_OVERLAP
                  EXIT
          ELSE
                  FOR (I = 1; I < 512, I++) {
                          temp_RMP = 0
                          temp_RMP.ASSIGNED = NEW_RMP.ASSIGNED
                          WRITE_MEM.o [RMP_ENTRY_PA + I * 16] = temp_RMP;
                  }
Brijesh Singh July 19, 2021, 12:55 p.m. UTC | #4
On 7/16/21 7:46 PM, Sean Christopherson wrote:

> takes the page size as a parameter even though it unconditionally zeros the page
> size flag in the RMP entry for unassigned pages.
>
> A wrapper around rmpupdate() would definitely help, e.g. (though level might need
> to be an "int" to avoid a bunch of casts).
>
>   int rmp_make_shared(u64 pfn, enum pg_level level);
>
> Wrappers for "private" and "firmware" would probably be helpful too.  And if you
> do that, I think you can bury both "struct rmpupdate", rmpupdate(), and
> X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c.  snp_set_rmptable_state() might
> need some refactoring to avoid three booleans, but I guess maybe that could be
> an exception?  Not sure.  Anyways, was thinking something like:
>
>   int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid);
>   int rmp_make_firmware(u64 pfn);
>
> It would consolidate a bit of code, and more importantly it would give visual
> cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared".

Okay, I will add helper to make things easier. One case where we will
need to directly call the rmpupdate() is during the LAUNCH_UPDATE
command. In that case the page is private and its immutable bit is also
set. This is because the firmware makes change to the page, and we are
required to set the immutable bit before the call.


>
> Side topic, what happens if a firmware entry is configured with page_size=1?

Its not any different from the guest requesting a page private with the
page_size=1. Some firmware commands require the page_size=0, and others
can work with page_size=1 or page_size=0.


>
> And one architectural question: what prevents a malicious VMM from punching a 4k
> shared page into a 2mb private page?  E.g.
>
>   rmpupdate(1 << 20, [private, 2mb]);
>   rmpupdate(1 << 20 + 4096, [shared, 4kb]);
>
> I don't see any checks in the pseudocode that will detect this, and presumably the
> whole point of a 2mb private RMP entry is to not have to go walk the individual
> 4kb entries on a private access.

I believe pseudo-code is not meant to be exactly accurate and
comprehensive, but it is intended to summarize the HW behavior and
explain what can cause the different fault cases. In the real design we
may have a separate checks to catch the above issue. I just tested on
the hardware to ensure that HW correctly detects the above error
condition. However, in this case we are missingĀ  a significant check (at
least the check that the 2M region is not already assigned). I have
raised the concern with the hardware team to look into updating the APM.
thank you so much for the bringing this up.

thanks
Sean Christopherson July 19, 2021, 5:18 p.m. UTC | #5
On Mon, Jul 19, 2021, Brijesh Singh wrote:
> 
> On 7/16/21 7:46 PM, Sean Christopherson wrote:
> 
> > takes the page size as a parameter even though it unconditionally zeros the page
> > size flag in the RMP entry for unassigned pages.
> >
> > A wrapper around rmpupdate() would definitely help, e.g. (though level might need
> > to be an "int" to avoid a bunch of casts).
> >
> >   int rmp_make_shared(u64 pfn, enum pg_level level);
> >
> > Wrappers for "private" and "firmware" would probably be helpful too.  And if you
> > do that, I think you can bury both "struct rmpupdate", rmpupdate(), and
> > X86_TO_RMP_PG_LEVEL() in arch/x86/kernel/sev.c.  snp_set_rmptable_state() might
> > need some refactoring to avoid three booleans, but I guess maybe that could be
> > an exception?  Not sure.  Anyways, was thinking something like:
> >
> >   int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid);
> >   int rmp_make_firmware(u64 pfn);
> >
> > It would consolidate a bit of code, and more importantly it would give visual
> > cues to the reader, e.g. it's easy to overlook "val = {0}" meaning "make shared".
> 
> Okay, I will add helper to make things easier. One case where we will
> need to directly call the rmpupdate() is during the LAUNCH_UPDATE
> command. In that case the page is private and its immutable bit is also
> set. This is because the firmware makes change to the page, and we are
> required to set the immutable bit before the call.

Or do "int rmp_make_firmware(u64 pfn, bool immutable)"?

> > And one architectural question: what prevents a malicious VMM from punching a 4k
> > shared page into a 2mb private page?  E.g.
> >
> >   rmpupdate(1 << 20, [private, 2mb]);
> >   rmpupdate(1 << 20 + 4096, [shared, 4kb]);
> >
> > I don't see any checks in the pseudocode that will detect this, and presumably the
> > whole point of a 2mb private RMP entry is to not have to go walk the individual
> > 4kb entries on a private access.
> 
> I believe pseudo-code is not meant to be exactly accurate and
> comprehensive, but it is intended to summarize the HW behavior and
> explain what can cause the different fault cases. In the real design we
> may have a separate checks to catch the above issue. I just tested on
> the hardware to ensure that HW correctly detects the above error
> condition. However, in this case we are missing a significant check (at
> least the check that the 2M region is not already assigned). I have
> raised the concern with the hardware team to look into updating the APM.

Thanks!  While you have their ear, please emphasive the importance of the pseudocode
for us software folks.  It's perfectly ok to omit or gloss over microarchitectural
details, but ISA pseudocode is often the source of truth for behavior that is
architecturally visible.
Brijesh Singh July 19, 2021, 6:34 p.m. UTC | #6
On 7/19/21 12:18 PM, Sean Christopherson wrote:
>>
>> Okay, I will add helper to make things easier. One case where we will
>> need to directly call the rmpupdate() is during the LAUNCH_UPDATE
>> command. In that case the page is private and its immutable bit is also
>> set. This is because the firmware makes change to the page, and we are
>> required to set the immutable bit before the call.
> 
> Or do "int rmp_make_firmware(u64 pfn, bool immutable)"?
> 

That's not what we need.

We need 'rmp_make_private() + immutable' all in one RMPUPDATE.  Here is 
the snippet from SNP_LAUNCH_UPDATE.


+	/* Transition the page state to pre-guest */
+	memset(&e, 0, sizeof(e));
+	e.assigned = 1;
+	e.gpa = gpa;
+	e.asid = sev_get_asid(kvm);
+	e.immutable = true;
+	e.pagesize = X86_TO_RMP_PG_LEVEL(level);
+	ret = rmpupdate(inpages[i], &e);

thanks
Sean Christopherson July 19, 2021, 7:03 p.m. UTC | #7
On Mon, Jul 19, 2021, Brijesh Singh wrote:
> 
> On 7/19/21 12:18 PM, Sean Christopherson wrote:
> > > 
> > > Okay, I will add helper to make things easier. One case where we will
> > > need to directly call the rmpupdate() is during the LAUNCH_UPDATE
> > > command. In that case the page is private and its immutable bit is also
> > > set. This is because the firmware makes change to the page, and we are
> > > required to set the immutable bit before the call.
> > 
> > Or do "int rmp_make_firmware(u64 pfn, bool immutable)"?
> 
> That's not what we need.
> 
> We need 'rmp_make_private() + immutable' all in one RMPUPDATE.  Here is the
> snippet from SNP_LAUNCH_UPDATE.

Ah, not firmwrare, gotcha.  But we can still use a helper, e.g. an inner
double-underscore helper, __rmp_make_private().
Sean Christopherson July 19, 2021, 7:14 p.m. UTC | #8
On Mon, Jul 19, 2021, Sean Christopherson wrote:
> On Mon, Jul 19, 2021, Brijesh Singh wrote:
> > 
> > On 7/19/21 12:18 PM, Sean Christopherson wrote:
> > > > 
> > > > Okay, I will add helper to make things easier. One case where we will
> > > > need to directly call the rmpupdate() is during the LAUNCH_UPDATE
> > > > command. In that case the page is private and its immutable bit is also
> > > > set. This is because the firmware makes change to the page, and we are
> > > > required to set the immutable bit before the call.
> > > 
> > > Or do "int rmp_make_firmware(u64 pfn, bool immutable)"?
> > 
> > That's not what we need.
> > 
> > We need 'rmp_make_private() + immutable' all in one RMPUPDATE.  Here is the
> > snippet from SNP_LAUNCH_UPDATE.
> 
> Ah, not firmwrare, gotcha.  But we can still use a helper, e.g. an inner
> double-underscore helper, __rmp_make_private().

Hmm, looking at it again, I think I also got confused by the comment for the VMSA
page:

	/* Transition the VMSA page to a firmware state. */
 	e.assigned = 1;
	e.immutable = 1;
	e.asid = sev->asid;
	e.gpa = -1;
	e.pagesize = RMP_PG_SIZE_4K;

Unlike __snp_alloc_firmware_pages() in the CCP code, the VMSA is associated with
the guest's ASID, just not a GPA.  I.e. the VMSA is more of a specialized guest
private page, as opposed to a dedicated firmware page.  I.e. a __rmp_make_private()
and/or rmp_make_private_immutable() definitely seems like a good idea.
Brijesh Singh July 19, 2021, 7:37 p.m. UTC | #9
On 7/19/21 2:03 PM, Sean Christopherson wrote:
> On Mon, Jul 19, 2021, Brijesh Singh wrote:
>>
>> On 7/19/21 12:18 PM, Sean Christopherson wrote:
>>>>
>>>> Okay, I will add helper to make things easier. One case where we will
>>>> need to directly call the rmpupdate() is during the LAUNCH_UPDATE
>>>> command. In that case the page is private and its immutable bit is also
>>>> set. This is because the firmware makes change to the page, and we are
>>>> required to set the immutable bit before the call.
>>>
>>> Or do "int rmp_make_firmware(u64 pfn, bool immutable)"?
>>
>> That's not what we need.
>>
>> We need 'rmp_make_private() + immutable' all in one RMPUPDATE.  Here is the
>> snippet from SNP_LAUNCH_UPDATE.
> 
> Ah, not firmwrare, gotcha.  But we can still use a helper, e.g. an inner
> double-underscore helper, __rmp_make_private().
> 

In that case we are basically passing the all the fields defined in the 
'struct rmpupdate' as individual arguments. How about something like this:

* core kernel exports the rmpupdate()
* the include/linux/sev.h header file defines the helper functions

   int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid)
   int rmp_make_firmware(u64 pfn, int psize);
   int rmp_make_shared(u64 pfn, int psize);

In most of the case above 3 helpers are good. If driver finds that the 
above helper does not fit its need (such as SNP_LAUNCH_UPDATE) then call 
the rmpupdate() without going through the helper.

-Brijesh
Sean Christopherson July 20, 2021, 4:40 p.m. UTC | #10
On Mon, Jul 19, 2021, Brijesh Singh wrote:
> 
> On 7/19/21 2:03 PM, Sean Christopherson wrote:
> > On Mon, Jul 19, 2021, Brijesh Singh wrote:
> > Ah, not firmwrare, gotcha.  But we can still use a helper, e.g. an inner
> > double-underscore helper, __rmp_make_private().
> 
> In that case we are basically passing the all the fields defined in the
> 'struct rmpupdate' as individual arguments.

Yes, but (a) not _all_ fields, (b) it would allow hiding "struct rmpupdate", and
(c) this is much friendlier to readers:

	__rmp_make_private(pfn, gpa, PG_LEVEL_4K, svm->asid, true);

than:

	rmpupdate(&rmpupdate);

For the former, I can see in a single line of code that KVM is creating a 4k
private, immutable guest page.  With the latter, I need to go hunt down all code
that modifies rmpupdate to understand what the code is doing.

> How about something like this:
> 
> * core kernel exports the rmpupdate()
> * the include/linux/sev.h header file defines the helper functions
> 
>   int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid)

I think we'll want s/psize/level, i.e. make it more obvious clear that the input
is PG_LEVEL_*.
Brijesh Singh July 20, 2021, 6:23 p.m. UTC | #11
On 7/20/21 11:40 AM, Sean Christopherson wrote:
> On Mon, Jul 19, 2021, Brijesh Singh wrote:
>>
>> On 7/19/21 2:03 PM, Sean Christopherson wrote:
>>> On Mon, Jul 19, 2021, Brijesh Singh wrote:
>>> Ah, not firmwrare, gotcha.  But we can still use a helper, e.g. an inner
>>> double-underscore helper, __rmp_make_private().
>>
>> In that case we are basically passing the all the fields defined in the
>> 'struct rmpupdate' as individual arguments.
> 
> Yes, but (a) not _all_ fields, (b) it would allow hiding "struct rmpupdate", and
> (c) this is much friendlier to readers:
> 
> 	__rmp_make_private(pfn, gpa, PG_LEVEL_4K, svm->asid, true);
> 
> than:
> 
> 	rmpupdate(&rmpupdate);
> 

Ok.

> For the former, I can see in a single line of code that KVM is creating a 4k
> private, immutable guest page.  With the latter, I need to go hunt down all code
> that modifies rmpupdate to understand what the code is doing.
> 
>> How about something like this:
>>
>> * core kernel exports the rmpupdate()
>> * the include/linux/sev.h header file defines the helper functions
>>
>>    int rmp_make_private(u64 pfn, u64 gpa, int psize, int asid)
> 
> I think we'll want s/psize/level, i.e. make it more obvious clear that the input
> is PG_LEVEL_*.
> 

ok, I will stick to x86 PG_LEVEL_*

thanks
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1f0635ac9ff9..4468995dd209 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1940,6 +1940,45 @@  find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
 static void __unregister_enc_region_locked(struct kvm *kvm,
 					   struct enc_region *region)
 {
+	struct rmpupdate val = {};
+	unsigned long i, pfn;
+	struct rmpentry *e;
+	int level, rc;
+
+	/*
+	 * The guest memory pages are assigned in the RMP table. Unassign it
+	 * before releasing the memory.
+	 */
+	if (sev_snp_guest(kvm)) {
+		for (i = 0; i < region->npages; i++) {
+			pfn = page_to_pfn(region->pages[i]);
+
+			if (need_resched())
+				schedule();
+
+			e = snp_lookup_page_in_rmptable(region->pages[i], &level);
+			if (unlikely(!e))
+				continue;
+
+			/* If its not a guest assigned page then skip it. */
+			if (!rmpentry_assigned(e))
+				continue;
+
+			/* Is the page part of a 2MB RMP entry? */
+			if (level == PG_LEVEL_2M) {
+				val.pagesize = RMP_PG_SIZE_2M;
+				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+			} else {
+				val.pagesize = RMP_PG_SIZE_4K;
+			}
+
+			/* Transition the page to hypervisor owned. */
+			rc = rmpupdate(pfn_to_page(pfn), &val);
+			if (rc)
+				pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc);
+		}
+	}
+
 	sev_unpin_memory(kvm, region->pages, region->npages);
 	list_del(&region->list);
 	kfree(region);