diff mbox series

KVM: SEV: Fix guest memory leak when handling guest requests

Message ID 20240518150457.1033295-1-michael.roth@amd.com (mailing list archive)
State New
Headers show
Series KVM: SEV: Fix guest memory leak when handling guest requests | expand

Commit Message

Michael Roth May 18, 2024, 3:04 p.m. UTC
Before forwarding guest requests to firmware, KVM takes a reference on
the 2 pages the guest uses for its request/response buffers. Make sure
to release these when cleaning up after the request is completed.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---

Hi Paolo,

Sorry for another late fix, but I finally spotted this while looking over
the code again today. I've re-tested attestation guest requests with this
applied (after applying the other pending fix) and everything looks good.

-Mike

 arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

Comments

Sean Christopherson May 20, 2024, 2:17 p.m. UTC | #1
This needs a

  From: Michael Roth <michael.roth@amd.com>

otherwise Author will be assigned to your @utexas.edu email.

On Sat, May 18, 2024, Michael Roth wrote:
> Before forwarding guest requests to firmware, KVM takes a reference on
> the 2 pages the guest uses for its request/response buffers. Make sure
> to release these when cleaning up after the request is completed.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---

...

> @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
>  		return ret;
>  
>  	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> -	if (ret)
> -		return ret;
>  
> -	ret = snp_cleanup_guest_buf(&data);
> -	if (ret)
> -		return ret;
> +	if (snp_cleanup_guest_buf(&data))
> +		return -EINVAL;

EINVAL feels wrong.  The input was completely valid.  Also, forwarding the error
to the guest doesn't seem like the right thing to do if KVM can't reclaim the
response PFN.  Shouldn't that be fatal to the VM?

> -	return 0;
> +	return ret;

I find the setup/cleanup split makes this code harder to read, not easier.  It
won't be pretty no matter waht due to the potential RMP failures, but IMO this
is easier to follow:

	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
	struct sev_data_snp_guest_request data = {0};
	kvm_pfn_t req_pfn, resp_pfn;
	int ret;

	if (!sev_snp_guest(kvm))
		return -EINVAL;

	if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
		return -EINVAL;

	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
	if (is_error_noslot_pfn(req_pfn))
		return -EINVAL;

	ret = -EINVAL;

	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
	if (is_error_noslot_pfn(resp_pfn))
		goto release_req;

	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
		kvm_release_pfn_clean(resp_pfn);
		goto release_req;
	}

	data.gctx_paddr = __psp_pa(sev->snp_context);
	data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
	data.res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);

	if (snp_page_reclaim(resp_pfn) ||
	    rmp_make_shared(resp_pfn, PG_LEVEL_4K))
		ret = ret ?: -EIO;
	else
		kvm_release_pfn_dirty(resp_pfn);
release_req:
	kvm_release_pfn_clean(req_pfn);
	return ret;
Michael Roth May 20, 2024, 10:50 p.m. UTC | #2
On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> This needs a
> 
>   From: Michael Roth <michael.roth@amd.com>
> 
> otherwise Author will be assigned to your @utexas.edu email.

Thanks, I hadn't considered that. My work email issue seems to be
resolved now, but will keep that in mind if I ever need to use a
fallback again.

> 
> On Sat, May 18, 2024, Michael Roth wrote:
> > Before forwarding guest requests to firmware, KVM takes a reference on
> > the 2 pages the guest uses for its request/response buffers. Make sure
> > to release these when cleaning up after the request is completed.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> 
> ...
> 
> > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> >  		return ret;
> >  
> >  	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > -	if (ret)
> > -		return ret;
> >  
> > -	ret = snp_cleanup_guest_buf(&data);
> > -	if (ret)
> > -		return ret;
> > +	if (snp_cleanup_guest_buf(&data))
> > +		return -EINVAL;
> 
> EINVAL feels wrong.  The input was completely valid.  Also, forwarding the error

Yah, EIO seems more suitable here.

> to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> response PFN.  Shouldn't that be fatal to the VM?

The thinking here is that pretty much all guest request failures will be
fatal to the guest being able to continue. At least, that's definitely
true for attestation. So reporting the error to the guest would allow that
failure to be propagated along by handling in the guest where it would
presumably be reported a little more clearly to the guest owner, at
which point the guest would most likely terminate itself anyway.

But there is a possibility that the guest will attempt access the response
PFN before/during that reporting and spin on an #NPF instead though. So
maybe the safer more repeatable approach is to handle the error directly
from KVM and propagate it to userspace.

But the GHCB spec does require that the firmware response code for
SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
SW_EXITINFO2, so we'd still want handling to pass that error on to the
guest, so I made some changes to retain that behavior.

> 
> > -	return 0;
> > +	return ret;
> 
> I find the setup/cleanup split makes this code harder to read, not easier.  It
> won't be pretty no matter waht due to the potential RMP failures, but IMO this
> is easier to follow:

It *might* make more sense to split things out into helpers when extended
guest requests are implemented, but for the patch in question I agree
what you have below is clearer. I also went a step further and moved
__snp_handle_guest_req() back into snp_handle_guest_req() as well to
simplify the logic for always passing firmware errors back to the guest.

I'll post a v2 of the fixup with these changes added. But I've also
pushed it here for reference:

  https://github.com/mdroth/linux/commit/8ceab17950dc5f1b94231037748104f7c31752f8
  (from https://github.com/mdroth/linux/commits/kvm-next-snp-fixes2/)

and here's the original PATCH 17/19 with all pending fixes squashed in:

  https://github.com/mdroth/linux/commit/b4f51e38da22a2b163c546cb2a3aefd04446b3c7
  (from https://github.com/mdroth/linux/commits/kvm-next-snp-fixes2-squashed/)
  (also retested attestation with simulated failures and double-checked
   for clang warnings with W=1)
  
Thanks!

-Mike

> 
> 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> 	struct sev_data_snp_guest_request data = {0};
> 	kvm_pfn_t req_pfn, resp_pfn;
> 	int ret;
> 
> 	if (!sev_snp_guest(kvm))
> 		return -EINVAL;
> 
> 	if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
> 		return -EINVAL;
> 
> 	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> 	if (is_error_noslot_pfn(req_pfn))
> 		return -EINVAL;
> 
> 	ret = -EINVAL;
> 
> 	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> 	if (is_error_noslot_pfn(resp_pfn))
> 		goto release_req;
> 
> 	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
> 		kvm_release_pfn_clean(resp_pfn);
> 		goto release_req;
> 	}
> 
> 	data.gctx_paddr = __psp_pa(sev->snp_context);
> 	data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> 	data.res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> 	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> 
> 	if (snp_page_reclaim(resp_pfn) ||
> 	    rmp_make_shared(resp_pfn, PG_LEVEL_4K))
> 		ret = ret ?: -EIO;
> 	else
> 		kvm_release_pfn_dirty(resp_pfn);
> release_req:
> 	kvm_release_pfn_clean(req_pfn);
> 	return ret;
> 
>
Sean Christopherson May 20, 2024, 11:32 p.m. UTC | #3
On Mon, May 20, 2024, Michael Roth wrote:
> On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> > On Sat, May 18, 2024, Michael Roth wrote:
> > > Before forwarding guest requests to firmware, KVM takes a reference on
> > > the 2 pages the guest uses for its request/response buffers. Make sure
> > > to release these when cleaning up after the request is completed.
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > 
> > ...
> > 
> > > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > >  		return ret;
> > >  
> > >  	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > > -	if (ret)
> > > -		return ret;
> > >  
> > > -	ret = snp_cleanup_guest_buf(&data);
> > > -	if (ret)
> > > -		return ret;
> > > +	if (snp_cleanup_guest_buf(&data))
> > > +		return -EINVAL;
> > 
> > EINVAL feels wrong.  The input was completely valid.  Also, forwarding the error
> 
> Yah, EIO seems more suitable here.
> 
> > to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> > response PFN.  Shouldn't that be fatal to the VM?
> 
> The thinking here is that pretty much all guest request failures will be
> fatal to the guest being able to continue. At least, that's definitely
> true for attestation. So reporting the error to the guest would allow that
> failure to be propagated along by handling in the guest where it would
> presumably be reported a little more clearly to the guest owner, at
> which point the guest would most likely terminate itself anyway.

But failure to convert a pfn back to shared is a _host_ issue, not a guest issue.
E.g. it most likely indicates a bug in the host software stack, or perhaps a bad
CPU or firmware bug.



> But there is a possibility that the guest will attempt access the response
> PFN before/during that reporting and spin on an #NPF instead though. So
> maybe the safer more repeatable approach is to handle the error directly
> from KVM and propagate it to userspace.

I was thinking more along the lines of KVM marking the VM as dead/bugged.  

> But the GHCB spec does require that the firmware response code for
> SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
> SW_EXITINFO2, so we'd still want handling to pass that error on to the
> guest, so I made some changes to retain that behavior.

If and only the hypervisor completes the event.

  The hypervisor must save the SNP_GUEST_REQUEST return code in the lower 32-bits
  of the SW_EXITINFO2 field before completing the Guest Request NAE event.

If KVM terminates the VM, there's obviously no need to fill SW_EXITINFO2.

Side topic, is there a plan to ratelimit Guest Requests?

  To avoid the possibility of a guest creating a denial of service attack against
  the SNP firmware, it is recommended that some form of rate limiting be implemented
  should it be detected that a high number of Guest Request NAE events are being
  issued.
Michael Roth May 21, 2024, 2 a.m. UTC | #4
On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Michael Roth wrote:
> > On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> > > On Sat, May 18, 2024, Michael Roth wrote:
> > > > Before forwarding guest requests to firmware, KVM takes a reference on
> > > > the 2 pages the guest uses for its request/response buffers. Make sure
> > > > to release these when cleaning up after the request is completed.
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > ---
> > > 
> > > ...
> > > 
> > > > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > > >  		return ret;
> > > >  
> > > >  	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > > > -	if (ret)
> > > > -		return ret;
> > > >  
> > > > -	ret = snp_cleanup_guest_buf(&data);
> > > > -	if (ret)
> > > > -		return ret;
> > > > +	if (snp_cleanup_guest_buf(&data))
> > > > +		return -EINVAL;
> > > 
> > > EINVAL feels wrong.  The input was completely valid.  Also, forwarding the error
> > 
> > Yah, EIO seems more suitable here.
> > 
> > > to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> > > response PFN.  Shouldn't that be fatal to the VM?
> > 
> > The thinking here is that pretty much all guest request failures will be
> > fatal to the guest being able to continue. At least, that's definitely
> > true for attestation. So reporting the error to the guest would allow that
> > failure to be propagated along by handling in the guest where it would
> > presumably be reported a little more clearly to the guest owner, at
> > which point the guest would most likely terminate itself anyway.
> 
> But failure to convert a pfn back to shared is a _host_ issue, not a guest issue.
> E.g. it most likely indicates a bug in the host software stack, or perhaps a bad
> CPU or firmware bug.

No disagreement there, I think it's more correct to not propagate
any errors resulting from reclaim failure. Was just explaining why the
original code had propensity for propagating errors to guest, and why it
still needs to be done for firmware errors.

> 
> > But there is a possibility that the guest will attempt access the response
> > PFN before/during that reporting and spin on an #NPF instead though. So
> > maybe the safer more repeatable approach is to handle the error directly
> > from KVM and propagate it to userspace.
> 
> I was thinking more along the lines of KVM marking the VM as dead/bugged.  

In practice userspace will get an unhandled exit and kill the vcpu/guest,
but we could additionally flag the guest as dead. Is there a existing
mechanism for this?

> 
> > But the GHCB spec does require that the firmware response code for
> > SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
> > SW_EXITINFO2, so we'd still want handling to pass that error on to the
> > guest, so I made some changes to retain that behavior.
> 
> If and only the hypervisor completes the event.
> 
>   The hypervisor must save the SNP_GUEST_REQUEST return code in the lower 32-bits
>   of the SW_EXITINFO2 field before completing the Guest Request NAE event.
> 
> If KVM terminates the VM, there's obviously no need to fill SW_EXITINFO2.

Yah, the v2 patch will only propagate the firmware error if reclaim was
successful.

> 
> Side topic, is there a plan to ratelimit Guest Requests?
> 
>   To avoid the possibility of a guest creating a denial of service attack against
>   the SNP firmware, it is recommended that some form of rate limiting be implemented
>   should it be detected that a high number of Guest Request NAE events are being
>   issued.

The guest side is upstream, and Dionna submitted HV patches last year. I think
these are the latest ones:

  https://www.spinics.net/lists/kvm/msg301438.html

I think it probably makes sense to try to get the throttling support in
for 6.11

-Mike
Sean Christopherson May 21, 2024, 2:09 p.m. UTC | #5
On Mon, May 20, 2024, Michael Roth wrote:
> On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > On Mon, May 20, 2024, Michael Roth wrote:
> > > But there is a possibility that the guest will attempt access the response
> > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > maybe the safer more repeatable approach is to handle the error directly
> > > from KVM and propagate it to userspace.
> > 
> > I was thinking more along the lines of KVM marking the VM as dead/bugged.  
> 
> In practice userspace will get an unhandled exit and kill the vcpu/guest,
> but we could additionally flag the guest as dead.

Honest question, does it make sense from KVM to make the VM unusable?  E.g. is
it feasible for userspace to keep running the VM?  Does the page that's in a bad
state present any danger to the host?

> Is there a existing mechanism for this?

kvm_vm_dead()
Michael Roth May 21, 2024, 3:34 p.m. UTC | #6
On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Michael Roth wrote:
> > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > But there is a possibility that the guest will attempt access the response
> > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > maybe the safer more repeatable approach is to handle the error directly
> > > > from KVM and propagate it to userspace.
> > > 
> > > I was thinking more along the lines of KVM marking the VM as dead/bugged.  
> > 
> > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > but we could additionally flag the guest as dead.
> 
> Honest question, does it make sense from KVM to make the VM unusable?  E.g. is
> it feasible for userspace to keep running the VM?  Does the page that's in a bad
> state present any danger to the host?

If the reclaim fails (which it shouldn't), then KVM has a unique situation
where a non-gmem guest page is in a state. In theory, if the guest/userspace
could somehow induce a reclaim failure, then can they potentially trick the
host into trying to access that same page as a shared page and induce a
host RMP #PF.

So it does seem like a good idea to force the guest to stop executing. Then
once the guest is fully destroyed the bad page will stay leaked so it
won't affect subsequent activities.

> 
> > Is there a existing mechanism for this?
> 
> kvm_vm_dead()

Nice, that would do the trick. I'll modify the logic to also call that
after a reclaim failure.

Thanks,

Mike
Sean Christopherson May 21, 2024, 4:58 p.m. UTC | #7
On Tue, May 21, 2024, Michael Roth wrote:
> On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> > On Mon, May 20, 2024, Michael Roth wrote:
> > > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > > But there is a possibility that the guest will attempt access the response
> > > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > > maybe the safer more repeatable approach is to handle the error directly
> > > > > from KVM and propagate it to userspace.
> > > > 
> > > > I was thinking more along the lines of KVM marking the VM as dead/bugged.  
> > > 
> > > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > > but we could additionally flag the guest as dead.
> > 
> > Honest question, does it make sense from KVM to make the VM unusable?  E.g. is
> > it feasible for userspace to keep running the VM?  Does the page that's in a bad
> > state present any danger to the host?
> 
> If the reclaim fails (which it shouldn't), then KVM has a unique situation
> where a non-gmem guest page is in a state. In theory, if the guest/userspace
> could somehow induce a reclaim failure, then can they potentially trick the
> host into trying to access that same page as a shared page and induce a
> host RMP #PF.
> 
> So it does seem like a good idea to force the guest to stop executing. Then
> once the guest is fully destroyed the bad page will stay leaked so it
> won't affect subsequent activities.
> 
> > 
> > > Is there a existing mechanism for this?
> > 
> > kvm_vm_dead()
> 
> Nice, that would do the trick. I'll modify the logic to also call that
> after a reclaim failure.

Hmm, assuming there's no scenario where snp_page_reclaim() is expected fail, and
such a failure is always unrecoverable, e.g. has similar potential for inducing
host RMP #PFs, then KVM_BUG_ON() is more appropriate.

Ah, and there are already WARNs in the lower level helpers.  Those WARNs should
be KVM_BUG_ON(), because AFAICT there's no scenario where letting the VM live on
is safe/sensible.  And unless I'm missing something, snp_page_reclaim() should
do the private=>shared conversion, because the only reason to reclaim a page is
to move it back to shared state.

Lastly, I vote to rename host_rmp_make_shared() to kvm_rmp_make_shared() to make
it more obvious that it's a KVM helper, whereas rmp_make_shared() is a generic
kernel helper, i.e. _can't_ bug the VM because it doesn't (and shouldn't) have a
pointer to the VM.

E.g. end up with something like this:

/*
 * Transition a page to hypervisor-owned/shared state in the RMP table. This
 * should not fail under normal conditions, but leak the page should that
 * happen since it will no longer be usable by the host due to RMP protections.
 */
static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level)
{
	if (KVM_BUG_ON(rmp_make_shared(pfn, level), kvm)) {
		snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
		return -EIO;
	}

	return 0;
}

/*
 * Certain page-states, such as Pre-Guest and Firmware pages (as documented
 * in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
 * directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
 * unless they are reclaimed first.
 *
 * Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
 * might not be usable by the host due to being set as immutable or still
 * being associated with a guest ASID.
 *
 * Bug the VM and leak the page if reclaim fails, or if the RMP entry can't be
 * converted back to shared, as the page is no longer usable due to RMP
 * protections, and it's infeasible for the guest to continue on.
 */
static int snp_page_reclaim(struct kvm *kvm, u64 pfn)
{
	struct sev_data_snp_page_reclaim data = {0};
	int err;

	data.paddr = __sme_set(pfn << PAGE_SHIFT);
	
	if (KVM_BUG_ON(sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err), kvm)) {
		snp_leak_pages(pfn, 1);
		return -EIO;
	}

	if (kvm_rmp_make_shared(kvm, pfn, PG_LEVEL_4K))
		return -EIO;

	return 0;
}
Michael Roth May 21, 2024, 9 p.m. UTC | #8
On Tue, May 21, 2024 at 09:58:36AM -0700, Sean Christopherson wrote:
> On Tue, May 21, 2024, Michael Roth wrote:
> > On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > > > But there is a possibility that the guest will attempt access the response
> > > > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > > > maybe the safer more repeatable approach is to handle the error directly
> > > > > > from KVM and propagate it to userspace.
> > > > > 
> > > > > I was thinking more along the lines of KVM marking the VM as dead/bugged.  
> > > > 
> > > > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > > > but we could additionally flag the guest as dead.
> > > 
> > > Honest question, does it make sense from KVM to make the VM unusable?  E.g. is
> > > it feasible for userspace to keep running the VM?  Does the page that's in a bad
> > > state present any danger to the host?
> > 
> > If the reclaim fails (which it shouldn't), then KVM has a unique situation
> > where a non-gmem guest page is in a state. In theory, if the guest/userspace
> > could somehow induce a reclaim failure, then can they potentially trick the
> > host into trying to access that same page as a shared page and induce a
> > host RMP #PF.
> > 
> > So it does seem like a good idea to force the guest to stop executing. Then
> > once the guest is fully destroyed the bad page will stay leaked so it
> > won't affect subsequent activities.
> > 
> > > 
> > > > Is there a existing mechanism for this?
> > > 
> > > kvm_vm_dead()
> > 
> > Nice, that would do the trick. I'll modify the logic to also call that
> > after a reclaim failure.
> 
> Hmm, assuming there's no scenario where snp_page_reclaim() is expected fail, and
> such a failure is always unrecoverable, e.g. has similar potential for inducing
> host RMP #PFs, then KVM_BUG_ON() is more appropriate.
> 
> Ah, and there are already WARNs in the lower level helpers.  Those WARNs should
> be KVM_BUG_ON(), because AFAICT there's no scenario where letting the VM live on
> is safe/sensible.  And unless I'm missing something, snp_page_reclaim() should
> do the private=>shared conversion, because the only reason to reclaim a page is
> to move it back to shared state.

Yes, and the code always follows up snp_page_reclaim() with
rmp_make_shared(), so it makes sense to combine the 2.

> 
> Lastly, I vote to rename host_rmp_make_shared() to kvm_rmp_make_shared() to make
> it more obvious that it's a KVM helper, whereas rmp_make_shared() is a generic
> kernel helper, i.e. _can't_ bug the VM because it doesn't (and shouldn't) have a
> pointer to the VM.

Makes sense.

> 
> E.g. end up with something like this:
> 
> /*
>  * Transition a page to hypervisor-owned/shared state in the RMP table. This
>  * should not fail under normal conditions, but leak the page should that
>  * happen since it will no longer be usable by the host due to RMP protections.
>  */
> static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level)
> {
> 	if (KVM_BUG_ON(rmp_make_shared(pfn, level), kvm)) {
> 		snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
> 		return -EIO;
> 	}
> 
> 	return 0;
> }
> 
> /*
>  * Certain page-states, such as Pre-Guest and Firmware pages (as documented
>  * in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
>  * directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
>  * unless they are reclaimed first.
>  *
>  * Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
>  * might not be usable by the host due to being set as immutable or still
>  * being associated with a guest ASID.
>  *
>  * Bug the VM and leak the page if reclaim fails, or if the RMP entry can't be
>  * converted back to shared, as the page is no longer usable due to RMP
>  * protections, and it's infeasible for the guest to continue on.
>  */
> static int snp_page_reclaim(struct kvm *kvm, u64 pfn)
> {
> 	struct sev_data_snp_page_reclaim data = {0};
> 	int err;
> 
> 	data.paddr = __sme_set(pfn << PAGE_SHIFT);
> 	
> 	if (KVM_BUG_ON(sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err), kvm)) {

I would probably opt to use KVM_BUG() and print the PFN and firmware
error code to help with diagnosing the failure, but I think the overall
approach seems reasonable and is a safer/cleaner way to handle this
situation.

-Mike

> 		snp_leak_pages(pfn, 1);
> 		return -EIO;
> 	}
> 
> 	if (kvm_rmp_make_shared(kvm, pfn, PG_LEVEL_4K))
> 		return -EIO;
> 
> 	return 0;
> }
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 41e383e30797..e57faf7d04d1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3933,11 +3933,16 @@  static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_reques
 		return -EINVAL;
 
 	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
-	if (is_error_noslot_pfn(resp_pfn))
+	if (is_error_noslot_pfn(resp_pfn)) {
+		kvm_release_pfn_clean(req_pfn);
 		return -EINVAL;
+	}
 
-	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
+		kvm_release_pfn_clean(req_pfn);
+		kvm_release_pfn_clean(resp_pfn);
 		return -EINVAL;
+	}
 
 	data->gctx_paddr = __psp_pa(sev->snp_context);
 	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
@@ -3948,11 +3953,16 @@  static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_reques
 
 static int snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
 {
-	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+	u64 req_pfn = __sme_clr(data->req_paddr) >> PAGE_SHIFT;
+	u64 resp_pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+	kvm_release_pfn_clean(req_pfn);
 
-	if (snp_page_reclaim(pfn) || rmp_make_shared(pfn, PG_LEVEL_4K))
+	if (snp_page_reclaim(resp_pfn) || rmp_make_shared(resp_pfn, PG_LEVEL_4K))
 		return -EINVAL;
 
+	kvm_release_pfn_dirty(resp_pfn);
+
 	return 0;
 }
 
@@ -3970,14 +3980,11 @@  static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
 		return ret;
 
 	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
-	if (ret)
-		return ret;
 
-	ret = snp_cleanup_guest_buf(&data);
-	if (ret)
-		return ret;
+	if (snp_cleanup_guest_buf(&data))
+		return -EINVAL;
 
-	return 0;
+	return ret;
 }
 
 static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)