diff mbox series

[Part2,RFC,v4,26/40] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

Message ID 20210707183616.5620-27-brijesh.singh@amd.com (mailing list archive)
State New, archived
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh July 7, 2021, 6:36 p.m. UTC
The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  22 +++
 arch/x86/kvm/svm/sev.c                        | 125 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  13 ++
 3 files changed, 160 insertions(+)

Comments

Sean Christopherson July 16, 2021, 8:18 p.m. UTC | #1
On Wed, Jul 07, 2021, Brijesh Singh wrote:
> +        struct kvm_sev_snp_launch_finish {
> +                __u64 id_block_uaddr;
> +                __u64 id_auth_uaddr;
> +                __u8 id_block_en;
> +                __u8 auth_key_en;
> +                __u8 host_data[32];

Pad this one too?

> +        };
> +
> +
> +See SEV-SNP specification for further details on launch finish input parameters.

...

> +	data->gctx_paddr = __psp_pa(sev->snp_context);
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);

Shouldn't KVM unwind everything it did if LAUNCH_FINISH fails?  And if that's
not possible, take steps to make the VM unusable?

> +
> +	kfree(id_auth);
> +
> +e_free_id_block:
> +	kfree(id_block);
> +
> +e_free:
> +	kfree(data);
> +
> +	return ret;
> +}
> +

...

> @@ -2346,8 +2454,25 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>  
>  	if (vcpu->arch.guest_state_protected)
>  		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
> +
> +	/*
> +	 * If its an SNP guest, then VMSA was added in the RMP entry as a guest owned page.
> +	 * Transition the page to hyperivosr state before releasing it back to the system.

"hyperivosr" typo.  And please wrap at 80 chars.

> +	 */
> +	if (sev_snp_guest(vcpu->kvm)) {
> +		struct rmpupdate e = {};
> +		int rc;
> +
> +		rc = rmpupdate(virt_to_page(svm->vmsa), &e);

So why does this not need to go through snp_page_reclaim()?

> +		if (rc) {
> +			pr_err("Failed to release SNP guest VMSA page (rc %d), leaking it\n", rc);

Seems like a WARN would be simpler.  But the more I see the rmpupdate(..., {0})
pattern, the more I believe that nuking an RMP entry needs a dedicated helper.

> +			goto skip_vmsa_free;
Brijesh Singh July 16, 2021, 10:48 p.m. UTC | #2
On 7/16/21 3:18 PM, Sean Christopherson wrote:
> On Wed, Jul 07, 2021, Brijesh Singh wrote:
>> +        struct kvm_sev_snp_launch_finish {
>> +                __u64 id_block_uaddr;
>> +                __u64 id_auth_uaddr;
>> +                __u8 id_block_en;
>> +                __u8 auth_key_en;
>> +                __u8 host_data[32];
> Pad this one too?

Noted.


>
>> +        };
>> +
>> +
>> +See SEV-SNP specification for further details on launch finish input parameters.
> ...
>
>> +	data->gctx_paddr = __psp_pa(sev->snp_context);
>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> Shouldn't KVM unwind everything it did if LAUNCH_FINISH fails?  And if that's
> not possible, take steps to make the VM unusable?

Well, I am not sure if VM need to unwind. If the command fail but VMM
decide to ignore the error then VMRUN will probably fail and user will
get the KVM shutdown event. The LAUNCH_FINISH command finalizes the VM
launch process, the firmware will probably not load the memory
encryption keys until it moves to the running state.


>> +
>> +	kfree(id_auth);
>> +
>> +e_free_id_block:
>> +	kfree(id_block);
>> +
>> +e_free:
>> +	kfree(data);
>> +
>> +	return ret;
>> +}
>> +
> ...
>
>> @@ -2346,8 +2454,25 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>>  
>>  	if (vcpu->arch.guest_state_protected)
>>  		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
>> +
>> +	/*
>> +	 * If its an SNP guest, then VMSA was added in the RMP entry as a guest owned page.
>> +	 * Transition the page to hyperivosr state before releasing it back to the system.
> "hyperivosr" typo.  And please wrap at 80 chars.

Noted.


>
>> +	 */
>> +	if (sev_snp_guest(vcpu->kvm)) {
>> +		struct rmpupdate e = {};
>> +		int rc;
>> +
>> +		rc = rmpupdate(virt_to_page(svm->vmsa), &e);
> So why does this not need to go through snp_page_reclaim()?

As I said in previous comments that by default all the memory is in the
hypervisor state. if the rmpupdate() failed that means nothing is
changed in the RMP and there is no need to reclaim. The reclaim is
required only if the pages are assigned in the RMP table.


>
>> +		if (rc) {
>> +			pr_err("Failed to release SNP guest VMSA page (rc %d), leaking it\n", rc);
> Seems like a WARN would be simpler.  But the more I see the rmpupdate(..., {0})
> pattern, the more I believe that nuking an RMP entry needs a dedicated helper.


Yes, let me try coming up with helper for it.


>
>> +			goto skip_vmsa_free;
Sean Christopherson July 19, 2021, 4:54 p.m. UTC | #3
On Fri, Jul 16, 2021, Brijesh Singh wrote:
> 
> On 7/16/21 3:18 PM, Sean Christopherson wrote:
> > On Wed, Jul 07, 2021, Brijesh Singh wrote:
> >> +	data->gctx_paddr = __psp_pa(sev->snp_context);
> >> +	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> > Shouldn't KVM unwind everything it did if LAUNCH_FINISH fails?  And if that's
> > not possible, take steps to make the VM unusable?
> 
> Well, I am not sure if VM need to unwind. If the command fail but VMM decide
> to ignore the error then VMRUN will probably fail and user will get the KVM
> shutdown event. The LAUNCH_FINISH command finalizes the VM launch process,
> the firmware will probably not load the memory encryption keys until it moves
> to the running state.

Within reason, KVM needs to provide consistent, deterministic behavior.  Yes, more
than likely failure at this point will be fatal to the VM, but that doesn't justify
leaving the VM in a random/bogus state.  In addition to being a poor ABI, it also
makes it more difficult to reason about what is/isn't possible in KVM.

> >> +	 */
> >> +	if (sev_snp_guest(vcpu->kvm)) {
> >> +		struct rmpupdate e = {};
> >> +		int rc;
> >> +
> >> +		rc = rmpupdate(virt_to_page(svm->vmsa), &e);
> > So why does this not need to go through snp_page_reclaim()?
> 
> As I said in previous comments that by default all the memory is in the
> hypervisor state. if the rmpupdate() failed that means nothing is changed in
> the RMP and there is no need to reclaim. The reclaim is required only if the
> pages are assigned in the RMP table.

I wasn't referring to RMPUPDATE failing here (or anywhere).  This is the vCPU free
path, which I think means the svm->vmsa page was successfully updated in the RMP
during LAUNCH_UPDATE.  snp_launch_update_vmsa() goes through snp_page_reclaim()
on LAUNCH_UPDATE failure, whereas this happy path does not.  Is there some other
transition during teardown that obviastes the need for reclaim?  If so, a comment
to explain that would be very helpful.
Brijesh Singh July 19, 2021, 6:29 p.m. UTC | #4
On 7/19/21 11:54 AM, Sean Christopherson wrote:
>> As I said in previous comments that by default all the memory is in the
>> hypervisor state. if the rmpupdate() failed that means nothing is changed in
>> the RMP and there is no need to reclaim. The reclaim is required only if the
>> pages are assigned in the RMP table.
> 
> I wasn't referring to RMPUPDATE failing here (or anywhere).  This is the vCPU free
> path, which I think means the svm->vmsa page was successfully updated in the RMP
> during LAUNCH_UPDATE.  snp_launch_update_vmsa() goes through snp_page_reclaim()
> on LAUNCH_UPDATE failure, whereas this happy path does not.  Is there some other
> transition during teardown that obviastes the need for reclaim?  If so, a comment
> to explain that would be very helpful.
> 

In this patch, the sev_free_vcpu() hunk takes care of reclaiming the 
vmsa pages before releasing it. I think it will make it more obvious 
after I add a helper so that we don't depend on user reading the comment 
block to see what its doing.

-Brijesh
Sean Christopherson July 19, 2021, 7:14 p.m. UTC | #5
On Mon, Jul 19, 2021, Brijesh Singh wrote:
> 
> 
> On 7/19/21 11:54 AM, Sean Christopherson wrote:
> > > As I said in previous comments that by default all the memory is in the
> > > hypervisor state. if the rmpupdate() failed that means nothing is changed in
> > > the RMP and there is no need to reclaim. The reclaim is required only if the
> > > pages are assigned in the RMP table.
> > 
> > I wasn't referring to RMPUPDATE failing here (or anywhere).  This is the vCPU free
> > path, which I think means the svm->vmsa page was successfully updated in the RMP
> > during LAUNCH_UPDATE.  snp_launch_update_vmsa() goes through snp_page_reclaim()
> > on LAUNCH_UPDATE failure, whereas this happy path does not.  Is there some other
> > transition during teardown that obviastes the need for reclaim?  If so, a comment
> > to explain that would be very helpful.
> > 
> 
> In this patch, the sev_free_vcpu() hunk takes care of reclaiming the vmsa
> pages before releasing it. I think it will make it more obvious after I add
> a helper so that we don't depend on user reading the comment block to see
> what its doing.

Where?  I feel like I'm missing something.  The only change to sev_free_vcpu() I
see is that addition of the rmpupdate(), I don't see any reclaim path.

@@ -2346,8 +2454,25 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

        if (vcpu->arch.guest_state_protected)
                sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+
+       /*
+        * If its an SNP guest, then VMSA was added in the RMP entry as a guest owned page.
+        * Transition the page to hyperivosr state before releasing it back to the system.
+        */
+       if (sev_snp_guest(vcpu->kvm)) {
+               struct rmpupdate e = {};
+               int rc;
+
+               rc = rmpupdate(virt_to_page(svm->vmsa), &e);
+               if (rc) {
+                       pr_err("Failed to release SNP guest VMSA page (rc %d), leaking it\n", rc);
+                       goto skip_vmsa_free;
+               }
+       }
+
        __free_page(virt_to_page(svm->vmsa));

+skip_vmsa_free:
        if (svm->ghcb_sa_free)
                kfree(svm->ghcb_sa);
 }
Brijesh Singh July 19, 2021, 7:49 p.m. UTC | #6
On 7/19/21 2:14 PM, Sean Christopherson wrote:

> 
> Where?  I feel like I'm missing something.  The only change to sev_free_vcpu() I
> see is that addition of the rmpupdate(), I don't see any reclaim path.

Clearing of the immutable bit (aka reclaim) is done by the firmware 
after the command was successful. See the section 8.14.2.1 of the 
SEV-SNP spec[1].

   The firmware encrypts the page with the VEK in place. The firmware
   sets the RMP.VMSA of the page to 1. The firmware sets the VMPL
   permissions for the page and transitions the page to Guest-Valid.

The Guest-Valid state means the immutable bit is cleared.  In this case,
the hypervisor just need to make the page shared and that's what the 
sev_free_vcpu() does to ensure that page is transitioned from the 
Guest-Valid to Hypervisor.

[1] https://www.amd.com/system/files/TechDocs/56860.pdf

thanks
Sean Christopherson July 19, 2021, 8:13 p.m. UTC | #7
On Mon, Jul 19, 2021, Brijesh Singh wrote:
> 
> On 7/19/21 2:14 PM, Sean Christopherson wrote:
> > 
> > Where?  I feel like I'm missing something.  The only change to sev_free_vcpu() I
> > see is that addition of the rmpupdate(), I don't see any reclaim path.
> 
> Clearing of the immutable bit (aka reclaim) is done by the firmware after
> the command was successful.

Ah, which is why the failure path has to do manual reclaim of the immutable page.
Thanks!
Marc Orr July 21, 2021, 5:53 p.m. UTC | #8
On Mon, Jul 19, 2021 at 9:54 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Jul 16, 2021, Brijesh Singh wrote:
> >
> > On 7/16/21 3:18 PM, Sean Christopherson wrote:
> > > On Wed, Jul 07, 2021, Brijesh Singh wrote:
> > >> +  data->gctx_paddr = __psp_pa(sev->snp_context);
> > >> +  ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> > > Shouldn't KVM unwind everything it did if LAUNCH_FINISH fails?  And if that's
> > > not possible, take steps to make the VM unusable?
> >
> > Well, I am not sure if VM need to unwind. If the command fail but VMM decide
> > to ignore the error then VMRUN will probably fail and user will get the KVM
> > shutdown event. The LAUNCH_FINISH command finalizes the VM launch process,
> > the firmware will probably not load the memory encryption keys until it moves
> > to the running state.
>
> Within reason, KVM needs to provide consistent, deterministic behavior.  Yes, more
> than likely failure at this point will be fatal to the VM, but that doesn't justify
> leaving the VM in a random/bogus state.  In addition to being a poor ABI, it also
> makes it more difficult to reason about what is/isn't possible in KVM.

+1 to Sean's feedback to unwind everything here properly here.
Comments of the nature of "XYZ should happen" -- without a test (e.g.,
selftest or kvm-unit-test) to ensure the XYZ _does_ happen -- are a
time bomb waiting to happen.

Also, I wonder if we leave pages, RMPUPDATE'd to immutable in previous
loop iterations, is it possible for them to remain as immutable and be
reused later on (after this guest is destroyed)? And if this happens,
will we get an RMP violation? Even if the answer is no -- go read this
code 2,000 lines away -- it handles this case. That's still not a very
satisfying answer. I'd rather see things cleaned up ASAP as soon as
the code starts to go off the rails.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 60ace54438c3..a3d863e88869 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -495,6 +495,28 @@  Returns: 0 on success, -negative on error
 See the SEV-SNP spec for further details on how to build the VMPL permission
 mask and page type.
 
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 host_data[32];
+        };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
 
 References
 ==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4468995dd209..3f8824c9a5dc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1763,6 +1763,111 @@  static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	int i, ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	for (i = 0; i < kvm->created_vcpus; i++) {
+		struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+		struct rmpupdate e = {};
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		e.assigned = 1;
+		e.immutable = 1;
+		e.asid = sev->asid;
+		e.gpa = -1;
+		e.pagesize = RMP_PG_SIZE_4K;
+		ret = rmpupdate(virt_to_page(svm->vmsa), &e);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(virt_to_page(svm->vmsa), RMP_PG_SIZE_4K);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	struct kvm_sev_snp_launch_finish params;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+	}
+
+	if (params.auth_key_en) {
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->auth_key_en = 1;
+		data->id_auth_paddr = __sme_pa(id_auth);
+	}
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1858,6 +1963,9 @@  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2346,8 +2454,25 @@  void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as a guest owned page.
+	 * Transition the page to hyperivosr state before releasing it back to the system.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		struct rmpupdate e = {};
+		int rc;
+
+		rc = rmpupdate(virt_to_page(svm->vmsa), &e);
+		if (rc) {
+			pr_err("Failed to release SNP guest VMSA page (rc %d), leaking it\n", rc);
+			goto skip_vmsa_free;
+		}
+	}
+
 	__free_page(virt_to_page(svm->vmsa));
 
+skip_vmsa_free:
 	if (svm->ghcb_sa_free)
 		kfree(svm->ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c9b453fb31d4..fb3f6e1defd9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1682,6 +1682,7 @@  enum sev_cmd_id {
 	KVM_SEV_SNP_INIT = 256,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1808,6 +1809,18 @@  struct kvm_sev_snp_launch_update {
 	__u8 vmpl1_perms;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)