diff mbox series

[Part2,v5,44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event

Message ID 20210820155918.7518-45-brijesh.singh@amd.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh Aug. 20, 2021, 3:59 p.m. UTC
From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with the
  specified APIC ID. The GPA is saved in the svm struct of the target
  vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
  vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with the
  specified APIC ID the next time an INIT is performed. The GPA is saved
  in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared from
  the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to vCPU and then the vCPU is kicked.


The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
a result of the event or as a result of an INIT. The handler sets the vCPU
to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
vCPU as not runnable. Any previous VMSA pages that were installed as
part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
to be installed, the VMSA guest page is pinned and set as the VMSA in the
vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   3 +
 arch/x86/include/asm/svm.h         |   7 +-
 arch/x86/kvm/svm/sev.c             | 211 +++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             |   6 +-
 arch/x86/kvm/svm/svm.h             |   9 ++
 arch/x86/kvm/x86.c                 |  13 +-
 7 files changed, 247 insertions(+), 3 deletions(-)

Comments

Sean Christopherson Oct. 15, 2021, 7:50 p.m. UTC | #1
On Fri, Aug 20, 2021, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
> 
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
> 
> For CREATE
>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>   specified APIC ID. The GPA is saved in the svm struct of the target
>   vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
>   vCPU and then the vCPU is kicked.
> 
> For CREATE_ON_INIT:
>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>   specified APIC ID the next time an INIT is performed. The GPA is saved
>   in the svm struct of the target vCPU.
> 
> For DESTROY:
>   The guest indicates it wishes to stop the vCPU. The GPA is cleared from
>   the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>   to vCPU and then the vCPU is kicked.
> 
> 
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
> a result of the event or as a result of an INIT. The handler sets the vCPU
> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
> vCPU as not runnable. Any previous VMSA pages that were installed as
> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
> to be installed, the VMSA guest page is pinned and set as the VMSA in the
> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
> run.

LOL, this part of the GHCB is debatable, though I guess it does say "may"...

  Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
  vCPU state of an AP, which may allow for a simpler and more secure method of
                                             ^^^^^^^
  booting an AP.

> +	if (VALID_PAGE(svm->snp_vmsa_pfn)) {

KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
incorrect for KVM to ever go back to the original VMSA.

> +		/*
> +		 * The snp_vmsa_pfn fields holds the hypervisor physical address
> +		 * of the about to be replaced VMSA which will no longer be used
> +		 * or referenced, so un-pin it.
> +		 */
> +		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
> +		svm->snp_vmsa_pfn = INVALID_PAGE;
> +	}
> +
> +	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
> +		/*
> +		 * The VMSA is referenced by the hypervisor physical address,
> +		 * so retrieve the PFN and pin it.
> +		 */
> +		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));

Oh yay, a gfn.  That means that the page is subject to memslot movement.  I don't
think the code will break per se, but it's a wrinkle that's not handled.

I'm also pretty sure the page will effectively be leaked, I don't see a

	kvm_release_pfn_dirty(svm->snp_vmsa_pfn);

in vCPU teardown.

Furthermore, letting the guest specify the page would open up to exploits of the
erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
page, is 2mb aligned.  That also means the _guest_ needs to be somehow be aware
of the erratum.

And digging through the guest patches, this gives the guest _full_ control over
the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
fuzz VMRUN ucode by stuffing garbage into the VMSA.

Honestly, why should KVM even support guest-provided VMSAs?  It's far, far simpler
to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
protocol being added for TDX.  That would allow improving the security for SEV-ES
as well, though I'm guessing no one actually cares about that in practice.

IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
and guest, i.e. attacks via the VMSA are out-of-scope.  That is very much not the
case here.
Brijesh Singh Oct. 20, 2021, 9:48 p.m. UTC | #2
On 10/15/21 2:50 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
>> guests to alter the register state of the APs on their own. This allows
>> the guest a way of simulating INIT-SIPI.
>>
>> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
>> so as to avoid updating the VMSA pointer while the vCPU is running.
>>
>> For CREATE
>>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>>   specified APIC ID. The GPA is saved in the svm struct of the target
>>   vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
>>   vCPU and then the vCPU is kicked.
>>
>> For CREATE_ON_INIT:
>>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>>   specified APIC ID the next time an INIT is performed. The GPA is saved
>>   in the svm struct of the target vCPU.
>>
>> For DESTROY:
>>   The guest indicates it wishes to stop the vCPU. The GPA is cleared from
>>   the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>>   to vCPU and then the vCPU is kicked.
>>
>>
>> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
>> a result of the event or as a result of an INIT. The handler sets the vCPU
>> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
>> vCPU as not runnable. Any previous VMSA pages that were installed as
>> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
>> to be installed, the VMSA guest page is pinned and set as the VMSA in the
>> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
>> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
>> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
>> run.
> LOL, this part of the GHCB is debatable, though I guess it does say "may"...
>
>   Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
>   vCPU state of an AP, which may allow for a simpler and more secure method of
>                                              ^^^^^^^
>   booting an AP.
>
>> +	if (VALID_PAGE(svm->snp_vmsa_pfn)) {
> KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
> incorrect for KVM to ever go back to the original VMSA.
>
>> +		/*
>> +		 * The snp_vmsa_pfn fields holds the hypervisor physical address
>> +		 * of the about to be replaced VMSA which will no longer be used
>> +		 * or referenced, so un-pin it.
>> +		 */
>> +		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>> +		svm->snp_vmsa_pfn = INVALID_PAGE;
>> +	}
>> +
>> +	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
>> +		/*
>> +		 * The VMSA is referenced by the hypervisor physical address,
>> +		 * so retrieve the PFN and pin it.
>> +		 */
>> +		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));
> Oh yay, a gfn.  That means that the page is subject to memslot movement.  I don't
> think the code will break per se, but it's a wrinkle that's not handled.
>
> I'm also pretty sure the page will effectively be leaked, I don't see a
>
> 	kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>
> in vCPU teardown.
>
> Furthermore, letting the guest specify the page would open up to exploits of the
> erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
> page, is 2mb aligned.  That also means the _guest_ needs to be somehow be aware
> of the erratum.

Good point Sean, a guest could exploit the IN_USE erratum in this case.
We need to somehow communicate this to guest so that it does not
allocate the VMSA at 2MB boundary. It would be nice if GHCB spec can add
a requirement that VMSA should not be a 2MB aligned. I will see what we
can do to address this.


> And digging through the guest patches, this gives the guest _full_ control over
> the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
> fuzz VMRUN ucode by stuffing garbage into the VMSA.

If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
doing all kind of sanity checks to ensure that VMSA does not contain
invalid value before the run.


> Honestly, why should KVM even support guest-provided VMSAs?  It's far, far simpler
> to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
> protocol being added for TDX.  That would allow improving the security for SEV-ES
> as well, though I'm guessing no one actually cares about that in practice.
> IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
> and guest, i.e. attacks via the VMSA are out-of-scope.  That is very much not the
> case here.
Sean Christopherson Oct. 20, 2021, 11:01 p.m. UTC | #3
On Wed, Oct 20, 2021, Brijesh Singh wrote:
> 
> On 10/15/21 2:50 PM, Sean Christopherson wrote:
> > And digging through the guest patches, this gives the guest _full_ control over
> > the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
> > fuzz VMRUN ucode by stuffing garbage into the VMSA.
> 
> If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
> doing all kind of sanity checks to ensure that VMSA does not contain
> invalid value before the run.

Oh, I'm well aware of the number of sanity checks that are in VM-Enter ucode, and
that's precisely why I'm of the opinion that letting the guest fuzz VMRUN is a
non-trivial security risk for the host.  I know of at least at least two VMX bugs
(one erratum that I could find, one that must have been fixed with a ucode patch?)
where ucode failed to detect invalid state.  Those were "benign" in that they
caused a missed VM-Fail but didn't corrupt CPU state, but it's not a stretch to
imagine a ucode bug that leads to corruption of CPU state and a system crash.

The sheer number of checks involved, combined with the fact that there likely
hasn't been much fuzzing of VM-Enter outside of the hardware vendor's own
validation, means I'm not exactly brimming with confidence that VMRUN's ucode
is perfect.

I fully acknowledge that the host kernel obviously "trusts" CPU ucode to a great
extent.  My point here is that the design exposes the host to unnecessary risk.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c09bd40e0160..01f31957bd7d 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -126,6 +126,7 @@  KVM_X86_OP(alloc_apic_backing_page)
 KVM_X86_OP_NULL(rmp_page_level_adjust)
 KVM_X86_OP(post_map_gfn)
 KVM_X86_OP(post_unmap_gfn)
+KVM_X86_OP(update_protected_guest_state)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8773c1f9e45e..11ce66fe1656 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -91,6 +91,7 @@ 
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(31)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1468,6 +1469,8 @@  struct kvm_x86_ops {
 
 	int (*post_map_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
 	void (*post_unmap_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
+
+	int (*update_protected_guest_state)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index a39e31845a33..cf7c88a0d60a 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -218,7 +218,12 @@  struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
-#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE			BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION	BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION	BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 05f795c30816..151747ec0809 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -649,6 +649,7 @@  static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 	struct sev_es_save_area *save = svm->vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -693,6 +694,12 @@  static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_snp_guest(svm->vcpu.kvm))
 		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
 
+	/*
+	 * Save the VMSA synced SEV features. For now, they are the same for
+	 * all vCPUs, so just save each time.
+	 */
+	sev->sev_features = save->sev_features;
+
 	return 0;
 }
 
@@ -2760,6 +2767,10 @@  static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 		if (!ghcb_sw_scratch_is_valid(ghcb))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (!ghcb_rax_is_valid(ghcb))
+			goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3332,6 +3343,191 @@  static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
 	svm_set_ghcb_sw_exit_info_2(vcpu, rc);
 }
 
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	kvm_pfn_t pfn;
+
+	WARN_ON(!mutex_is_locked(&svm->snp_vmsa_mutex));
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
+
+	/* Clear use of the VMSA in the sev_es_init_vmcb() path */
+	svm->vmsa_pa = INVALID_PAGE;
+
+	/* Clear use of the VMSA from the VMCB */
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	if (VALID_PAGE(svm->snp_vmsa_pfn)) {
+		/*
+		 * The snp_vmsa_pfn fields holds the hypervisor physical address
+		 * of the about to be replaced VMSA which will no longer be used
+		 * or referenced, so un-pin it.
+		 */
+		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
+		svm->snp_vmsa_pfn = INVALID_PAGE;
+	}
+
+	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
+		/*
+		 * The VMSA is referenced by the hypervisor physical address,
+		 * so retrieve the PFN and pin it.
+		 */
+		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));
+		if (is_error_pfn(pfn))
+			return -EINVAL;
+
+		svm->snp_vmsa_pfn = pfn;
+
+		/* Use the new VMSA in the sev_es_init_vmcb() path */
+		svm->vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+	}
+
+	return 0;
+}
+
+/*
+ * Invoked as part of vcpu_enter_guest() event processing.
+ * Expected return values are:
+ *   0 - exit to userspace
+ *   1 - continue vcpu_run() execution loop
+ */
+int sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	mutex_lock(&svm->snp_vmsa_mutex);
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update failed\n");
+
+	mutex_unlock(&svm->snp_vmsa_mutex);
+
+	return ret ? 0 : 1;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->snp_vmsa_mutex);
+
+	if (!svm->snp_vmsa_update_on_init)
+		goto unlock;
+
+	svm->snp_vmsa_update_on_init = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * We have a valid target vCPU, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->snp_vmsa_mutex);
+
+	target_svm->snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->snp_vmsa_update_on_init = false;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->sev_features;
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		target_svm->snp_vmsa_update_on_init = true;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	mutex_unlock(&target_svm->snp_vmsa_mutex);
+
+	if (kick) {
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3589,6 +3785,18 @@  int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+			svm_set_ghcb_sw_exit_info_2(vcpu,
+						    X86_TRAP_GP |
+						    SVM_EVTINJ_TYPE_EXEPT |
+						    SVM_EVTINJ_VALID);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3663,6 +3871,9 @@  void sev_es_create_vcpu(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->snp_vmsa_mutex);
+	svm->snp_vmsa_pfn = INVALID_PAGE;
 }
 
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index be820eb999fb..29e7666a710b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1336,7 +1336,9 @@  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
-	if (!init_event) {
+	if (init_event) {
+		sev_snp_init_protected_guest_state(vcpu);
+	} else {
 		vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 				       MSR_IA32_APICBASE_ENABLE;
 		if (kvm_vcpu_is_reset_bsp(vcpu))
@@ -4697,6 +4699,8 @@  static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.post_map_gfn = sev_post_map_gfn,
 	.post_unmap_gfn = sev_post_unmap_gfn,
+
+	.update_protected_guest_state = sev_snp_update_protected_guest_state,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9bf6404142dd..59044b3a7c7a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -94,6 +94,8 @@  struct kvm_sev_info {
 	struct srcu_struct psc_srcu;
 	void *snp_certs_data;
 	struct mutex guest_req_lock;
+
+	u64 sev_features;	/* Features set at VMSA creation */
 };
 
 struct kvm_svm {
@@ -221,6 +223,11 @@  struct vcpu_svm {
 	u64 ghcb_sw_exit_info_2;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex;
+	gpa_t snp_vmsa_gpa;
+	kvm_pfn_t snp_vmsa_pfn;
+	bool snp_vmsa_update_on_init;	/* SEV-SNP AP Creation on INIT-SIPI */
 };
 
 struct svm_cpu_data {
@@ -630,6 +637,8 @@  void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
 int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
 void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf4389ffc88f..dbb8362cc576 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9576,6 +9576,16 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			r = static_call(kvm_x86_update_protected_guest_state)(vcpu);
+			if (!r) {
+				vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+				goto out;
+			} else if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -11656,7 +11666,8 @@  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (!list_empty_careful(&vcpu->async_pf.done))
 		return true;
 
-	if (kvm_apic_has_events(vcpu))
+	if (kvm_apic_has_events(vcpu) ||
+	    kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
 		return true;
 
 	if (vcpu->arch.pv.pv_unhalted)