diff mbox series

[1/3,V7] KVM, SEV: Add support for SEV intra host migration

Message ID 20210902181751.252227-2-pgonda@google.com (mailing list archive)
State New, archived
Headers show
Series Add AMD SEV and SEV-ES intra host migration support | expand

Commit Message

Peter Gonda Sept. 2, 2021, 6:17 p.m. UTC
For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

The target is expected to be initialized (sev_guest_init), but not
launched state (sev_launch_start) when performing receive. Once the
target has received, it will be in a launched state and will not
need to perform the typical SEV launch commands.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/virt/kvm/api.rst  |  15 +++++
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm/sev.c          | 101 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/svm/svm.h          |   2 +
 arch/x86/kvm/x86.c              |   5 ++
 include/uapi/linux/kvm.h        |   1 +
 7 files changed, 126 insertions(+)

Comments

Sean Christopherson Sept. 10, 2021, 12:11 a.m. UTC | #1
Nit, preferred shortlog scope is "KVM: SEV:"

On Thu, Sep 02, 2021, Peter Gonda wrote:
> For SEV to work with intra host migration, contents of the SEV info struct
> such as the ASID (used to index the encryption key in the AMD SP) and
> the list of memory regions need to be transferred to the target VM.
> This change adds a commands for a target VMM to get a source SEV VM's sev
> info.
> 
> The target is expected to be initialized (sev_guest_init), but not
> launched state (sev_launch_start) when performing receive. Once the
> target has received, it will be in a launched state and will not
> need to perform the typical SEV launch commands.
> 
> Signed-off-by: Peter Gonda <pgonda@google.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Marc Orr <marcorr@google.com>
> Cc: Marc Orr <marcorr@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  Documentation/virt/kvm/api.rst  |  15 +++++
>  arch/x86/include/asm/kvm_host.h |   1 +
>  arch/x86/kvm/svm/sev.c          | 101 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c          |   1 +
>  arch/x86/kvm/svm/svm.h          |   2 +
>  arch/x86/kvm/x86.c              |   5 ++
>  include/uapi/linux/kvm.h        |   1 +
>  7 files changed, 126 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4ea1bb28297b..e8cecc024649 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6702,6 +6702,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
>  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>  perform a bulk copy of tags to/from the guest.
>  
> +7.29 KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM
> +-------------------------------------

Do we really want to bury this under KVM_CAP?  Even KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
is a bit of a stretch, but at least that's a one-way "enabling", whereas this
migration routine should be able to handle multiple migrations, e.g. migrate A->B
and B->A.  Peeking at your selftest, it should be fairly easy to add in this edge
case.

This is probably a Paolo question, I've no idea if there's a desire to expand
KVM_CAP versus adding a new ioctl().

> +Architectures: x86 SEV enabled
> +Type: vm
> +Parameters: args[0] is the fd of the source vm
> +Returns: 0 on success

It'd be helpful to provide a brief description of the error cases.  Looks like
-EINVAL is the only possible error?

> +This capability enables userspace to migrate the encryption context

I would prefer to scope this beyond "encryption context".  Even for SEV, it
copies more than just the "context", which was an abstraction of SEV's ASID,
e.g. this also hands off the set of encrypted memory regions.  Looking toward
the future, if TDX wants to support this it's going to need to hand over a ton
of stuff, e.g. S-EPT tables.

Not sure on a name, maybe MIGRATE_PROTECTED_VM_FROM?

> from the vm

Capitalize VM in the description, if only to be consistent within these two
paragraphs.  If it helps, oretend all the terrible examples in this file don't
exist ;-)

> +indicated by the fd to the vm this is called on.
> +
> +This is intended to support intra-host migration of VMs between userspace VMMs.
> +in-guest workloads scheduled by the host. This allows for upgrading the VMMg

This snippet (and the lowercase "vm") looks like it was left behind after a
copy-paste from KVM_CAP_VM_COPY_ENC_CONTEXT_FROM.

> +process without interrupting the guest.
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09b256db394a..f06d87a85654 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1456,6 +1456,7 @@ struct kvm_x86_ops {
>  	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>  	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>  	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> +	int (*vm_migrate_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
>  
>  	int (*get_msr_feature)(struct kvm_msr_entry *entry);
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 46eb1ba62d3d..8db666a362d4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1501,6 +1501,107 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
>  }
>  
> +static int svm_sev_lock_for_migration(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	/*
> +	 * Bail if this VM is already involved in a migration to avoid deadlock
> +	 * between two VMs trying to migrate to/from each other.
> +	 */
> +	if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
> +		return -EBUSY;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	return 0;
> +}
> +
> +static void svm_unlock_after_migration(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	mutex_unlock(&kvm->lock);
> +	atomic_set_release(&sev->migration_in_progress, 0);
> +}
> +
> +static void migrate_info_from(struct kvm_sev_info *dst,
> +			      struct kvm_sev_info *src)
> +{
> +	sev_asid_free(dst);

Ooh, this brings up a potential shortcoming of requiring @dst to be SEV-enabled.
If every SEV{-ES} ASID is allocated, then there won't be an available ASID to
(temporarily) allocate for the intra-host migration.  But that temporary ASID
isn't actually necessary, i.e. there's no reason intra-host migration should fail
if all ASIDs are in-use.

I don't see any harm in requiring the @dst to _not_ be SEV-enabled.  sev_info
is not dynamically allocated, i.e. migration_in_progress is accessible either
way.  That would also simplify some of the checks, e.g. the regions_list check
goes away because svm_register_enc_region() fails on non-SEV guests.

I believe this will also fix multiple bugs in the next patch (SEV-ES support).

Bug #1, SEV-ES support changes the checks to:

	if (!sev_guest(kvm)) {
		ret = -EINVAL;
		pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
		goto out_unlock;
	}

	...

	if (!sev_guest(source_kvm)) {
		ret = -EINVAL;
		pr_warn_ratelimited(
			"Source VM must be SEV enabled to migrate from.\n");
		goto out_source;
	}

	if (sev_es_guest(kvm)) {
		ret = migrate_vmsa_from(kvm, source_kvm);
		if (ret)
			goto out_source;
	}

and fails to handle the scenario where dst.SEV_ES != src.SEV_ES.  If @dst is
SEV_ES-enabled and @src has created vCPUs, migrate_vmsa_from() will still fail
due to guest_state_protected being false, but the reverse won't hold true and
KVM will "successfully" migrate an SEV-ES guest to an SEV guest.  I'm guessing
fireworks will ensue, e.g. due to running with the wrong ASID.

Bug #2, migrate_vmsa_from() leaks dst->vmsa, as this

		dst_svm->vmsa = src_svm->vmsa;
		src_svm->vmsa = NULL;

overwrites dst_svm->vmsa that was allocated by svm_create_vcpu().

AFAICT, there isn't anything that will break by forcing @dst to be !SEV (except
stuff that's already broken, see below).  For SEV{-ES} specific stuff, anything
that is allocated/set vCPU creation likely needs to be migrated, e.g. VMSA and
the GHCB MSR value.  The only missing action is kvm_free_guest_fpu().

Side topic, the VMSA really should be allocated in sev_es_create_vcpu(), and
guest_fpu should never be allocated for SEV-ES guests (though that doesn't change
the need for kvm_free_guest_fpu() in this case).  I'll send patches for that.

> +	dst->asid = src->asid;
> +	dst->misc_cg = src->misc_cg;
> +	dst->handle = src->handle;
> +	dst->pages_locked = src->pages_locked;
> +
> +	src->asid = 0;
> +	src->active = false;
> +	src->handle = 0;
> +	src->pages_locked = 0;
> +	src->misc_cg = NULL;
> +
> +	INIT_LIST_HEAD(&dst->regions_list);
> +	list_replace_init(&src->regions_list, &dst->regions_list);
> +}
> +
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> +{
> +	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> +	struct file *source_kvm_file;
> +	struct kvm *source_kvm;
> +	int ret;
> +
> +	ret = svm_sev_lock_for_migration(kvm);
> +	if (ret)
> +		return ret;
> +
> +	if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> +		ret = -EINVAL;
> +		pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");

Linux generally doesn't log user errors to dmesg.  They can be helpful during
development, but aren't actionable and thus are of limited use in production.

> +		goto out_unlock;
> +	}

Hmm, I was going to say that migration should be rejected if @dst has created
vCPUs, but the SEV-ES support migrates VMSA state and so must run after vCPUs
are created.  Holding kvm->lock does not prevent invoking per-vCPU ioctls(),
including KVM_RUN.  Modifying vCPU SEV{-ES} state while a vCPU is actively running
is bound to cause explosions.

One option for this patch would be to check kvm->created_vcpus and then add
different logic for SEV-ES, but that's probably not desirable for userspace as
it will mean triggering intra-host migration at different points for SEV vs. SEV-ES.

So I think the only option is to take vcpu->mutex for all vCPUs in both @src and
@dst.  Adding that after acquiring kvm->lock in svm_sev_lock_for_migration()
should Just Work.  Unless userspace is misbehaving, the lock won't be contended
since all vCPUs need to be quiesced, though it's probably worth using the
mutex_lock_killable() variant just to be safe.

> +	if (!list_empty(&dst_sev->regions_list)) {
> +		ret = -EINVAL;
> +		pr_warn_ratelimited(
> +			"VM must not have encrypted regions to migrate to.\n");
> +		goto out_unlock;
> +	}
> +
> +	source_kvm_file = fget(source_fd);
> +	if (!file_is_kvm(source_kvm_file)) {
> +		ret = -EBADF;
> +		pr_warn_ratelimited(
> +				"Source VM must be SEV enabled to migrate from.\n");

Case in point for not logging errors, this is arguably inaccurate as the source
"VM" isn't a VM.

> +		goto out_fput;
> +	}
> +
> +	source_kvm = source_kvm_file->private_data;
> +	ret = svm_sev_lock_for_migration(source_kvm);
> +	if (ret)
> +		goto out_fput;
> +
> +	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +		ret = -EINVAL;
> +		pr_warn_ratelimited(
> +			"Source VM must be SEV enabled to migrate from.\n");
> +		goto out_source;
> +	}
> +
> +	migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> +	ret = 0;
> +
> +out_source:
> +	svm_unlock_after_migration(source_kvm);
> +out_fput:
> +	if (source_kvm_file)
> +		fput(source_kvm_file);
> +out_unlock:
> +	svm_unlock_after_migration(kvm);
> +	return ret;
> +}
Sean Christopherson Sept. 10, 2021, 1:12 a.m. UTC | #2
On Fri, Sep 10, 2021, Sean Christopherson wrote:
> Ooh, this brings up a potential shortcoming of requiring @dst to be SEV-enabled.
> If every SEV{-ES} ASID is allocated, then there won't be an available ASID to
> (temporarily) allocate for the intra-host migration.  But that temporary ASID
> isn't actually necessary, i.e. there's no reason intra-host migration should fail
> if all ASIDs are in-use.

...

> So I think the only option is to take vcpu->mutex for all vCPUs in both @src and
> @dst.  Adding that after acquiring kvm->lock in svm_sev_lock_for_migration()
> should Just Work.  Unless userspace is misbehaving, the lock won't be contended
> since all vCPUs need to be quiesced, though it's probably worth using the
> mutex_lock_killable() variant just to be safe.

Circling back to this after looking at the SEV-ES support, I think the vCPUs in
the source VM need to be reset via kvm_vcpu_reset(vcpu, false).  I doubt there's
a use case for actually doing anything with the vCPU, but leaving it runnable
without purging state makes me nervous.

Alternative #1 would be to mark vCPUs as dead in some way so as to prevent doing
anything useful with the vCPU.

Alternative #2 would be to "kill" the source VM by setting kvm->vm_bugged to
prevent all ioctls().

The downside to preventing future ioctls() is that this would need to be the
very last step of migration.  Not sure if that's problematic?
Marc Orr Sept. 10, 2021, 1:15 a.m. UTC | #3
> > +     dst->asid = src->asid;
> > +     dst->misc_cg = src->misc_cg;
> > +     dst->handle = src->handle;
> > +     dst->pages_locked = src->pages_locked;
> > +
> > +     src->asid = 0;
> > +     src->active = false;
> > +     src->handle = 0;
> > +     src->pages_locked = 0;
> > +     src->misc_cg = NULL;
> > +
> > +     INIT_LIST_HEAD(&dst->regions_list);
> > +     list_replace_init(&src->regions_list, &dst->regions_list);
> > +}
> > +
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > +{
> > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct file *source_kvm_file;
> > +     struct kvm *source_kvm;
> > +     int ret;
> > +
> > +     ret = svm_sev_lock_for_migration(kvm);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > +             ret = -EINVAL;
> > +             pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
>
> Linux generally doesn't log user errors to dmesg.  They can be helpful during
> development, but aren't actionable and thus are of limited use in production.

Ha. I had suggested adding the logs when I reviewed these patches
(maybe before Peter posted them publicly). My rationale is that if I'm
looking at a crash in production, and all I have is a stack trace and
the error code, then I can narrow the failure down to this function,
but once the function starts returning the same error code in multiple
places now it's non-trivial for me to deduce exactly which condition
caused the crash. Having these logs makes it trivial. However, if this
is not the preferred Linux style then so be it.
Sean Christopherson Sept. 10, 2021, 1:40 a.m. UTC | #4
On Thu, Sep 09, 2021, Marc Orr wrote:
> > > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > > +{
> > > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > > +     struct file *source_kvm_file;
> > > +     struct kvm *source_kvm;
> > > +     int ret;
> > > +
> > > +     ret = svm_sev_lock_for_migration(kvm);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > > +             ret = -EINVAL;
> > > +             pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
> >
> > Linux generally doesn't log user errors to dmesg.  They can be helpful during
> > development, but aren't actionable and thus are of limited use in production.
> 
> Ha. I had suggested adding the logs when I reviewed these patches
> (maybe before Peter posted them publicly). My rationale is that if I'm
> looking at a crash in production, and all I have is a stack trace and
> the error code, then I can narrow the failure down to this function,
> but once the function starts returning the same error code in multiple
> places now it's non-trivial for me to deduce exactly which condition
> caused the crash. Having these logs makes it trivial. However, if this
> is not the preferred Linux style then so be it.

I don't necessarily disagree, but none of these errors conditions should so much
as sniff production.  E.g. if userspace invokes this on a !KVM fd or on a non-SEV
source, or before guest_state_protected=true, then userspace has bigger problems.
Ditto if the dest isn't actual KVM VM or doesn't meet whatever SEV-enabled/disabled
criteria we end up with.

The mismatch in online_vcpus is the only one where I could reasonablly see a bug
escaping to production, e.g. due to an orchestration layer mixup.

For all of these conditions, userspace _must_ be aware of the conditions for success,
and except for guest_state_protected=true, userspace has access to what state it
sent into KVM, e.g. it shouldn't be difficult for userspace dump the relevant bits
from the src and dst without any help from the kernel.

If userspace really needs kernel help to differentiate what's up, I'd rather use
more unique errors for online_cpus and guest_state_protected, e.g. -E2BIG isn't
too big of a strecth for the online_cpus mismatch.
Marc Orr Sept. 10, 2021, 3:41 a.m. UTC | #5
On Thu, Sep 9, 2021 at 6:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Sep 09, 2021, Marc Orr wrote:
> > > > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > > > +{
> > > > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > > > +     struct file *source_kvm_file;
> > > > +     struct kvm *source_kvm;
> > > > +     int ret;
> > > > +
> > > > +     ret = svm_sev_lock_for_migration(kvm);
> > > > +     if (ret)
> > > > +             return ret;
> > > > +
> > > > +     if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > > > +             ret = -EINVAL;
> > > > +             pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
> > >
> > > Linux generally doesn't log user errors to dmesg.  They can be helpful during
> > > development, but aren't actionable and thus are of limited use in production.
> >
> > Ha. I had suggested adding the logs when I reviewed these patches
> > (maybe before Peter posted them publicly). My rationale is that if I'm
> > looking at a crash in production, and all I have is a stack trace and
> > the error code, then I can narrow the failure down to this function,
> > but once the function starts returning the same error code in multiple
> > places now it's non-trivial for me to deduce exactly which condition
> > caused the crash. Having these logs makes it trivial. However, if this
> > is not the preferred Linux style then so be it.
>
> I don't necessarily disagree, but none of these errors conditions should so much
> as sniff production.  E.g. if userspace invokes this on a !KVM fd or on a non-SEV
> source, or before guest_state_protected=true, then userspace has bigger problems.
> Ditto if the dest isn't actual KVM VM or doesn't meet whatever SEV-enabled/disabled
> criteria we end up with.
>
> The mismatch in online_vcpus is the only one where I could reasonablly see a bug
> escaping to production, e.g. due to an orchestration layer mixup.
>
> For all of these conditions, userspace _must_ be aware of the conditions for success,
> and except for guest_state_protected=true, userspace has access to what state it
> sent into KVM, e.g. it shouldn't be difficult for userspace dump the relevant bits
> from the src and dst without any help from the kernel.
>
> If userspace really needs kernel help to differentiate what's up, I'd rather use
> more unique errors for online_cpus and guest_state_protected, e.g. -E2BIG isn't
> too big of a strecth for the online_cpus mismatch.

SGTM, thanks.
Peter Gonda Sept. 10, 2021, 9:54 p.m. UTC | #6
On Thu, Sep 9, 2021 at 6:11 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Nit, preferred shortlog scope is "KVM: SEV:"
>
> On Thu, Sep 02, 2021, Peter Gonda wrote:
> > For SEV to work with intra host migration, contents of the SEV info struct
> > such as the ASID (used to index the encryption key in the AMD SP) and
> > the list of memory regions need to be transferred to the target VM.
> > This change adds a commands for a target VMM to get a source SEV VM's sev
> > info.
> >
> > The target is expected to be initialized (sev_guest_init), but not
> > launched state (sev_launch_start) when performing receive. Once the
> > target has received, it will be in a launched state and will not
> > need to perform the typical SEV launch commands.
> >
> > Signed-off-by: Peter Gonda <pgonda@google.com>
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Reviewed-by: Marc Orr <marcorr@google.com>
> > Cc: Marc Orr <marcorr@google.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: Brijesh Singh <brijesh.singh@amd.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Cc: Wanpeng Li <wanpengli@tencent.com>
> > Cc: Jim Mattson <jmattson@google.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  Documentation/virt/kvm/api.rst  |  15 +++++
> >  arch/x86/include/asm/kvm_host.h |   1 +
> >  arch/x86/kvm/svm/sev.c          | 101 ++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.c          |   1 +
> >  arch/x86/kvm/svm/svm.h          |   2 +
> >  arch/x86/kvm/x86.c              |   5 ++
> >  include/uapi/linux/kvm.h        |   1 +
> >  7 files changed, 126 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 4ea1bb28297b..e8cecc024649 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6702,6 +6702,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
> >  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> >  perform a bulk copy of tags to/from the guest.
> >
> > +7.29 KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM
> > +-------------------------------------
>
> Do we really want to bury this under KVM_CAP?  Even KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
> is a bit of a stretch, but at least that's a one-way "enabling", whereas this
> migration routine should be able to handle multiple migrations, e.g. migrate A->B
> and B->A.  Peeking at your selftest, it should be fairly easy to add in this edge
> case.
>
> This is probably a Paolo question, I've no idea if there's a desire to expand
> KVM_CAP versus adding a new ioctl().

Thanks for the review Sean. I put this under KVM_CAP as you suggested
following the idea of svm_vm_copy_asid_from. Paolo or anyone else have
thoughts here? It doesn't really matter to me.

>
> > +Architectures: x86 SEV enabled
> > +Type: vm
> > +Parameters: args[0] is the fd of the source vm
> > +Returns: 0 on success
>
> It'd be helpful to provide a brief description of the error cases.  Looks like
> -EINVAL is the only possible error?
>
> > +This capability enables userspace to migrate the encryption context
>
> I would prefer to scope this beyond "encryption context".  Even for SEV, it
> copies more than just the "context", which was an abstraction of SEV's ASID,
> e.g. this also hands off the set of encrypted memory regions.  Looking toward
> the future, if TDX wants to support this it's going to need to hand over a ton
> of stuff, e.g. S-EPT tables.
>
> Not sure on a name, maybe MIGRATE_PROTECTED_VM_FROM?

Protected VM sounds reasonable. I was using 'context' here to mean all
metadata related to a CoCo VM as with the
KVM_CAP_VM_COPY_ENC_CONTEXT_FROM. Is it worth diverging naming here?

>
> > from the vm
>
> Capitalize VM in the description, if only to be consistent within these two
> paragraphs.  If it helps, oretend all the terrible examples in this file don't
> exist ;-)
>
> > +indicated by the fd to the vm this is called on.
> > +
> > +This is intended to support intra-host migration of VMs between userspace VMMs.
> > +in-guest workloads scheduled by the host. This allows for upgrading the VMMg
>
> This snippet (and the lowercase "vm") looks like it was left behind after a
> copy-paste from KVM_CAP_VM_COPY_ENC_CONTEXT_FROM.
>
> > +process without interrupting the guest.
> > +
> >  8. Other capabilities.
> >  ======================
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 09b256db394a..f06d87a85654 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1456,6 +1456,7 @@ struct kvm_x86_ops {
> >       int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
> >       int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
> >       int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> > +     int (*vm_migrate_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> >
> >       int (*get_msr_feature)(struct kvm_msr_entry *entry);
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 46eb1ba62d3d..8db666a362d4 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -1501,6 +1501,107 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >       return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
> >  }
> >
> > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > +{
> > +     struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > +     /*
> > +      * Bail if this VM is already involved in a migration to avoid deadlock
> > +      * between two VMs trying to migrate to/from each other.
> > +      */
> > +     if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
> > +             return -EBUSY;
> > +
> > +     mutex_lock(&kvm->lock);
> > +
> > +     return 0;
> > +}
> > +
> > +static void svm_unlock_after_migration(struct kvm *kvm)
> > +{
> > +     struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > +     mutex_unlock(&kvm->lock);
> > +     atomic_set_release(&sev->migration_in_progress, 0);
> > +}
> > +
> > +static void migrate_info_from(struct kvm_sev_info *dst,
> > +                           struct kvm_sev_info *src)
> > +{
> > +     sev_asid_free(dst);
>
> Ooh, this brings up a potential shortcoming of requiring @dst to be SEV-enabled.
> If every SEV{-ES} ASID is allocated, then there won't be an available ASID to
> (temporarily) allocate for the intra-host migration.  But that temporary ASID
> isn't actually necessary, i.e. there's no reason intra-host migration should fail
> if all ASIDs are in-use.
>
> I don't see any harm in requiring the @dst to _not_ be SEV-enabled.  sev_info
> is not dynamically allocated, i.e. migration_in_progress is accessible either
> way.  That would also simplify some of the checks, e.g. the regions_list check
> goes away because svm_register_enc_region() fails on non-SEV guests.
>
> I believe this will also fix multiple bugs in the next patch (SEV-ES support).
>
> Bug #1, SEV-ES support changes the checks to:
>
>         if (!sev_guest(kvm)) {
>                 ret = -EINVAL;
>                 pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
>                 goto out_unlock;
>         }
>
>         ...
>
>         if (!sev_guest(source_kvm)) {
>                 ret = -EINVAL;
>                 pr_warn_ratelimited(
>                         "Source VM must be SEV enabled to migrate from.\n");
>                 goto out_source;
>         }
>
>         if (sev_es_guest(kvm)) {
>                 ret = migrate_vmsa_from(kvm, source_kvm);
>                 if (ret)
>                         goto out_source;
>         }
>
> and fails to handle the scenario where dst.SEV_ES != src.SEV_ES.  If @dst is
> SEV_ES-enabled and @src has created vCPUs, migrate_vmsa_from() will still fail
> due to guest_state_protected being false, but the reverse won't hold true and
> KVM will "successfully" migrate an SEV-ES guest to an SEV guest.  I'm guessing
> fireworks will ensue, e.g. due to running with the wrong ASID.
>
> Bug #2, migrate_vmsa_from() leaks dst->vmsa, as this
>
>                 dst_svm->vmsa = src_svm->vmsa;
>                 src_svm->vmsa = NULL;
>
> overwrites dst_svm->vmsa that was allocated by svm_create_vcpu().
>
> AFAICT, there isn't anything that will break by forcing @dst to be !SEV (except
> stuff that's already broken, see below).  For SEV{-ES} specific stuff, anything
> that is allocated/set vCPU creation likely needs to be migrated, e.g. VMSA and
> the GHCB MSR value.  The only missing action is kvm_free_guest_fpu().
>
> Side topic, the VMSA really should be allocated in sev_es_create_vcpu(), and
> guest_fpu should never be allocated for SEV-ES guests (though that doesn't change
> the need for kvm_free_guest_fpu() in this case).  I'll send patches for that.

I believe there is no need to require dst to be SEV or SEV-ES enabled.
This logic was just carried over from our internal implementation
which is more similar to the 2 ioctl version in V1-3.

>
> > +     dst->asid = src->asid;
> > +     dst->misc_cg = src->misc_cg;
> > +     dst->handle = src->handle;
> > +     dst->pages_locked = src->pages_locked;
> > +
> > +     src->asid = 0;
> > +     src->active = false;
> > +     src->handle = 0;
> > +     src->pages_locked = 0;
> > +     src->misc_cg = NULL;
> > +
> > +     INIT_LIST_HEAD(&dst->regions_list);
> > +     list_replace_init(&src->regions_list, &dst->regions_list);
> > +}
> > +
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > +{
> > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct file *source_kvm_file;
> > +     struct kvm *source_kvm;
> > +     int ret;
> > +
> > +     ret = svm_sev_lock_for_migration(kvm);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > +             ret = -EINVAL;
> > +             pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
>
> Linux generally doesn't log user errors to dmesg.  They can be helpful during
> development, but aren't actionable and thus are of limited use in production.
>

As noted I added for marcorr's feedback. I'll remove all of this.

> > +             goto out_unlock;
> > +     }
>
> Hmm, I was going to say that migration should be rejected if @dst has created
> vCPUs, but the SEV-ES support migrates VMSA state and so must run after vCPUs
> are created.  Holding kvm->lock does not prevent invoking per-vCPU ioctls(),
> including KVM_RUN.  Modifying vCPU SEV{-ES} state while a vCPU is actively running
> is bound to cause explosions.
>
> One option for this patch would be to check kvm->created_vcpus and then add
> different logic for SEV-ES, but that's probably not desirable for userspace as
> it will mean triggering intra-host migration at different points for SEV vs. SEV-ES.
>
> So I think the only option is to take vcpu->mutex for all vCPUs in both @src and
> @dst.  Adding that after acquiring kvm->lock in svm_sev_lock_for_migration()
> should Just Work.  Unless userspace is misbehaving, the lock won't be contended
> since all vCPUs need to be quiesced, though it's probably worth using the
> mutex_lock_killable() variant just to be safe.

Ack will do.

>
> > +     if (!list_empty(&dst_sev->regions_list)) {
> > +             ret = -EINVAL;
> > +             pr_warn_ratelimited(
> > +                     "VM must not have encrypted regions to migrate to.\n");
> > +             goto out_unlock;
> > +     }
> > +
> > +     source_kvm_file = fget(source_fd);
> > +     if (!file_is_kvm(source_kvm_file)) {
> > +             ret = -EBADF;
> > +             pr_warn_ratelimited(
> > +                             "Source VM must be SEV enabled to migrate from.\n");
>
> Case in point for not logging errors, this is arguably inaccurate as the source
> "VM" isn't a VM.
>
> > +             goto out_fput;
> > +     }
> > +
> > +     source_kvm = source_kvm_file->private_data;
> > +     ret = svm_sev_lock_for_migration(source_kvm);
> > +     if (ret)
> > +             goto out_fput;
> > +
> > +     if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +             ret = -EINVAL;
> > +             pr_warn_ratelimited(
> > +                     "Source VM must be SEV enabled to migrate from.\n");
> > +             goto out_source;
> > +     }
> > +
> > +     migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > +     ret = 0;
> > +
> > +out_source:
> > +     svm_unlock_after_migration(source_kvm);
> > +out_fput:
> > +     if (source_kvm_file)
> > +             fput(source_kvm_file);
> > +out_unlock:
> > +     svm_unlock_after_migration(kvm);
> > +     return ret;
> > +}
Sean Christopherson Sept. 10, 2021, 10:03 p.m. UTC | #7
On Fri, Sep 10, 2021, Peter Gonda wrote:
> > Do we really want to bury this under KVM_CAP?  Even KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
> > is a bit of a stretch, but at least that's a one-way "enabling", whereas this
> > migration routine should be able to handle multiple migrations, e.g. migrate A->B
> > and B->A.  Peeking at your selftest, it should be fairly easy to add in this edge
> > case.
> >
> > This is probably a Paolo question, I've no idea if there's a desire to expand
> > KVM_CAP versus adding a new ioctl().
> 
> Thanks for the review Sean. I put this under KVM_CAP as you suggested
> following the idea of svm_vm_copy_asid_from. Paolo or anyone else have
> thoughts here? It doesn't really matter to me.

Ah, sorry :-/  I obviously don't have a strong preference either.

> > > +Architectures: x86 SEV enabled
> > > +Type: vm
> > > +Parameters: args[0] is the fd of the source vm
> > > +Returns: 0 on success
> >
> > It'd be helpful to provide a brief description of the error cases.  Looks like
> > -EINVAL is the only possible error?
> >
> > > +This capability enables userspace to migrate the encryption context
> >
> > I would prefer to scope this beyond "encryption context".  Even for SEV, it
> > copies more than just the "context", which was an abstraction of SEV's ASID,
> > e.g. this also hands off the set of encrypted memory regions.  Looking toward
> > the future, if TDX wants to support this it's going to need to hand over a ton
> > of stuff, e.g. S-EPT tables.
> >
> > Not sure on a name, maybe MIGRATE_PROTECTED_VM_FROM?
> 
> Protected VM sounds reasonable. I was using 'context' here to mean all
> metadata related to a CoCo VM as with the
> KVM_CAP_VM_COPY_ENC_CONTEXT_FROM. Is it worth diverging naming here?

Yes, as they are two similar but slightly different things, IMO we want to diverge
so that it's obvious they operate on different data.
Peter Gonda Sept. 10, 2021, 10:07 p.m. UTC | #8
On Fri, Sep 10, 2021 at 4:03 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 10, 2021, Peter Gonda wrote:
> > > Do we really want to bury this under KVM_CAP?  Even KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
> > > is a bit of a stretch, but at least that's a one-way "enabling", whereas this
> > > migration routine should be able to handle multiple migrations, e.g. migrate A->B
> > > and B->A.  Peeking at your selftest, it should be fairly easy to add in this edge
> > > case.
> > >
> > > This is probably a Paolo question, I've no idea if there's a desire to expand
> > > KVM_CAP versus adding a new ioctl().
> >
> > Thanks for the review Sean. I put this under KVM_CAP as you suggested
> > following the idea of svm_vm_copy_asid_from. Paolo or anyone else have
> > thoughts here? It doesn't really matter to me.
>
> Ah, sorry :-/  I obviously don't have a strong preference either.

I am going to suggest leaving it under KVM_CAP for this reason. I
don't see a great use case for A->B then B->A migrations. And if we
are going to move to dst must be not SEV or SEV-ES enabled, which I
think makes sense. Then your VM can only ever have migrated from 1
other VM since once it has it will be SEV/SEV-ES enabled. Does that
seem reasonable?

>
> > > > +Architectures: x86 SEV enabled
> > > > +Type: vm
> > > > +Parameters: args[0] is the fd of the source vm
> > > > +Returns: 0 on success
> > >
> > > It'd be helpful to provide a brief description of the error cases.  Looks like
> > > -EINVAL is the only possible error?
> > >
> > > > +This capability enables userspace to migrate the encryption context
> > >
> > > I would prefer to scope this beyond "encryption context".  Even for SEV, it
> > > copies more than just the "context", which was an abstraction of SEV's ASID,
> > > e.g. this also hands off the set of encrypted memory regions.  Looking toward
> > > the future, if TDX wants to support this it's going to need to hand over a ton
> > > of stuff, e.g. S-EPT tables.
> > >
> > > Not sure on a name, maybe MIGRATE_PROTECTED_VM_FROM?
> >
> > Protected VM sounds reasonable. I was using 'context' here to mean all
> > metadata related to a CoCo VM as with the
> > KVM_CAP_VM_COPY_ENC_CONTEXT_FROM. Is it worth diverging naming here?
>
> Yes, as they are two similar but slightly different things, IMO we want to diverge
> so that it's obvious they operate on different data.

Sounds good I'll rename.
Peter Gonda Sept. 13, 2021, 4:21 p.m. UTC | #9
On Thu, Sep 9, 2021 at 7:12 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 10, 2021, Sean Christopherson wrote:
> > Ooh, this brings up a potential shortcoming of requiring @dst to be SEV-enabled.
> > If every SEV{-ES} ASID is allocated, then there won't be an available ASID to
> > (temporarily) allocate for the intra-host migration.  But that temporary ASID
> > isn't actually necessary, i.e. there's no reason intra-host migration should fail
> > if all ASIDs are in-use.

Ack forcing dst to be SEV disabled will mitigate this problem.

>
> ...
>
> > So I think the only option is to take vcpu->mutex for all vCPUs in both @src and
> > @dst.  Adding that after acquiring kvm->lock in svm_sev_lock_for_migration()
> > should Just Work.  Unless userspace is misbehaving, the lock won't be contended
> > since all vCPUs need to be quiesced, though it's probably worth using the
> > mutex_lock_killable() variant just to be safe.
>
> Circling back to this after looking at the SEV-ES support, I think the vCPUs in
> the source VM need to be reset via kvm_vcpu_reset(vcpu, false).  I doubt there's
> a use case for actually doing anything with the vCPU, but leaving it runnable
> without purging state makes me nervous.
>
> Alternative #1 would be to mark vCPUs as dead in some way so as to prevent doing
> anything useful with the vCPU.
>
> Alternative #2 would be to "kill" the source VM by setting kvm->vm_bugged to
> prevent all ioctls().
>
> The downside to preventing future ioctls() is that this would need to be the
> very last step of migration.  Not sure if that's problematic?

I'll add calls to kvm_vcpu_reset. Alternative #2 using vm_bugged won't
work for us because we need to keep using the source VM even after the
state is transfered.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4ea1bb28297b..e8cecc024649 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6702,6 +6702,21 @@  MAP_SHARED mmap will result in an -EINVAL return.
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
 
+7.29 KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM
+-------------------------------------
+
+Architectures: x86 SEV enabled
+Type: vm
+Parameters: args[0] is the fd of the source vm
+Returns: 0 on success
+
+This capability enables userspace to migrate the encryption context from the vm
+indicated by the fd to the vm this is called on.
+
+This is intended to support intra-host migration of VMs between userspace VMMs.
+in-guest workloads scheduled by the host. This allows for upgrading the VMM
+process without interrupting the guest.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09b256db394a..f06d87a85654 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1456,6 +1456,7 @@  struct kvm_x86_ops {
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
+	int (*vm_migrate_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 
 	int (*get_msr_feature)(struct kvm_msr_entry *entry);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 46eb1ba62d3d..8db666a362d4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1501,6 +1501,107 @@  static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
 }
 
+static int svm_sev_lock_for_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	/*
+	 * Bail if this VM is already involved in a migration to avoid deadlock
+	 * between two VMs trying to migrate to/from each other.
+	 */
+	if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
+		return -EBUSY;
+
+	mutex_lock(&kvm->lock);
+
+	return 0;
+}
+
+static void svm_unlock_after_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_unlock(&kvm->lock);
+	atomic_set_release(&sev->migration_in_progress, 0);
+}
+
+static void migrate_info_from(struct kvm_sev_info *dst,
+			      struct kvm_sev_info *src)
+{
+	sev_asid_free(dst);
+
+	dst->asid = src->asid;
+	dst->misc_cg = src->misc_cg;
+	dst->handle = src->handle;
+	dst->pages_locked = src->pages_locked;
+
+	src->asid = 0;
+	src->active = false;
+	src->handle = 0;
+	src->pages_locked = 0;
+	src->misc_cg = NULL;
+
+	INIT_LIST_HEAD(&dst->regions_list);
+	list_replace_init(&src->regions_list, &dst->regions_list);
+}
+
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
+{
+	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct file *source_kvm_file;
+	struct kvm *source_kvm;
+	int ret;
+
+	ret = svm_sev_lock_for_migration(kvm);
+	if (ret)
+		return ret;
+
+	if (!sev_guest(kvm) || sev_es_guest(kvm)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
+		goto out_unlock;
+	}
+
+	if (!list_empty(&dst_sev->regions_list)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited(
+			"VM must not have encrypted regions to migrate to.\n");
+		goto out_unlock;
+	}
+
+	source_kvm_file = fget(source_fd);
+	if (!file_is_kvm(source_kvm_file)) {
+		ret = -EBADF;
+		pr_warn_ratelimited(
+				"Source VM must be SEV enabled to migrate from.\n");
+		goto out_fput;
+	}
+
+	source_kvm = source_kvm_file->private_data;
+	ret = svm_sev_lock_for_migration(source_kvm);
+	if (ret)
+		goto out_fput;
+
+	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited(
+			"Source VM must be SEV enabled to migrate from.\n");
+		goto out_source;
+	}
+
+	migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	ret = 0;
+
+out_source:
+	svm_unlock_after_migration(source_kvm);
+out_fput:
+	if (source_kvm_file)
+		fput(source_kvm_file);
+out_unlock:
+	svm_unlock_after_migration(kvm);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1a70e11f0487..88dd76dd966f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4625,6 +4625,7 @@  static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_unreg_region = svm_unregister_enc_region,
 
 	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
+	.vm_migrate_enc_context_from = svm_vm_migrate_from,
 
 	.can_emulate_instruction = svm_can_emulate_instruction,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 524d943f3efc..67bfb43301e1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -80,6 +80,7 @@  struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	atomic_t migration_in_progress;
 };
 
 struct kvm_svm {
@@ -552,6 +553,7 @@  int svm_register_enc_region(struct kvm *kvm,
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range);
 int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86539c1686fa..c461867d37aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5654,6 +5654,11 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm_x86_ops.vm_copy_enc_context_from)
 			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
 		return r;
+	case KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM:
+		r = -EINVAL;
+		if (kvm_x86_ops.vm_migrate_enc_context_from)
+			r = kvm_x86_ops.vm_migrate_enc_context_from(kvm, cap->args[0]);
+		return r;
 	case KVM_CAP_EXIT_HYPERCALL:
 		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
 			r = -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..49660204cdb9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING