diff mbox series

[V11,2/5] KVM: SEV: Add support for SEV intra host migration

Message ID 20211021174303.385706-3-pgonda@google.com (mailing list archive)
State New, archived
Headers show
Series Add AMD SEV and SEV-ES intra host migration support | expand

Commit Message

Peter Gonda Oct. 21, 2021, 5:43 p.m. UTC
For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/virt/kvm/api.rst  |  15 ++++
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm/sev.c          | 137 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/svm/svm.h          |   2 +
 arch/x86/kvm/x86.c              |   6 ++
 include/uapi/linux/kvm.h        |   1 +
 7 files changed, 163 insertions(+)

Comments

Sean Christopherson Nov. 4, 2021, 10:07 p.m. UTC | #1
Paolo and anyone else, any thoughts before I lead Peter on an even longer wild
goose chase?

On Thu, Oct 21, 2021, Peter Gonda wrote:
> @@ -6706,6 +6706,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
>  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>  perform a bulk copy of tags to/from the guest.
>  
> +7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
> +-------------------------------------
> +
> +Architectures: x86 SEV enabled

I'd drop the "SEV enabled" part.  In a way, it's technically a lie for this one
patch since an SEV-ES VM is also an SEV VM, but doesn't support this capability.
And AFAICT no other ioctl()/capability provides this level of granularity.

> +Type: vm
> +Parameters: args[0] is the fd of the source vm
> +Returns: 0 on success
> +
> +This capability enables userspace to migrate the encryption context from the VM
> +indicated by the fd to the VM this is called on.
> +
> +This is intended to support intra-host migration of VMs between userspace VMMs.
> +in-guest workloads scheduled by the host. This allows for upgrading the VMM
> +process without interrupting the guest.
> +

...

> +static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {

Braces not needed.

> +		mutex_unlock(&vcpu->mutex);
> +	}
> +}
> +
> +static void sev_migrate_from(struct kvm_sev_info *dst,
> +			      struct kvm_sev_info *src)
> +{
> +	dst->active = true;
> +	dst->asid = src->asid;
> +	dst->misc_cg = src->misc_cg;

Ah, this is not correct.  If @dst is in a different cgroup, then @dst needs to
be charged and @src needs to be uncharged.

That would also provide a good opportunity to more tightly couple ->asid and
->misc_cg in the form of a helper.  Looking at the code, there's an invariant
that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
in a helper, irrespective of this code.

	misc_cg_uncharge(type, sev->misc_cg, 1);
	put_misc_cg(sev->misc_cg);
	sev->misc_cg = NULL;

> +	dst->handle = src->handle;
> +	dst->pages_locked = src->pages_locked;
> +
> +	src->asid = 0;
> +	src->active = false;
> +	src->handle = 0;
> +	src->pages_locked = 0;
> +	src->misc_cg = NULL;
> +	INIT_LIST_HEAD(&dst->regions_list);
> +	list_replace_init(&src->regions_list, &dst->regions_list);
> +}
> +
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> +{
> +	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> +	struct file *source_kvm_file;
> +	struct kvm *source_kvm;
> +	struct kvm_vcpu *vcpu;
> +	int i, ret;
> +
> +	ret = sev_lock_for_migration(kvm);
> +	if (ret)
> +		return ret;
> +
> +	if (sev_guest(kvm)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	source_kvm_file = fget(source_fd);
> +	if (!file_is_kvm(source_kvm_file)) {
> +		ret = -EBADF;
> +		goto out_fput;
> +	}
> +
> +	source_kvm = source_kvm_file->private_data;
> +	ret = sev_lock_for_migration(source_kvm);
> +	if (ret)
> +		goto out_fput;
> +
> +	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +		ret = -EINVAL;
> +		goto out_source;
> +	}
> +	ret = sev_lock_vcpus_for_migration(kvm);
> +	if (ret)
> +		goto out_dst_vcpu;
> +	ret = sev_lock_vcpus_for_migration(source_kvm);
> +	if (ret)
> +		goto out_source_vcpu;
> +
> +	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> +	kvm_for_each_vcpu(i, vcpu, source_kvm) {

Braces not needed.

> +		kvm_vcpu_reset(vcpu, /* init_event= */ false);

Phooey.  I made this suggestion, but in hindsight, it's a bad suggestion as KVM
doesn't currently have a true RESET path; there are quite a few blobs of code
that assume the vCPU has never been run if init_event=false.

And to go through kvm_vcpu_reset(), the vcpu needs to be loaded, not just locked.
It won't fail as hard as VMX, where KVM would write the wrong VMCS, but odds are
good something will eventually go sideways.

Aha!  An idea.  Marking the VM bugged doesn't work because "we need to keep using
the  source VM even after the state is transfered"[*], but the core idea is sound,
it just needs to add a different flag to more precisely prevent kvm_vcpu_ioctl().

If we rename KVM_REQ_VM_BUGGED=>KVM_REQ_VM_DEAD in a prep patch (see below), then
this patch can add something here (can't think of a good name)

	source_kvm->??? = true;
	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);

and then check it in kvm_vcpu_ioctl()

	struct kvm *kvm = vcpu->kvm;

	if (kvm->mm != current->mm || kvm->vm_bugged || kvm->???)
		return -EIO;

That way the source vCPUs don't need to be locked and all vCPU ioctls() are
blocked, which I think is ideal since the vCPUs are in a frankenstate and really
should just die.

Maybe we can call the flag "zombie", or "mostly_dead" :-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c80fa1d378c9..e3f49ca01f95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9423,7 +9423,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
        }

        if (kvm_request_pending(vcpu)) {
-               if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+               if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
                        r = -EIO;
                        goto out;
                }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0f18df7fe874..de8d25cef183 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD                  (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQUEST_ARCH_BASE     8

 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -654,7 +654,7 @@ struct kvm {
 static inline void kvm_vm_bugged(struct kvm *kvm)
 {
        kvm->vm_bugged = true;
-       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
 }

 #define KVM_BUG(cond, kvm, fmt...)


Back when I made this bad suggestion in v7, you said "we need to keep using the
source VM even after the state is transfered"[*].  What all do you need to do
after the migration?  I assume it's mostly memory related per-VM ioctls?


[*] https://lkml.kernel.org/r/CAMkAt6q3as414YMZco6UyCycY+jKbaYS5BUdC+U+8iWmBft3+A@mail.gmail.com

> +	}
> +	ret = 0;
> +
> +out_source_vcpu:
> +	sev_unlock_vcpus_for_migration(source_kvm);
> +
> +out_dst_vcpu:
> +	sev_unlock_vcpus_for_migration(kvm);
> +
> +out_source:
> +	sev_unlock_after_migration(source_kvm);
> +out_fput:
> +	if (source_kvm_file)
> +		fput(source_kvm_file);
> +out_unlock:
> +	sev_unlock_after_migration(kvm);
> +	return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 68294491c23d..c2e25ae4757f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4637,6 +4637,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.mem_enc_unreg_region = svm_unregister_enc_region,
>  
>  	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
> +	.vm_migrate_protected_vm_from = svm_vm_migrate_from,
>  
>  	.can_emulate_instruction = svm_can_emulate_instruction,
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 6d8d762d208f..d7b44b37dfcf 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -80,6 +80,7 @@ struct kvm_sev_info {
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +	atomic_t migration_in_progress;
>  };
>  
>  struct kvm_svm {
> @@ -557,6 +558,7 @@ int svm_register_enc_region(struct kvm *kvm,
>  int svm_unregister_enc_region(struct kvm *kvm,
>  			      struct kvm_enc_region *range);
>  int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_set_cpu_caps(void);
>  void __init sev_hardware_setup(void);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0c8b5129effd..c80fa1d378c9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5665,6 +5665,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		if (kvm_x86_ops.vm_copy_enc_context_from)
>  			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
>  		return r;
> +	case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:

I wonder... would it make sense to hedge and just call this KVM_CAP_VM_MIGRATE_VM_FROM?
I can't think of a use case where KVM would "need" to do this for a non-protected
VM, but I also don't see a huge naming problem if the "PROTECTED" is omitted.

> +		r = -EINVAL;
> +		if (kvm_x86_ops.vm_migrate_protected_vm_from)
> +			r = kvm_x86_ops.vm_migrate_protected_vm_from(
> +				kvm, cap->args[0]);

Either let that poke out and/or refactor to avoid the indentation.  E.g.

		r = -EINVAL;
		if (!kvm_x86_ops.vm_migrate_protected_vm_from)
			break;

		return kvm_x86_ops.vm_migrate_protected_vm_from(kvm, cap->args[0]);

		
> +		return r;
>  	case KVM_CAP_EXIT_HYPERCALL:
>  		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
>  			r = -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a067410ebea5..77b292ed01c1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_BINARY_STATS_FD 203
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
> +#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.33.0.1079.g6e70778dc9-goog
>
Sean Christopherson Nov. 4, 2021, 11:04 p.m. UTC | #2
On Thu, Nov 04, 2021, Sean Christopherson wrote:
> On Thu, Oct 21, 2021, Peter Gonda wrote:
> > +	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +		ret = -EINVAL;
> > +		goto out_source;
> > +	}
> > +	ret = sev_lock_vcpus_for_migration(kvm);
> > +	if (ret)
> > +		goto out_dst_vcpu;
> > +	ret = sev_lock_vcpus_for_migration(source_kvm);
> > +	if (ret)
> > +		goto out_source_vcpu;
> > +
> > +	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > +	kvm_for_each_vcpu(i, vcpu, source_kvm) {
> 
> Braces not needed.
> 
> > +		kvm_vcpu_reset(vcpu, /* init_event= */ false);

...

> That way the source vCPUs don't need to be locked

Scratch that particular idea, I keep forgetting the SEV-ES support needs to lock
the source vCPUs to transfer state.
Peter Gonda Nov. 9, 2021, 3:19 p.m. UTC | #3
On Thu, Nov 4, 2021 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Paolo and anyone else, any thoughts before I lead Peter on an even longer wild
> goose chase?

Input would be appreciated.

Commented on some questions, otherwise taken feedback for a new revision.

>
> On Thu, Oct 21, 2021, Peter Gonda wrote:
> > @@ -6706,6 +6706,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
> >  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> >  perform a bulk copy of tags to/from the guest.
> >
> > +7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
> > +-------------------------------------
> > +
> > +Architectures: x86 SEV enabled
>
> I'd drop the "SEV enabled" part.  In a way, it's technically a lie for this one
> patch since an SEV-ES VM is also an SEV VM, but doesn't support this capability.
> And AFAICT no other ioctl()/capability provides this level of granularity.
>
> > +Type: vm
> > +Parameters: args[0] is the fd of the source vm
> > +Returns: 0 on success
> > +
> > +This capability enables userspace to migrate the encryption context from the VM
> > +indicated by the fd to the VM this is called on.
> > +
> > +This is intended to support intra-host migration of VMs between userspace VMMs.
> > +in-guest workloads scheduled by the host. This allows for upgrading the VMM
> > +process without interrupting the guest.
> > +
>
> ...
>
> > +static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
> > +{
> > +     struct kvm_vcpu *vcpu;
> > +     int i;
> > +
> > +     kvm_for_each_vcpu(i, vcpu, kvm) {
>
> Braces not needed.
>
> > +             mutex_unlock(&vcpu->mutex);
> > +     }
> > +}
> > +
> > +static void sev_migrate_from(struct kvm_sev_info *dst,
> > +                           struct kvm_sev_info *src)
> > +{
> > +     dst->active = true;
> > +     dst->asid = src->asid;
> > +     dst->misc_cg = src->misc_cg;
>
> Ah, this is not correct.  If @dst is in a different cgroup, then @dst needs to
> be charged and @src needs to be uncharged.
>
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
>
>         misc_cg_uncharge(type, sev->misc_cg, 1);
>         put_misc_cg(sev->misc_cg);
>         sev->misc_cg = NULL;
>

OK I can pull this out into a helper in a separate patch. Then do the
uncharge/charge here.

> > +     dst->handle = src->handle;
> > +     dst->pages_locked = src->pages_locked;
> > +
> > +     src->asid = 0;
> > +     src->active = false;
> > +     src->handle = 0;
> > +     src->pages_locked = 0;
> > +     src->misc_cg = NULL;
> > +     INIT_LIST_HEAD(&dst->regions_list);
> > +     list_replace_init(&src->regions_list, &dst->regions_list);
> > +}
> > +
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > +{
> > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct file *source_kvm_file;
> > +     struct kvm *source_kvm;
> > +     struct kvm_vcpu *vcpu;
> > +     int i, ret;
> > +
> > +     ret = sev_lock_for_migration(kvm);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (sev_guest(kvm)) {
> > +             ret = -EINVAL;
> > +             goto out_unlock;
> > +     }
> > +
> > +     source_kvm_file = fget(source_fd);
> > +     if (!file_is_kvm(source_kvm_file)) {
> > +             ret = -EBADF;
> > +             goto out_fput;
> > +     }
> > +
> > +     source_kvm = source_kvm_file->private_data;
> > +     ret = sev_lock_for_migration(source_kvm);
> > +     if (ret)
> > +             goto out_fput;
> > +
> > +     if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +             ret = -EINVAL;
> > +             goto out_source;
> > +     }
> > +     ret = sev_lock_vcpus_for_migration(kvm);
> > +     if (ret)
> > +             goto out_dst_vcpu;
> > +     ret = sev_lock_vcpus_for_migration(source_kvm);
> > +     if (ret)
> > +             goto out_source_vcpu;
> > +
> > +     sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > +     kvm_for_each_vcpu(i, vcpu, source_kvm) {
>
> Braces not needed.
>
> > +             kvm_vcpu_reset(vcpu, /* init_event= */ false);
>
> Phooey.  I made this suggestion, but in hindsight, it's a bad suggestion as KVM
> doesn't currently have a true RESET path; there are quite a few blobs of code
> that assume the vCPU has never been run if init_event=false.
>
> And to go through kvm_vcpu_reset(), the vcpu needs to be loaded, not just locked.
> It won't fail as hard as VMX, where KVM would write the wrong VMCS, but odds are
> good something will eventually go sideways.
>
> Aha!  An idea.  Marking the VM bugged doesn't work because "we need to keep using
> the  source VM even after the state is transfered"[*], but the core idea is sound,
> it just needs to add a different flag to more precisely prevent kvm_vcpu_ioctl().
>
> If we rename KVM_REQ_VM_BUGGED=>KVM_REQ_VM_DEAD in a prep patch (see below), then
> this patch can add something here (can't think of a good name)
>
>         source_kvm->??? = true;
>         kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
>
> and then check it in kvm_vcpu_ioctl()
>
>         struct kvm *kvm = vcpu->kvm;
>
>         if (kvm->mm != current->mm || kvm->vm_bugged || kvm->???)
>                 return -EIO;
>
> That way the source vCPUs don't need to be locked and all vCPU ioctls() are
> blocked, which I think is ideal since the vCPUs are in a frankenstate and really
> should just die.
>
> Maybe we can call the flag "zombie", or "mostly_dead" :-)

Do we actually need this functionality? We currently do intra-host
migration leaving the old vCPUs in a potentially dangling state until
we clean them up, so I am wondering why SEV VMs intra-host migration
should be special cased? Why not just zero out all the SEV-ES state
here so they cannot corrupt the GHCB, VMSA, etc. That is already safer
than what's done currently since the source vCPUs are still available
until the source VMM starts to tear down, those vCPUs could still be
run affecting guest state unexpectedly.

>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c80fa1d378c9..e3f49ca01f95 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9423,7 +9423,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         }
>
>         if (kvm_request_pending(vcpu)) {
> -               if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
> +               if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>                         r = -EIO;
>                         goto out;
>                 }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0f18df7fe874..de8d25cef183 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
>  #define KVM_REQ_UNHALT            3
> -#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> +#define KVM_REQ_VM_DEAD                  (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQUEST_ARCH_BASE     8
>
>  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
> @@ -654,7 +654,7 @@ struct kvm {
>  static inline void kvm_vm_bugged(struct kvm *kvm)
>  {
>         kvm->vm_bugged = true;
> -       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
> +       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
>  }
>
>  #define KVM_BUG(cond, kvm, fmt...)
>
>
> Back when I made this bad suggestion in v7, you said "we need to keep using the
> source VM even after the state is transfered"[*].  What all do you need to do
> after the migration?  I assume it's mostly memory related per-VM ioctls?
>
>
> [*] https://lkml.kernel.org/r/CAMkAt6q3as414YMZco6UyCycY+jKbaYS5BUdC+U+8iWmBft3+A@mail.gmail.com
>
> > +     }
> > +     ret = 0;
> > +
> > +out_source_vcpu:
> > +     sev_unlock_vcpus_for_migration(source_kvm);
> > +
> > +out_dst_vcpu:
> > +     sev_unlock_vcpus_for_migration(kvm);
> > +
> > +out_source:
> > +     sev_unlock_after_migration(source_kvm);
> > +out_fput:
> > +     if (source_kvm_file)
> > +             fput(source_kvm_file);
> > +out_unlock:
> > +     sev_unlock_after_migration(kvm);
> > +     return ret;
> > +}
> > +
> >  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >  {
> >       struct kvm_sev_cmd sev_cmd;
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 68294491c23d..c2e25ae4757f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4637,6 +4637,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >       .mem_enc_unreg_region = svm_unregister_enc_region,
> >
> >       .vm_copy_enc_context_from = svm_vm_copy_asid_from,
> > +     .vm_migrate_protected_vm_from = svm_vm_migrate_from,
> >
> >       .can_emulate_instruction = svm_can_emulate_instruction,
> >
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 6d8d762d208f..d7b44b37dfcf 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -80,6 +80,7 @@ struct kvm_sev_info {
> >       u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
> >       struct kvm *enc_context_owner; /* Owner of copied encryption context */
> >       struct misc_cg *misc_cg; /* For misc cgroup accounting */
> > +     atomic_t migration_in_progress;
> >  };
> >
> >  struct kvm_svm {
> > @@ -557,6 +558,7 @@ int svm_register_enc_region(struct kvm *kvm,
> >  int svm_unregister_enc_region(struct kvm *kvm,
> >                             struct kvm_enc_region *range);
> >  int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
> >  void pre_sev_run(struct vcpu_svm *svm, int cpu);
> >  void __init sev_set_cpu_caps(void);
> >  void __init sev_hardware_setup(void);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 0c8b5129effd..c80fa1d378c9 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5665,6 +5665,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >               if (kvm_x86_ops.vm_copy_enc_context_from)
> >                       r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
> >               return r;
> > +     case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:
>
> I wonder... would it make sense to hedge and just call this KVM_CAP_VM_MIGRATE_VM_FROM?
> I can't think of a use case where KVM would "need" to do this for a non-protected
> VM, but I also don't see a huge naming problem if the "PROTECTED" is omitted.

Currently this CAP only deals with "protected" VM metadata. This call
isn't needed at all for non-protected VMs so wouldn't this change
imply that this call is needed for intra-host migration of all VMs?

>
> > +             r = -EINVAL;
> > +             if (kvm_x86_ops.vm_migrate_protected_vm_from)
> > +                     r = kvm_x86_ops.vm_migrate_protected_vm_from(
> > +                             kvm, cap->args[0]);
>
> Either let that poke out and/or refactor to avoid the indentation.  E.g.
>
>                 r = -EINVAL;
>                 if (!kvm_x86_ops.vm_migrate_protected_vm_from)
>                         break;
>
>                 return kvm_x86_ops.vm_migrate_protected_vm_from(kvm, cap->args[0]);
>
>
> > +             return r;
> >       case KVM_CAP_EXIT_HYPERCALL:
> >               if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
> >                       r = -EINVAL;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index a067410ebea5..77b292ed01c1 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_BINARY_STATS_FD 203
> >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> >  #define KVM_CAP_ARM_MTE 205
> > +#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > --
> > 2.33.0.1079.g6e70778dc9-goog
> >
Paolo Bonzini Nov. 11, 2021, 3:17 p.m. UTC | #4
On 11/4/21 23:07, Sean Christopherson wrote:
> 
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
> 
> 	misc_cg_uncharge(type, sev->misc_cg, 1);
> 	put_misc_cg(sev->misc_cg);
> 	sev->misc_cg = NULL;

Agreed.  Though it's a bit more complicated because if dst->misc_cg ==
src->misc_cg you should *not* charge and uncharge, because charging
could fail.  So I'm going for just simple charge/uncharge helpers for
now:

 From 0ac1004d9e15e9460b51651bd095e6c5ee9cf4f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:02:26 -0500
Subject: [PATCH 1/2] KVM: SEV: provide helpers to charge/uncharge misc_cg

Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge.  The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c94fe307b39..5bafa4bf7c49 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -120,16 +120,26 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
  	return true;
  }
  
+static int sev_misc_cg_try_charge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	return misc_cg_try_charge(type, sev->misc_cg, 1);
+}
+
+static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+}
+
  static int sev_asid_new(struct kvm_sev_info *sev)
  {
  	int asid, min_asid, max_asid, ret;
  	bool retry = true;
-	enum misc_res_type type;
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
  	WARN_ON(sev->misc_cg);
  	sev->misc_cg = get_current_misc_cg();
-	ret = misc_cg_try_charge(type, sev->misc_cg, 1);
+	ret = sev_misc_cg_try_charge(sev);
  	if (ret) {
  		put_misc_cg(sev->misc_cg);
  		sev->misc_cg = NULL;
@@ -162,7 +172,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
  
  	return asid;
  e_uncharge:
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  	return ret;
@@ -179,7 +189,6 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  {
  	struct svm_cpu_data *sd;
  	int cpu;
-	enum misc_res_type type;
  
  	mutex_lock(&sev_bitmap_lock);
  
@@ -192,8 +201,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  
  	mutex_unlock(&sev_bitmap_lock);
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  }



 From 5214132ae7e8310de26d5791f7fe913085a8e53c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:08:32 -0500
Subject: [PATCH] fix cgroup charging


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bafa4bf7c49..97048ff7c2ad 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1594,7 +1594,6 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  {
  	dst->active = true;
  	dst->asid = src->asid;
-	dst->misc_cg = src->misc_cg;
  	dst->handle = src->handle;
  	dst->pages_locked = src->pages_locked;
  
@@ -1602,6 +1601,11 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  	src->active = false;
  	src->handle = 0;
  	src->pages_locked = 0;
+
+	if (dst->misc_cg != src->misc_cg)
+		sev_misc_cg_uncharge(src);
+
+	put_misc_cg(src->misc_cg);
  	src->misc_cg = NULL;
  
  	INIT_LIST_HEAD(&dst->regions_list);
@@ -1611,6 +1615,7 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  {
  	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_info *src_sev;
  	struct file *source_kvm_file;
  	struct kvm *source_kvm;
  	struct kvm_vcpu *vcpu;
@@ -1640,25 +1645,39 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		ret = -EINVAL;
  		goto out_source;
  	}
+
+	src_sev = &to_kvm_svm(source_kvm)->sev_info;
+	dst_sev->misc_cg = get_current_misc_cg();
+	if (dst_sev->misc_cg != src_sev->misc_cg) {
+		ret = sev_misc_cg_try_charge(dst_sev);
+		if (ret)
+			goto out_dst_put_cgroup;
+	}
+
  	ret = sev_lock_vcpus_for_migration(kvm);
  	if (ret)
-		goto out_dst_vcpu;
+		goto out_dst_cgroup;
  	ret = sev_lock_vcpus_for_migration(source_kvm);
  	if (ret)
-		goto out_source_vcpu;
+		goto out_dst_vcpu;
  
-	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	sev_migrate_from(dst_sev, src_sev);
  	kvm_for_each_vcpu(i, vcpu, source_kvm) {
  		kvm_vcpu_reset(vcpu, /* init_event= */ false);
  	}
  	ret = 0;
  
-out_source_vcpu:
  	sev_unlock_vcpus_for_migration(source_kvm);
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);
-
+out_dst_cgroup:
+	if (ret < 0) {
+		sev_misc_cg_uncharge(dst_sev);
+out_dst_put_cgroup:
+		put_misc_cg(dst_sev->misc_cg);
+		dst_sev->misc_cg = NULL;
+	}
  out_source:
  	sev_unlock_after_migration(source_kvm);
  out_fput:



 From 943ba93c57ee25f85538decd68dca6e4ebdaf2c1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:13:38 -0500
Subject: [PATCH] KVM: generalize "bugged" VM to "dead" VM

Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon.  In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca9693d3436b..185094eb86b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9660,7 +9660,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  	}
  
  	if (kvm_request_pending(vcpu)) {
-		if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
  			r = -EIO;
  			goto out;
  		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..9e0667e3723e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQ_UNBLOCK           2
  #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD           (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQUEST_ARCH_BASE     8
  
  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -617,6 +617,7 @@ struct kvm {
  	unsigned int max_halt_poll_ns;
  	u32 dirty_ring_size;
  	bool vm_bugged;
+	bool vm_dead;
  
  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
  	struct notifier_block pm_notifier;
@@ -650,12 +651,19 @@ struct kvm {
  #define vcpu_err(vcpu, fmt, ...)					\
  	kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
  
+static inline void kvm_vm_dead(struct kvm *kvm)
+{
+	kvm->vm_dead = true;
+	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
+}
+
  static inline void kvm_vm_bugged(struct kvm *kvm)
  {
  	kvm->vm_bugged = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+	kvm_vm_dead(kvm);
  }
  
+
  #define KVM_BUG(cond, kvm, fmt...)				\
  ({								\
  	int __ret = (cond);					\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..d31724500501 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3747,7 +3747,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
  	struct kvm_fpu *fpu = NULL;
  	struct kvm_sregs *kvm_sregs = NULL;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3957,7 +3957,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
  	void __user *argp = compat_ptr(arg);
  	int r;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4023,7 +4023,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
  {
  	struct kvm_device *dev = filp->private_data;
  
-	if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
+	if (dev->kvm->mm != current->mm || dev->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4345,7 +4345,7 @@ static long kvm_vm_ioctl(struct file *filp,
  	void __user *argp = (void __user *)arg;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  	case KVM_CREATE_VCPU:
@@ -4556,7 +4556,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
  	struct kvm *kvm = filp->private_data;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT



 From fb168352e16a4dbd95a7c0d1e6add18f0496ac97 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:14:49 -0500
Subject: [PATCH] mark src vm as dead


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97048ff7c2ad..2403aea3dbd3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1662,12 +1662,9 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		goto out_dst_vcpu;
  
  	sev_migrate_from(dst_sev, src_sev);
-	kvm_for_each_vcpu(i, vcpu, source_kvm) {
-		kvm_vcpu_reset(vcpu, /* init_event= */ false);
-	}
-	ret = 0;
-
+	kvm_vm_dead(source_kvm);
  	sev_unlock_vcpus_for_migration(source_kvm);
+	ret = 0;
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);


I'll send it out properly when I finish reviewing.

Paolo
Paolo Bonzini Nov. 11, 2021, 3:18 p.m. UTC | #5
On 11/4/21 23:07, Sean Christopherson wrote:
> 
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
> 
> 	misc_cg_uncharge(type, sev->misc_cg, 1);
> 	put_misc_cg(sev->misc_cg);
> 	sev->misc_cg = NULL;

Agreed.  Though it's a bit more complicated because if dst->misc_cg ==
src->misc_cg you should *not* charge and uncharge, because charging
could fail.  So I'm going for just simple charge/uncharge helpers for
now:

 From 0ac1004d9e15e9460b51651bd095e6c5ee9cf4f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:02:26 -0500
Subject: [PATCH 1/4] KVM: SEV: provide helpers to charge/uncharge misc_cg

Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge.  The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c94fe307b39..5bafa4bf7c49 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -120,16 +120,26 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
  	return true;
  }
  
+static int sev_misc_cg_try_charge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	return misc_cg_try_charge(type, sev->misc_cg, 1);
+}
+
+static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+}
+
  static int sev_asid_new(struct kvm_sev_info *sev)
  {
  	int asid, min_asid, max_asid, ret;
  	bool retry = true;
-	enum misc_res_type type;
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
  	WARN_ON(sev->misc_cg);
  	sev->misc_cg = get_current_misc_cg();
-	ret = misc_cg_try_charge(type, sev->misc_cg, 1);
+	ret = sev_misc_cg_try_charge(sev);
  	if (ret) {
  		put_misc_cg(sev->misc_cg);
  		sev->misc_cg = NULL;
@@ -162,7 +172,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
  
  	return asid;
  e_uncharge:
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  	return ret;
@@ -179,7 +189,6 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  {
  	struct svm_cpu_data *sd;
  	int cpu;
-	enum misc_res_type type;
  
  	mutex_lock(&sev_bitmap_lock);
  
@@ -192,8 +201,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  
  	mutex_unlock(&sev_bitmap_lock);
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  }



 From 5214132ae7e8310de26d5791f7fe913085a8e53c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:08:32 -0500
Subject: [PATCH 2/4] fix cgroup charging


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bafa4bf7c49..97048ff7c2ad 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1594,7 +1594,6 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  {
  	dst->active = true;
  	dst->asid = src->asid;
-	dst->misc_cg = src->misc_cg;
  	dst->handle = src->handle;
  	dst->pages_locked = src->pages_locked;
  
@@ -1602,6 +1601,11 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  	src->active = false;
  	src->handle = 0;
  	src->pages_locked = 0;
+
+	if (dst->misc_cg != src->misc_cg)
+		sev_misc_cg_uncharge(src);
+
+	put_misc_cg(src->misc_cg);
  	src->misc_cg = NULL;
  
  	INIT_LIST_HEAD(&dst->regions_list);
@@ -1611,6 +1615,7 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  {
  	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_info *src_sev;
  	struct file *source_kvm_file;
  	struct kvm *source_kvm;
  	struct kvm_vcpu *vcpu;
@@ -1640,25 +1645,39 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		ret = -EINVAL;
  		goto out_source;
  	}
+
+	src_sev = &to_kvm_svm(source_kvm)->sev_info;
+	dst_sev->misc_cg = get_current_misc_cg();
+	if (dst_sev->misc_cg != src_sev->misc_cg) {
+		ret = sev_misc_cg_try_charge(dst_sev);
+		if (ret)
+			goto out_dst_put_cgroup;
+	}
+
  	ret = sev_lock_vcpus_for_migration(kvm);
  	if (ret)
-		goto out_dst_vcpu;
+		goto out_dst_cgroup;
  	ret = sev_lock_vcpus_for_migration(source_kvm);
  	if (ret)
-		goto out_source_vcpu;
+		goto out_dst_vcpu;
  
-	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	sev_migrate_from(dst_sev, src_sev);
  	kvm_for_each_vcpu(i, vcpu, source_kvm) {
  		kvm_vcpu_reset(vcpu, /* init_event= */ false);
  	}
  	ret = 0;
  
-out_source_vcpu:
  	sev_unlock_vcpus_for_migration(source_kvm);
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);
-
+out_dst_cgroup:
+	if (ret < 0) {
+		sev_misc_cg_uncharge(dst_sev);
+out_dst_put_cgroup:
+		put_misc_cg(dst_sev->misc_cg);
+		dst_sev->misc_cg = NULL;
+	}
  out_source:
  	sev_unlock_after_migration(source_kvm);
  out_fput:



 From 943ba93c57ee25f85538decd68dca6e4ebdaf2c1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:13:38 -0500
Subject: [PATCH 3/4] KVM: generalize "bugged" VM to "dead" VM

Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon.  In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca9693d3436b..185094eb86b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9660,7 +9660,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  	}
  
  	if (kvm_request_pending(vcpu)) {
-		if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
  			r = -EIO;
  			goto out;
  		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..9e0667e3723e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQ_UNBLOCK           2
  #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD           (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQUEST_ARCH_BASE     8
  
  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -617,6 +617,7 @@ struct kvm {
  	unsigned int max_halt_poll_ns;
  	u32 dirty_ring_size;
  	bool vm_bugged;
+	bool vm_dead;
  
  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
  	struct notifier_block pm_notifier;
@@ -650,12 +651,19 @@ struct kvm {
  #define vcpu_err(vcpu, fmt, ...)					\
  	kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
  
+static inline void kvm_vm_dead(struct kvm *kvm)
+{
+	kvm->vm_dead = true;
+	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
+}
+
  static inline void kvm_vm_bugged(struct kvm *kvm)
  {
  	kvm->vm_bugged = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+	kvm_vm_dead(kvm);
  }
  
+
  #define KVM_BUG(cond, kvm, fmt...)				\
  ({								\
  	int __ret = (cond);					\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..d31724500501 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3747,7 +3747,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
  	struct kvm_fpu *fpu = NULL;
  	struct kvm_sregs *kvm_sregs = NULL;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3957,7 +3957,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
  	void __user *argp = compat_ptr(arg);
  	int r;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4023,7 +4023,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
  {
  	struct kvm_device *dev = filp->private_data;
  
-	if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
+	if (dev->kvm->mm != current->mm || dev->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4345,7 +4345,7 @@ static long kvm_vm_ioctl(struct file *filp,
  	void __user *argp = (void __user *)arg;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  	case KVM_CREATE_VCPU:
@@ -4556,7 +4556,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
  	struct kvm *kvm = filp->private_data;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT



 From fb168352e16a4dbd95a7c0d1e6add18f0496ac97 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:14:49 -0500
Subject: [PATCH 4/4] mark src vm as dead


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97048ff7c2ad..2403aea3dbd3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1662,12 +1662,9 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		goto out_dst_vcpu;
  
  	sev_migrate_from(dst_sev, src_sev);
-	kvm_for_each_vcpu(i, vcpu, source_kvm) {
-		kvm_vcpu_reset(vcpu, /* init_event= */ false);
-	}
-	ret = 0;
-
+	kvm_vm_dead(source_kvm);
  	sev_unlock_vcpus_for_migration(source_kvm);
+	ret = 0;
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);


I'll send it out properly when I finish reviewing.

Paolo
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a6729c8cf063..df6f56d689e2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6706,6 +6706,21 @@  MAP_SHARED mmap will result in an -EINVAL return.
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
 
+7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
+-------------------------------------
+
+Architectures: x86 SEV enabled
+Type: vm
+Parameters: args[0] is the fd of the source vm
+Returns: 0 on success
+
+This capability enables userspace to migrate the encryption context from the VM
+indicated by the fd to the VM this is called on.
+
+This is intended to support intra-host migration of VMs between userspace VMMs.
+in-guest workloads scheduled by the host. This allows for upgrading the VMM
+process without interrupting the guest.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8f48a7ec577..422869707466 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1468,6 +1468,7 @@  struct kvm_x86_ops {
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
+	int (*vm_migrate_protected_vm_from)(struct kvm *kvm, unsigned int source_fd);
 
 	int (*get_msr_feature)(struct kvm_msr_entry *entry);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 109b223c0b58..2c2f724c9096 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1529,6 +1529,143 @@  static bool cmd_allowed_from_miror(u32 cmd_id)
 	return false;
 }
 
+static int sev_lock_for_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	/*
+	 * Bail if this VM is already involved in a migration to avoid deadlock
+	 * between two VMs trying to migrate to/from each other.
+	 */
+	if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
+		return -EBUSY;
+
+	mutex_lock(&kvm->lock);
+
+	return 0;
+}
+
+static void sev_unlock_after_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_unlock(&kvm->lock);
+	atomic_set_release(&sev->migration_in_progress, 0);
+}
+
+
+static int sev_lock_vcpus_for_migration(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i, j;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (mutex_lock_killable(&vcpu->mutex))
+			goto out_unlock;
+	}
+
+	return 0;
+
+out_unlock:
+	kvm_for_each_vcpu(j, vcpu, kvm) {
+		if (i == j)
+			break;
+
+		mutex_unlock(&vcpu->mutex);
+	}
+	return -EINTR;
+}
+
+static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		mutex_unlock(&vcpu->mutex);
+	}
+}
+
+static void sev_migrate_from(struct kvm_sev_info *dst,
+			      struct kvm_sev_info *src)
+{
+	dst->active = true;
+	dst->asid = src->asid;
+	dst->misc_cg = src->misc_cg;
+	dst->handle = src->handle;
+	dst->pages_locked = src->pages_locked;
+
+	src->asid = 0;
+	src->active = false;
+	src->handle = 0;
+	src->pages_locked = 0;
+	src->misc_cg = NULL;
+
+	INIT_LIST_HEAD(&dst->regions_list);
+	list_replace_init(&src->regions_list, &dst->regions_list);
+}
+
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
+{
+	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct file *source_kvm_file;
+	struct kvm *source_kvm;
+	struct kvm_vcpu *vcpu;
+	int i, ret;
+
+	ret = sev_lock_for_migration(kvm);
+	if (ret)
+		return ret;
+
+	if (sev_guest(kvm)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	source_kvm_file = fget(source_fd);
+	if (!file_is_kvm(source_kvm_file)) {
+		ret = -EBADF;
+		goto out_fput;
+	}
+
+	source_kvm = source_kvm_file->private_data;
+	ret = sev_lock_for_migration(source_kvm);
+	if (ret)
+		goto out_fput;
+
+	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+		ret = -EINVAL;
+		goto out_source;
+	}
+	ret = sev_lock_vcpus_for_migration(kvm);
+	if (ret)
+		goto out_dst_vcpu;
+	ret = sev_lock_vcpus_for_migration(source_kvm);
+	if (ret)
+		goto out_source_vcpu;
+
+	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	kvm_for_each_vcpu(i, vcpu, source_kvm) {
+		kvm_vcpu_reset(vcpu, /* init_event= */ false);
+	}
+	ret = 0;
+
+out_source_vcpu:
+	sev_unlock_vcpus_for_migration(source_kvm);
+
+out_dst_vcpu:
+	sev_unlock_vcpus_for_migration(kvm);
+
+out_source:
+	sev_unlock_after_migration(source_kvm);
+out_fput:
+	if (source_kvm_file)
+		fput(source_kvm_file);
+out_unlock:
+	sev_unlock_after_migration(kvm);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 68294491c23d..c2e25ae4757f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4637,6 +4637,7 @@  static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_unreg_region = svm_unregister_enc_region,
 
 	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
+	.vm_migrate_protected_vm_from = svm_vm_migrate_from,
 
 	.can_emulate_instruction = svm_can_emulate_instruction,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6d8d762d208f..d7b44b37dfcf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -80,6 +80,7 @@  struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	atomic_t migration_in_progress;
 };
 
 struct kvm_svm {
@@ -557,6 +558,7 @@  int svm_register_enc_region(struct kvm *kvm,
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range);
 int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c8b5129effd..c80fa1d378c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5665,6 +5665,12 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm_x86_ops.vm_copy_enc_context_from)
 			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
 		return r;
+	case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:
+		r = -EINVAL;
+		if (kvm_x86_ops.vm_migrate_protected_vm_from)
+			r = kvm_x86_ops.vm_migrate_protected_vm_from(
+				kvm, cap->args[0]);
+		return r;
 	case KVM_CAP_EXIT_HYPERCALL:
 		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
 			r = -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..77b292ed01c1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING