diff mbox series

[v3,2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

Message ID 20210429104707.203055-3-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: guest interface for SEV live migration | expand

Commit Message

Paolo Bonzini April 29, 2021, 10:47 a.m. UTC
From: Ashish Kalra <ashish.kalra@amd.com>

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <93d7f2c2888315adc48905722574d89699edde33.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst        | 14 +++++++++
 Documentation/virt/kvm/cpuid.rst      |  6 ++++
 Documentation/virt/kvm/hypercalls.rst | 21 ++++++++++++++
 Documentation/virt/kvm/msr.rst        |  7 ++++-
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h  |  1 +
 arch/x86/kvm/x86.c                    | 42 +++++++++++++++++++++++++++
 include/uapi/linux/kvm.h              |  1 +
 include/uapi/linux/kvm_para.h         |  1 +
 9 files changed, 94 insertions(+), 1 deletion(-)

Comments

Sean Christopherson April 30, 2021, 8:10 p.m. UTC | #1
On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 57fc4090031a..cf1b0b2099b0 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
>  data:
>          This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
>          CPUID.  Bit 0 represents whether live migration of the guest is allowed.
> +
>          When a guest is started, bit 0 will be 1 if the guest has encrypted
> -        memory and 0 if the guest does not have encrypted memory.
> +        memory and 0 if the guest does not have encrypted memory.  If the
> +        guest is communicating page encryption status to the host using the
> +        ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
> +        allow live migration of the guest.  The MSR is read-only if
> +        ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.

I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
in any way shape or form.  I can understand making it read-only or dropping
writes if it's not intercepted by userspace, but making it read-only for
non-encrypted guests makes it useful only for encrypted guests, which defeats
the purpose of genericizing the MSR.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e9c40be9235c..0c2524bbaa84 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
>  			return 1;
>  
> +		/*
> +		 * This implementation is only good if userspace has *not*
> +		 * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS.  If userspace
> +		 * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
> +		 * MSR filter in order to accept writes that change bit 0.
> +		 */
>  		if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
>  			return 1;

This behavior doesn't match the documentation.

  a. The MSR is not read-only for legacy guests since they can write '0'.
  b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
     a guest with encrypted memory can write '1' regardless of whether userspace
     has enabled KVM_FEATURE_HC_PAGE_STATUS.
  c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
     bit 0, but not clear it.  This doesn't seem intentional?

Why not simply drop writes?  E.g.

		if (data & ~KVM_MIGRATION_READY)
			return 1;
		break;

And then do "msr->data = 0;" in the read path.  That's just as effective as
making the MSR read-only to force userspace to intercept the MSR if it wants to
do anything useful with the information, and it's easy to document.

>  		break;
Paolo Bonzini May 1, 2021, 9:01 a.m. UTC | #2
On 30/04/21 22:10, Sean Christopherson wrote:
> On Thu, Apr 29, 2021, Paolo Bonzini wrote:
>> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
>> index 57fc4090031a..cf1b0b2099b0 100644
>> --- a/Documentation/virt/kvm/msr.rst
>> +++ b/Documentation/virt/kvm/msr.rst
>> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
>>   data:
>>           This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
>>           CPUID.  Bit 0 represents whether live migration of the guest is allowed.
>> +
>>           When a guest is started, bit 0 will be 1 if the guest has encrypted
>> -        memory and 0 if the guest does not have encrypted memory.
>> +        memory and 0 if the guest does not have encrypted memory.  If the
>> +        guest is communicating page encryption status to the host using the
>> +        ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
>> +        allow live migration of the guest.  The MSR is read-only if
>> +        ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
> 
> I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
> in any way shape or form.  I can understand making it read-only or dropping
> writes if it's not intercepted by userspace, but making it read-only for
> non-encrypted guests makes it useful only for encrypted guests, which defeats
> the purpose of genericizing the MSR.

Yeah, I see your point.  On the other hand by making it unconditionally 
writable we must implement the writability in KVM, because a read-only 
implementation would not comply with the spec.

>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e9c40be9235c..0c2524bbaa84 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
>>   			return 1;
>>   
>> +		/*
>> +		 * This implementation is only good if userspace has *not*
>> +		 * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS.  If userspace
>> +		 * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
>> +		 * MSR filter in order to accept writes that change bit 0.
>> +		 */
>>   		if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
>>   			return 1;
> 
> This behavior doesn't match the documentation.
> 
>    a. The MSR is not read-only for legacy guests since they can write '0'.
>    b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
>       a guest with encrypted memory can write '1' regardless of whether userspace
>       has enabled KVM_FEATURE_HC_PAGE_STATUS.

Right, I should have said "not changeable" rather than "read-only".

>    c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
>       bit 0, but not clear it.  This doesn't seem intentional?

It is intentional, clearing it would mean preserving the value in the 
kernel so that userspace can read it.

So... I don't know, all in all having both the separate CPUID and the 
userspace implementation reeks of overengineering.  It should be either 
of these:

- separate CPUID bit, MSR unconditionally writable and implemented in 
KVM.  Userspace is expected to ignore the MSR value for encrypted guests 
unless KVM_FEATURE_HC_PAGE_STATUS is exposed.  Userspace should respect 
it even for unencrypted guests (not a migration-DoS vector, because 
userspace can just not expose the feature).

- make it completely independent from migration, i.e. it's just a facet 
of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It 
would use CPUID bit as the encryption status bitmap and have no code at 
all in KVM (userspace needs to set up the filter and implement everything).

At this point I very much prefer the latter, which is basically Ashish's 
earlier patch.

Paolo

> Why not simply drop writes?  E.g.
> 
> 		if (data & ~KVM_MIGRATION_READY)
> 			return 1;
> 		break;
> 
> And then do "msr->data = 0;" in the read path.  That's just as effective as
> making the MSR read-only to force userspace to intercept the MSR if it wants to
> do anything useful with the information, and it's easy to document.
Steve Rutherford May 3, 2021, 11:22 p.m. UTC | #3
On Sat, May 1, 2021 at 2:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/04/21 22:10, Sean Christopherson wrote:
> > On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> >> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> >> index 57fc4090031a..cf1b0b2099b0 100644
> >> --- a/Documentation/virt/kvm/msr.rst
> >> +++ b/Documentation/virt/kvm/msr.rst
> >> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
> >>   data:
> >>           This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
> >>           CPUID.  Bit 0 represents whether live migration of the guest is allowed.
> >> +
> >>           When a guest is started, bit 0 will be 1 if the guest has encrypted
> >> -        memory and 0 if the guest does not have encrypted memory.
> >> +        memory and 0 if the guest does not have encrypted memory.  If the
> >> +        guest is communicating page encryption status to the host using the
> >> +        ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
> >> +        allow live migration of the guest.  The MSR is read-only if
> >> +        ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
> >
> > I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
> > in any way shape or form.  I can understand making it read-only or dropping
> > writes if it's not intercepted by userspace, but making it read-only for
> > non-encrypted guests makes it useful only for encrypted guests, which defeats
> > the purpose of genericizing the MSR.
>
> Yeah, I see your point.  On the other hand by making it unconditionally
> writable we must implement the writability in KVM, because a read-only
> implementation would not comply with the spec.
>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index e9c40be9235c..0c2524bbaa84 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >>              if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
> >>                      return 1;
> >>
> >> +            /*
> >> +             * This implementation is only good if userspace has *not*
> >> +             * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS.  If userspace
> >> +             * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
> >> +             * MSR filter in order to accept writes that change bit 0.
> >> +             */
> >>              if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
> >>                      return 1;
> >
> > This behavior doesn't match the documentation.
> >
> >    a. The MSR is not read-only for legacy guests since they can write '0'.
> >    b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
> >       a guest with encrypted memory can write '1' regardless of whether userspace
> >       has enabled KVM_FEATURE_HC_PAGE_STATUS.
>
> Right, I should have said "not changeable" rather than "read-only".
>
> >    c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
> >       bit 0, but not clear it.  This doesn't seem intentional?
>
> It is intentional, clearing it would mean preserving the value in the
> kernel so that userspace can read it.
>
> So... I don't know, all in all having both the separate CPUID and the
> userspace implementation reeks of overengineering.  It should be either
> of these:
>
> - separate CPUID bit, MSR unconditionally writable and implemented in
> KVM.  Userspace is expected to ignore the MSR value for encrypted guests
> unless KVM_FEATURE_HC_PAGE_STATUS is exposed.  Userspace should respect
> it even for unencrypted guests (not a migration-DoS vector, because
> userspace can just not expose the feature).
>
> - make it completely independent from migration, i.e. it's just a facet
> of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It
> would use CPUID bit as the encryption status bitmap and have no code at
> all in KVM (userspace needs to set up the filter and implement everything).
As far as I know, because of MSR filtering, the only "code" that needs
to be in KVM for MSR handling is a #define reserving the PV feature
number and a #define for the MSR number.

Arguably, you don't even need to add the new PV bits to the supported
cpuid, since MSR filtering is really what determines if kernel support
is present.

>
> At this point I very much prefer the latter, which is basically Ashish's
> earlier patch.
The minor distinction would be that if you expose the cpuid bit to the
guest you plan on intercepting the MSR with filters, and would not
need any handler code in the kernel.

Steve
>
> Paolo
Paolo Bonzini May 4, 2021, 8:06 a.m. UTC | #4
On 04/05/21 01:22, Steve Rutherford wrote:
> As far as I know, because of MSR filtering, the only "code" that needs
> to be in KVM for MSR handling is a #define reserving the PV feature
> number and a #define for the MSR number.
> 
> Arguably, you don't even need to add the new PV bits to the supported
> cpuid, since MSR filtering is really what determines if kernel support
> is present.

Not only I don't need to do that, I must not. :)

>> At this point I very much prefer the latter, which is basically Ashish's
>> earlier patch.
>
> The minor distinction would be that if you expose the cpuid bit to the
> guest you plan on intercepting the MSR with filters, and would not
> need any handler code in the kernel.

Yep, and it's not a minor distinction after all (especially from the PoV 
of the guy who actually ends up maintaining the code, i.e. me), so 
that's what I'm going for.

Paolo
Sean Christopherson May 4, 2021, 5:09 p.m. UTC | #5
On Sat, May 01, 2021, Paolo Bonzini wrote:
> - make it completely independent from migration, i.e. it's just a facet of
> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It would
> use CPUID bit as the encryption status bitmap and have no code at all in KVM
> (userspace needs to set up the filter and implement everything).

If the bit is purely a "page encryption status is up-to-date", what about
overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well?   That
would eliminate my biggest complaint about having what is effectively a single
paravirt feature split into two separate, but intertwined chunks of ABI.

#define KVM_HC_PAGE_ENC_UPDATE		12

#define KVM_HC_PAGE_ENC_REGION_UPDATE	0 /* encrypted vs. plain text */
#define KVM_HC_PAGE_ENC_STATUS_UPDATE	1 /* up-to-date vs. stale */

		ret = -KVM_ENOSYS;
		if (!vcpu->kvm->arch.hypercall_exit_enabled)
		        break;

		ret = -EINVAL;
		if (a0 == KVM_HC_PAGE_ENC_REGION_UPDATE) {
			u64 gpa = a1, npages = a2;

			if (!PAGE_ALIGNED(gpa) || !npages ||
			    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa))
				break;
		} else if (a0 != KVM_HC_PAGE_ENC_STATUS_UPDATE) {
			break;
		}

		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
		vcpu->run->hypercall.args[0]  = a0;
		vcpu->run->hypercall.args[1]  = a1;
		vcpu->run->hypercall.args[2]  = a2;
		vcpu->run->hypercall.args[3]  = a3;
		vcpu->run->hypercall.longmode = op_64_bit;
		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
		return 0;
Paolo Bonzini May 4, 2021, 8:27 p.m. UTC | #6
On 04/05/21 19:09, Sean Christopherson wrote:
> On Sat, May 01, 2021, Paolo Bonzini wrote:
>> - make it completely independent from migration, i.e. it's just a facet of
>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It would
>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
>> (userspace needs to set up the filter and implement everything).
> 
> If the bit is purely a "page encryption status is up-to-date", what about
> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well?   That
> would eliminate my biggest complaint about having what is effectively a single
> paravirt feature split into two separate, but intertwined chunks of ABI.

It's true that they are intertwined, but I dislike not having a way to 
read the current state.

Paolo

> 
> #define KVM_HC_PAGE_ENC_UPDATE		12
> 
> #define KVM_HC_PAGE_ENC_REGION_UPDATE	0 /* encrypted vs. plain text */
> #define KVM_HC_PAGE_ENC_STATUS_UPDATE	1 /* up-to-date vs. stale */
> 
> 		ret = -KVM_ENOSYS;
> 		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> 		        break;
> 
> 		ret = -EINVAL;
> 		if (a0 == KVM_HC_PAGE_ENC_REGION_UPDATE) {
> 			u64 gpa = a1, npages = a2;
> 
> 			if (!PAGE_ALIGNED(gpa) || !npages ||
> 			    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa))
> 				break;
> 		} else if (a0 != KVM_HC_PAGE_ENC_STATUS_UPDATE) {
> 			break;
> 		}
> 
> 		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> 		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
> 		vcpu->run->hypercall.args[0]  = a0;
> 		vcpu->run->hypercall.args[1]  = a1;
> 		vcpu->run->hypercall.args[2]  = a2;
> 		vcpu->run->hypercall.args[3]  = a3;
> 		vcpu->run->hypercall.longmode = op_64_bit;
> 		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> 		return 0;
>
Sean Christopherson May 4, 2021, 8:33 p.m. UTC | #7
On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 19:09, Sean Christopherson wrote:
> > On Sat, May 01, 2021, Paolo Bonzini wrote:
> > > - make it completely independent from migration, i.e. it's just a facet of
> > > MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It would
> > > use CPUID bit as the encryption status bitmap and have no code at all in KVM
> > > (userspace needs to set up the filter and implement everything).
> > 
> > If the bit is purely a "page encryption status is up-to-date", what about
> > overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well?   That
> > would eliminate my biggest complaint about having what is effectively a single
> > paravirt feature split into two separate, but intertwined chunks of ABI.
> 
> It's true that they are intertwined, but I dislike not having a way to read
> the current state.

From the guest?
Paolo Bonzini May 4, 2021, 8:56 p.m. UTC | #8
On 04/05/21 22:33, Sean Christopherson wrote:
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>> On 04/05/21 19:09, Sean Christopherson wrote:
>>> On Sat, May 01, 2021, Paolo Bonzini wrote:
>>>> - make it completely independent from migration, i.e. it's just a facet of
>>>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It would
>>>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
>>>> (userspace needs to set up the filter and implement everything).
>>>
>>> If the bit is purely a "page encryption status is up-to-date", what about
>>> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well?   That
>>> would eliminate my biggest complaint about having what is effectively a single
>>> paravirt feature split into two separate, but intertwined chunks of ABI.
>>
>> It's true that they are intertwined, but I dislike not having a way to read
>> the current state.
> 
>  From the guest?

Yes, host userspace obviously doesn't need one since it's implemented 
through an MSR filter.  It may not be really necessary to read it, but 
it's a bit jarring compared to how the rest of the PV APIs uses MSRs.

Also from a debugging/crashdump point of view the VMM may have an 
established way to read an MSR from a vCPU, but it won't work if you 
come up with a new way to set the state.

Paolo
Steve Rutherford May 4, 2021, 9:16 p.m. UTC | #9
On Tue, May 4, 2021 at 1:56 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 04/05/21 22:33, Sean Christopherson wrote:
> > On Tue, May 04, 2021, Paolo Bonzini wrote:
> >> On 04/05/21 19:09, Sean Christopherson wrote:
> >>> On Sat, May 01, 2021, Paolo Bonzini wrote:
> >>>> - make it completely independent from migration, i.e. it's just a facet of
> >>>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date.  It would
> >>>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
> >>>> (userspace needs to set up the filter and implement everything).
> >>>
> >>> If the bit is purely a "page encryption status is up-to-date", what about
> >>> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well?   That
> >>> would eliminate my biggest complaint about having what is effectively a single
> >>> paravirt feature split into two separate, but intertwined chunks of ABI.
> >>
> >> It's true that they are intertwined, but I dislike not having a way to read
> >> the current state.
> >
> >  From the guest?
>
> Yes, host userspace obviously doesn't need one since it's implemented
> through an MSR filter.  It may not be really necessary to read it, but
> it's a bit jarring compared to how the rest of the PV APIs uses MSRs.
>
> Also from a debugging/crashdump point of view the VMM may have an
> established way to read an MSR from a vCPU, but it won't work if you
> come up with a new way to set the state.

Agreed on the preference for an MSR. I particularly appreciate that it
reduces the kernel footprint for these changes.


Steve
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 37c520ddb7e8..f8794fed23a4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6891,3 +6891,17 @@  This capability is always enabled.
 This capability indicates that the KVM virtual PTP service is
 supported in the host. A VMM can check whether the service is
 available to the guest on migration.
+
+8.33 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Capability: KVM_CAP_EXIT_HYPERCALL
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that will exit to userspace.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index c53d7e2b8ff4..014ceffc46f2 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -99,6 +99,12 @@  KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
 KVM_FEATURE_MIGRATION_CONTROL      16          guest checks this feature bit before
                                                using MSR_KVM_MIGRATION_CONTROL
 
+KVM_FEATURE_HC_PAGE_ENC_STATUS     17          guest checks this feature bit before
+                                               using the page encryption state
+                                               hypercall to notify the page state
+                                               change, and before modifying bit 0 of
+                                               MSR_KVM_MIGRATION_CONTROL
+
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                                per-cpu warps are expected in
                                                kvmclock
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index ed4fddd364ea..117ff3b27d3c 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,24 @@  a0: destination APIC ID
 
 :Usage example: When sending a call-function IPI-many to vCPUs, yield if
 	        any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: page encryption status
+
+   Where:
+	* 1: Page is encrypted
+	* 0: Page is decrypted
+
+**Implementation note**: this hypercall is implemented in userspace via
+the KVM_CAP_EXIT_HYPERCALL capability.  Userspace must enable that capability
+before advertising KVM_FEATURE_HC_PAGE_ENC_STATUS in the guest CPUID.  In
+addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
+must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 57fc4090031a..cf1b0b2099b0 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -383,5 +383,10 @@  MSR_KVM_MIGRATION_CONTROL:
 data:
         This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
         CPUID.  Bit 0 represents whether live migration of the guest is allowed.
+
         When a guest is started, bit 0 will be 1 if the guest has encrypted
-        memory and 0 if the guest does not have encrypted memory.
+        memory and 0 if the guest does not have encrypted memory.  If the
+        guest is communicating page encryption status to the host using the
+        ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
+        allow live migration of the guest.  The MSR is read-only if
+        ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5491fc617451..9b90a0faeab4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1065,6 +1065,8 @@  struct kvm_arch {
 	u32 user_space_msr_mask;
 	struct kvm_x86_msr_filter __rcu *msr_filter;
 
+	bool hypercall_exit_enabled;
+
 	/* Guest can access the SGX PROVISIONKEY. */
 	bool sgx_provisioning_allowed;
 
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 21390fccfb90..8fadae64bc66 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -34,6 +34,7 @@ 
 #define KVM_FEATURE_ASYNC_PF_INT	14
 #define KVM_FEATURE_MSI_EXT_DEST_ID	15
 #define KVM_FEATURE_MIGRATION_CONTROL	16
+#define KVM_FEATURE_HC_PAGE_ENC_STATUS	17
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9c40be9235c..0c2524bbaa84 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3279,6 +3279,12 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
 			return 1;
 
+		/*
+		 * This implementation is only good if userspace has *not*
+		 * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS.  If userspace
+		 * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
+		 * MSR filter in order to accept writes that change bit 0.
+		 */
 		if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
 			return 1;
 		break;
@@ -3842,6 +3848,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
 		r = 1;
 		break;
+	case KVM_CAP_EXIT_HYPERCALL:
+		r = (1 << KVM_HC_PAGE_ENC_STATUS);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
 		return KVM_GUESTDBG_VALID_MASK;
 #ifdef CONFIG_KVM_XEN
@@ -5444,6 +5453,10 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	}
 #endif
+	case KVM_CAP_EXIT_HYPERCALL:
+		kvm->arch.hypercall_exit_enabled = cap->args[0];
+		r = 0;
+		break;
 	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
 		r = -EINVAL;
 		if (kvm_x86_ops.vm_copy_enc_context_from)
@@ -8329,6 +8342,13 @@  static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 	return;
 }
 
+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+	kvm_rax_write(vcpu, vcpu->run->hypercall.ret);
+	++vcpu->stat.hypercalls;
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -8394,6 +8414,28 @@  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		kvm_sched_yield(vcpu, a0);
 		ret = 0;
 		break;
+	case KVM_HC_PAGE_ENC_STATUS: {
+		u64 gpa = a0, npages = a1, enc = a2;
+
+		ret = -KVM_ENOSYS;
+		if (!vcpu->kvm->arch.hypercall_exit_enabled)
+			break;
+
+		if (!PAGE_ALIGNED(gpa) || !npages ||
+		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
+		vcpu->run->hypercall.args[0]  = gpa;
+		vcpu->run->hypercall.args[1]  = npages;
+		vcpu->run->hypercall.args[2]  = enc;
+		vcpu->run->hypercall.longmode = op_64_bit;
+		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+		return 0;
+	}
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..1fb4fd863324 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_EXIT_HYPERCALL 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@ 
 #define KVM_HC_CLOCK_PAIRING		9
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
+#define KVM_HC_PAGE_ENC_STATUS		12
 
 /*
  * hypercalls use architecture specific