diff mbox series

[v6,1/5] x86/kvm: Add AMD SEV specific Hypercall3

Message ID 6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com (mailing list archive)
State New, archived
Headers show
Series Add Guest API & Guest Kernel support for SEV live migration. | expand

Commit Message

Kalra, Ashish Aug. 24, 2021, 11:04 a.m. UTC
From: Brijesh Singh <brijesh.singh@amd.com>

KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

To highlight the need to provide this interface, capturing the
flow of apply_alternatives() :
setup_arch() call init_hypervisor_platform() which detects
the hypervisor platform the kernel is running under and then the
hypervisor specific initialization code can make early hypercalls.
For example, KVM specific initialization in case of SEV will try
to mark the "__bss_decrypted" section's encryption state via early
page encryption status hypercalls.

Now, apply_alternatives() is called much later when setup_arch()
calls check_bugs(), so we do need some kind of an early,
pre-alternatives hypercall interface. Other cases of pre-alternatives
hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES
and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for
SEV generally.

Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.

This kvm_sev_hypercall3() function is abstracted and used as follows :
All these early hypercalls are made through early_set_memory_XX() interfaces,
which in turn invoke pv_ops (paravirt_ops).

This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
is a generic interface and can easily have SEV, TDX and any other
future platform specific abstractions added to it.

Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to
invoke kvm_sev_hypercall3() in case of SEV.

Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed()
can be setup to a TDX specific callback.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Steve Rutherford Sept. 16, 2021, 1:15 a.m. UTC | #1
On Tue, Aug 24, 2021 at 4:04 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> To highlight the need to provide this interface, capturing the
> flow of apply_alternatives() :
> setup_arch() call init_hypervisor_platform() which detects
> the hypervisor platform the kernel is running under and then the
> hypervisor specific initialization code can make early hypercalls.
> For example, KVM specific initialization in case of SEV will try
> to mark the "__bss_decrypted" section's encryption state via early
> page encryption status hypercalls.
>
> Now, apply_alternatives() is called much later when setup_arch()
> calls check_bugs(), so we do need some kind of an early,
> pre-alternatives hypercall interface. Other cases of pre-alternatives
> hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES
> and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for
> SEV generally.
>
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
>
> This kvm_sev_hypercall3() function is abstracted and used as follows :
> All these early hypercalls are made through early_set_memory_XX() interfaces,
> which in turn invoke pv_ops (paravirt_ops).
>
> This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
> is a generic interface and can easily have SEV, TDX and any other
> future platform specific abstractions added to it.
>
> Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to
> invoke kvm_sev_hypercall3() in case of SEV.
>
> Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed()
> can be setup to a TDX specific callback.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 69299878b200..56935ebb1dfe 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -83,6 +83,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
>         return ret;
>  }
>
> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> +                                     unsigned long p2, unsigned long p3)
> +{
> +       long ret;
> +
> +       asm volatile("vmmcall"
> +                    : "=a"(ret)
> +                    : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> +                    : "memory");
> +       return ret;
> +}
> +
>  #ifdef CONFIG_KVM_GUEST
>  void kvmclock_init(void);
>  void kvmclock_disable(void);
> --
> 2.17.1
>

Looking at these threads, this patch either:
1) Needs review/approval from a maintainer that is interested or
2) Should flip back to using alternative (as suggested by Sean). In
particular: `ALTERNATIVE("vmmcall", "vmcall",
ALT_NOT(X86_FEATURE_VMMCALL))`. My understanding is that the advantage
of this is that (after calling apply alternatives) you get exactly the
same behavior as before. But before apply alternatives, you get the
desired flipped behavior. The previous patch changed the behavior
after apply alternatives in a very slight manner (if feature flags
were not set, you'd get a different instruction).

I personally don't have strong feelings on this decision, but this
decision does need to be made for this patch series to move forward.

I'd also be curious to hear Sean's opinion on this since he was vocal
about this previously.

Thanks,
Steve
Sean Christopherson Sept. 20, 2021, 4:07 p.m. UTC | #2
On Wed, Sep 15, 2021, Steve Rutherford wrote:
> Looking at these threads, this patch either:
> 1) Needs review/approval from a maintainer that is interested or
> 2) Should flip back to using alternative (as suggested by Sean). In
> particular: `ALTERNATIVE("vmmcall", "vmcall",
> ALT_NOT(X86_FEATURE_VMMCALL))`. My understanding is that the advantage
> of this is that (after calling apply alternatives) you get exactly the
> same behavior as before. But before apply alternatives, you get the
> desired flipped behavior. The previous patch changed the behavior
> after apply alternatives in a very slight manner (if feature flags
> were not set, you'd get a different instruction).
> 
> I personally don't have strong feelings on this decision, but this
> decision does need to be made for this patch series to move forward.
> 
> I'd also be curious to hear Sean's opinion on this since he was vocal
> about this previously.

Pulling in Ashish's last email from the previous thread, which I failed to respond
to.

https://lore.kernel.org/all/20210820133223.GA28059@ashkalra_ubuntu_server/T/#u

On Fri, Aug 20, 2021, Ashish Kalra wrote:
> On Thu, Aug 19, 2021 at 11:15:26PM +0000, Sean Christopherson wrote:
> > On Thu, Aug 19, 2021, Kalra, Ashish wrote:
> > >
> > > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > > > I think it makes more sense to stick to the original approach/patch, i.e.,
> > > > introducing a new private hypercall interface like kvm_sev_hypercall3() and
> > > > let early paravirtualized kernel code invoke this private hypercall
> > > > interface wherever required.
> >
> > I don't like the idea of duplicating code just because the problem is tricky to
> > solve.  Right now it's just one function, but it could balloon to multiple in
> > the future.  Plus there's always the possibility of a new, pre-alternatives
> > kvm_hypercall() being added in generic code, at which point using an SEV-specific
> > variant gets even uglier.

...

> Now, apply_alternatives() is called much later when setup_arch() calls
> check_bugs(), so we do need some kind of an early, pre-alternatives
> hypercall interface.
>
> Other cases of pre-alternatives hypercalls include marking per-cpu GHCB
> pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and
> kvm_apic_eoi as decrypted for SEV generally.
>
> Actually using this kvm_sev_hypercall3() function may be abstracted
> quite nicely. All these early hypercalls are made through
> early_set_memory_XX() interfaces, which in turn invoke pv_ops.
>
> Now, pv_ops can have this SEV/TDX specific abstractions.
>
> Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup
> to kvm_sev_hypercall3() in case of SEV.
>
> Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can
> be setup to a TDX specific callback.
>
> Therefore, this early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
> is a generic interface and can easily have SEV, TDX and any other future platform
> specific abstractions added to it.

Unless there's some fundamental technical hurdle I'm overlooking, if pv_ops can
be configured early enough to handle this, then so can alternatives.  Adding
notify_page_enc_status_changed() may be necessary in the future, e.g. for TDX
or SNP, but IMO that is orthogonal to adding a generic, 100% redundant helper.

I appreciate that simply swapping the default from VMCALL->VMMCALL is a bit dirty
since it gives special meaning to the default value, but if that's the argument
against reusing kvm_hypercall3() then we should solve the early alternatives
problem, not fudge around it.
Kalra, Ashish Sept. 21, 2021, 9:58 a.m. UTC | #3
Hello Sean, Steve,

On Mon, Sep 20, 2021 at 04:07:04PM +0000, Sean Christopherson wrote:
> On Wed, Sep 15, 2021, Steve Rutherford wrote:
> > Looking at these threads, this patch either:
> > 1) Needs review/approval from a maintainer that is interested or
> > 2) Should flip back to using alternative (as suggested by Sean). In
> > particular: `ALTERNATIVE("vmmcall", "vmcall",
> > ALT_NOT(X86_FEATURE_VMMCALL))`. My understanding is that the advantage
> > of this is that (after calling apply alternatives) you get exactly the
> > same behavior as before. But before apply alternatives, you get the
> > desired flipped behavior. The previous patch changed the behavior
> > after apply alternatives in a very slight manner (if feature flags
> > were not set, you'd get a different instruction).
> > 

This is simply a Hack, i don't think this is a good approach to take forward.

> > I personally don't have strong feelings on this decision, but this
> > decision does need to be made for this patch series to move forward.
> > 
> > I'd also be curious to hear Sean's opinion on this since he was vocal
> > about this previously.
> 
> Pulling in Ashish's last email from the previous thread, which I failed to respond
> to.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20210820133223.GA28059%40ashkalra_ubuntu_server%2FT%2F%23u&amp;data=04%7C01%7CAshish.Kalra%40amd.com%7C14e66eb4c505448175ae08d97c50b3c1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637677508322702274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=STJ6ze6iE7Uu7U3XPwWhMxwB%2BoYYcbZ7JcnIdlZ41rY%3D&amp;reserved=0
> 
> On Fri, Aug 20, 2021, Ashish Kalra wrote:
> > On Thu, Aug 19, 2021 at 11:15:26PM +0000, Sean Christopherson wrote:
> > > On Thu, Aug 19, 2021, Kalra, Ashish wrote:
> > > >
> > > > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > > > > I think it makes more sense to stick to the original approach/patch, i.e.,
> > > > > introducing a new private hypercall interface like kvm_sev_hypercall3() and
> > > > > let early paravirtualized kernel code invoke this private hypercall
> > > > > interface wherever required.
> > >
> > > I don't like the idea of duplicating code just because the problem is tricky to
> > > solve.  Right now it's just one function, but it could balloon to multiple in
> > > the future.  Plus there's always the possibility of a new, pre-alternatives
> > > kvm_hypercall() being added in generic code, at which point using an SEV-specific
> > > variant gets even uglier.
> 
> ...
> 
> > Now, apply_alternatives() is called much later when setup_arch() calls
> > check_bugs(), so we do need some kind of an early, pre-alternatives
> > hypercall interface.
> >
> > Other cases of pre-alternatives hypercalls include marking per-cpu GHCB
> > pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and
> > kvm_apic_eoi as decrypted for SEV generally.
> >
> > Actually using this kvm_sev_hypercall3() function may be abstracted
> > quite nicely. All these early hypercalls are made through
> > early_set_memory_XX() interfaces, which in turn invoke pv_ops.
> >
> > Now, pv_ops can have this SEV/TDX specific abstractions.
> >
> > Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup
> > to kvm_sev_hypercall3() in case of SEV.
> >
> > Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can
> > be setup to a TDX specific callback.
> >
> > Therefore, this early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
> > is a generic interface and can easily have SEV, TDX and any other future platform
> > specific abstractions added to it.
> 
> Unless there's some fundamental technical hurdle I'm overlooking, if pv_ops can
> be configured early enough to handle this, then so can alternatives.  
> 

Now, as i mentioned earlier, apply_alternatives() is only called boot
CPU identification has been done which is a lot of support code which
may be dependent on earlier setup_arch() code and then it does CPU
mitigtion selections before patching alternatives, again which may have
dependencies on previous code paths in setup_arch(), so i am not sure if
we can call apply_alternatives() earlier. 

Maybe for a guest kernel and virtualized boot enviroment, CPU
identification may not be as complicated as for a physical host, but
still it may have dependencies on earlier architecture specific boot
code.

> Adding notify_page_enc_status_changed() may be necessary in the future, e.g. for TDX
> or SNP, but IMO that is orthogonal to adding a generic, 100% redundant helper.

If we have to do this in the future and as Sean mentioned ealier that
vmcall needs to be fixed for TDX (as it will cause a #VE), then why not
add this abstraction right now ?

Thanks,
Ashish

> I appreciate that simply swapping the default from VMCALL->VMMCALL is a bit dirty
> since it gives special meaning to the default value, but if that's the argument
> against reusing kvm_hypercall3() then we should solve the early alternatives
> problem, not fudge around it.
Sean Christopherson Sept. 21, 2021, 1:50 p.m. UTC | #4
On Tue, Sep 21, 2021, Ashish Kalra wrote:
> This is simply a Hack, i don't think this is a good approach to take forward.

But a clever hack ;-)

> > Unless there's some fundamental technical hurdle I'm overlooking, if pv_ops can
> > be configured early enough to handle this, then so can alternatives.  
> 
> Now, as i mentioned earlier, apply_alternatives() is only called boot
> CPU identification has been done which is a lot of support code which
> may be dependent on earlier setup_arch() code and then it does CPU
> mitigtion selections before patching alternatives, again which may have
> dependencies on previous code paths in setup_arch(), so i am not sure if
> we can call apply_alternatives() earlier. 

apply_alternatives() is a generic helper that can work on any struct alt_instr
array, e.g. KVM_HYPERCALL can put its alternative into a different section that's
patched as soon as the VMM is identified.

> Maybe for a guest kernel and virtualized boot enviroment, CPU
> identification may not be as complicated as for a physical host, but
> still it may have dependencies on earlier architecture specific boot
> code.
> 
> > Adding notify_page_enc_status_changed() may be necessary in the future, e.g. for TDX
> > or SNP, but IMO that is orthogonal to adding a generic, 100% redundant helper.
> 
> If we have to do this in the future and as Sean mentioned ealier that
> vmcall needs to be fixed for TDX (as it will cause a #VE), then why not
> add this abstraction right now ?

I'm not objecting to adding a PV op, I'm objecting to kvm_sev_hypercall3().  If
others disagree and feel it's the way forward, I certainly won't stand in the way,
but IMO it's unnecessary code duplication.
Borislav Petkov Sept. 21, 2021, 2:51 p.m. UTC | #5
On Tue, Sep 21, 2021 at 01:50:09PM +0000, Sean Christopherson wrote:
> apply_alternatives() is a generic helper that can work on any struct alt_instr
> array, e.g. KVM_HYPERCALL can put its alternative into a different section that's
> patched as soon as the VMM is identified.

Where exactly in the boot process you wanna move it?

As Ashish says, you need the boot_cpu_data bits properly set before it
runs.
Sean Christopherson Sept. 21, 2021, 4:07 p.m. UTC | #6
On Tue, Sep 21, 2021, Borislav Petkov wrote:
> On Tue, Sep 21, 2021 at 01:50:09PM +0000, Sean Christopherson wrote:
> > apply_alternatives() is a generic helper that can work on any struct alt_instr
> > array, e.g. KVM_HYPERCALL can put its alternative into a different section that's
> > patched as soon as the VMM is identified.
> 
> Where exactly in the boot process you wanna move it?

init_hypervisor_platform(), after x86_init.hyper.init_platform() so that the
PV support can set the desired feature flags.  Since kvm_hypercall*() is only
used by KVM guests, set_cpu_cap(c, X86_FEATURE_VMMCALL) can be moved out of
early_init_amd/hygon() and into kvm_init_platform().

> As Ashish says, you need the boot_cpu_data bits properly set before it
> runs.

Another option would be to refactor apply_alternatives() to allow the caller to
provide a different feature check mechanism than boot_cpu_has(), which I think
would let us drop X86_FEATURE_VMMCALL, X86_FEATURE_VMCALL, and X86_FEATURE_VMW_VMMCALL
from cpufeatures.  That might get more than a bit gross though.

But like I said, if others think I'm over-engineering this...
Borislav Petkov Sept. 22, 2021, 9:38 a.m. UTC | #7
On Tue, Sep 21, 2021 at 04:07:03PM +0000, Sean Christopherson wrote:
> init_hypervisor_platform(), after x86_init.hyper.init_platform() so that the
> PV support can set the desired feature flags.  Since kvm_hypercall*() is only
> used by KVM guests, set_cpu_cap(c, X86_FEATURE_VMMCALL) can be moved out of
> early_init_amd/hygon() and into kvm_init_platform().

See below.

> Another option would be to refactor apply_alternatives() to allow
> the caller to provide a different feature check mechanism than
> boot_cpu_has(), which I think would let us drop X86_FEATURE_VMMCALL,
> X86_FEATURE_VMCALL, and X86_FEATURE_VMW_VMMCALL from cpufeatures. That
> might get more than a bit gross though.

Uuuf.

So here's what I'm seeing (line numbers given to show when stuff
happens):

start_kernel
|-> 953: setup_arch
    |-> 794: early_cpu_init
    |-> 936: init_hypervisor_platform
|
|-> 1134: check_bugs
	  |-> alternative_instructions

at line 794 setup_arch() calls early_cpu_init() which would end up
setting X86_FEATURE_VMMCALL on an AMD guest, based on CPUID information.

init_hypervisor_platform() happens after that.

The alternatives patching happens in check_bugs() at line 1134. Which
means, if one would consider moving the patching up, one would have
to audit all the code between line 953 and 1134, whether it does
set_cpu_cap() or some of the other helpers to set or clear bits in
boot_cpu_data which controls the patching.

So for that I have only one thing to say: can'o'worms. We tried to move
the memblock allocations placement in the boot process and generated at
least 4 regressions. I'm still testing the fix for the fix for the 4th
regression.

So moving stuff in the fragile boot process makes my hair stand up.

Refactoring apply_alternatives() to patch only for X86_FEATURE_VMMCALL
and then patch again, I dunno, this stuff is fragile and it might cause
some other similarly nasty fallout. And those are hard to debug because
one does not see immediately when boot_cpu_data features are missing and
functionality is behaving differently because of that.

So what's wrong with:

kvm_hypercall3:

	if (cpu_feature_enabled(X86_FEATURE_VMMCALL))
		return kvm_sev_hypercall3(...);

	/* rest of code */

?

Dunno we probably had that already in those old versions and maybe that
was shot down for another reason but it should get you what you want
without having to test the world and more for regressions possibly
happening from disturbing the house of cards called x86 boot order.

IMHO, I'd say.
Kalra, Ashish Sept. 22, 2021, 12:10 p.m. UTC | #8
Hello Boris,

On Wed, Sep 22, 2021 at 11:38:08AM +0200, Borislav Petkov wrote:
> On Tue, Sep 21, 2021 at 04:07:03PM +0000, Sean Christopherson wrote:
> > init_hypervisor_platform(), after x86_init.hyper.init_platform() so that the
> > PV support can set the desired feature flags.  Since kvm_hypercall*() is only
> > used by KVM guests, set_cpu_cap(c, X86_FEATURE_VMMCALL) can be moved out of
> > early_init_amd/hygon() and into kvm_init_platform().
> 
> See below.
> 
> > Another option would be to refactor apply_alternatives() to allow
> > the caller to provide a different feature check mechanism than
> > boot_cpu_has(), which I think would let us drop X86_FEATURE_VMMCALL,
> > X86_FEATURE_VMCALL, and X86_FEATURE_VMW_VMMCALL from cpufeatures. That
> > might get more than a bit gross though.
> 
> Uuuf.
> 
> So here's what I'm seeing (line numbers given to show when stuff
> happens):
> 
> start_kernel
> |-> 953: setup_arch
>     |-> 794: early_cpu_init
>     |-> 936: init_hypervisor_platform
> |
> |-> 1134: check_bugs
> 	  |-> alternative_instructions
> 
> at line 794 setup_arch() calls early_cpu_init() which would end up
> setting X86_FEATURE_VMMCALL on an AMD guest, based on CPUID information.
> 
> init_hypervisor_platform() happens after that.
> 
> The alternatives patching happens in check_bugs() at line 1134. Which
> means, if one would consider moving the patching up, one would have
> to audit all the code between line 953 and 1134, whether it does
> set_cpu_cap() or some of the other helpers to set or clear bits in
> boot_cpu_data which controls the patching.
> 
> So for that I have only one thing to say: can'o'worms. We tried to move
> the memblock allocations placement in the boot process and generated at
> least 4 regressions. I'm still testing the fix for the fix for the 4th
> regression.
> 
> So moving stuff in the fragile boot process makes my hair stand up.
> 
> Refactoring apply_alternatives() to patch only for X86_FEATURE_VMMCALL
> and then patch again, I dunno, this stuff is fragile and it might cause
> some other similarly nasty fallout. And those are hard to debug because
> one does not see immediately when boot_cpu_data features are missing and
> functionality is behaving differently because of that.
> 
> So what's wrong with:
> 
> kvm_hypercall3:
> 
> 	if (cpu_feature_enabled(X86_FEATURE_VMMCALL))
> 		return kvm_sev_hypercall3(...);
> 
> 	/* rest of code */
> 
> ?
> 
> Dunno we probably had that already in those old versions and maybe that
> was shot down for another reason but it should get you what you want
> without having to test the world and more for regressions possibly
> happening from disturbing the house of cards called x86 boot order.
> 
> IMHO, I'd say.
> 

Thanks for the above explanation.

If we have to do this:
 	if (cpu_feature_enabled(X86_FEATURE_VMMCALL))
 		return kvm_sev_hypercall3(...);

Then isn't it cleaner to simply do it via the paravirt_ops interface,
i.e, pv_ops.mmu.notify_page_enc_status_changed() where the callback
is only set when SEV and live migration feature are supported and
invoked through early_set_memory_decrypted()/encrypted().

Another memory encryption platform can set it's callback accordingly.

Thanks,
Ashish

> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C02217ac26c444833d50208d97dacb5f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679003031781718%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1oXxchRABGifVoLwnXwQxQ7%2F%2FZpwGLqpGdma4Yz5sjw%3D&amp;reserved=0
Borislav Petkov Sept. 22, 2021, 1:54 p.m. UTC | #9
On Wed, Sep 22, 2021 at 12:10:08PM +0000, Ashish Kalra wrote:
> Then isn't it cleaner to simply do it via the paravirt_ops interface,
> i.e, pv_ops.mmu.notify_page_enc_status_changed() where the callback
> is only set when SEV and live migration feature are supported and
> invoked through early_set_memory_decrypted()/encrypted().
> 
> Another memory encryption platform can set it's callback accordingly.

Yeah, that sounds even cleaner to me.
Steve Rutherford Sept. 28, 2021, 7:05 p.m. UTC | #10
On Wed, Sep 22, 2021 at 6:54 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Sep 22, 2021 at 12:10:08PM +0000, Ashish Kalra wrote:
> > Then isn't it cleaner to simply do it via the paravirt_ops interface,
> > i.e, pv_ops.mmu.notify_page_enc_status_changed() where the callback
> > is only set when SEV and live migration feature are supported and
> > invoked through early_set_memory_decrypted()/encrypted().
> >
> > Another memory encryption platform can set it's callback accordingly.
>
> Yeah, that sounds even cleaner to me.
If I'm not mistaken, this is what the patch set does now?
Kalra, Ashish Sept. 28, 2021, 7:26 p.m. UTC | #11
> On Sep 28, 2021, at 2:05 PM, Steve Rutherford <srutherford@google.com> wrote:
> 
> On Wed, Sep 22, 2021 at 6:54 AM Borislav Petkov <bp@alien8.de> wrote:
>> 
>>> On Wed, Sep 22, 2021 at 12:10:08PM +0000, Ashish Kalra wrote:
>>> Then isn't it cleaner to simply do it via the paravirt_ops interface,
>>> i.e, pv_ops.mmu.notify_page_enc_status_changed() where the callback
>>> is only set when SEV and live migration feature are supported and
>>> invoked through early_set_memory_decrypted()/encrypted().
>>> 
>>> Another memory encryption platform can set it's callback accordingly.
>> 
>> Yeah, that sounds even cleaner to me.
> If I'm not mistaken, this is what the patch set does now?

Yes that’s what I mentioned to Boris.

Thanks,
Ashish
Borislav Petkov Sept. 29, 2021, 11:44 a.m. UTC | #12
On Tue, Sep 28, 2021 at 07:26:32PM +0000, Kalra, Ashish wrote:
> Yes that’s what I mentioned to Boris.

Right, and as far as I'm concerned, the x86 bits look ok to me and I'm
fine with this going through the kvm tree.

There will be a conflict with this:

https://lkml.kernel.org/r/20210928191009.32551-1-bp@alien8.de

resulting in:

arch/x86/kernel/kvm.c: In function ‘setup_efi_kvm_sev_migration’:
arch/x86/kernel/kvm.c:563:7: error: implicit declaration of function ‘sev_active’; did you mean ‘cpu_active’? [-Werror=implicit-function-declaration]
  563 |  if (!sev_active() ||
      |       ^~~~~~~~~~
      |       cpu_active
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:277: arch/x86/kernel/kvm.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:540: arch/x86/kernel] Error 2
make: *** [Makefile:1868: arch/x86] Error 2
make: *** Waiting for unfinished jobs....

but Paolo and I will figure out what to do - I'll likely have a separate
branch out which he can merge and that sev_active() will need to be
converted to

	if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))

which is trivial.

Thx.
Ashish Kalra Oct. 26, 2021, 8:48 p.m. UTC | #13
Hello Paolo,

With reference to Boris's ack below, are you going to go ahead and queue 
this patch-set to kvm tree ?

Or do you want me to work on improving on or fixing anything on this 
patch-set?

Thanks,

Ashish

On 9/29/21 11:44 AM, Borislav Petkov wrote:
> On Tue, Sep 28, 2021 at 07:26:32PM +0000, Kalra, Ashish wrote:
>> Yes that’s what I mentioned to Boris.
> Right, and as far as I'm concerned, the x86 bits look ok to me and I'm
> fine with this going through the kvm tree.
>
> There will be a conflict with this:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210928191009.32551-1-bp%40alien8.de&amp;data=04%7C01%7CAshish.Kalra%40amd.com%7Cbfa692635e9d4247a8f708d9833e8bcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637685126945432007%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vZU8ZijdqiSiVpCquIMqu2yz3Z7sWgD3vvNiiQyszzo%3D&amp;reserved=0
>
> resulting in:
>
> arch/x86/kernel/kvm.c: In function ‘setup_efi_kvm_sev_migration’:
> arch/x86/kernel/kvm.c:563:7: error: implicit declaration of function ‘sev_active’; did you mean ‘cpu_active’? [-Werror=implicit-function-declaration]
>    563 |  if (!sev_active() ||
>        |       ^~~~~~~~~~
>        |       cpu_active
> cc1: some warnings being treated as errors
> make[2]: *** [scripts/Makefile.build:277: arch/x86/kernel/kvm.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [scripts/Makefile.build:540: arch/x86/kernel] Error 2
> make: *** [Makefile:1868: arch/x86] Error 2
> make: *** Waiting for unfinished jobs....
>
> but Paolo and I will figure out what to do - I'll likely have a separate
> branch out which he can merge and that sev_active() will need to be
> converted to
>
> 	if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
>
> which is trivial.
>
> Thx.
>
Steve Rutherford Nov. 10, 2021, 7:38 p.m. UTC | #14
On Wed, Sep 29, 2021 at 4:44 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Sep 28, 2021 at 07:26:32PM +0000, Kalra, Ashish wrote:
> > Yes that’s what I mentioned to Boris.
>
> Right, and as far as I'm concerned, the x86 bits look ok to me and I'm
> fine with this going through the kvm tree.
>
> There will be a conflict with this:
>
> https://lkml.kernel.org/r/20210928191009.32551-1-bp@alien8.de
>
> resulting in:
>
> arch/x86/kernel/kvm.c: In function ‘setup_efi_kvm_sev_migration’:
> arch/x86/kernel/kvm.c:563:7: error: implicit declaration of function ‘sev_active’; did you mean ‘cpu_active’? [-Werror=implicit-function-declaration]
>   563 |  if (!sev_active() ||
>       |       ^~~~~~~~~~
>       |       cpu_active
> cc1: some warnings being treated as errors
> make[2]: *** [scripts/Makefile.build:277: arch/x86/kernel/kvm.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [scripts/Makefile.build:540: arch/x86/kernel] Error 2
> make: *** [Makefile:1868: arch/x86] Error 2
> make: *** Waiting for unfinished jobs....
>
> but Paolo and I will figure out what to do - I'll likely have a separate
> branch out which he can merge and that sev_active() will need to be
> converted to
>
>         if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
>
> which is trivial.
>
> Thx.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Hey All,

Bumping this thread again, since I believe these patches are good to go.

Let me know if there is anything I can do to help here,
Thanks,
Steve
Paolo Bonzini Nov. 10, 2021, 10:11 p.m. UTC | #15
On 11/10/21 20:38, Steve Rutherford wrote:
> arch/x86/kernel/kvm.c: In function ‘setup_efi_kvm_sev_migration’:
> arch/x86/kernel/kvm.c:563:7: error: implicit declaration of function ‘sev_active’; did you mean ‘cpu_active’? [-Werror=implicit-function-declaration]
>    563 |  if (!sev_active() ||
>        |       ^~~~~~~~~~
>        |       cpu_active
> cc1: some warnings being treated as errors
> make[2]: *** [scripts/Makefile.build:277: arch/x86/kernel/kvm.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [scripts/Makefile.build:540: arch/x86/kernel] Error 2
> make: *** [Makefile:1868: arch/x86] Error 2
> make: *** Waiting for unfinished jobs....
> 
> but Paolo and I will figure out what to do - I'll likely have a separate
> branch out which he can merge and that sev_active() will need to be
> converted to
> 
>          if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))

Hi Boris,

can I just merge these patches via the KVM tree, since we're close to 
the end of the merge window and the cc_platform_has series has been merged?

Thanks,

Paolo
Borislav Petkov Nov. 10, 2021, 10:42 p.m. UTC | #16
On Wed, Nov 10, 2021 at 11:11:41PM +0100, Paolo Bonzini wrote:
> can I just merge these patches via the KVM tree, since we're close to the
> end of the merge window and the cc_platform_has series has been merged?

Yes, please. All the pending tip stuff from the last round is already
upstream.

Thx.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..56935ebb1dfe 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -83,6 +83,18 @@  static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 	return ret;
 }
 
+static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
+				      unsigned long p2, unsigned long p3)
+{
+	long ret;
+
+	asm volatile("vmmcall"
+		     : "=a"(ret)
+		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
+		     : "memory");
+	return ret;
+}
+
 #ifdef CONFIG_KVM_GUEST
 void kvmclock_init(void);
 void kvmclock_disable(void);