mbox series

[v4,00/75] x86: SEV-ES Guest Support

Message ID 20200714120917.11253-1-joro@8bytes.org (mailing list archive)
Headers show
Series x86: SEV-ES Guest Support | expand

Message

Joerg Roedel July 14, 2020, 12:08 p.m. UTC
From: Joerg Roedel <jroedel@suse.de>

Hi,

here is the fourth version of the SEV-ES Guest Support patches. I
addressed the review comments sent to me for the previous version and
rebased the code v5.8-rc5.

The biggest change in this version is the IST handling code for the
#VC handler. I adapted the entry code for the #VC handler to the big
pile of entry code changes merged into v5.8-rc1, which means it no
longer uses IST shifting (with one exception in the NMI handler, but
that is not ist-shifting as implemented previously).

The #VC entry code now tries to pretend that the #VC handler does not
use an IST stack by switching to the task stack if entered from
user-mode or the SYSCALL entry path. When it is entered from
kernel-mode it is doing its best to switch back to the interrupted
stack. This is only possible if it is entered from a known and safe
kernel stack (e.g. not the entry stack). If the previous stack is not
safe to use the #VC handler switches to a fall-back stack and calls a
special handler function which, as of now, just panics the system. For
now this is safe as #VC exceptions only happen at know places which
use a safe stack.

The use of the fall-back stack is necessary so that the special
handler function can safely raise nested #VC exceptions, for
example to print a panic message.

This implementation has survived overnight stress testing (>14h) with
'perf top' running for NMI-load and three instances of the x86-selftests
in a loop.

A git-tree with these patches applied can be found at:

	https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=sev-es-client-v5.8-rc5

Changes to the previous version:

	* Addressed review comments

	* Rebased to v5.8-rc5

	* Addressed kbuild-bot reports

	* Removed CPUID caching for now

	* Re-implemented IST handling so that the #VC handler supports
	  nesting

The previous post of the patch-set can be found here:

	v3: https://lore.kernel.org/lkml/20200428151725.31091-1-joro@8bytes.org/

	v2: https://lore.kernel.org/lkml/20200319091407.1481-1-joro@8bytes.org/

	v1: https://lore.kernel.org/lkml/20200211135256.24617-1-joro@8bytes.org/

Please review.

Thanks,

	Joerg

Borislav Petkov (1):
  KVM: SVM: Use __packed shorthand

Doug Covelli (1):
  x86/vmware: Add VMware specific handling for VMMCALL under SEV-ES

Joerg Roedel (53):
  KVM: SVM: Add GHCB Accessor functions
  x86/traps: Move pf error codes to <asm/trap_pf.h>
  x86/insn: Make inat-tables.c suitable for pre-decompression code
  x86/umip: Factor out instruction fetch
  x86/umip: Factor out instruction decoding
  x86/insn: Add insn_get_modrm_reg_off()
  x86/insn: Add insn_has_rep_prefix() helper
  x86/boot/compressed/64: Disable red-zone usage
  x86/boot/compressed/64: Add IDT Infrastructure
  x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c
  x86/boot/compressed/64: Add page-fault handler
  x86/boot/compressed/64: Always switch to own page-table
  x86/boot/compressed/64: Don't pre-map memory in KASLR code
  x86/boot/compressed/64: Change add_identity_map() to take start and
    end
  x86/boot/compressed/64: Add stage1 #VC handler
  x86/boot/compressed/64: Call set_sev_encryption_mask earlier
  x86/boot/compressed/64: Check return value of
    kernel_ident_mapping_init()
  x86/boot/compressed/64: Add set_page_en/decrypted() helpers
  x86/boot/compressed/64: Setup GHCB Based VC Exception handler
  x86/boot/compressed/64: Unmap GHCB page before booting the kernel
  x86/fpu: Move xgetbv()/xsetbv() into separate header
  x86/idt: Move IDT to data segment
  x86/idt: Split idt_data setup out of set_intr_gate()
  x86/idt: Move two function from k/idt.c to i/a/desc.h
  x86/head/64: Install boot GDT
  x86/head/64: Reload GDT after switch to virtual addresses
  x86/head/64: Load segment registers earlier
  x86/head/64: Switch to initial stack earlier
  x86/head/64: Build k/head64.c with -fno-stack-protector
  x86/head/64: Load IDT earlier
  x86/head/64: Move early exception dispatch to C code
  x86/sev-es: Add SEV-ES Feature Detection
  x86/sev-es: Print SEV-ES info into kernel log
  x86/sev-es: Compile early handler code into kernel image
  x86/sev-es: Setup early #VC handler
  x86/sev-es: Setup GHCB based boot #VC handler
  x86/sev-es: Allocate and Map stacks for #VC handler
  x86/sev-es: Allocate and setup IST entry for #VC
  x86/sev-es: Adjust #VC IST Stack on entering NMI handler
  x86/dumpstack/64: Add noinstr version of get_stack_info()
  x86/entry/64: Add entry code for #VC handler
  x86/sev-es: Wire up existing #VC exit-code handlers
  x86/sev-es: Handle instruction fetches from user-space
  x86/sev-es: Handle MMIO String Instructions
  x86/sev-es: Handle #AC Events
  x86/sev-es: Handle #DB Events
  x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES
  x86/realmode: Add SEV-ES specific trampoline entry point
  x86/head/64: Setup TSS early for secondary CPUs
  x86/head/64: Don't call verify_cpu() on starting APs
  x86/head/64: Rename start_cpu0
  x86/sev-es: Support CPU offline/online
  x86/sev-es: Handle NMI State

Martin Radev (1):
  x86/sev-es: Check required CPU features for SEV-ES

Tom Lendacky (19):
  KVM: SVM: Add GHCB definitions
  x86/cpufeatures: Add SEV-ES CPU feature
  x86/sev-es: Add support for handling IOIO exceptions
  x86/sev-es: Add CPUID handling to #VC handler
  x86/sev-es: Setup per-cpu GHCBs for the runtime handler
  x86/sev-es: Add Runtime #VC Exception Handler
  x86/sev-es: Handle MMIO events
  x86/sev-es: Handle MSR events
  x86/sev-es: Handle DR7 read/write events
  x86/sev-es: Handle WBINVD Events
  x86/sev-es: Handle RDTSC(P) Events
  x86/sev-es: Handle RDPMC Events
  x86/sev-es: Handle INVD Events
  x86/sev-es: Handle MONITOR/MONITORX Events
  x86/sev-es: Handle MWAIT/MWAITX Events
  x86/sev-es: Handle VMMCALL Events
  x86/kvm: Add KVM specific VMMCALL handling under SEV-ES
  x86/realmode: Setup AP jump table
  x86/efi: Add GHCB mappings when SEV-ES is active

 arch/x86/Kconfig                           |    1 +
 arch/x86/boot/Makefile                     |    2 +-
 arch/x86/boot/compressed/Makefile          |    9 +-
 arch/x86/boot/compressed/head_64.S         |   32 +-
 arch/x86/boot/compressed/ident_map_64.c    |  349 +++++
 arch/x86/boot/compressed/idt_64.c          |   54 +
 arch/x86/boot/compressed/idt_handlers_64.S |   77 ++
 arch/x86/boot/compressed/kaslr.c           |   36 +-
 arch/x86/boot/compressed/kaslr_64.c        |  153 ---
 arch/x86/boot/compressed/misc.c            |    7 +
 arch/x86/boot/compressed/misc.h            |   45 +-
 arch/x86/boot/compressed/sev-es.c          |  214 +++
 arch/x86/entry/entry_64.S                  |   78 ++
 arch/x86/include/asm/cpu.h                 |    2 +-
 arch/x86/include/asm/cpu_entry_area.h      |   33 +-
 arch/x86/include/asm/cpufeatures.h         |    1 +
 arch/x86/include/asm/desc.h                |   27 +
 arch/x86/include/asm/desc_defs.h           |   10 +
 arch/x86/include/asm/fpu/internal.h        |   33 +-
 arch/x86/include/asm/fpu/xcr.h             |   37 +
 arch/x86/include/asm/idtentry.h            |   49 +
 arch/x86/include/asm/insn-eval.h           |    6 +
 arch/x86/include/asm/mem_encrypt.h         |    5 +
 arch/x86/include/asm/msr-index.h           |    3 +
 arch/x86/include/asm/page_64_types.h       |    1 +
 arch/x86/include/asm/pgtable.h             |    2 +-
 arch/x86/include/asm/processor.h           |    1 +
 arch/x86/include/asm/proto.h               |    1 +
 arch/x86/include/asm/realmode.h            |    4 +
 arch/x86/include/asm/segment.h             |    2 +-
 arch/x86/include/asm/setup.h               |    3 +-
 arch/x86/include/asm/sev-es.h              |   97 ++
 arch/x86/include/asm/stacktrace.h          |    2 +
 arch/x86/include/asm/svm.h                 |  118 +-
 arch/x86/include/asm/trap_pf.h             |   24 +
 arch/x86/include/asm/trapnr.h              |    1 +
 arch/x86/include/asm/traps.h               |   20 +-
 arch/x86/include/asm/x86_init.h            |   16 +-
 arch/x86/include/uapi/asm/svm.h            |   11 +
 arch/x86/kernel/Makefile                   |    5 +
 arch/x86/kernel/cpu/amd.c                  |    3 +-
 arch/x86/kernel/cpu/scattered.c            |    1 +
 arch/x86/kernel/cpu/vmware.c               |   50 +-
 arch/x86/kernel/dumpstack.c                |    7 +-
 arch/x86/kernel/dumpstack_64.c             |   47 +-
 arch/x86/kernel/head64.c                   |  106 +-
 arch/x86/kernel/head_32.S                  |    4 +-
 arch/x86/kernel/head_64.S                  |  176 ++-
 arch/x86/kernel/idt.c                      |   43 +-
 arch/x86/kernel/kvm.c                      |   35 +-
 arch/x86/kernel/nmi.c                      |   12 +
 arch/x86/kernel/sev-es-shared.c            |  507 +++++++
 arch/x86/kernel/sev-es.c                   | 1403 ++++++++++++++++++++
 arch/x86/kernel/smpboot.c                  |    4 +-
 arch/x86/kernel/traps.c                    |   56 +
 arch/x86/kernel/umip.c                     |   49 +-
 arch/x86/kvm/svm/svm.c                     |    2 +
 arch/x86/lib/insn-eval.c                   |  130 ++
 arch/x86/mm/cpu_entry_area.c               |    3 +-
 arch/x86/mm/extable.c                      |    1 +
 arch/x86/mm/mem_encrypt.c                  |   38 +-
 arch/x86/mm/mem_encrypt_identity.c         |    3 +
 arch/x86/platform/efi/efi_64.c             |   10 +
 arch/x86/realmode/init.c                   |   24 +-
 arch/x86/realmode/rm/header.S              |    3 +
 arch/x86/realmode/rm/trampoline_64.S       |   20 +
 arch/x86/tools/gen-insn-attr-x86.awk       |   50 +-
 tools/arch/x86/tools/gen-insn-attr-x86.awk |   50 +-
 68 files changed, 3964 insertions(+), 444 deletions(-)
 create mode 100644 arch/x86/boot/compressed/ident_map_64.c
 create mode 100644 arch/x86/boot/compressed/idt_64.c
 create mode 100644 arch/x86/boot/compressed/idt_handlers_64.S
 delete mode 100644 arch/x86/boot/compressed/kaslr_64.c
 create mode 100644 arch/x86/boot/compressed/sev-es.c
 create mode 100644 arch/x86/include/asm/fpu/xcr.h
 create mode 100644 arch/x86/include/asm/sev-es.h
 create mode 100644 arch/x86/include/asm/trap_pf.h
 create mode 100644 arch/x86/kernel/sev-es-shared.c
 create mode 100644 arch/x86/kernel/sev-es.c

Comments

Peter Zijlstra July 15, 2020, 9:24 a.m. UTC | #1
On Tue, Jul 14, 2020 at 02:08:02PM +0200, Joerg Roedel wrote:
> The #VC entry code now tries to pretend that the #VC handler does not
> use an IST stack by switching to the task stack if entered from
> user-mode or the SYSCALL entry path. When it is entered from
> kernel-mode it is doing its best to switch back to the interrupted
> stack. This is only possible if it is entered from a known and safe
> kernel stack (e.g. not the entry stack). If the previous stack is not
> safe to use the #VC handler switches to a fall-back stack and calls a
> special handler function which, as of now, just panics the system. For
> now this is safe as #VC exceptions only happen at know places which
> use a safe stack.
> 
> The use of the fall-back stack is necessary so that the special
> handler function can safely raise nested #VC exceptions, for
> example to print a panic message.

Can we get some more words -- preferably in actual code comments, on
when exactly #VC happens?

Because the only thing I remember is that #VC could happen on any memop,
but I also have vague memories of that being a later extention.
Joerg Roedel July 15, 2020, 9:34 a.m. UTC | #2
On Wed, Jul 15, 2020 at 11:24:56AM +0200, Peter Zijlstra wrote:
> Can we get some more words -- preferably in actual code comments, on
> when exactly #VC happens?

Sure, will add this as a comment before the actual runtime VC handler.

> Because the only thing I remember is that #VC could happen on any memop,
> but I also have vague memories of that being a later extention.

Currently it is only raised when something happens that the hypervisor
intercepts, for example on a couple of instructions like CPUID,
RD/WRMSR, ..., or on MMIO/IOIO accesses.

With Secure Nested Paging (SNP), which needs additional enablement, a #VC can
happen on any memory access. I wrote the IST handling entry code for #VC
with that in mind, but do not actually enable it. This is the reason why
the #VC handler just panics the system when it ends up on the fall-back
(VC2) stack, with SNP enabled it needs to handle the SNP exit-codes in
that path.

Regards,

	Joerg
Peter Zijlstra July 15, 2020, 9:55 a.m. UTC | #3
On Wed, Jul 15, 2020 at 11:34:26AM +0200, Joerg Roedel wrote:
> On Wed, Jul 15, 2020 at 11:24:56AM +0200, Peter Zijlstra wrote:
> > Can we get some more words -- preferably in actual code comments, on
> > when exactly #VC happens?
> 
> Sure, will add this as a comment before the actual runtime VC handler.

Thanks!

> > Because the only thing I remember is that #VC could happen on any memop,
> > but I also have vague memories of that being a later extention.
> 
> Currently it is only raised when something happens that the hypervisor
> intercepts, for example on a couple of instructions like CPUID,
> RD/WRMSR, ..., or on MMIO/IOIO accesses.
> 
> With Secure Nested Paging (SNP), which needs additional enablement, a #VC can
> happen on any memory access. I wrote the IST handling entry code for #VC
> with that in mind, but do not actually enable it. This is the reason why
> the #VC handler just panics the system when it ends up on the fall-back
> (VC2) stack, with SNP enabled it needs to handle the SNP exit-codes in
> that path.

And recursive #VC was instant death, right? Because there's no way to
avoid IST stack corruption in that case.
Joerg Roedel July 15, 2020, 10:10 a.m. UTC | #4
On Wed, Jul 15, 2020 at 11:55:56AM +0200, Peter Zijlstra wrote:
> And recursive #VC was instant death, right? Because there's no way to
> avoid IST stack corruption in that case.

Right, a #VC exception while still on the IST stack must instantly kill
the VM. That needs an additional check which is not implemented yet, as
it only becomes necessary with SNP.

Regards,

	Joerg
Erdem Aktas July 21, 2020, 1:09 a.m. UTC | #5
Hi,

It looks like there is an expectation that the bootloader will start
from the 64bit entry point in header_64.S. With the current patch
series, it will not boot up if the bootloader jumps to the startup_32
entry, which might break some default distro images.
What are supported bootloaders and configurations?
I am using grub ( 2.02-2ubuntu8.15) and it fails to boot because of
this reason. I am not a grub expert, so I would appreciate any
pointers on this.
Also, it would be nice to put some error code in the GHCB MSR if the
guest dies for some reason in real mode. Currently, it just dies with
no information provided.

PS: sorry for sending twice due to the wrong email body type.

Regards
-Erdem


On Wed, Jul 15, 2020 at 3:10 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Wed, Jul 15, 2020 at 11:55:56AM +0200, Peter Zijlstra wrote:
> > And recursive #VC was instant death, right? Because there's no way to
> > avoid IST stack corruption in that case.
>
> Right, a #VC exception while still on the IST stack must instantly kill
> the VM. That needs an additional check which is not implemented yet, as
> it only becomes necessary with SNP.
>
> Regards,
>
>         Joerg
>
>
Joerg Roedel July 21, 2020, 12:49 p.m. UTC | #6
Hi,

On Mon, Jul 20, 2020 at 06:09:19PM -0700, Erdem Aktas wrote:
> It looks like there is an expectation that the bootloader will start
> from the 64bit entry point in header_64.S. With the current patch
> series, it will not boot up if the bootloader jumps to the startup_32
> entry, which might break some default distro images.
> What are supported bootloaders and configurations?
> I am using grub ( 2.02-2ubuntu8.15) and it fails to boot because of
> this reason. I am not a grub expert, so I would appreciate any
> pointers on this.

This is right, the only supported boot path is via the 64bit EFI entry
point. The reason is that SEV-ES requires support in the firmware too,
and currently only OVMF is supported in that regard. The firmware needs
to setup the AP jump-table, for example.

Other boot-paths have not been implemented. Booting via startup_32 would
require exception handling in the 32bit-part of the boot-strap code,
because verify_cpu is called there. Also an AMD specific MSR can't be
accessed there because this would #GP on non-AMD/SEV-ES machines and,
as I said, there is no way yet to handle them.

How did you get into the startup_32 entry-point, do you have an SEV-ES
BIOS supporting this? If it is really needed it could be implemented at
a later point.

Regards,

	Joerg
Erdem Aktas July 21, 2020, 4:48 p.m. UTC | #7
Yes, I am using OVMF with SEV-ES (sev-es-v12 patches applied). I am
running Ubuntu 18.04 distro. My grub target is x86_64-efi. I also
tried installing the grub-efi-amd64 package. In all cases, the grub is
running in 64bit but enters the startup_32 in 32 bit mode. I think
there should be a 32bit #VC handler just something very similar in the
OVMF patches to handle the cpuid when the CPU is still in 32bit mode.
As it is now, it will be a huge problem to support different distro images.
I wonder if I am the only one having this problem.

-Erdem

On Tue, Jul 21, 2020 at 5:50 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> Hi,
>
> On Mon, Jul 20, 2020 at 06:09:19PM -0700, Erdem Aktas wrote:
> > It looks like there is an expectation that the bootloader will start
> > from the 64bit entry point in header_64.S. With the current patch
> > series, it will not boot up if the bootloader jumps to the startup_32
> > entry, which might break some default distro images.
> > What are supported bootloaders and configurations?
> > I am using grub ( 2.02-2ubuntu8.15) and it fails to boot because of
> > this reason. I am not a grub expert, so I would appreciate any
> > pointers on this.
>
> This is right, the only supported boot path is via the 64bit EFI entry
> point. The reason is that SEV-ES requires support in the firmware too,
> and currently only OVMF is supported in that regard. The firmware needs
> to setup the AP jump-table, for example.
>
> Other boot-paths have not been implemented. Booting via startup_32 would
> require exception handling in the 32bit-part of the boot-strap code,
> because verify_cpu is called there. Also an AMD specific MSR can't be
> accessed there because this would #GP on non-AMD/SEV-ES machines and,
> as I said, there is no way yet to handle them.
>
> How did you get into the startup_32 entry-point, do you have an SEV-ES
> BIOS supporting this? If it is really needed it could be implemented at
> a later point.
>
> Regards,
>
>         Joerg
>
Joerg Roedel July 22, 2020, 9:04 a.m. UTC | #8
Hi Erdem,

On Tue, Jul 21, 2020 at 09:48:51AM -0700, Erdem Aktas wrote:
> Yes, I am using OVMF with SEV-ES (sev-es-v12 patches applied). I am
> running Ubuntu 18.04 distro. My grub target is x86_64-efi. I also
> tried installing the grub-efi-amd64 package. In all cases, the grub is
> running in 64bit but enters the startup_32 in 32 bit mode. I think
> there should be a 32bit #VC handler just something very similar in the
> OVMF patches to handle the cpuid when the CPU is still in 32bit mode.
> As it is now, it will be a huge problem to support different distro images.
> I wonder if I am the only one having this problem.

I havn't heard from anyone else that the startup_32 boot-path is being
used for SEV-ES. What OVMF binary do you use for your guest?

In general it is not that difficult to support that boot-path too, but
I'd like to keep that as a future addition, as the patch-set is already
quite large. In the startup_32 path there is already a GDT set up, so
whats needed is an IDT and a 32-bit #VC handler using the MRS-based
protocol (and hoping that there will only be CPUID intercepts until it
reaches long-mode).

Regards,

	Joerg
Erdem Aktas July 22, 2020, 4:54 p.m. UTC | #9
I am using a custom, optimized and stripped down version, OVMF build.
Do you think it is because of the OVMF or grub?

In my case, there are 2 places where the CPUID is called: the first
one is to decide if long mode is supported, along with few other
features like SSE support and the second one is to retrieve the
encryption bit location.

-Erdem

On Wed, Jul 22, 2020 at 2:04 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> Hi Erdem,
>
> On Tue, Jul 21, 2020 at 09:48:51AM -0700, Erdem Aktas wrote:
> > Yes, I am using OVMF with SEV-ES (sev-es-v12 patches applied). I am
> > running Ubuntu 18.04 distro. My grub target is x86_64-efi. I also
> > tried installing the grub-efi-amd64 package. In all cases, the grub is
> > running in 64bit but enters the startup_32 in 32 bit mode. I think
> > there should be a 32bit #VC handler just something very similar in the
> > OVMF patches to handle the cpuid when the CPU is still in 32bit mode.
> > As it is now, it will be a huge problem to support different distro images.
> > I wonder if I am the only one having this problem.
>
> I havn't heard from anyone else that the startup_32 boot-path is being
> used for SEV-ES. What OVMF binary do you use for your guest?
>
> In general it is not that difficult to support that boot-path too, but
> I'd like to keep that as a future addition, as the patch-set is already
> quite large. In the startup_32 path there is already a GDT set up, so
> whats needed is an IDT and a 32-bit #VC handler using the MRS-based
> protocol (and hoping that there will only be CPUID intercepts until it
> reaches long-mode).
>
> Regards,
>
>         Joerg
>
Joerg Roedel July 22, 2020, 5:45 p.m. UTC | #10
On Wed, Jul 22, 2020 at 09:54:40AM -0700, Erdem Aktas wrote:
> I am using a custom, optimized and stripped down version, OVMF build.
> Do you think it is because of the OVMF or grub?

Not sure, I havn't looked into how grub decides which entry point to
use.

> In my case, there are 2 places where the CPUID is called: the first
> one is to decide if long mode is supported, along with few other
> features like SSE support and the second one is to retrieve the
> encryption bit location.

Yes, it is basically the verify_cpu() function that is causing the
trouble. If you want to work around it you can comment it out in that
path for testing purposes.

Regards,

	Joerg