mbox series

[v4,0/9] x86/sev: KEXEC/KDUMP support for SEV-ES guests

Message ID 20240311161727.14916-1-vsntk18@gmail.com (mailing list archive)
Headers show
Series x86/sev: KEXEC/KDUMP support for SEV-ES guests | expand

Message

Vasant Karasulli March 11, 2024, 4:17 p.m. UTC
From: Vasant Karasulli <vkarasulli@suse.de>

Hi,

here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.

Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.

This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.

The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.

The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.

The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.

The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.

Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).

The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.

The previous version of this patch-set can be found here:

	https://lore.kernel.org/lkml/20220127101044.13803-1-joro@8bytes.org/

Please review.

Thanks,
   Vasant

Changes v3->v4:
        - Rebased to v6.8 kernel
	- Applied review comments by Sean Christopherson
	- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
          into a single function which makes caching jump table address
          unnecessary
        - annotated struct sev_ap_jump_table_header with __packed attribute
	- added code to set up real mode data segment at boot time instead of
          hardcoding the value.

Changes v2->v3:

	- Rebased to v5.17-rc1
	- Applied most review comments by Boris
	- Use the name 'AP jump table' consistently
	- Make kexec-disabling for unsupported guests x86-specific
	- Cleanup and consolidate patches to detect GHCB v2 protocol
	  support

Joerg Roedel (9):
  x86/kexec/64: Disable kexec when SEV-ES is active
  x86/sev: Save and print negotiated GHCB protocol version
  x86/sev: Set GHCB data structure version
  x86/sev: Setup code to park APs in the AP Jump Table
  x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
  x86/sev: Use AP Jump Table blob to stop CPU
  x86/sev: Add MMIO handling support to boot/compressed/ code
  x86/sev: Handle CLFLUSH MMIO events
  x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob

 arch/x86/boot/compressed/sev.c          |  45 +-
 arch/x86/include/asm/insn-eval.h        |   1 +
 arch/x86/include/asm/realmode.h         |   5 +
 arch/x86/include/asm/sev-ap-jumptable.h |  30 +
 arch/x86/include/asm/sev.h              |   7 +
 arch/x86/kernel/machine_kexec_64.c      |  12 +
 arch/x86/kernel/process.c               |   8 +
 arch/x86/kernel/sev-shared.c            | 234 +++++-
 arch/x86/kernel/sev.c                   | 372 +++++-----
 arch/x86/lib/insn-eval-shared.c         | 912 ++++++++++++++++++++++++
 arch/x86/lib/insn-eval.c                | 911 +----------------------
 arch/x86/realmode/Makefile              |   9 +-
 arch/x86/realmode/rm/Makefile           |  11 +-
 arch/x86/realmode/rm/header.S           |   3 +
 arch/x86/realmode/rm/sev.S              |  85 +++
 arch/x86/realmode/rmpiggy.S             |   6 +
 arch/x86/realmode/sev/Makefile          |  33 +
 arch/x86/realmode/sev/ap_jump_table.S   | 131 ++++
 arch/x86/realmode/sev/ap_jump_table.lds |  24 +
 19 files changed, 1695 insertions(+), 1144 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
 create mode 100644 arch/x86/lib/insn-eval-shared.c
 create mode 100644 arch/x86/realmode/rm/sev.S
 create mode 100644 arch/x86/realmode/sev/Makefile
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds


base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
2.34.1

Comments

Tom Lendacky March 11, 2024, 7:19 p.m. UTC | #1
On 3/11/24 11:17, Vasant Karasulli wrote:
> From: Vasant Karasulli <vkarasulli@suse.de>
> 
> Hi,

Hi Vasant,

The SNP guest support has been incorporated in the kernel since this 
patchset was originally presented. SNP also is considered a guest with 
encrypted state (CC_ATTR_GUEST_STATE_ENCRYPT will return true), but does 
not use the AP jump table. So this series need adjusted so that the AP 
jump table is only used for SEV-ES guests.

Thanks,
Tom

> 
> here are changes to enable kexec/kdump in SEV-ES guests. The biggest
> problem for supporting kexec/kdump under SEV-ES is to find a way to
> hand the non-boot CPUs (APs) from one kernel to another.
> 
> Without SEV-ES the first kernel parks the CPUs in a HLT loop until
> they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
> For virtual machines the CPU reset is emulated by the hypervisor,
> which sets the vCPU registers back to reset state.
> 
> This does not work under SEV-ES, because the hypervisor has no access
> to the vCPU registers and can't make modifications to them. So an
> SEV-ES guest needs to reset the vCPU itself and park it using the
> AP-reset-hold protocol. Upon wakeup the guest needs to jump to
> real-mode and to the reset-vector configured in the AP-Jump-Table.
> 
> The code to do this is the main part of this patch-set. It works by
> placing code on the AP Jump-Table page itself to park the vCPU and for
> jumping to the reset vector upon wakeup. The code on the AP Jump Table
> runs in 16-bit protected mode with segment base set to the beginning
> of the page. The AP Jump-Table is usually not within the first 1MB of
> memory, so the code can't run in real-mode.
> 
> The AP Jump-Table is the best place to put the parking code, because
> the memory is owned, but read-only by the firmware and writeable by
> the OS. Only the first 4 bytes are used for the reset-vector, leaving
> the rest of the page for code/data/stack to park a vCPU. The code
> can't be in kernel memory because by the time the vCPU wakes up the
> memory will be owned by the new kernel, which might have overwritten it
> already.
> 
> The other patches add initial GHCB Version 2 protocol support, because
> kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
> which is a GHCB protocol version 2 feature.
> 
> The kexec'ed kernel is also entered via the decompressor and needs
> MMIO support there, so this patch-set also adds MMIO #VC support to
> the decompressor and support for handling CLFLUSH instructions.
> 
> Finally there is also code to disable kexec/kdump support at runtime
> when the environment does not support it (e.g. no GHCB protocol
> version 2 support or AP Jump Table over 4GB).
> 
> The diffstat looks big, but most of it is moving code for MMIO #VC
> support around to make it available to the decompressor.
> 
> The previous version of this patch-set can be found here:
> 
> 	https://lore.kernel.org/lkml/20220127101044.13803-1-joro@8bytes.org/
> 
> Please review.
> 
> Thanks,
>     Vasant
> 
> Changes v3->v4:
>          - Rebased to v6.8 kernel
> 	- Applied review comments by Sean Christopherson
> 	- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
>            into a single function which makes caching jump table address
>            unnecessary
>          - annotated struct sev_ap_jump_table_header with __packed attribute
> 	- added code to set up real mode data segment at boot time instead of
>            hardcoding the value.
> 
> Changes v2->v3:
> 
> 	- Rebased to v5.17-rc1
> 	- Applied most review comments by Boris
> 	- Use the name 'AP jump table' consistently
> 	- Make kexec-disabling for unsupported guests x86-specific
> 	- Cleanup and consolidate patches to detect GHCB v2 protocol
> 	  support
> 
> Joerg Roedel (9):
>    x86/kexec/64: Disable kexec when SEV-ES is active
>    x86/sev: Save and print negotiated GHCB protocol version
>    x86/sev: Set GHCB data structure version
>    x86/sev: Setup code to park APs in the AP Jump Table
>    x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
>    x86/sev: Use AP Jump Table blob to stop CPU
>    x86/sev: Add MMIO handling support to boot/compressed/ code
>    x86/sev: Handle CLFLUSH MMIO events
>    x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
> 
>   arch/x86/boot/compressed/sev.c          |  45 +-
>   arch/x86/include/asm/insn-eval.h        |   1 +
>   arch/x86/include/asm/realmode.h         |   5 +
>   arch/x86/include/asm/sev-ap-jumptable.h |  30 +
>   arch/x86/include/asm/sev.h              |   7 +
>   arch/x86/kernel/machine_kexec_64.c      |  12 +
>   arch/x86/kernel/process.c               |   8 +
>   arch/x86/kernel/sev-shared.c            | 234 +++++-
>   arch/x86/kernel/sev.c                   | 372 +++++-----
>   arch/x86/lib/insn-eval-shared.c         | 912 ++++++++++++++++++++++++
>   arch/x86/lib/insn-eval.c                | 911 +----------------------
>   arch/x86/realmode/Makefile              |   9 +-
>   arch/x86/realmode/rm/Makefile           |  11 +-
>   arch/x86/realmode/rm/header.S           |   3 +
>   arch/x86/realmode/rm/sev.S              |  85 +++
>   arch/x86/realmode/rmpiggy.S             |   6 +
>   arch/x86/realmode/sev/Makefile          |  33 +
>   arch/x86/realmode/sev/ap_jump_table.S   | 131 ++++
>   arch/x86/realmode/sev/ap_jump_table.lds |  24 +
>   19 files changed, 1695 insertions(+), 1144 deletions(-)
>   create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
>   create mode 100644 arch/x86/lib/insn-eval-shared.c
>   create mode 100644 arch/x86/realmode/rm/sev.S
>   create mode 100644 arch/x86/realmode/sev/Makefile
>   create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
>   create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
> 
> 
> base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
> --
> 2.34.1
>
Tom Lendacky March 12, 2024, 2:04 p.m. UTC | #2
On 3/11/24 15:32, Vasant k wrote:
> Hi Tom,
> 
>         Right,  it just escaped my mind that the SNP uses the secrets page
> to hand over APs to the next stage.  I will correct that in the next

Not quite... The MADT table lists the APs and the GHCB AP Create NAE event 
is used to start the APs.

Thanks,
Tom

> version.  Please let me know if you have any corrections or improvement
> suggestions on the rest of the patchset.
> 
> Thanks,
> Vasant
>
Vasant Karasulli March 12, 2024, 3:16 p.m. UTC | #3
On Di 12-03-24 09:04:13, Tom Lendacky wrote:
> On 3/11/24 15:32, Vasant k wrote:
> > Hi Tom,
> >
> >         Right,  it just escaped my mind that the SNP uses the secrets page
> > to hand over APs to the next stage.  I will correct that in the next
>
> Not quite... The MADT table lists the APs and the GHCB AP Create NAE event
> is used to start the APs.

Alright. So AP Jump Table is not used like in the case of SEV-ES. Thanks,
I will keep the changes in the patch set exclusively for SEV-ES then.

- Vasant
Tom Lendacky March 12, 2024, 4:13 p.m. UTC | #4
On 3/12/24 10:16, Vasant Karasulli wrote:
> On Di 12-03-24 09:04:13, Tom Lendacky wrote:
>> On 3/11/24 15:32, Vasant k wrote:
>>> Hi Tom,
>>>
>>>          Right,  it just escaped my mind that the SNP uses the secrets page
>>> to hand over APs to the next stage.  I will correct that in the next
>>
>> Not quite... The MADT table lists the APs and the GHCB AP Create NAE event
>> is used to start the APs.
> 
> Alright. So AP Jump Table is not used like in the case of SEV-ES. Thanks,

Right. It can be, but we don't use that method in Linux.

Thanks,
Tom

> I will keep the changes in the patch set exclusively for SEV-ES then.
> 
> - Vasant