Message ID | 20240913113705.419146-1-Neeraj.Upadhyay@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | AMD: Add Secure AVIC Guest Support | expand |
On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote: > Introduction > ------------ > > Secure AVIC is a new hardware feature in the AMD64 architecture to > allow SEV-SNP guests to prevent hypervisor from generating unexpected > interrupts to a vCPU or otherwise violate architectural assumptions > around APIC behavior. > > One of the significant differences from AVIC or emulated x2APIC is that > Secure AVIC uses a guest-owned and managed APIC backing page. It also > introduces additional fields in both the VMCB and the Secure AVIC backing > page to aid the guest in limiting which interrupt vectors can be injected > into the guest. > > Guest APIC Backing Page > ----------------------- > Each vCPU has a guest-allocated APIC backing page of size 4K, which > maintains APIC state for that vCPU. The x2APIC MSRs are mapped at > their corresposing x2APIC MMIO offset within the guest APIC backing > page. All x2APIC accesses by guest or Secure AVIC hardware operate > on this backing page. The backing page should be pinned and NPT entry > for it should be always mapped while the corresponding vCPU is running. > > > MSR Accesses > ------------ > Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based > accesses are not supported. > > Some of the MSR accesses such as ICR writes (with shorthand equal to > self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC > hardware. Other MSR accesses generate a #VC exception. The #VC > exception handler reads/writes to the guest APIC backing page. > As guest APIC backing page is accessible to the guest, the Secure > AVIC driver code optimizes APIC register access by directly > reading/writing to the guest APIC backing page (instead of taking > the #VC exception route). > > In addition to the architected MSRs, following new fields are added to > the guest APIC backing page which can be modified directly by the > guest: > > a. ALLOWED_IRR > > ALLOWED_IRR vector indicates the interrupt vectors which the guest > allows the hypervisor to send. The combination of host-controlled > REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by > hardware to update the IRR vectors of the Guest APIC backing page. > > #Offset #bits Description > 204h 31:0 Guest allowed vectors 0-31 > 214h 31:0 Guest allowed vectors 32-63 > ... > 274h 31:0 Guest allowed vectors 224-255 > > ALLOWED_IRR is meant to be used specifically for vectors that the > hypervisor is allowed to inject, such as device interrupts. Interrupt > vectors used exclusively by the guest itself (like IPI vectors) should > not be allowed to be injected into the guest for security reasons. > > b. NMI Request > > #Offset #bits Description > 278h 0 Set by Guest to request Virtual NMI > > > LAPIC Timer Support > ------------------- > LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and > APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest > APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR > VMGEXIT. > > IPI Support > ----------- > Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require > writing (from the Secure AVIC driver) to the IRR vector of the target CPU > backing page and then issuing VMGEXIT for the hypervisor to notify the > target vCPU. > > Driver Implementation Open Points > --------------------------------- > > The Secure AVIC driver only supports physical destination mode. If > logical destination mode need to be supported, then a separate x2apic > driver would be required for supporting logical destination mode. > > Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI > interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector > migrates to different CPU. Using a cleaner approach to manage and > configure allowed vectors needs more work. > > > Testing > ------- > > This series is based on top of commit 196145c606d0 "Merge > tag 'clk-fixes-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux." > > Host Secure AVIC support patch series is at [1]. > > Following tests are done: > > 1) Boot to Prompt using initramfs and ubuntu fs. > 2) Verified timer and IPI as part of the guest bootup. > 3) Verified long run SCF TORTURE IPI test. > 4) Verified FIO test with NVME passthrough. One case that is missing is kexec. If the first kernel set ALLOWED_IRR, but the target kernel doesn't know anything about Secure AVIC, there are going to be a problem I assume. I think we need ->setup() counterpart (->teardown() ?) to get configuration back to the boot state. And get it called from kexec path.
Hi Kirill, On 10/17/2024 1:53 PM, Kirill A. Shutemov wrote: > On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote: >> Introduction >> ------------ >> >> Secure AVIC is a new hardware feature in the AMD64 architecture to >> allow SEV-SNP guests to prevent hypervisor from generating unexpected >> interrupts to a vCPU or otherwise violate architectural assumptions >> around APIC behavior. >> >> One of the significant differences from AVIC or emulated x2APIC is that >> Secure AVIC uses a guest-owned and managed APIC backing page. It also >> introduces additional fields in both the VMCB and the Secure AVIC backing >> page to aid the guest in limiting which interrupt vectors can be injected >> into the guest. >> >> Guest APIC Backing Page >> ----------------------- >> Each vCPU has a guest-allocated APIC backing page of size 4K, which >> maintains APIC state for that vCPU. The x2APIC MSRs are mapped at >> their corresposing x2APIC MMIO offset within the guest APIC backing >> page. All x2APIC accesses by guest or Secure AVIC hardware operate >> on this backing page. The backing page should be pinned and NPT entry >> for it should be always mapped while the corresponding vCPU is running. >> >> >> MSR Accesses >> ------------ >> Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based >> accesses are not supported. >> >> Some of the MSR accesses such as ICR writes (with shorthand equal to >> self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC >> hardware. Other MSR accesses generate a #VC exception. The #VC >> exception handler reads/writes to the guest APIC backing page. >> As guest APIC backing page is accessible to the guest, the Secure >> AVIC driver code optimizes APIC register access by directly >> reading/writing to the guest APIC backing page (instead of taking >> the #VC exception route). >> >> In addition to the architected MSRs, following new fields are added to >> the guest APIC backing page which can be modified directly by the >> guest: >> >> a. ALLOWED_IRR >> >> ALLOWED_IRR vector indicates the interrupt vectors which the guest >> allows the hypervisor to send. The combination of host-controlled >> REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by >> hardware to update the IRR vectors of the Guest APIC backing page. >> >> #Offset #bits Description >> 204h 31:0 Guest allowed vectors 0-31 >> 214h 31:0 Guest allowed vectors 32-63 >> ... >> 274h 31:0 Guest allowed vectors 224-255 >> >> ALLOWED_IRR is meant to be used specifically for vectors that the >> hypervisor is allowed to inject, such as device interrupts. Interrupt >> vectors used exclusively by the guest itself (like IPI vectors) should >> not be allowed to be injected into the guest for security reasons. >> >> b. NMI Request >> >> #Offset #bits Description >> 278h 0 Set by Guest to request Virtual NMI >> >> >> LAPIC Timer Support >> ------------------- >> LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and >> APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest >> APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR >> VMGEXIT. >> >> IPI Support >> ----------- >> Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require >> writing (from the Secure AVIC driver) to the IRR vector of the target CPU >> backing page and then issuing VMGEXIT for the hypervisor to notify the >> target vCPU. >> >> Driver Implementation Open Points >> --------------------------------- >> >> The Secure AVIC driver only supports physical destination mode. If >> logical destination mode need to be supported, then a separate x2apic >> driver would be required for supporting logical destination mode. >> >> Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI >> interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector >> migrates to different CPU. Using a cleaner approach to manage and >> configure allowed vectors needs more work. >> >> >> Testing >> ------- >> >> This series is based on top of commit 196145c606d0 "Merge >> tag 'clk-fixes-for-linus' of >> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux." >> >> Host Secure AVIC support patch series is at [1]. >> >> Following tests are done: >> >> 1) Boot to Prompt using initramfs and ubuntu fs. >> 2) Verified timer and IPI as part of the guest bootup. >> 3) Verified long run SCF TORTURE IPI test. >> 4) Verified FIO test with NVME passthrough. > > One case that is missing is kexec. > > If the first kernel set ALLOWED_IRR, but the target kernel doesn't know > anything about Secure AVIC, there are going to be a problem I assume. > > I think we need ->setup() counterpart (->teardown() ?) to get > configuration back to the boot state. And get it called from kexec path. > Agree, I haven't fully investigated the changes required to support kexec. Yes, teardown step might be required to disable Secure AVIC in control msr and possibly resetting other Secure AVIC configuration. Thanks for pointing it out! I will update the details with kexec support being missing in this series. - Neeraj
On Fri, Oct 18, 2024 at 08:03:20AM +0530, Neeraj Upadhyay wrote: > Hi Kirill, > > On 10/17/2024 1:53 PM, Kirill A. Shutemov wrote: > > On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote: > >> Introduction > >> ------------ > >> > >> Secure AVIC is a new hardware feature in the AMD64 architecture to > >> allow SEV-SNP guests to prevent hypervisor from generating unexpected > >> interrupts to a vCPU or otherwise violate architectural assumptions > >> around APIC behavior. > >> > >> One of the significant differences from AVIC or emulated x2APIC is that > >> Secure AVIC uses a guest-owned and managed APIC backing page. It also > >> introduces additional fields in both the VMCB and the Secure AVIC backing > >> page to aid the guest in limiting which interrupt vectors can be injected > >> into the guest. > >> > >> Guest APIC Backing Page > >> ----------------------- > >> Each vCPU has a guest-allocated APIC backing page of size 4K, which > >> maintains APIC state for that vCPU. The x2APIC MSRs are mapped at > >> their corresposing x2APIC MMIO offset within the guest APIC backing > >> page. All x2APIC accesses by guest or Secure AVIC hardware operate > >> on this backing page. The backing page should be pinned and NPT entry > >> for it should be always mapped while the corresponding vCPU is running. > >> > >> > >> MSR Accesses > >> ------------ > >> Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based > >> accesses are not supported. > >> > >> Some of the MSR accesses such as ICR writes (with shorthand equal to > >> self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC > >> hardware. Other MSR accesses generate a #VC exception. The #VC > >> exception handler reads/writes to the guest APIC backing page. > >> As guest APIC backing page is accessible to the guest, the Secure > >> AVIC driver code optimizes APIC register access by directly > >> reading/writing to the guest APIC backing page (instead of taking > >> the #VC exception route). > >> > >> In addition to the architected MSRs, following new fields are added to > >> the guest APIC backing page which can be modified directly by the > >> guest: > >> > >> a. ALLOWED_IRR > >> > >> ALLOWED_IRR vector indicates the interrupt vectors which the guest > >> allows the hypervisor to send. The combination of host-controlled > >> REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by > >> hardware to update the IRR vectors of the Guest APIC backing page. > >> > >> #Offset #bits Description > >> 204h 31:0 Guest allowed vectors 0-31 > >> 214h 31:0 Guest allowed vectors 32-63 > >> ... > >> 274h 31:0 Guest allowed vectors 224-255 > >> > >> ALLOWED_IRR is meant to be used specifically for vectors that the > >> hypervisor is allowed to inject, such as device interrupts. Interrupt > >> vectors used exclusively by the guest itself (like IPI vectors) should > >> not be allowed to be injected into the guest for security reasons. > >> > >> b. NMI Request > >> > >> #Offset #bits Description > >> 278h 0 Set by Guest to request Virtual NMI > >> > >> > >> LAPIC Timer Support > >> ------------------- > >> LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and > >> APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest > >> APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR > >> VMGEXIT. > >> > >> IPI Support > >> ----------- > >> Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require > >> writing (from the Secure AVIC driver) to the IRR vector of the target CPU > >> backing page and then issuing VMGEXIT for the hypervisor to notify the > >> target vCPU. > >> > >> Driver Implementation Open Points > >> --------------------------------- > >> > >> The Secure AVIC driver only supports physical destination mode. If > >> logical destination mode need to be supported, then a separate x2apic > >> driver would be required for supporting logical destination mode. > >> > >> Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI > >> interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector > >> migrates to different CPU. Using a cleaner approach to manage and > >> configure allowed vectors needs more work. > >> > >> > >> Testing > >> ------- > >> > >> This series is based on top of commit 196145c606d0 "Merge > >> tag 'clk-fixes-for-linus' of > >> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux." > >> > >> Host Secure AVIC support patch series is at [1]. > >> > >> Following tests are done: > >> > >> 1) Boot to Prompt using initramfs and ubuntu fs. > >> 2) Verified timer and IPI as part of the guest bootup. > >> 3) Verified long run SCF TORTURE IPI test. > >> 4) Verified FIO test with NVME passthrough. > > > > One case that is missing is kexec. > > > > If the first kernel set ALLOWED_IRR, but the target kernel doesn't know > > anything about Secure AVIC, there are going to be a problem I assume. > > > > I think we need ->setup() counterpart (->teardown() ?) to get > > configuration back to the boot state. And get it called from kexec path. > > > > Agree, I haven't fully investigated the changes required to support kexec. > Yes, teardown step might be required to disable Secure AVIC in control msr > and possibly resetting other Secure AVIC configuration. > > Thanks for pointing it out! I will update the details with kexec support > being missing in this series. I think it has to be addressed before it got merged. Or we will get a regression.
On Fri, Oct 18, 2024 at 10:54:21AM +0300, Kirill A. Shutemov wrote: > I think it has to be addressed before it got merged. Or we will get a > regression. ... or temporarily disable kexec when SAVIC is present.
On 10/29/2024 3:17 PM, Borislav Petkov wrote: > On Fri, Oct 18, 2024 at 10:54:21AM +0300, Kirill A. Shutemov wrote: >> I think it has to be addressed before it got merged. Or we will get a >> regression. > > ... or temporarily disable kexec when SAVIC is present. > Thanks! I plan to do something like below patch for the next version. Verified Secure AVIC guest kexec with this. - Neeraj ----------------------------------------------------------------------- From 80a4901644fa8a9ed2c6f690fbba4b8a6176b215 Mon Sep 17 00:00:00 2001 From: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Date: Tue, 29 Oct 2024 15:38:21 +0530 Subject: [RFC 15/14] x86/apic: Add kexec support for Secure AVIC Add a ->teardown callback to disable Secure AVIC before rebooting into the new kernel. This ensures that the new kernel does not access the old APIC backing page which was allocated by the previous kernel. This can happen if there are any APIC accesses done during guest boot before Secure AVIC driver probe is done by the new kernel (as Secure AVIC remained enabled in control msr). Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> --- This is dependent on SNP guest supports patches [1] [1] https://lore.kernel.org/lkml/cover.1722520012.git.ashish.kalra@amd.com/ arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/apic/apic.c | 3 +++ arch/x86/kernel/apic/x2apic_savic.c | 7 +++++++ 3 files changed, 11 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 2d5400372470..ec332afd0277 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -303,6 +303,7 @@ struct apic { /* Probe, setup and smpboot functions */ int (*probe)(void); void (*setup)(void); + void (*teardown)(void); int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id); void (*init_apic_ldr)(void); diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index aeda74bf15e6..08156ac4ec6c 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1163,6 +1163,9 @@ void disable_local_APIC(void) if (!apic_accessible()) return; + if (apic->teardown) + apic->teardown(); + apic_soft_disable(); #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/apic/x2apic_savic.c b/arch/x86/kernel/apic/x2apic_savic.c index a3f0ddc6b5b6..bb7a28f9646a 100644 --- a/arch/x86/kernel/apic/x2apic_savic.c +++ b/arch/x86/kernel/apic/x2apic_savic.c @@ -391,6 +391,12 @@ static void init_backing_page(void *backing_page) set_reg(backing_page, APIC_ID, apic_id); } +static void x2apic_savic_teardown(void) +{ + /* Disable Secure AVIC */ + native_wrmsr(MSR_AMD64_SECURE_AVIC_CONTROL, 0, 0); +} + static void x2apic_savic_setup(void) { void *backing_page; @@ -447,6 +453,7 @@ static struct apic apic_x2apic_savic __ro_after_init = { .probe = x2apic_savic_probe, .acpi_madt_oem_check = x2apic_savic_acpi_madt_oem_check, .setup = x2apic_savic_setup, + .teardown = x2apic_savic_teardown, .dest_mode_logical = false, --
On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote: > Thanks! I plan to do something like below patch for the next version. > Verified Secure AVIC guest kexec with this. Sure, if you're adding a ->setup anyway, then it better have a counterpart. :-)
On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote: > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index aeda74bf15e6..08156ac4ec6c 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1163,6 +1163,9 @@ void disable_local_APIC(void) > if (!apic_accessible()) > return; > > + if (apic->teardown) > + apic->teardown(); > + > apic_soft_disable(); > > #ifdef CONFIG_X86_32 Hm. I think it will call apic->teardown() for all but the one CPU that does kexec. I believe we need to disable SAVIC for all CPUs. Have you tested the case when the target kernel doesn't support SAVIC and tries to use a new interrupt vector on the boot CPU? I think it will break.
On 10/29/2024 5:21 PM, Kirill A. Shutemov wrote: > On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote: >> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >> index aeda74bf15e6..08156ac4ec6c 100644 >> --- a/arch/x86/kernel/apic/apic.c >> +++ b/arch/x86/kernel/apic/apic.c >> @@ -1163,6 +1163,9 @@ void disable_local_APIC(void) >> if (!apic_accessible()) >> return; >> >> + if (apic->teardown) >> + apic->teardown(); >> + >> apic_soft_disable(); >> >> #ifdef CONFIG_X86_32 > > Hm. I think it will call apic->teardown() for all but the one CPU that > does kexec. I believe we need to disable SAVIC for all CPUs. > I see it being called for all CPUs. For the CPU doing kexec, I see below backtrace, which lands into disable_local_APIC() disable_local_APIC native_stop_other_cpus native_machine_shutdown machine_shutdown kernel_kexec For the other CPUs, it is below: disable_local_APIC stop_this_cpu __sysvec_reboot sysvec_reboot > Have you tested the case when the target kernel doesn't support SAVIC and > tries to use a new interrupt vector on the boot CPU? I think it will > break. > For a VM launched with VMSA feature containing Secure AVIC, the target kernel also is required to support Secure AVIC. Otherwise, guest bootup would fail. I will capture this information in the documentation. So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel is not a valid use case. - Neeraj
On Tue, Oct 29, 2024 at 05:45:23PM +0530, Neeraj Upadhyay wrote: > > > On 10/29/2024 5:21 PM, Kirill A. Shutemov wrote: > > On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote: > >> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > >> index aeda74bf15e6..08156ac4ec6c 100644 > >> --- a/arch/x86/kernel/apic/apic.c > >> +++ b/arch/x86/kernel/apic/apic.c > >> @@ -1163,6 +1163,9 @@ void disable_local_APIC(void) > >> if (!apic_accessible()) > >> return; > >> > >> + if (apic->teardown) > >> + apic->teardown(); > >> + > >> apic_soft_disable(); > >> > >> #ifdef CONFIG_X86_32 > > > > Hm. I think it will call apic->teardown() for all but the one CPU that > > does kexec. I believe we need to disable SAVIC for all CPUs. > > > > I see it being called for all CPUs. > > For the CPU doing kexec, I see below backtrace, which lands into disable_local_APIC() > > disable_local_APIC > native_stop_other_cpus > native_machine_shutdown > machine_shutdown > kernel_kexec > > For the other CPUs, it is below: > > disable_local_APIC > stop_this_cpu > __sysvec_reboot > sysvec_reboot Backtraces are backwards, but, yeah, I missed reboot path. > > Have you tested the case when the target kernel doesn't support SAVIC and > > tries to use a new interrupt vector on the boot CPU? I think it will > > break. > > > > For a VM launched with VMSA feature containing Secure AVIC, the target > kernel also is required to support Secure AVIC. Otherwise, guest bootup > would fail. I will capture this information in the documentation. > So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel > is not a valid use case. Hm. I thought if SAVIC is not enabled by the guest the guest would boot without the secure feature, no?
>>> Have you tested the case when the target kernel doesn't support SAVIC and >>> tries to use a new interrupt vector on the boot CPU? I think it will >>> break. >>> >> >> For a VM launched with VMSA feature containing Secure AVIC, the target >> kernel also is required to support Secure AVIC. Otherwise, guest bootup >> would fail. I will capture this information in the documentation. >> So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel >> is not a valid use case. > > Hm. I thought if SAVIC is not enabled by the guest the guest would boot > without the secure feature, no? > Actually no. The guest VM which is launched by VMM with Secure AVIC enabled would have SecureAVIC reported in sev_status MSR. Secure AVIC is part of SNP_FEATURES_IMPL_REQ and guest boot would terminate due to snp_get_unsupported_features() check in arch/x86/boot/compressed/sev.c if secure avic is not enabled (having said that, I need to update config rules to select CONFIG_AMD_SECURE_AVIC if CONFIG_AMD_MEM_ENCRYPT is enabled). - Neeraj