mbox series

[RFC,00/14] AMD: Add Secure AVIC Guest Support

Message ID 20240913113705.419146-1-Neeraj.Upadhyay@amd.com (mailing list archive)
Headers show
Series AMD: Add Secure AVIC Guest Support | expand

Message

Neeraj Upadhyay Sept. 13, 2024, 11:36 a.m. UTC
Introduction
------------

Secure AVIC is a new hardware feature in the AMD64 architecture to
allow SEV-SNP guests to prevent hypervisor from generating unexpected
interrupts to a vCPU or otherwise violate architectural assumptions
around APIC behavior.

One of the significant differences from AVIC or emulated x2APIC is that
Secure AVIC uses a guest-owned and managed APIC backing page. It also
introduces additional fields in both the VMCB and the Secure AVIC backing
page to aid the guest in limiting which interrupt vectors can be injected
into the guest.

Guest APIC Backing Page
-----------------------
Each vCPU has a guest-allocated APIC backing page of size 4K, which
maintains APIC state for that vCPU. The x2APIC MSRs are mapped at
their corresposing x2APIC MMIO offset within the guest APIC backing
page. All x2APIC accesses by guest or Secure AVIC hardware operate
on this backing page. The backing page should be pinned and NPT entry
for it should be always mapped while the corresponding vCPU is running.


MSR Accesses
------------
Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
accesses are not supported.

Some of the MSR accesses such as ICR writes (with shorthand equal to
self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
hardware. Other MSR accesses generate a #VC exception. The #VC
exception handler reads/writes to the guest APIC backing page.
As guest APIC backing page is accessible to the guest, the Secure
AVIC driver code optimizes APIC register access by directly
reading/writing to the guest APIC backing page (instead of taking
the #VC exception route).

In addition to the architected MSRs, following new fields are added to
the guest APIC backing page which can be modified directly by the
guest:

a. ALLOWED_IRR

ALLOWED_IRR vector indicates the interrupt vectors which the guest
allows the hypervisor to send. The combination of host-controlled
REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by
hardware to update the IRR vectors of the Guest APIC backing page.

#Offset        #bits        Description
204h           31:0         Guest allowed vectors 0-31
214h           31:0         Guest allowed vectors 32-63
...
274h           31:0         Guest allowed vectors 224-255

ALLOWED_IRR is meant to be used specifically for vectors that the
hypervisor is allowed to inject, such as device interrupts.  Interrupt
vectors used exclusively by the guest itself (like IPI vectors) should
not be allowed to be injected into the guest for security reasons.

b. NMI Request
 
#Offset        #bits        Description
278h           0            Set by Guest to request Virtual NMI


LAPIC Timer Support
-------------------
LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and
APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest
APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR
VMGEXIT. 

IPI Support
-----------
Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require
writing (from the Secure AVIC driver) to the IRR vector of the target CPU
backing page and then issuing VMGEXIT for the hypervisor to notify the
target vCPU.

Driver Implementation Open Points
---------------------------------

The Secure AVIC driver only supports physical destination mode. If
logical destination mode need to be supported, then a separate x2apic
driver would be required for supporting logical destination mode.

Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI
interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector
migrates to different CPU. Using a cleaner approach to manage and
configure allowed vectors needs more work.


Testing
-------

This series is based on top of commit 196145c606d0 "Merge
tag 'clk-fixes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux."

Host Secure AVIC support patch series is at [1].

Following tests are done:

1) Boot to Prompt using initramfs and ubuntu fs.
2) Verified timer and IPI as part of the guest bootup.
3) Verified long run SCF TORTURE IPI test.
4) Verified FIO test with NVME passthrough.

[1] https://github.com/AMDESE/linux-kvm/tree/savic-host

Kishon Vijay Abraham I (11):
  x86/apic: Add new driver for Secure AVIC
  x86/apic: Initialize Secure AVIC APIC backing page
  x86/apic: Initialize APIC backing page for Secure AVIC
  x86/apic: Add update_vector callback for Secure AVIC
  x86/apic: Add support to send IPI for Secure AVIC
  x86/apic: Support LAPIC timer for Secure AVIC
  x86/sev: Initialize VGIF for secondary VCPUs for Secure AVIC
  x86/apic: Add support to send NMI IPI for Secure AVIC
  x86/apic: Allow NMI to be injected from hypervisor for Secure AVIC
  x86/sev: Enable NMI support for Secure AVIC
  x86/sev: Indicate SEV-SNP guest supports Secure AVIC

Neeraj Upadhyay (3):
  x86/apic: Populate .read()/.write() callbacks of Secure AVIC driver
  x86/apic: Initialize APIC ID for Secure AVIC
  x86/apic: Enable Secure AVIC in Control MSR

 arch/x86/Kconfig                    |  12 +
 arch/x86/boot/compressed/sev.c      |   3 +-
 arch/x86/coco/core.c                |   3 +
 arch/x86/coco/sev/core.c            |  91 +++++-
 arch/x86/include/asm/apic.h         |   3 +
 arch/x86/include/asm/apicdef.h      |   2 +
 arch/x86/include/asm/msr-index.h    |   9 +-
 arch/x86/include/asm/sev.h          |   6 +
 arch/x86/include/uapi/asm/svm.h     |   1 +
 arch/x86/kernel/apic/Makefile       |   1 +
 arch/x86/kernel/apic/apic.c         |   4 +
 arch/x86/kernel/apic/vector.c       |   8 +
 arch/x86/kernel/apic/x2apic_savic.c | 480 ++++++++++++++++++++++++++++
 include/linux/cc_platform.h         |   8 +
 14 files changed, 621 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/kernel/apic/x2apic_savic.c

Comments

Kirill A. Shutemov Oct. 17, 2024, 8:23 a.m. UTC | #1
On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote:
> Introduction
> ------------
> 
> Secure AVIC is a new hardware feature in the AMD64 architecture to
> allow SEV-SNP guests to prevent hypervisor from generating unexpected
> interrupts to a vCPU or otherwise violate architectural assumptions
> around APIC behavior.
> 
> One of the significant differences from AVIC or emulated x2APIC is that
> Secure AVIC uses a guest-owned and managed APIC backing page. It also
> introduces additional fields in both the VMCB and the Secure AVIC backing
> page to aid the guest in limiting which interrupt vectors can be injected
> into the guest.
> 
> Guest APIC Backing Page
> -----------------------
> Each vCPU has a guest-allocated APIC backing page of size 4K, which
> maintains APIC state for that vCPU. The x2APIC MSRs are mapped at
> their corresposing x2APIC MMIO offset within the guest APIC backing
> page. All x2APIC accesses by guest or Secure AVIC hardware operate
> on this backing page. The backing page should be pinned and NPT entry
> for it should be always mapped while the corresponding vCPU is running.
> 
> 
> MSR Accesses
> ------------
> Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
> accesses are not supported.
> 
> Some of the MSR accesses such as ICR writes (with shorthand equal to
> self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
> hardware. Other MSR accesses generate a #VC exception. The #VC
> exception handler reads/writes to the guest APIC backing page.
> As guest APIC backing page is accessible to the guest, the Secure
> AVIC driver code optimizes APIC register access by directly
> reading/writing to the guest APIC backing page (instead of taking
> the #VC exception route).
> 
> In addition to the architected MSRs, following new fields are added to
> the guest APIC backing page which can be modified directly by the
> guest:
> 
> a. ALLOWED_IRR
> 
> ALLOWED_IRR vector indicates the interrupt vectors which the guest
> allows the hypervisor to send. The combination of host-controlled
> REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by
> hardware to update the IRR vectors of the Guest APIC backing page.
> 
> #Offset        #bits        Description
> 204h           31:0         Guest allowed vectors 0-31
> 214h           31:0         Guest allowed vectors 32-63
> ...
> 274h           31:0         Guest allowed vectors 224-255
> 
> ALLOWED_IRR is meant to be used specifically for vectors that the
> hypervisor is allowed to inject, such as device interrupts.  Interrupt
> vectors used exclusively by the guest itself (like IPI vectors) should
> not be allowed to be injected into the guest for security reasons.
> 
> b. NMI Request
>  
> #Offset        #bits        Description
> 278h           0            Set by Guest to request Virtual NMI
> 
> 
> LAPIC Timer Support
> -------------------
> LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and
> APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest
> APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR
> VMGEXIT. 
> 
> IPI Support
> -----------
> Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require
> writing (from the Secure AVIC driver) to the IRR vector of the target CPU
> backing page and then issuing VMGEXIT for the hypervisor to notify the
> target vCPU.
> 
> Driver Implementation Open Points
> ---------------------------------
> 
> The Secure AVIC driver only supports physical destination mode. If
> logical destination mode need to be supported, then a separate x2apic
> driver would be required for supporting logical destination mode.
> 
> Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI
> interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector
> migrates to different CPU. Using a cleaner approach to manage and
> configure allowed vectors needs more work.
> 
> 
> Testing
> -------
> 
> This series is based on top of commit 196145c606d0 "Merge
> tag 'clk-fixes-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux."
> 
> Host Secure AVIC support patch series is at [1].
> 
> Following tests are done:
> 
> 1) Boot to Prompt using initramfs and ubuntu fs.
> 2) Verified timer and IPI as part of the guest bootup.
> 3) Verified long run SCF TORTURE IPI test.
> 4) Verified FIO test with NVME passthrough.

One case that is missing is kexec.

If the first kernel set ALLOWED_IRR, but the target kernel doesn't know
anything about Secure AVIC, there are going to be a problem I assume.

I think we need ->setup() counterpart (->teardown() ?) to get
configuration back to the boot state. And get it called from kexec path.
Neeraj Upadhyay Oct. 18, 2024, 2:33 a.m. UTC | #2
Hi Kirill,

On 10/17/2024 1:53 PM, Kirill A. Shutemov wrote:
> On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote:
>> Introduction
>> ------------
>>
>> Secure AVIC is a new hardware feature in the AMD64 architecture to
>> allow SEV-SNP guests to prevent hypervisor from generating unexpected
>> interrupts to a vCPU or otherwise violate architectural assumptions
>> around APIC behavior.
>>
>> One of the significant differences from AVIC or emulated x2APIC is that
>> Secure AVIC uses a guest-owned and managed APIC backing page. It also
>> introduces additional fields in both the VMCB and the Secure AVIC backing
>> page to aid the guest in limiting which interrupt vectors can be injected
>> into the guest.
>>
>> Guest APIC Backing Page
>> -----------------------
>> Each vCPU has a guest-allocated APIC backing page of size 4K, which
>> maintains APIC state for that vCPU. The x2APIC MSRs are mapped at
>> their corresposing x2APIC MMIO offset within the guest APIC backing
>> page. All x2APIC accesses by guest or Secure AVIC hardware operate
>> on this backing page. The backing page should be pinned and NPT entry
>> for it should be always mapped while the corresponding vCPU is running.
>>
>>
>> MSR Accesses
>> ------------
>> Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
>> accesses are not supported.
>>
>> Some of the MSR accesses such as ICR writes (with shorthand equal to
>> self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
>> hardware. Other MSR accesses generate a #VC exception. The #VC
>> exception handler reads/writes to the guest APIC backing page.
>> As guest APIC backing page is accessible to the guest, the Secure
>> AVIC driver code optimizes APIC register access by directly
>> reading/writing to the guest APIC backing page (instead of taking
>> the #VC exception route).
>>
>> In addition to the architected MSRs, following new fields are added to
>> the guest APIC backing page which can be modified directly by the
>> guest:
>>
>> a. ALLOWED_IRR
>>
>> ALLOWED_IRR vector indicates the interrupt vectors which the guest
>> allows the hypervisor to send. The combination of host-controlled
>> REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by
>> hardware to update the IRR vectors of the Guest APIC backing page.
>>
>> #Offset        #bits        Description
>> 204h           31:0         Guest allowed vectors 0-31
>> 214h           31:0         Guest allowed vectors 32-63
>> ...
>> 274h           31:0         Guest allowed vectors 224-255
>>
>> ALLOWED_IRR is meant to be used specifically for vectors that the
>> hypervisor is allowed to inject, such as device interrupts.  Interrupt
>> vectors used exclusively by the guest itself (like IPI vectors) should
>> not be allowed to be injected into the guest for security reasons.
>>
>> b. NMI Request
>>  
>> #Offset        #bits        Description
>> 278h           0            Set by Guest to request Virtual NMI
>>
>>
>> LAPIC Timer Support
>> -------------------
>> LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and
>> APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest
>> APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR
>> VMGEXIT. 
>>
>> IPI Support
>> -----------
>> Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require
>> writing (from the Secure AVIC driver) to the IRR vector of the target CPU
>> backing page and then issuing VMGEXIT for the hypervisor to notify the
>> target vCPU.
>>
>> Driver Implementation Open Points
>> ---------------------------------
>>
>> The Secure AVIC driver only supports physical destination mode. If
>> logical destination mode need to be supported, then a separate x2apic
>> driver would be required for supporting logical destination mode.
>>
>> Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI
>> interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector
>> migrates to different CPU. Using a cleaner approach to manage and
>> configure allowed vectors needs more work.
>>
>>
>> Testing
>> -------
>>
>> This series is based on top of commit 196145c606d0 "Merge
>> tag 'clk-fixes-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux."
>>
>> Host Secure AVIC support patch series is at [1].
>>
>> Following tests are done:
>>
>> 1) Boot to Prompt using initramfs and ubuntu fs.
>> 2) Verified timer and IPI as part of the guest bootup.
>> 3) Verified long run SCF TORTURE IPI test.
>> 4) Verified FIO test with NVME passthrough.
> 
> One case that is missing is kexec.
> 
> If the first kernel set ALLOWED_IRR, but the target kernel doesn't know
> anything about Secure AVIC, there are going to be a problem I assume.
> 
> I think we need ->setup() counterpart (->teardown() ?) to get
> configuration back to the boot state. And get it called from kexec path.
> 

Agree, I haven't fully investigated the changes required to support kexec.
Yes, teardown step might be required to disable Secure AVIC in control msr
and possibly resetting other Secure AVIC configuration.

Thanks for pointing it out! I will update the details with kexec support
being missing in this series.


- Neeraj
Kirill A. Shutemov Oct. 18, 2024, 7:54 a.m. UTC | #3
On Fri, Oct 18, 2024 at 08:03:20AM +0530, Neeraj Upadhyay wrote:
> Hi Kirill,
> 
> On 10/17/2024 1:53 PM, Kirill A. Shutemov wrote:
> > On Fri, Sep 13, 2024 at 05:06:51PM +0530, Neeraj Upadhyay wrote:
> >> Introduction
> >> ------------
> >>
> >> Secure AVIC is a new hardware feature in the AMD64 architecture to
> >> allow SEV-SNP guests to prevent hypervisor from generating unexpected
> >> interrupts to a vCPU or otherwise violate architectural assumptions
> >> around APIC behavior.
> >>
> >> One of the significant differences from AVIC or emulated x2APIC is that
> >> Secure AVIC uses a guest-owned and managed APIC backing page. It also
> >> introduces additional fields in both the VMCB and the Secure AVIC backing
> >> page to aid the guest in limiting which interrupt vectors can be injected
> >> into the guest.
> >>
> >> Guest APIC Backing Page
> >> -----------------------
> >> Each vCPU has a guest-allocated APIC backing page of size 4K, which
> >> maintains APIC state for that vCPU. The x2APIC MSRs are mapped at
> >> their corresposing x2APIC MMIO offset within the guest APIC backing
> >> page. All x2APIC accesses by guest or Secure AVIC hardware operate
> >> on this backing page. The backing page should be pinned and NPT entry
> >> for it should be always mapped while the corresponding vCPU is running.
> >>
> >>
> >> MSR Accesses
> >> ------------
> >> Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
> >> accesses are not supported.
> >>
> >> Some of the MSR accesses such as ICR writes (with shorthand equal to
> >> self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
> >> hardware. Other MSR accesses generate a #VC exception. The #VC
> >> exception handler reads/writes to the guest APIC backing page.
> >> As guest APIC backing page is accessible to the guest, the Secure
> >> AVIC driver code optimizes APIC register access by directly
> >> reading/writing to the guest APIC backing page (instead of taking
> >> the #VC exception route).
> >>
> >> In addition to the architected MSRs, following new fields are added to
> >> the guest APIC backing page which can be modified directly by the
> >> guest:
> >>
> >> a. ALLOWED_IRR
> >>
> >> ALLOWED_IRR vector indicates the interrupt vectors which the guest
> >> allows the hypervisor to send. The combination of host-controlled
> >> REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by
> >> hardware to update the IRR vectors of the Guest APIC backing page.
> >>
> >> #Offset        #bits        Description
> >> 204h           31:0         Guest allowed vectors 0-31
> >> 214h           31:0         Guest allowed vectors 32-63
> >> ...
> >> 274h           31:0         Guest allowed vectors 224-255
> >>
> >> ALLOWED_IRR is meant to be used specifically for vectors that the
> >> hypervisor is allowed to inject, such as device interrupts.  Interrupt
> >> vectors used exclusively by the guest itself (like IPI vectors) should
> >> not be allowed to be injected into the guest for security reasons.
> >>
> >> b. NMI Request
> >>  
> >> #Offset        #bits        Description
> >> 278h           0            Set by Guest to request Virtual NMI
> >>
> >>
> >> LAPIC Timer Support
> >> -------------------
> >> LAPIC timer is emulated by hypervisor. So, APIC_LVTT, APIC_TMICT and
> >> APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest
> >> APIC backing page and are communicated to the hypervisor using SVM_EXIT_MSR
> >> VMGEXIT. 
> >>
> >> IPI Support
> >> -----------
> >> Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPIs require
> >> writing (from the Secure AVIC driver) to the IRR vector of the target CPU
> >> backing page and then issuing VMGEXIT for the hypervisor to notify the
> >> target vCPU.
> >>
> >> Driver Implementation Open Points
> >> ---------------------------------
> >>
> >> The Secure AVIC driver only supports physical destination mode. If
> >> logical destination mode need to be supported, then a separate x2apic
> >> driver would be required for supporting logical destination mode.
> >>
> >> Setting of ALLOWED_IRR vectors is done from vector.c for IOAPIC and MSI
> >> interrupts. ALLOWED_IRR vector is not cleared when an interrupt vector
> >> migrates to different CPU. Using a cleaner approach to manage and
> >> configure allowed vectors needs more work.
> >>
> >>
> >> Testing
> >> -------
> >>
> >> This series is based on top of commit 196145c606d0 "Merge
> >> tag 'clk-fixes-for-linus' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux."
> >>
> >> Host Secure AVIC support patch series is at [1].
> >>
> >> Following tests are done:
> >>
> >> 1) Boot to Prompt using initramfs and ubuntu fs.
> >> 2) Verified timer and IPI as part of the guest bootup.
> >> 3) Verified long run SCF TORTURE IPI test.
> >> 4) Verified FIO test with NVME passthrough.
> > 
> > One case that is missing is kexec.
> > 
> > If the first kernel set ALLOWED_IRR, but the target kernel doesn't know
> > anything about Secure AVIC, there are going to be a problem I assume.
> > 
> > I think we need ->setup() counterpart (->teardown() ?) to get
> > configuration back to the boot state. And get it called from kexec path.
> > 
> 
> Agree, I haven't fully investigated the changes required to support kexec.
> Yes, teardown step might be required to disable Secure AVIC in control msr
> and possibly resetting other Secure AVIC configuration.
> 
> Thanks for pointing it out! I will update the details with kexec support
> being missing in this series.

I think it has to be addressed before it got merged. Or we will get a
regression.
Borislav Petkov Oct. 29, 2024, 9:47 a.m. UTC | #4
On Fri, Oct 18, 2024 at 10:54:21AM +0300, Kirill A. Shutemov wrote:
> I think it has to be addressed before it got merged. Or we will get a
> regression.

... or temporarily disable kexec when SAVIC is present.
Neeraj Upadhyay Oct. 29, 2024, 10:24 a.m. UTC | #5
On 10/29/2024 3:17 PM, Borislav Petkov wrote:
> On Fri, Oct 18, 2024 at 10:54:21AM +0300, Kirill A. Shutemov wrote:
>> I think it has to be addressed before it got merged. Or we will get a
>> regression.
> 
> ... or temporarily disable kexec when SAVIC is present.
> 

Thanks! I plan to do something like below patch for the next version.
Verified Secure AVIC guest kexec with this.



- Neeraj

-----------------------------------------------------------------------

From 80a4901644fa8a9ed2c6f690fbba4b8a6176b215 Mon Sep 17 00:00:00 2001
From: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Date: Tue, 29 Oct 2024 15:38:21 +0530
Subject: [RFC 15/14] x86/apic: Add kexec support for Secure AVIC

Add a ->teardown callback to disable Secure AVIC before
rebooting into the new kernel. This ensures that the new
kernel does not access the old APIC backing page which was
allocated by the previous kernel. This can happen if there
are any APIC accesses done during guest boot before Secure
AVIC driver probe is done by the new kernel (as Secure AVIC
remained enabled in control msr).

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---

This is dependent on SNP guest supports patches [1]


[1] https://lore.kernel.org/lkml/cover.1722520012.git.ashish.kalra@amd.com/

 arch/x86/include/asm/apic.h         | 1 +
 arch/x86/kernel/apic/apic.c         | 3 +++
 arch/x86/kernel/apic/x2apic_savic.c | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 2d5400372470..ec332afd0277 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -303,6 +303,7 @@ struct apic {
        /* Probe, setup and smpboot functions */
        int     (*probe)(void);
        void    (*setup)(void);
+       void    (*teardown)(void);
        int     (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);

        void    (*init_apic_ldr)(void);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index aeda74bf15e6..08156ac4ec6c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1163,6 +1163,9 @@ void disable_local_APIC(void)
        if (!apic_accessible())
                return;

+       if (apic->teardown)
+               apic->teardown();
+
        apic_soft_disable();

 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/apic/x2apic_savic.c b/arch/x86/kernel/apic/x2apic_savic.c
index a3f0ddc6b5b6..bb7a28f9646a 100644
--- a/arch/x86/kernel/apic/x2apic_savic.c
+++ b/arch/x86/kernel/apic/x2apic_savic.c
@@ -391,6 +391,12 @@ static void init_backing_page(void *backing_page)
        set_reg(backing_page, APIC_ID, apic_id);
 }

+static void x2apic_savic_teardown(void)
+{
+       /* Disable Secure AVIC */
+       native_wrmsr(MSR_AMD64_SECURE_AVIC_CONTROL, 0, 0);
+}
+
 static void x2apic_savic_setup(void)
 {
        void *backing_page;
@@ -447,6 +453,7 @@ static struct apic apic_x2apic_savic __ro_after_init = {
        .probe                          = x2apic_savic_probe,
        .acpi_madt_oem_check            = x2apic_savic_acpi_madt_oem_check,
        .setup                          = x2apic_savic_setup,
+       .teardown                       = x2apic_savic_teardown,

        .dest_mode_logical              = false,

--
Borislav Petkov Oct. 29, 2024, 10:54 a.m. UTC | #6
On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote:
> Thanks! I plan to do something like below patch for the next version.
> Verified Secure AVIC guest kexec with this.

Sure, if you're adding a ->setup anyway, then it better have a counterpart.

:-)
Kirill A. Shutemov Oct. 29, 2024, 11:51 a.m. UTC | #7
On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote:
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index aeda74bf15e6..08156ac4ec6c 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1163,6 +1163,9 @@ void disable_local_APIC(void)
>         if (!apic_accessible())
>                 return;
> 
> +       if (apic->teardown)
> +               apic->teardown();
> +
>         apic_soft_disable();
> 
>  #ifdef CONFIG_X86_32

Hm. I think it will call apic->teardown() for all but the one CPU that
does kexec. I believe we need to disable SAVIC for all CPUs.

Have you tested the case when the target kernel doesn't support SAVIC and
tries to use a new interrupt vector on the boot CPU? I think it will
break.
Neeraj Upadhyay Oct. 29, 2024, 12:15 p.m. UTC | #8
On 10/29/2024 5:21 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote:
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index aeda74bf15e6..08156ac4ec6c 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -1163,6 +1163,9 @@ void disable_local_APIC(void)
>>         if (!apic_accessible())
>>                 return;
>>
>> +       if (apic->teardown)
>> +               apic->teardown();
>> +
>>         apic_soft_disable();
>>
>>  #ifdef CONFIG_X86_32
> 
> Hm. I think it will call apic->teardown() for all but the one CPU that
> does kexec. I believe we need to disable SAVIC for all CPUs.
> 

I see it being called for all CPUs.

For the CPU doing kexec, I see below backtrace, which lands into disable_local_APIC()

disable_local_APIC
native_stop_other_cpus
native_machine_shutdown
machine_shutdown
kernel_kexec

For the other CPUs, it is below:

disable_local_APIC
stop_this_cpu
__sysvec_reboot
sysvec_reboot


> Have you tested the case when the target kernel doesn't support SAVIC and
> tries to use a new interrupt vector on the boot CPU? I think it will
> break.
> 

For a VM launched with VMSA feature containing Secure AVIC, the target
kernel also is required to support Secure AVIC. Otherwise, guest bootup
would fail. I will capture this information in the documentation.
So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel
is not a valid use case.



- Neeraj
Kirill A. Shutemov Oct. 29, 2024, 2:36 p.m. UTC | #9
On Tue, Oct 29, 2024 at 05:45:23PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 10/29/2024 5:21 PM, Kirill A. Shutemov wrote:
> > On Tue, Oct 29, 2024 at 03:54:24PM +0530, Neeraj Upadhyay wrote:
> >> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> >> index aeda74bf15e6..08156ac4ec6c 100644
> >> --- a/arch/x86/kernel/apic/apic.c
> >> +++ b/arch/x86/kernel/apic/apic.c
> >> @@ -1163,6 +1163,9 @@ void disable_local_APIC(void)
> >>         if (!apic_accessible())
> >>                 return;
> >>
> >> +       if (apic->teardown)
> >> +               apic->teardown();
> >> +
> >>         apic_soft_disable();
> >>
> >>  #ifdef CONFIG_X86_32
> > 
> > Hm. I think it will call apic->teardown() for all but the one CPU that
> > does kexec. I believe we need to disable SAVIC for all CPUs.
> > 
> 
> I see it being called for all CPUs.
> 
> For the CPU doing kexec, I see below backtrace, which lands into disable_local_APIC()
> 
> disable_local_APIC
> native_stop_other_cpus
> native_machine_shutdown
> machine_shutdown
> kernel_kexec
> 
> For the other CPUs, it is below:
> 
> disable_local_APIC
> stop_this_cpu
> __sysvec_reboot
> sysvec_reboot

Backtraces are backwards, but, yeah, I missed reboot path.

> > Have you tested the case when the target kernel doesn't support SAVIC and
> > tries to use a new interrupt vector on the boot CPU? I think it will
> > break.
> > 
> 
> For a VM launched with VMSA feature containing Secure AVIC, the target
> kernel also is required to support Secure AVIC. Otherwise, guest bootup
> would fail. I will capture this information in the documentation.
> So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel
> is not a valid use case.

Hm. I thought if SAVIC is not enabled by the guest the guest would boot
without the secure feature, no?
Neeraj Upadhyay Oct. 29, 2024, 3:28 p.m. UTC | #10
>>> Have you tested the case when the target kernel doesn't support SAVIC and
>>> tries to use a new interrupt vector on the boot CPU? I think it will
>>> break.
>>>
>>
>> For a VM launched with VMSA feature containing Secure AVIC, the target
>> kernel also is required to support Secure AVIC. Otherwise, guest bootup
>> would fail. I will capture this information in the documentation.
>> So, as far as I understand, SAVIC kernel kexecing into a non-SAVIC kernel
>> is not a valid use case.
> 
> Hm. I thought if SAVIC is not enabled by the guest the guest would boot
> without the secure feature, no?
> 

Actually no. The guest VM which is launched by VMM with Secure AVIC enabled
would have SecureAVIC reported in sev_status MSR. Secure AVIC is part of
SNP_FEATURES_IMPL_REQ  and guest boot would terminate due to snp_get_unsupported_features()
check in arch/x86/boot/compressed/sev.c if secure avic is not enabled (having said that,
I need to update config rules to select CONFIG_AMD_SECURE_AVIC if CONFIG_AMD_MEM_ENCRYPT
is enabled).

- Neeraj