diff mbox series

[v3] x86/acpi: fix panic while AP online later with kernel parameter maxcpus=1

Message ID 20240702005800.622910-1-zhiquan1.li@intel.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series [v3] x86/acpi: fix panic while AP online later with kernel parameter maxcpus=1 | expand

Commit Message

Zhiquan Li July 2, 2024, 12:58 a.m. UTC
The issue was found on the platform that using "Multiprocessor Wakeup
Structure"[1] to startup secondary CPU, which is usually used by
encrypted guest.  When restrict boot time CPU to 1 with the kernel
parameter "maxcpus=1" and bring other CPUs online later, there will be
a kernel panic.

The variable acpi_mp_wake_mailbox, which holds the virtual address of
the MP Wakeup Structure mailbox, will be set as read-only after init.
If the first AP gets online later, after init, the attempt to update
the variable results in panic.

The memremap() call that initializes the variable cannot be moved into
acpi_parse_mp_wake() because memremap() is not functional at that point
in the boot process.

[1] Details about the MP Wakeup structure can be found in ACPI v6.4, in
    the "Multiprocessor Wakeup Structure" section.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

---

V2: https://lore.kernel.org/all/20240628082119.357735-1-zhiquan1.li@intel.com/

Changes since V2:
- Modify the commit log as suggested by Kirill.
- Add Kirill's Reviewed-by tag.

V1: https://lore.kernel.org/all/20240626073920.176471-1-zhiquan1.li@intel.com/

Changes since V1:
- Amend the commit message as per Kirill's comments.
---
 arch/x86/kernel/acpi/madt_wakeup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: ec6574a634db84f25e4eee6698d76fed5649e3bd

Comments

Kai Huang July 2, 2024, 12:05 p.m. UTC | #1
On Tue, 2024-07-02 at 08:58 +0800, Zhiquan Li wrote:
> The issue was found on the platform that using "Multiprocessor Wakeup
> Structure"[1] to startup secondary CPU, which is usually used by
> encrypted guest.  When restrict boot time CPU to 1 with the kernel
> parameter "maxcpus=1" and bring other CPUs online later, there will be
> a kernel panic.
> 
> The variable acpi_mp_wake_mailbox, which holds the virtual address of
> the MP Wakeup Structure mailbox, will be set as read-only after init.
> If the first AP gets online later, after init, the attempt to update
> the variable results in panic.
> 
> The memremap() call that initializes the variable cannot be moved into
> acpi_parse_mp_wake() because memremap() is not functional at that point
> in the boot process.
> 
> [1] Details about the MP Wakeup structure can be found in ACPI v6.4, in
>     the "Multiprocessor Wakeup Structure" section.
> 
> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Seems this changelog only mentions the problem, but doesn't say how to fix:

  Remove the __ro_after_init annotation of acpi_mp_wake_mailbox to fix.

[...]

> 
>  /* Virtual address of the Multiprocessor Wakeup Structure mailbox */
> -static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox __ro_after_init;
> +static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
>  
> 

But if memremap() isn't available in acpi_parse_mp_wake() is the concern, could
we change to do it before smp_init() but after memremap() is available?

E.g, we should be able to use early_initcall() (only compile tested):

diff --git a/arch/x86/kernel/acpi/madt_wakeup.c
b/arch/x86/kernel/acpi/madt_wakeup.c
index 6cfe762be28b..ca0aea0832ac 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -176,18 +176,6 @@ static int acpi_wakeup_cpu(u32 apicid, unsigned long
start_ip)
                return -EOPNOTSUPP;
        }
 
-       /*
-        * Remap mailbox memory only for the first call to acpi_wakeup_cpu().
-        *
-        * Wakeup of secondary CPUs is fully serialized in the core code.
-        * No need to protect acpi_mp_wake_mailbox from concurrent accesses.
-        */
-       if (!acpi_mp_wake_mailbox) {
-               acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
-                                               sizeof(*acpi_mp_wake_mailbox),
-                                               MEMREMAP_WB);
-       }
-
        /*
         * Mailbox memory is shared between the firmware and OS. Firmware will
         * listen on mailbox command address, and once it receives the wakeup
@@ -290,3 +278,29 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers
*header,
 
        return 0;
 }
+
+static int __init acpi_mp_wake_map_mailbox(void)
+{
+       /*
+        * Nothing to do if the "Multiprocessor Wakeup Structure" is
+        * not present.  acpi_mp_wake_mailbox_paddr could also be 0
+        * if the kernel is from kexec.
+        */
+       if (!acpi_mp_wake_mailbox_paddr)
+               return 0;
+
+       /* memremap() isn't available in acpi_parse_mp_wake() */
+       acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
+                                       sizeof(*acpi_mp_wake_mailbox),
+                                       MEMREMAP_WB);
+       /*
+        * When memremap() fails, clear acpi_mp_wake_mailbox_paddr
+        * to prevent NULL dereference of acpi_mp_wake_mailbox
+        * when bringing up secondary CPUs.
+        */
+       if (WARN_ON(!acpi_mp_wake_mailbox))
+               acpi_mp_wake_mailbox_paddr = 0;
+
+       return 0;
+}
+early_initcall(acpi_mp_wake_map_mailbox);
Borislav Petkov July 2, 2024, 12:45 p.m. UTC | #2
On Tue, Jul 02, 2024 at 12:05:38PM +0000, Huang, Kai wrote:
> On Tue, 2024-07-02 at 08:58 +0800, Zhiquan Li wrote:
> > The issue was found on the platform that using "Multiprocessor Wakeup
> > Structure"[1] to startup secondary CPU, which is usually used by
> > encrypted guest.  When restrict boot time CPU to 1 with the kernel
> > parameter "maxcpus=1" and bring other CPUs online later, there will be
> > a kernel panic.
> > 
> > The variable acpi_mp_wake_mailbox, which holds the virtual address of
> > the MP Wakeup Structure mailbox, will be set as read-only after init.
> > If the first AP gets online later, after init, the attempt to update
> > the variable results in panic.
> > 
> > The memremap() call that initializes the variable cannot be moved into
> > acpi_parse_mp_wake() because memremap() is not functional at that point
> > in the boot process.
> > 
> > [1] Details about the MP Wakeup structure can be found in ACPI v6.4, in
> >     the "Multiprocessor Wakeup Structure" section.
> > 
> > Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Seems this changelog only mentions the problem, but doesn't say how to fix:
> 
>   Remove the __ro_after_init annotation of acpi_mp_wake_mailbox to fix.

Do not talk about *what* the patch is doing in the commit message - that
should be obvious from the diff itself. Rather, concentrate on the *why*
it needs to be done.

Imagine one fine day you're doing git archeology, you find the place in
the code about which you want to find out why it was changed the way it 
is now.

You do git annotate <filename> ... find the line, see the commit id and
you do:

git show <commit id>

You read the commit message and there's just gibberish and nothing's
explaining *why* that change was done. And you start scratching your
head, trying to figure out why. Because the damn commit message is worth
sh*t.

This happens to us maintainers at least once a week. Well, I don't want
that to happen in my tree anymore.

So none of this text above still doesn't explain to me *why* this is
happening.

Why do APs need to update acpi_mp_wake_mailbox?

Which patch is this fixing?

See Documentation/process/submitting-patches.rst

Questions over questions...
Kai Huang July 2, 2024, 11:55 p.m. UTC | #3
On Tue, 2024-07-02 at 14:45 +0200, Borislav Petkov wrote:
> On Tue, Jul 02, 2024 at 12:05:38PM +0000, Huang, Kai wrote:
> > On Tue, 2024-07-02 at 08:58 +0800, Zhiquan Li wrote:
> > > The issue was found on the platform that using "Multiprocessor Wakeup
> > > Structure"[1] to startup secondary CPU, which is usually used by
> > > encrypted guest.  When restrict boot time CPU to 1 with the kernel
> > > parameter "maxcpus=1" and bring other CPUs online later, there will be
> > > a kernel panic.
> > > 
> > > The variable acpi_mp_wake_mailbox, which holds the virtual address of
> > > the MP Wakeup Structure mailbox, will be set as read-only after init.
> > > If the first AP gets online later, after init, the attempt to update
> > > the variable results in panic.
> > > 
> > > The memremap() call that initializes the variable cannot be moved into
> > > acpi_parse_mp_wake() because memremap() is not functional at that point
> > > in the boot process.
> > > 
> > > [1] Details about the MP Wakeup structure can be found in ACPI v6.4, in
> > >     the "Multiprocessor Wakeup Structure" section.
> > > 
> > > Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> > Seems this changelog only mentions the problem, but doesn't say how to fix:
> > 
> >   Remove the __ro_after_init annotation of acpi_mp_wake_mailbox to fix.
> 
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
> 
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it 
> is now.
> 
> You do git annotate <filename> ... find the line, see the commit id and
> you do:
> 
> git show <commit id>
> 
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.

Yeah fully agree.  Thanks for saying this again.

> 
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
> 
> So none of this text above still doesn't explain to me *why* this is
> happening.
> 
> Why do APs need to update acpi_mp_wake_mailbox?

They don't need to if acpi_mp_wake_mailbox can be setup before smp_init()
once for all.

But currently the setup of acpi_mp_wake_mailbox is done when the first AP is
brought up because memremap() doesn't work in acpi_parse_mp_wake(), as
mentioned in the changelog of this patch.

I also feel it's not ideal to setup acpi_mp_wake_mailbox when bringing up
the first AP, so I provided my diff.  IIUC, if memremap() works for
acpi_mp_wake_mailbox when bringing up the first AP, then it should also work
in
the early_initcall().

> 
> Which patch is this fixing?

It fiexes below commit AFAICT:

  24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as
__ro_after_init")

Which didn't consider 'maxvcpus=xx' case.


But I will leave to Kirill to confirm.
Zhiquan Li July 3, 2024, 2:39 a.m. UTC | #4
On 2024/7/3 07:55, Huang, Kai wrote:
>> This happens to us maintainers at least once a week. Well, I don't want
>> that to happen in my tree anymore.
>>
>> So none of this text above still doesn't explain to me *why* this is
>> happening.
>>
>> Why do APs need to update acpi_mp_wake_mailbox?

Not AP needs to update acpi_mp_wake_mailbox, but BSP might need to
update it after the init stage.  In the encrypted guest CPU hot-plug
scenario, BSP memremap() the acpi_mp_wake_mailbox_paddr, and writes APIC
ID of APs, wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command
into mailbox.  Firmware will listen on mailbox command address, and once
it receives the wakeup command, the CPU associated with the given apicid
will be booted.

We cannot assume that all APs will be brought up in the init stage.

> They don't need to if acpi_mp_wake_mailbox can be setup before smp_init()
> once for all.
> 
> But currently the setup of acpi_mp_wake_mailbox is done when the first AP is
> brought up because memremap() doesn't work in acpi_parse_mp_wake(), as
> mentioned in the changelog of this patch.
> 
> I also feel it's not ideal to setup acpi_mp_wake_mailbox when bringing up
> the first AP, so I provided my diff.  IIUC, if memremap() works for
> acpi_mp_wake_mailbox when bringing up the first AP, then it should also work
> in
> the early_initcall().

Besides the factor that whether memremap() is functional at the point in
the boot process, another reason I can think of is, if the intention is
just to work with BSP, then the remapping is a redundant step.
Especially in the kexec & kdump case, the capture kernel only needs
single CPU to work usually with the "maxcpus=1" option.

IMHO, the solution that postpone the remapping while really needs to
bring up APs is reasonable, just don't make acpi_mp_wake_mailbox
read-only.  The APs might be brought up later, might be never.


> 
>> Which patch is this fixing?
> It fiexes below commit AFAICT:
> 
>   24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as
> __ro_after_init")
> 
> Which didn't consider 'maxvcpus=xx' case.
> 

Thanks a lot for checking this, Kai.

> 
> But I will leave to Kirill to confirm.

Best Regards,
Zhiquan
Borislav Petkov July 4, 2024, 11:31 a.m. UTC | #5
On July 3, 2024 4:39:43 AM GMT+02:00, Zhiquan Li <zhiquan1.li@intel.com> wrote:
>
>On 2024/7/3 07:55, Huang, Kai wrote:
>>> This happens to us maintainers at least once a week. Well, I don't want
>>> that to happen in my tree anymore.
>>>
>>> So none of this text above still doesn't explain to me *why* this is
>>> happening.
>>>
>>> Why do APs need to update acpi_mp_wake_mailbox?
>
>Not AP needs to update acpi_mp_wake_mailbox, but BSP might need to
>update it after the init stage.  In the encrypted guest CPU hot-plug
>scenario, BSP memremap() the acpi_mp_wake_mailbox_paddr, and writes APIC
>ID of APs, wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command
>into mailbox.  Firmware will listen on mailbox command address, and once
>it receives the wakeup command, the CPU associated with the given apicid
>will be booted.
>
>We cannot assume that all APs will be brought up in the init stage.
>
>> They don't need to if acpi_mp_wake_mailbox can be setup before smp_init()
>> once for all.
>> 
>> But currently the setup of acpi_mp_wake_mailbox is done when the first AP is
>> brought up because memremap() doesn't work in acpi_parse_mp_wake(), as
>> mentioned in the changelog of this patch.
>> 
>> I also feel it's not ideal to setup acpi_mp_wake_mailbox when bringing up
>> the first AP, so I provided my diff.  IIUC, if memremap() works for
>> acpi_mp_wake_mailbox when bringing up the first AP, then it should also work
>> in
>> the early_initcall().
>
>Besides the factor that whether memremap() is functional at the point in
>the boot process, another reason I can think of is, if the intention is
>just to work with BSP, then the remapping is a redundant step.
>Especially in the kexec & kdump case, the capture kernel only needs
>single CPU to work usually with the "maxcpus=1" option.
>
>IMHO, the solution that postpone the remapping while really needs to
>bring up APs is reasonable, just don't make acpi_mp_wake_mailbox
>read-only.  The APs might be brought up later, might be never.
>
>
>> 
>>> Which patch is this fixing?
>> It fiexes below commit AFAICT:
>> 
>>   24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as
>> __ro_after_init")
>> 
>> Which didn't consider 'maxvcpus=xx' case.
>> 
>
>Thanks a lot for checking this, Kai.
>
>> 
>> But I will leave to Kirill to confirm.
>
>Best Regards,
>Zhiquan

Then please extend the commit message with the "why", add the Fixes tag and resend.

Thx.
diff mbox series

Patch

diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index 6cfe762be28b..d5ef6215583b 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -19,7 +19,7 @@ 
 static u64 acpi_mp_wake_mailbox_paddr __ro_after_init;
 
 /* Virtual address of the Multiprocessor Wakeup Structure mailbox */
-static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox __ro_after_init;
+static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
 
 static u64 acpi_mp_pgd __ro_after_init;
 static u64 acpi_mp_reset_vector_paddr __ro_after_init;