diff mbox series

[Bug,217247] BUG: kernel NULL pointer dereference, address: 000000000000000c / speculation_ctrl_update

Message ID bug-217247-28872-KWS7YimwyM@https.bugzilla.kernel.org/ (mailing list archive)
State New, archived
Headers show
Series [Bug,217247] BUG: kernel NULL pointer dereference, address: 000000000000000c / speculation_ctrl_update | expand

Commit Message

bugzilla-daemon@kernel.org March 27, 2023, 3:32 p.m. UTC
https://bugzilla.kernel.org/show_bug.cgi?id=217247

--- Comment #2 from Sean Christopherson (seanjc@google.com) ---
+tglx

On Sat, Mar 25, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217247
> 
>             Bug ID: 217247
>            Summary: BUG: kernel NULL pointer dereference, address:
>                     000000000000000c / speculation_ctrl_update
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 6.1.20
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: hvtaifwkbgefbaei@gmail.com
>         Regression: No
> 
> Created attachment 304023
>   --> https://bugzilla.kernel.org/attachment.cgi?id=304023&action=edit
> kernel config
> 
> This is 6.1.20 with only ZFS 2.1.9 module added.
> I booted kernel with acpi=off because this old Ryzen 1600X system is getting
> unreliable (so only one CPU is online with acpi=off, and it has been reliable
> before this splat).
> 
> 2023-03-25T13:28:40,794781+02:00 BUG: kernel NULL pointer dereference,
> address:
> 000000000000000c
> 2023-03-25T13:28:40,794786+02:00 #PF: supervisor read access in kernel mode
> 2023-03-25T13:28:40,794788+02:00 #PF: error_code(0x0000) - not-present page
> 2023-03-25T13:28:40,794790+02:00 PGD 0 P4D 0 
> 2023-03-25T13:28:40,794793+02:00 Oops: 0000 [#1] PREEMPT SMP NOPTI
> 2023-03-25T13:28:40,794795+02:00 CPU: 0 PID: 917598 Comm: qemu-kvm Tainted: P 
>      W  O       6.1.20+ #12
> 2023-03-25T13:28:40,794798+02:00 Hardware name: To Be Filled By O.E.M. To Be
> Filled By O.E.M./X370 Taichi, BIOS P6.20 01/03/2020
> 2023-03-25T13:28:40,794800+02:00 RIP: 0010:do_raw_spin_lock+0x6/0xb0

This looks like amd_set_core_ssb_state() explodes when it tries to acquire
ssb_state.shared_state.lock.

Aha!  With acpi=off, I assume __apic_intr_mode_select() will return
APIC_VIRTUAL_WIRE_NO_CONFIG:

        /* Check MP table or ACPI MADT configuration */
        if (!smp_found_config) {
                disable_ioapic_support();
                if (!acpi_lapic) {
                        pr_info("APIC: ACPI MADT or MP tables are not
detected\n");
                        return APIC_VIRTUAL_WIRE_NO_CONFIG;
                }
                return APIC_VIRTUAL_WIRE;
        }

Which will cause native_smp_prepare_cpus() to bail early and not run through
speculative_store_bypass_ht_init(), leaving a NULL ssb_state.shared_state:

        switch (apic_intr_mode) {
        case APIC_PIC:
        case APIC_VIRTUAL_WIRE_NO_CONFIG:
                disable_smp();
                return;
        case APIC_SYMMETRIC_IO_NO_ROUTING:
                disable_smp();
                /* Setup local timer */
                x86_init.timers.setup_percpu_clockev();
                return;
        case APIC_VIRTUAL_WIRE:
        case APIC_SYMMETRIC_IO:
                break;
        }

I believe this will remedy your problem.  I don't see anything that will
obviously
break in native_smp_prepare_cpus() by continuing on with a "bad" APIC. 
Hopefully
Thomas can weigh in on whether or not it's a sane change.

---
 arch/x86/kernel/smpboot.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)



base-commit: b0d237087c674c43df76c1a0bc2737592f3038f4
diff mbox series

Patch

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9013bb28255a..ff69f8e3c392 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1409,22 +1409,17 @@  void __init native_smp_prepare_cpus(unsigned int
max_cpus)
        case APIC_PIC:
        case APIC_VIRTUAL_WIRE_NO_CONFIG:
                disable_smp();
-               return;
+               break;
        case APIC_SYMMETRIC_IO_NO_ROUTING:
                disable_smp();
-               /* Setup local timer */
-               x86_init.timers.setup_percpu_clockev();
-               return;
+               fallthrough;
        case APIC_VIRTUAL_WIRE:
        case APIC_SYMMETRIC_IO:
+               x86_init.timers.setup_percpu_clockev();
+               smp_get_logical_apicid();
                break;
        }

-       /* Setup local timer */
-       x86_init.timers.setup_percpu_clockev();
-
-       smp_get_logical_apicid();
-
        pr_info("CPU0: ");
        print_cpu_info(&cpu_data(0));