mbox series

[v2,00/16] AMD NB and SMN rework

Message ID 20241206161210.163701-1-yazen.ghannam@amd.com (mailing list archive)
Headers show
Series AMD NB and SMN rework | expand

Message

Yazen Ghannam Dec. 6, 2024, 4:11 p.m. UTC
Hi all,

The theme of this set is decoupling the "AMD node" concept from the
legacy northbridge support.

Additionally, AMD System Management Network (SMN) access code is
decoupled and expanded too.

Patches 1-3 begin reducing the scope of AMD_NB.

Patches 4-9 begin moving generic AMD node support out of AMD_NB.

Patches 10-13 move SMN support out of AMD_NB and do some refactoring.

Patch 14 has HSMP reuse SMN functionality.

Patches 15-16 address userspace access to SMN.

I say "begin" above because there is more to do here. Ultimately, AMD_NB
should only be needed for code used on legacy systems with northbridges.
Also, any and all SMN users in the kernel need to be updated to use the
central SMN code. Local solutions should be avoided.

Thanks,
Yazen

Link:
https://lore.kernel.org/r/20241023172150.659002-1-yazen.ghannam@amd.com

Major changes
v1->v2:
* Rebase HSMP changes on latest upstream rework.
* Keep Node and SMN code together.


Mario Limonciello (4):
  x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
  x86/amd_nb: Move SMN access code to a new amd_node driver
  x86/amd_node: Add SMN offsets to exclusive region access
  x86/amd_node: Add support for debugfs access to SMN registers

Yazen Ghannam (12):
  x86/mce/amd: Remove shared threshold bank plumbing
  x86/amd_nb: Restrict init function to AMD-based systems
  x86/amd_nb: Clean up early_is_amd_nb()
  x86: Start moving AMD Node functionality out of AMD_NB
  x86/amd_nb: Simplify function 4 search
  x86/amd_nb: Simplify root device search
  x86/amd_nb: Use topology info to get AMD node count
  x86/amd_nb: Simplify function 3 search
  x86/amd_node: Update __amd_smn_rw() error paths
  x86/amd_node: Remove dependency on AMD_NB
  x86/amd_node: Use defines for SMN register offsets
  x86/amd_node, platform/x86/amd/hsmp: Have HSMP use SMN through
    AMD_NODE

 MAINTAINERS                           |   8 +
 arch/x86/Kconfig                      |   6 +-
 arch/x86/include/asm/amd_nb.h         |  53 +---
 arch/x86/include/asm/amd_node.h       |  39 +++
 arch/x86/kernel/Makefile              |   1 +
 arch/x86/kernel/amd_nb.c              | 294 +--------------------
 arch/x86/kernel/amd_node.c            | 364 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mce/amd.c         | 127 ++-------
 arch/x86/pci/fixup.c                  |   4 +-
 drivers/edac/Kconfig                  |   1 +
 drivers/edac/amd64_edac.c             |   1 +
 drivers/hwmon/Kconfig                 |   2 +-
 drivers/hwmon/k10temp.c               |   7 +-
 drivers/platform/x86/amd/hsmp/Kconfig |   2 +-
 drivers/platform/x86/amd/hsmp/acpi.c  |   7 +-
 drivers/platform/x86/amd/hsmp/hsmp.c  |   1 -
 drivers/platform/x86/amd/hsmp/hsmp.h  |   3 -
 drivers/platform/x86/amd/hsmp/plat.c  |  30 +--
 drivers/platform/x86/amd/pmc/Kconfig  |   2 +-
 drivers/platform/x86/amd/pmc/pmc.c    |   3 +-
 drivers/platform/x86/amd/pmf/Kconfig  |   2 +-
 drivers/platform/x86/amd/pmf/core.c   |   2 +-
 drivers/ras/amd/atl/Kconfig           |   1 +
 drivers/ras/amd/atl/internal.h        |   1 +
 24 files changed, 485 insertions(+), 476 deletions(-)
 create mode 100644 arch/x86/include/asm/amd_node.h
 create mode 100644 arch/x86/kernel/amd_node.c


base-commit: ae61116b291c9358e8de38bd3505e83b85be2d0d
prerequisite-patch-id: 0000000000000000000000000000000000000000

Comments

Borislav Petkov Jan. 3, 2025, 9:49 p.m. UTC | #1
On Fri, Dec 06, 2024 at 04:11:53PM +0000, Yazen Ghannam wrote:
> Hi all,
> 
> The theme of this set is decoupling the "AMD node" concept from the
> legacy northbridge support.
> 
> Additionally, AMD System Management Network (SMN) access code is
> decoupled and expanded too.
> 
> Patches 1-3 begin reducing the scope of AMD_NB.
> 
> Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> 
> Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> 
> Patch 14 has HSMP reuse SMN functionality.
> 
> Patches 15-16 address userspace access to SMN.

So I took the first patch and then booting the first 13 with the intention to
queue them while the remaining three are still being discussed, is causing the
below in my guest.

.config is attached, I've pushed the branch here too, if you wanna test with
it:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-misc

[    0.897060] cirrus 0000:00:01.0: [drm] fb0: cirrusdrmfb frame buffer device
[    0.900310] BUG: kernel NULL pointer dereference, address: 00000000000000c4
[    0.902551] #PF: supervisor read access in kernel mode
[    0.904096] #PF: error_code(0x0000) - not-present page
[    0.904268] PGD 0 P4D 0 
[    0.904268] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[    0.904268] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.13.0-rc1+ #1
[    0.904268] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-8 02/21/2024
[    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
[    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
[    0.904268] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
[    0.904268] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
[    0.904268] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
[    0.904268] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
[    0.904268] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
[    0.904268] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
[    0.904268] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
[    0.904268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.904268] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
[    0.904268] Call Trace:
[    0.904268]  <TASK>
[    0.904268]  ? __die+0x31/0x80
[    0.904268]  ? page_fault_oops+0x15d/0x4f0
[    0.904268]  ? srso_return_thunk+0x5/0x5f
[    0.904268]  ? ttwu_queue_wakelist+0xf7/0x100
[    0.904268]  ? exc_page_fault+0x78/0x150
[    0.904268]  ? asm_exc_page_fault+0x26/0x30
[    0.904268]  ? pci_read_config_dword+0x9/0x40
[    0.904268]  ? srso_return_thunk+0x5/0x5f
[    0.904268]  amd_init_l3_cache.part.0+0x6a/0x110
[    0.904268]  cpuid4_cache_lookup_regs+0xcf/0x2a0
[    0.904268]  populate_cache_leaves+0x6f/0x530
[    0.904268]  ? srso_return_thunk+0x5/0x5f
[    0.904268]  ? dl_server_stop+0x2f/0x40
[    0.904268]  ? srso_return_thunk+0x5/0x5f
[    0.904268]  detect_cache_attributes+0x97/0x330
[    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
[    0.904268]  cacheinfo_cpu_online+0x22/0x250
[    0.904268]  ? srso_return_thunk+0x5/0x5f
[    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
[    0.904268]  cpuhp_invoke_callback+0x10f/0x480
[    0.904268]  ? try_to_wake_up+0x23b/0x540
[    0.904268]  cpuhp_thread_fun+0xd4/0x160
[    0.904268]  smpboot_thread_fn+0xdd/0x1f0
[    0.904268]  ? __pfx_smpboot_thread_fn+0x10/0x10
[    0.904268]  kthread+0xca/0xf0
[    0.904268]  ? __pfx_kthread+0x10/0x10
[    0.904268]  ret_from_fork+0x50/0x60
[    0.904268]  ? __pfx_kthread+0x10/0x10
[    0.904268]  ret_from_fork_asm+0x1a/0x30
[    0.904268]  </TASK>
[    0.904268] Modules linked in:
[    0.904268] CR2: 00000000000000c4
[    0.904268] ---[ end trace 0000000000000000 ]---
[    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
[    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
[    0.988792] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
[    0.988792] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
[    0.988792] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
[    0.988792] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
[    0.992761] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
[    0.992761] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
[    0.992761] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
[    0.996772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.996772] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
[    0.996772] note: cpuhp/0[20] exited with irqs disabled
[    1.680874] tsc: Refined TSC clocksource calibration: 3700.028 MHz
[    1.683128] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6aaae08e541, max_idle_ns: 881590514464 ns
[    1.688137] clocksource: Switched to clocksource tsc
Yazen Ghannam Jan. 6, 2025, 3:38 p.m. UTC | #2
On Fri, Jan 03, 2025 at 10:49:25PM +0100, Borislav Petkov wrote:
> On Fri, Dec 06, 2024 at 04:11:53PM +0000, Yazen Ghannam wrote:
> > Hi all,
> > 
> > The theme of this set is decoupling the "AMD node" concept from the
> > legacy northbridge support.
> > 
> > Additionally, AMD System Management Network (SMN) access code is
> > decoupled and expanded too.
> > 
> > Patches 1-3 begin reducing the scope of AMD_NB.
> > 
> > Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> > 
> > Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> > 
> > Patch 14 has HSMP reuse SMN functionality.
> > 
> > Patches 15-16 address userspace access to SMN.
> 
> So I took the first patch and then booting the first 13 with the intention to
> queue them while the remaining three are still being discussed, is causing the
> below in my guest.
> 
> .config is attached, I've pushed the branch here too, if you wanna test with
> it:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-misc
> 
> [    0.897060] cirrus 0000:00:01.0: [drm] fb0: cirrusdrmfb frame buffer device
> [    0.900310] BUG: kernel NULL pointer dereference, address: 00000000000000c4
> [    0.902551] #PF: supervisor read access in kernel mode
> [    0.904096] #PF: error_code(0x0000) - not-present page
> [    0.904268] PGD 0 P4D 0 
> [    0.904268] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [    0.904268] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.13.0-rc1+ #1
> [    0.904268] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-8 02/21/2024
> [    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
> [    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
> [    0.904268] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
> [    0.904268] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
> [    0.904268] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
> [    0.904268] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
> [    0.904268] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
> [    0.904268] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
> [    0.904268] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
> [    0.904268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.904268] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
> [    0.904268] Call Trace:
> [    0.904268]  <TASK>
> [    0.904268]  ? __die+0x31/0x80
> [    0.904268]  ? page_fault_oops+0x15d/0x4f0
> [    0.904268]  ? srso_return_thunk+0x5/0x5f
> [    0.904268]  ? ttwu_queue_wakelist+0xf7/0x100
> [    0.904268]  ? exc_page_fault+0x78/0x150
> [    0.904268]  ? asm_exc_page_fault+0x26/0x30
> [    0.904268]  ? pci_read_config_dword+0x9/0x40
> [    0.904268]  ? srso_return_thunk+0x5/0x5f
> [    0.904268]  amd_init_l3_cache.part.0+0x6a/0x110
> [    0.904268]  cpuid4_cache_lookup_regs+0xcf/0x2a0
> [    0.904268]  populate_cache_leaves+0x6f/0x530
> [    0.904268]  ? srso_return_thunk+0x5/0x5f
> [    0.904268]  ? dl_server_stop+0x2f/0x40
> [    0.904268]  ? srso_return_thunk+0x5/0x5f
> [    0.904268]  detect_cache_attributes+0x97/0x330
> [    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
> [    0.904268]  cacheinfo_cpu_online+0x22/0x250
> [    0.904268]  ? srso_return_thunk+0x5/0x5f
> [    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
> [    0.904268]  cpuhp_invoke_callback+0x10f/0x480
> [    0.904268]  ? try_to_wake_up+0x23b/0x540
> [    0.904268]  cpuhp_thread_fun+0xd4/0x160
> [    0.904268]  smpboot_thread_fn+0xdd/0x1f0
> [    0.904268]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [    0.904268]  kthread+0xca/0xf0
> [    0.904268]  ? __pfx_kthread+0x10/0x10
> [    0.904268]  ret_from_fork+0x50/0x60
> [    0.904268]  ? __pfx_kthread+0x10/0x10
> [    0.904268]  ret_from_fork_asm+0x1a/0x30
> [    0.904268]  </TASK>
> [    0.904268] Modules linked in:
> [    0.904268] CR2: 00000000000000c4
> [    0.904268] ---[ end trace 0000000000000000 ]---
> [    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
> [    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
> [    0.988792] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
> [    0.988792] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
> [    0.988792] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
> [    0.988792] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
> [    0.992761] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
> [    0.992761] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
> [    0.992761] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
> [    0.996772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.996772] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
> [    0.996772] note: cpuhp/0[20] exited with irqs disabled
> [    1.680874] tsc: Refined TSC clocksource calibration: 3700.028 MHz
> [    1.683128] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6aaae08e541, max_idle_ns: 881590514464 ns
> [    1.688137] clocksource: Switched to clocksource tsc
> 
> 

Can you please share the guest parameters?

Thanks,
Yazen
Yazen Ghannam Jan. 6, 2025, 4:31 p.m. UTC | #3
On Mon, Jan 06, 2025 at 10:38:45AM -0500, Yazen Ghannam wrote:
> On Fri, Jan 03, 2025 at 10:49:25PM +0100, Borislav Petkov wrote:
> > On Fri, Dec 06, 2024 at 04:11:53PM +0000, Yazen Ghannam wrote:
> > > Hi all,
> > > 
> > > The theme of this set is decoupling the "AMD node" concept from the
> > > legacy northbridge support.
> > > 
> > > Additionally, AMD System Management Network (SMN) access code is
> > > decoupled and expanded too.
> > > 
> > > Patches 1-3 begin reducing the scope of AMD_NB.
> > > 
> > > Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> > > 
> > > Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> > > 
> > > Patch 14 has HSMP reuse SMN functionality.
> > > 
> > > Patches 15-16 address userspace access to SMN.
> > 
> > So I took the first patch and then booting the first 13 with the intention to
> > queue them while the remaining three are still being discussed, is causing the
> > below in my guest.
> > 
> > .config is attached, I've pushed the branch here too, if you wanna test with
> > it:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-misc
> > 
> > [    0.897060] cirrus 0000:00:01.0: [drm] fb0: cirrusdrmfb frame buffer device
> > [    0.900310] BUG: kernel NULL pointer dereference, address: 00000000000000c4
> > [    0.902551] #PF: supervisor read access in kernel mode
> > [    0.904096] #PF: error_code(0x0000) - not-present page
> > [    0.904268] PGD 0 P4D 0 
> > [    0.904268] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [    0.904268] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.13.0-rc1+ #1
> > [    0.904268] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-8 02/21/2024
> > [    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
> > [    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
> > [    0.904268] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
> > [    0.904268] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
> > [    0.904268] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
> > [    0.904268] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
> > [    0.904268] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
> > [    0.904268] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
> > [    0.904268] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
> > [    0.904268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.904268] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
> > [    0.904268] Call Trace:
> > [    0.904268]  <TASK>
> > [    0.904268]  ? __die+0x31/0x80
> > [    0.904268]  ? page_fault_oops+0x15d/0x4f0
> > [    0.904268]  ? srso_return_thunk+0x5/0x5f
> > [    0.904268]  ? ttwu_queue_wakelist+0xf7/0x100
> > [    0.904268]  ? exc_page_fault+0x78/0x150
> > [    0.904268]  ? asm_exc_page_fault+0x26/0x30
> > [    0.904268]  ? pci_read_config_dword+0x9/0x40
> > [    0.904268]  ? srso_return_thunk+0x5/0x5f
> > [    0.904268]  amd_init_l3_cache.part.0+0x6a/0x110
> > [    0.904268]  cpuid4_cache_lookup_regs+0xcf/0x2a0
> > [    0.904268]  populate_cache_leaves+0x6f/0x530
> > [    0.904268]  ? srso_return_thunk+0x5/0x5f
> > [    0.904268]  ? dl_server_stop+0x2f/0x40
> > [    0.904268]  ? srso_return_thunk+0x5/0x5f
> > [    0.904268]  detect_cache_attributes+0x97/0x330
> > [    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
> > [    0.904268]  cacheinfo_cpu_online+0x22/0x250
> > [    0.904268]  ? srso_return_thunk+0x5/0x5f
> > [    0.904268]  ? __pfx_cacheinfo_cpu_online+0x10/0x10
> > [    0.904268]  cpuhp_invoke_callback+0x10f/0x480
> > [    0.904268]  ? try_to_wake_up+0x23b/0x540
> > [    0.904268]  cpuhp_thread_fun+0xd4/0x160
> > [    0.904268]  smpboot_thread_fn+0xdd/0x1f0
> > [    0.904268]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > [    0.904268]  kthread+0xca/0xf0
> > [    0.904268]  ? __pfx_kthread+0x10/0x10
> > [    0.904268]  ret_from_fork+0x50/0x60
> > [    0.904268]  ? __pfx_kthread+0x10/0x10
> > [    0.904268]  ret_from_fork_asm+0x1a/0x30
> > [    0.904268]  </TASK>
> > [    0.904268] Modules linked in:
> > [    0.904268] CR2: 00000000000000c4
> > [    0.904268] ---[ end trace 0000000000000000 ]---
> > [    0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40
> > [    0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89
> > [    0.988792] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246
> > [    0.988792] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f
> > [    0.988792] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000
> > [    0.988792] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010
> > [    0.992761] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163
> > [    0.992761] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90
> > [    0.992761] FS:  0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000
> > [    0.996772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.996772] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0
> > [    0.996772] note: cpuhp/0[20] exited with irqs disabled
> > [    1.680874] tsc: Refined TSC clocksource calibration: 3700.028 MHz
> > [    1.683128] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6aaae08e541, max_idle_ns: 881590514464 ns
> > [    1.688137] clocksource: Switched to clocksource tsc
> > 
> > 
> 
> Can you please share the guest parameters?
> 

I was able to reproduce it. The patch below seems to fix the issue.

There's a comment in the function that this code is not for virtualized
environments. Also, the "L3 in Northbridge" design doesn't apply to Zen
systems.

I'll keep looking at this to get a better understanding. My first
thought is that this was silently handled before, because the AMD_NB
code operated on PCI IDs. And these wouldn't be exposed to guests, so
the northbridge data structures wouldn't be initialized.

Specifically, I think we now have a non-zero number of northbridges,
since using the topology info rather than counting PCI devices.

In any case, I think it's better to have explicit checks.

Thanks,
Yazen

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 392d09c936d6..93d993a6a1df 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -595,6 +595,12 @@ static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index)
 	if (index < 3)
 		return;
 
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return;
+
+	if (cpu_feature_enabled(X86_FEATURE_ZEN))
+		return;
+
 	node = topology_amd_node_id(smp_processor_id());
 	this_leaf->nb = node_to_amd_nb(node);
 	if (this_leaf->nb && !this_leaf->nb->l3_cache.indices)