diff mbox

[regression] PCI early boot hang on certain AMD systems

Message ID 219224e6-71f5-3209-09d5-9863a0b6fd4a@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Christian König Dec. 6, 2017, 5:58 p.m. UTC
Hi Ingo,

known issue with multi socket systems and the patch in question.

The attached set of patches should fix the issue and are already send to 
Bjorn for inclusion in the next rc.

Sorry for the noise,
Christian.

Am 06.12.2017 um 17:16 schrieb Ingo Molnar:
> Hi,
>
> * Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> PCI changes:
>> Christian König (4):
>>        x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)
> In v4.15 one of my test systems broke, it hangs in early bootup, during early PCI
> setup:
>
> [    2.262005] pci 0000:00:18.1: adding root bus resource [mem 0x1027000000-0xfcffffffff 64bit pref window] <--- new resource
> [    2.270081] pci 0000:00:18.2: [1022:1602] type 00 class 0x060000
> [    2.271081] pci 0000:00:18.3: [1022:1603] type 00 class 0x060000
> [    2.272083] pci 0000:00:18.4: [1022:1604] type 00 class 0x060000
> [    2.273079] pci 0000:00:18.5: [1022:1605] type 00 class 0x060000
> [    2.274083] pci 0000:00:19.0: [1022:1600] type 00 class 0x060000
> [    2.275089] pci 0000:00:19.1: [1022:1601] type 00 class 0x060000
> [  hard hang ]
>
> I have bisected the hang to:
>
>    fa564ad96366: x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)
>
> Reverting the commit makes the system boot again. The 'new resource' line above is
> I believe the new BAR added by the commit.
>
> I've attached the earlyprintk boot log of the hang, with a few printks added to
> pci_amd_enable_64bit_bar() of the relevant fields:
>
> +       printk("res->start: %016llx\n", res->start);
> +       printk("res->end:   %016llx\n", res->end);
> +       printk("base:       %08x\n", base);
> +       printk("high:       %08x\n", high);
> +       printk("limit:      %08x\n", limit);
> +       printk("slot:       %d\n", i);
>
> [    2.261090] pci 0000:00:18.1: [1022:1601] type 00 class 0x060000
> [    2.262005] pci 0000:00:18.1: adding root bus resource [mem 0x1027000000-0xfcffffffff 64bit pref window]
> [    2.264001] res->start: 0000001027000000
> [    2.265001] res->end:   000000fcffffffff
> [    2.266001] base:       10270003
> [    2.267001] high:       00000000
> [    2.268001] limit:      fd000000
> [    2.269001] slot:       1
> [    2.270081] pci 0000:00:18.2: [1022:1602] type 00 class 0x060000
> [    2.271081] pci 0000:00:18.3: [1022:1603] type 00 class 0x060000
> [    2.272083] pci 0000:00:18.4: [1022:1604] type 00 class 0x060000
> [    2.273079] pci 0000:00:18.5: [1022:1605] type 00 class 0x060000
> [    2.274083] pci 0000:00:19.0: [1022:1600] type 00 class 0x060000
> [    2.275089] pci 0000:00:19.1: [1022:1601] type 00 class 0x060000
>
> On a sucessful bootup the system would continue with:
>
> [    0.583060] pci 0000:00:19.2: [1022:1602] type 00 class 0x060000
> [    0.584079] pci 0000:00:19.3: [1022:1603] type 00 class 0x060000
> [    0.585084] pci 0000:00:19.4: [1022:1604] type 00 class 0x060000
> [    0.586079] pci 0000:00:19.5: [1022:1605] type 00 class 0x060000
> [    0.588039] pci 0000:00:1a.0: [1022:1600] type 00 class 0x060000
> [    0.589090] pci 0000:00:1a.1: [1022:1601] type 00 class 0x060000
> [    0.590079] pci 0000:00:1a.2: [1022:1602] type 00 class 0x060000
> [    0.591080] pci 0000:00:1a.3: [1022:1603] type 00 class 0x060000
> [    0.593006] pci 0000:00:1a.4: [1022:1604] type 00 class 0x060000
> [    0.594079] pci 0000:00:1a.5: [1022:1605] type 00 class 0x060000
> [    0.595082] pci 0000:00:1b.0: [1022:1600] type 00 class 0x060000
> [    0.596087] pci 0000:00:1b.1: [1022:1601] type 00 class 0x060000
> [    0.597083] pci 0000:00:1b.2: [1022:1602] type 00 class 0x060000
> [    0.598080] pci 0000:00:1b.3: [1022:1603] type 00 class 0x060000
> [    0.599085] pci 0000:00:1b.4: [1022:1604] type 00 class 0x060000
> [    0.600079] pci 0000:00:1b.5: [1022:1605] type 00 class 0x060000
> [    0.601124] pci 0000:03:00.0: [1000:0072] type 00 class 0x010700
> [    0.602037] pci 0000:03:00.0: reg 0x10: [io  0xe000-0xe0ff]
> [    0.603010] pci 0000:03:00.0: reg 0x14: [mem 0xdff3c000-0xdff3ffff 64bit]
> [    0.604009] pci 0000:03:00.0: reg 0x1c: [mem 0xdff40000-0xdff7ffff 64bit]
> [    0.605011] pci 0000:03:00.0: reg 0x30: [mem 0xdff80000-0xdfffffff pref]
> ...
>
> cpuinfo:
>
>   processor       : 31
>   vendor_id       : AuthenticAMD
>   cpu family      : 21
>   model           : 1
>   model name      : AMD Opteron(tm) Processor 6278
>   stepping        : 2
>   microcode       : 0x6000626
>   cpu MHz         : 1427.124
>   cache size      : 2048 KB
>   physical id     : 1
>   siblings        : 16
>   core id         : 7
>   cpu cores       : 8
>
> board:
>
>          Manufacturer: Supermicro
>          Product Name: H8DG6/H8DGi
>
> BIOS:
>
>          Vendor: American Megatrends Inc.
>          Version: 2.0b
>          Release Date: 03/01/2012
>
> I've attached the lspci -v output and a successful full bootlog as well, with
> various debugging options enabled. Let me know if you need any other info.
>
> Thanks,
>
> 	Ingo
diff mbox

Patch

From e5d5c9682aa02a6b9c0c6bd446d433b924441679 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Tue, 28 Nov 2017 10:02:35 +0100
Subject: [PATCH 3/3] x86/PCI: limit the size of the 64bit BAR to 256GB
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This avoids problems with Xen which hides some memory resources from the
OS and potentially also allows memory hotplug while this fixup is
enabled.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 arch/x86/pci/fixup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index c817ab85dc82..149adbc7f2a3 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -701,7 +701,7 @@  static void pci_amd_enable_64bit_bar(struct pci_dev *dev)
 	res->name = "PCI Bus 0000:00";
 	res->flags = IORESOURCE_PREFETCH | IORESOURCE_MEM |
 		IORESOURCE_MEM_64 | IORESOURCE_WINDOW;
-	res->start = 0x100000000ull;
+	res->start = 0xbd00000000ull;
 	res->end = 0xfd00000000ull - 1;
 
 	/* Just grab the free area behind system memory for this */
-- 
2.11.0