Message ID | 20230110180243.1590045-3-helgaas@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 674279b8575ec24f0c39498029684480129bb3e9 |
Headers | show |
Series | PCI: Fix extended config space regression | expand |
On Tuesday, January 10, 2023 7:02:43 PM CET Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > Normally we reject ECAM space unless it is reported as reserved in the E820 > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > normally converted to an E820 entry by a bootloader or EFI stub. > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > E820 entries that correspond to EfiMemoryMappedIO regions because some > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > Allow use of ECAM for extended config space when the region is covered by > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > _CRS. > > Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu. > > Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") > Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com > Reported-by: Kan Liang <kan.liang@linux.intel.com> > Reported-by: Tony Luck <tony.luck@intel.com> > Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> > --- > arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c > index cd16bef5f2d9..da4b6e8e9df0 100644 > --- a/arch/x86/pci/mmconfig-shared.c > +++ b/arch/x86/pci/mmconfig-shared.c > @@ -12,6 +12,7 @@ > */ > > #include <linux/acpi.h> > +#include <linux/efi.h> > #include <linux/pci.h> > #include <linux/init.h> > #include <linux/bitmap.h> > @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) > return mcfg_res.flags; > } > > +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) > +{ > +#ifdef CONFIG_EFI > + efi_memory_desc_t *md; > + u64 size, mmio_start, mmio_end; > + > + for_each_efi_memory_desc(md) { > + if (md->type == EFI_MEMORY_MAPPED_IO) { > + size = md->num_pages << EFI_PAGE_SHIFT; > + mmio_start = md->phys_addr; > + mmio_end = mmio_start + size; > + > + /* > + * N.B. Caller supplies (start, start + size), > + * so to match, mmio_end is the first address > + * *past* the EFI_MEMORY_MAPPED_IO area. > + */ > + if (mmio_start <= start && end <= mmio_end) > + return true; > + } > + } > +#endif > + > + return false; > +} > + > typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type); > > static bool __ref is_mmconf_reserved(check_reserved_t is_reserved, > @@ -513,6 +540,10 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e > "MMCONFIG at %pR not reserved in " > "ACPI motherboard resources\n", > &cfg->res); > + > + if (is_mmconf_reserved(is_efi_mmio, cfg, dev, > + "EfiMemoryMappedIO")) > + return true; > } > > /* >
Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > Normally we reject ECAM space unless it is reported as reserved in the E820 > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > normally converted to an E820 entry by a bootloader or EFI stub. > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > E820 entries that correspond to EfiMemoryMappedIO regions because some > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > Allow use of ECAM for extended config space when the region is covered by > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > _CRS. > > Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu. > > Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") > Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com > Reported-by: Kan Liang <kan.liang@linux.intel.com> > Reported-by: Tony Luck <tony.luck@intel.com> > Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- > arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c > index cd16bef5f2d9..da4b6e8e9df0 100644 > --- a/arch/x86/pci/mmconfig-shared.c > +++ b/arch/x86/pci/mmconfig-shared.c > @@ -12,6 +12,7 @@ > */ > > #include <linux/acpi.h> > +#include <linux/efi.h> > #include <linux/pci.h> > #include <linux/init.h> > #include <linux/bitmap.h> > @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) > return mcfg_res.flags; > } > > +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) > +{ > +#ifdef CONFIG_EFI > + efi_memory_desc_t *md; > + u64 size, mmio_start, mmio_end; > + > + for_each_efi_memory_desc(md) { > + if (md->type == EFI_MEMORY_MAPPED_IO) { > + size = md->num_pages << EFI_PAGE_SHIFT; > + mmio_start = md->phys_addr; > + mmio_end = mmio_start + size; > + > + /* > + * N.B. Caller supplies (start, start + size), > + * so to match, mmio_end is the first address > + * *past* the EFI_MEMORY_MAPPED_IO area. > + */ > + if (mmio_start <= start && end <= mmio_end) > + return true; > + } > + } > +#endif Perhaps the following trick (compile tested), but either way: Reviewed-by: Dan Williams <dan.j.williams@intel.com> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index da4b6e8e9df0..ae95d1b073c6 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -445,7 +445,6 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) { -#ifdef CONFIG_EFI efi_memory_desc_t *md; u64 size, mmio_start, mmio_end; @@ -464,7 +463,6 @@ static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) return true; } } -#endif return false; } diff --git a/include/linux/efi.h b/include/linux/efi.h index 4b27519143f5..3ab0c255b791 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -790,8 +790,12 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, * * Once the loop finishes @md must not be accessed. */ +#ifdef CONFIG_EFI #define for_each_efi_memory_desc(md) \ for_each_efi_memory_desc_in_map(&efi.memmap, md) +#else +#define for_each_efi_memory_desc(md) for (; 0;) +#endif /* * Format an EFI memory descriptor's type and attributes to a user-provided
On Tue, Jan 10, 2023 at 10:29:06AM -0800, Dan Williams wrote: > Bjorn Helgaas wrote: > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > Normally we reject ECAM space unless it is reported as reserved in the E820 > > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > > normally converted to an E820 entry by a bootloader or EFI stub. > > > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > > E820 entries that correspond to EfiMemoryMappedIO regions because some > > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > > > Allow use of ECAM for extended config space when the region is covered by > > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > > _CRS. > > > > Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu. > > > > Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") > > Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com > > Reported-by: Kan Liang <kan.liang@linux.intel.com> > > Reported-by: Tony Luck <tony.luck@intel.com> > > Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > --- > > arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++ > > 1 file changed, 31 insertions(+) > > > > diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c > > index cd16bef5f2d9..da4b6e8e9df0 100644 > > --- a/arch/x86/pci/mmconfig-shared.c > > +++ b/arch/x86/pci/mmconfig-shared.c > > @@ -12,6 +12,7 @@ > > */ > > > > #include <linux/acpi.h> > > +#include <linux/efi.h> > > #include <linux/pci.h> > > #include <linux/init.h> > > #include <linux/bitmap.h> > > @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) > > return mcfg_res.flags; > > } > > > > +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) > > +{ > > +#ifdef CONFIG_EFI > > + efi_memory_desc_t *md; > > + u64 size, mmio_start, mmio_end; > > + > > + for_each_efi_memory_desc(md) { > > + if (md->type == EFI_MEMORY_MAPPED_IO) { > > + size = md->num_pages << EFI_PAGE_SHIFT; > > + mmio_start = md->phys_addr; > > + mmio_end = mmio_start + size; > > + > > + /* > > + * N.B. Caller supplies (start, start + size), > > + * so to match, mmio_end is the first address > > + * *past* the EFI_MEMORY_MAPPED_IO area. > > + */ > > + if (mmio_start <= start && end <= mmio_end) > > + return true; > > + } > > + } > > +#endif > > Perhaps the following trick (compile tested), but either way: > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> That's a great trick, and I wish I'd thought of it. I have some follow-on patches I'm considering for v6.3, so in the interest of streamlining the path of this one to v6.2-rc4, I think I'll wait on this until v6.3. > diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c > index da4b6e8e9df0..ae95d1b073c6 100644 > --- a/arch/x86/pci/mmconfig-shared.c > +++ b/arch/x86/pci/mmconfig-shared.c > @@ -445,7 +445,6 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) > > static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) > { > -#ifdef CONFIG_EFI > efi_memory_desc_t *md; > u64 size, mmio_start, mmio_end; > > @@ -464,7 +463,6 @@ static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) > return true; > } > } > -#endif > > return false; > } > diff --git a/include/linux/efi.h b/include/linux/efi.h > index 4b27519143f5..3ab0c255b791 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -790,8 +790,12 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, > * > * Once the loop finishes @md must not be accessed. > */ > +#ifdef CONFIG_EFI > #define for_each_efi_memory_desc(md) \ > for_each_efi_memory_desc_in_map(&efi.memmap, md) > +#else > +#define for_each_efi_memory_desc(md) for (; 0;) > +#endif > > /* > * Format an EFI memory descriptor's type and attributes to a user-provided
Hello, On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > Normally we reject ECAM space unless it is reported as reserved in the E820 > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > normally converted to an E820 entry by a bootloader or EFI stub. > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > E820 entries that correspond to EfiMemoryMappedIO regions because some > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > Allow use of ECAM for extended config space when the region is covered by > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > _CRS. I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map [...] [mem 0x7f800000-0xfed1bfff] available for PCI devices [...] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO [...] ixgbe 0000:02:00.0: enabling device (0140 -> 0142) ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 ixgbe: probe of 0000:02:00.0 failed with error -16 After disabling the code causing this (using always-false condition: if (size >= 256*1024 && 0) { ) in the chunk: https://lore.kernel.org/lkml/20221208190341.1560157-2-helgaas@kernel.org/ the BAR starts at 0x90000000 (not 0x80000000): efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map [...] [mem 0x90000000-0xfed1bfff] available for PCI devices [...] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry and everything seems to work again. I've got full system bootup logs from the upstream and worked around, but I'm not sure if this is OK to attach them (the CC list is long). Also, this is my test machine so I can run some experiments. best regards,
On Thu, Oct 12, 2023 at 17:33:47 +0200, Tomasz Pala wrote: > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. > > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map > [...] > [mem 0x7f800000-0xfed1bfff] available for PCI devices > [...] > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO > [...] > ixgbe 0000:02:00.0: enabling device (0140 -> 0142) > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 > ixgbe: probe of 0000:02:00.0 failed with error -16 FWIW, as I got no response - there were other people facing the issue as well: https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/ Apparently this might be some hardware quirk, therefore I'm not sure if the internal EfiMemoryMappedIO reservation logic should be reviewed, or some quirk handling to be added, or maybe some CONFIG_option introduced. Anyone please?
On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote: > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > > Normally we reject ECAM space unless it is reported as reserved in the E820 > > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > > normally converted to an E820 entry by a bootloader or EFI stub. > > > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > > E820 entries that correspond to EfiMemoryMappedIO regions because some > > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > > > Allow use of ECAM for extended config space when the region is covered by > > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > > _CRS. > > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. Thanks very much for the report, and sorry for the inconvenience and my delay in looking at it. > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map > [mem 0x7f800000-0xfed1bfff] available for PCI devices > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO > ixgbe 0000:02:00.0: enabling device (0140 -> 0142) > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 > ixgbe: probe of 0000:02:00.0 failed with error -16 Something is wrong with our allocation scheme. Both the MMCONFIG region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot work. Maybe the full dmesg log will have a clue about why we didn't move ixgbe out of the way. > After disabling the code causing this (using always-false condition: > if (size >= 256*1024 && 0) { > ) in the chunk: > > https://lore.kernel.org/lkml/20221208190341.1560157-2-helgaas@kernel.org/ > > the BAR starts at 0x90000000 (not 0x80000000): > > efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map > [...] > [mem 0x90000000-0xfed1bfff] available for PCI devices > [...] > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry > > and everything seems to work again. > > > I've got full system bootup logs from the upstream and worked around, > but I'm not sure if this is OK to attach them (the CC list is long). Would you mind opening a new report at https://bugzilla.kernel.org, attaching those logs, and responding here with the URL? I looked at the proxmox thread you mentioned, but sometimes people strip out parts of the log they think are irrelevant, and in this case, the stripped-out parts *are* relevant. Bjorn
On Thu, Oct 26, 2023 at 03:53:19PM -0500, Bjorn Helgaas wrote: > On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote: > > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > > > Normally we reject ECAM space unless it is reported as reserved in the E820 > > > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > > > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > > > > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > > > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > > > normally converted to an E820 entry by a bootloader or EFI stub. > > > > > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > > > E820 entries that correspond to EfiMemoryMappedIO regions because some > > > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > > > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > > > > > Allow use of ECAM for extended config space when the region is covered by > > > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > > > _CRS. > > > > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. > > Thanks very much for the report, and sorry for the inconvenience and > my delay in looking at it. > > > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map > > [mem 0x7f800000-0xfed1bfff] available for PCI devices > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO > > ixgbe 0000:02:00.0: enabling device (0140 -> 0142) > > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] > > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 > > ixgbe: probe of 0000:02:00.0 failed with error -16 > > Something is wrong with our allocation scheme. Both the MMCONFIG > region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot > work. Maybe the full dmesg log will have a clue about why we didn't > move ixgbe out of the way. > > > After disabling the code causing this (using always-false condition: > > if (size >= 256*1024 && 0) { > > ) in the chunk: > > > > https://lore.kernel.org/lkml/20221208190341.1560157-2-helgaas@kernel.org/ > > > > the BAR starts at 0x90000000 (not 0x80000000): > > > > efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map > > [...] > > [mem 0x90000000-0xfed1bfff] available for PCI devices > > [...] > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry > > > > and everything seems to work again. > > > > > > I've got full system bootup logs from the upstream and worked around, > > but I'm not sure if this is OK to attach them (the CC list is long). > > Would you mind opening a new report at https://bugzilla.kernel.org, > attaching those logs, and responding here with the URL? Thanks for the report and the logs, which are attached at https://bugzilla.kernel.org/show_bug.cgi?id=218050 I think the problem is that the MMCONFIG region is at [mem 0x80000000-0x8fffffff], and that is *also* included in one of the host bridge windows reported via _CRS: PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window] I'll try to figure out how to deal with that. In the meantime, would you mind attaching the contents of /proc/iomem to the bugzilla? I think you have to cat it as root to get the actual values included. Bjorn
On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote: > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > > > Normally we reject ECAM space unless it is reported as reserved in the E820 > > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This > > means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does > > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is > > normally converted to an E820 entry by a bootloader or EFI stub. > > > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes > > E820 entries that correspond to EfiMemoryMappedIO regions because some > > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the > > E820 entries prevent Linux from allocating BAR space for hot-added devices. > > > > Allow use of ECAM for extended config space when the region is covered by > > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 > > _CRS. > > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. > > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map > [...] > [mem 0x7f800000-0xfed1bfff] available for PCI devices > [...] > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO > [...] > ixgbe 0000:02:00.0: enabling device (0140 -> 0142) > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 > ixgbe: probe of 0000:02:00.0 failed with error -16 > > > After disabling the code causing this (using always-false condition: > if (size >= 256*1024 && 0) { > ) in the chunk: > > https://lore.kernel.org/lkml/20221208190341.1560157-2-helgaas@kernel.org/ > > the BAR starts at 0x90000000 (not 0x80000000): > > efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map > [...] > [mem 0x90000000-0xfed1bfff] available for PCI devices > [...] > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry > > and everything seems to work again. Adding to regression tracking: #regzbot ^introduced: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=218107
On Wed, Nov 08, 2023 at 11:47:21AM -0600, Bjorn Helgaas wrote: > ... > Adding to regression tracking: > > #regzbot ^introduced: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") > #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=218107 #regzbot title: PCI BAR vs MCFG/ECAM resource conflict #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=218050 #regzbot link: https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/
[+cc Sebastian] On Fri, Nov 03, 2023 at 02:18:58PM -0500, Bjorn Helgaas wrote: > On Thu, Oct 26, 2023 at 03:53:19PM -0500, Bjorn Helgaas wrote: > > On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote: > > > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > > > > Normally we reject ECAM space unless it is reported as > > > > reserved in the E820 table or via a PNP0C02 _CRS method (PCI > > > > Firmware, r3.3, sec 4.1.2). This means PCI extended config > > > > space (offsets 0x100-0xfff) may not be accessible. > > > > > > > > Some firmware doesn't report ECAM space via PNP0C02 _CRS > > > > methods, but does mention it as an EfiMemoryMappedIO region > > > > via EFI GetMemoryMap(), which is normally converted to an E820 > > > > entry by a bootloader or EFI stub. > > > > > > > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 > > > > map"), removes E820 entries that correspond to > > > > EfiMemoryMappedIO regions because some other firmware uses > > > > EfiMemoryMappedIO for PCI host bridge windows, and the E820 > > > > entries prevent Linux from allocating BAR space for hot-added > > > > devices. > > > > > > > > Allow use of ECAM for extended config space when the region is > > > > covered by an EfiMemoryMappedIO region, even if it's not > > > > included in E820 or PNP0C02 _CRS. > > > > > > I'm still having a problem initializing ixgbe NICs with pristine > > > 6.5.7 kernel. > > > > Thanks very much for the report, and sorry for the inconvenience and > > my delay in looking at it. > > > > > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map > > > [mem 0x7f800000-0xfed1bfff] available for PCI devices > > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > > > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > > > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO > > > ixgbe 0000:02:00.0: enabling device (0140 -> 0142) > > > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit] > > > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0 > > > ixgbe: probe of 0000:02:00.0 failed with error -16 > > > > Something is wrong with our allocation scheme. Both the MMCONFIG > > region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot > > work. Maybe the full dmesg log will have a clue about why we didn't > > move ixgbe out of the way. > > > > > After disabling the code causing this (using always-false condition: > > > if (size >= 256*1024 && 0) { > > > ) in the chunk: > > > > > > https://lore.kernel.org/lkml/20221208190341.1560157-2-helgaas@kernel.org/ > > > > > > the BAR starts at 0x90000000 (not 0x80000000): > > > > > > efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map > > > [...] > > > [mem 0x90000000-0xfed1bfff] available for PCI devices > > > [...] > > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > > > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry > > > > > > and everything seems to work again. > > > > > > > > > I've got full system bootup logs from the upstream and worked around, > > > but I'm not sure if this is OK to attach them (the CC list is long). > > > > Would you mind opening a new report at https://bugzilla.kernel.org, > > attaching those logs, and responding here with the URL? > > Thanks for the report and the logs, which are attached at > https://bugzilla.kernel.org/show_bug.cgi?id=218050 > > I think the problem is that the MMCONFIG region is at > [mem 0x80000000-0x8fffffff], and that is *also* included in one of the > host bridge windows reported via _CRS: > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window] > > I'll try to figure out how to deal with that. In the meantime, would > you mind attaching the contents of /proc/iomem to the bugzilla? I > think you have to cat it as root to get the actual values included. I attached a debug patch to both bugzilla entries. If you could attach the "acpidump" output and (if practical) boot a kernel with the debug patch and attach the dmesg logs, that would be great. Bjorn
On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote: >> https://bugzilla.kernel.org/show_bug.cgi?id=218050 >> >> I think the problem is that the MMCONFIG region is at >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the >> host bridge windows reported via _CRS: >> >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window] >> >> I'll try to figure out how to deal with that. In the meantime, would >> you mind attaching the contents of /proc/iomem to the bugzilla? I > > I attached a debug patch to both bugzilla entries. If you could > attach the "acpidump" output and (if practical) boot a kernel with the > debug patch and attach the dmesg logs, that would be great. I've posted the files. There are signs of buggy BIOS, but I don't expect any firmware update to be released for this hw anymore. DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019 .text .data .bss are not marked as E820_TYPE_RAM! tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff] BTW is there a reason for this logging discrepancy? efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map This is arch/x86/platform/efi/efi.c: static void __init efi_remove_e820_mmio(void) Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20 Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10
On Sat, Nov 18, 2023 at 03:21:43PM +0100, Tomasz Pala wrote: > On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote: > > >> https://bugzilla.kernel.org/show_bug.cgi?id=218050 > >> > >> I think the problem is that the MMCONFIG region is at > >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the > >> host bridge windows reported via _CRS: > >> > >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window] > >> > >> I'll try to figure out how to deal with that. In the meantime, would > >> you mind attaching the contents of /proc/iomem to the bugzilla? I > > > > I attached a debug patch to both bugzilla entries. If you could > > attach the "acpidump" output and (if practical) boot a kernel with the > > debug patch and attach the dmesg logs, that would be great. > > I've posted the files. There are signs of buggy BIOS, but I don't expect > any firmware update to be released for this hw anymore. Thank you! A BIOS update is almost never the answer because even if an update exists, we have to assume that most users in the field will never install the update. I want to look at the BIOS info in case we can learn about something *Linux* is doing wrong. This most likely works fine with Windows, so I assume Linux is doing something wrong or at least differently than Windows. > DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019 > > .text .data .bss are not marked as E820_TYPE_RAM! Added by 4eea6aa581ab ("x86, mm: if kernel .text .data .bss are not marked as E820_RAM, complain and fix"). No idea. A shame we didn't include the .text/.data values in the message. > tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED Added by 316253406959 ("x86, intel_txt: Intel TXT boot support"). No idea about this either. > DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes > DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff] Both related to arch_rmrr_sanity_check(), added by f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") and f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check"). No idea about this one either. The VT-d spec (r1.3, sec 8.4) says "BIOS must report the RMRR reported memory addresses as reserved in the system memory map returned through methods such as INT15, EFI GetMemoryMap etc." arch_rmrr_sanity_check() only looks at your e820 map, which only has this: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x0000000000100000-0x00000000d1f36fff] usable I think Linux basically converts the info from EFI GetMemoryMap to an e820 format; I think booting with "efi=debug" would show more details of this. Anyway, this is all a tangent. > BTW is there a reason for this logging discrepancy? > > efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map > efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map > > efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map > efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map > > This is arch/x86/platform/efi/efi.c: > static void __init efi_remove_e820_mmio(void) > > Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20 > Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10 You mean the MB vs KB difference? That's my fault. I guess I used KB for the "Not removing" message because those are smaller (< 256KB) so the size in MB wouldn't be useful there. We could use KB for both, but I guess I used MB for the "Remove" case because it's a little easier to read and I expected "Not removing" to be a relatively unusual case. Bjorn
On Mon, Nov 20, 2023 at 10:29:33 -0600, Bjorn Helgaas wrote: > Thank you! A BIOS update is almost never the answer because even if > an update exists, we have to assume that most users in the field will > never install the update. Not to mention enabling 64-bit BARs, which is even more cumbersome ixgbe-specific magic that requires entirely dedicated tools... >> .text .data .bss are not marked as E820_TYPE_RAM! and >> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes >> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff] [...] > I think Linux basically converts the info from EFI GetMemoryMap > to an e820 format; I think booting with "efi=debug" would show more > details of this. The dmesg I've attached today is with efi=debug, but the weird thing is - both of the above warnings manifested themself only once, with the first (verbose debugging: "MCFG debug") patch applied... Anyway. The "memremap attempted on mixed range 0x0000000000000000 size: 0x8000 WARNING: CPU: 0 PID: 1 at kernel/iomem.c:78 memremap+0x154/0x170" also seems to be triggered by "efi=debug", so my guess is that it's unrelated.
On Tue, Nov 21, 2023 at 04:24:07PM +0100, Tomasz Pala wrote: > On Mon, Nov 20, 2023 at 10:29:33 -0600, Bjorn Helgaas wrote: > > > Thank you! A BIOS update is almost never the answer because even if > > an update exists, we have to assume that most users in the field will > > never install the update. > > Not to mention enabling 64-bit BARs, which is even more cumbersome > ixgbe-specific magic that requires entirely dedicated tools... > > >> .text .data .bss are not marked as E820_TYPE_RAM! > and > >> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes > >> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff] > [...] > > I think Linux basically converts the info from EFI GetMemoryMap > > to an e820 format; I think booting with "efi=debug" would show more > > details of this. > > The dmesg I've attached today is with efi=debug, but the weird thing is > - both of the above warnings manifested themself only once, with the > first (verbose debugging: "MCFG debug") patch applied... Anyway. OK. I don't know what (if anything) to do about the above. > The "memremap attempted on mixed range 0x0000000000000000 size: 0x8000 > WARNING: CPU: 0 PID: 1 at kernel/iomem.c:78 memremap+0x154/0x170" also > seems to be triggered by "efi=debug", so my guess is that it's unrelated. Yes, I think so. This is from efi_debugfs_init(), which we only run when "efi=debug", and I think it comes from memremapping this area: efi: mem00: [Boot Code | | | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000007fff] (0MB) Bjorn
[TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.] On 12.10.23 17:33, Tomasz Pala wrote: > Hello, > > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote: > >> Normally we reject ECAM space unless it is reported as reserved in the E820 >> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This >> means PCI extended config space (offsets 0x100-0xfff) may not be accessible. > > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel. #regzbot fix: x86/pci: Reserve ECAM if BIOS didn't include it in PNP0C02 _CRS #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index cd16bef5f2d9..da4b6e8e9df0 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -12,6 +12,7 @@ */ #include <linux/acpi.h> +#include <linux/efi.h> #include <linux/pci.h> #include <linux/init.h> #include <linux/bitmap.h> @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used) return mcfg_res.flags; } +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) +{ +#ifdef CONFIG_EFI + efi_memory_desc_t *md; + u64 size, mmio_start, mmio_end; + + for_each_efi_memory_desc(md) { + if (md->type == EFI_MEMORY_MAPPED_IO) { + size = md->num_pages << EFI_PAGE_SHIFT; + mmio_start = md->phys_addr; + mmio_end = mmio_start + size; + + /* + * N.B. Caller supplies (start, start + size), + * so to match, mmio_end is the first address + * *past* the EFI_MEMORY_MAPPED_IO area. + */ + if (mmio_start <= start && end <= mmio_end) + return true; + } + } +#endif + + return false; +} + typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type); static bool __ref is_mmconf_reserved(check_reserved_t is_reserved, @@ -513,6 +540,10 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e "MMCONFIG at %pR not reserved in " "ACPI motherboard resources\n", &cfg->res); + + if (is_mmconf_reserved(is_efi_mmio, cfg, dev, + "EfiMemoryMappedIO")) + return true; } /*