diff mbox

Info: mapping multiple BARs. Your kernel is fine.

Message ID 20140417182637.GA2098@google.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Bjorn Helgaas April 17, 2014, 6:26 p.m. UTC
Thanks a lot for testing this out and debugging my issues.

Here's a new version that looks for both device IDs I know about.

I'm still nervous about the modeset problem Dave is seeing.  Since the
original patch wouldn't find an 8086:0c00 device on Dave's system, it
should have done nothing.  But since it caused a modesetting problem,
there's something else doing on that I don't understand.

Bjorn



PNP: Work around BIOS defects in Intel MCH area reporting

From: Bjorn Helgaas <bhelgaas@google.com>

Work around BIOSes that don't report the entire Intel MCH area.

MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
PNP0C02 resource.  The MCH space was once 16KB, but is 32KB in newer parts.
Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
the rest of the MCH space is consumed but unreported.

This can cause resource map sanity check warnings or (theoretically) a
device conflict if we assigned the unreported space to another device.

The Intel perf event uncore driver tripped over this when it claimed the
MCH region:

  resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
  Info: mapping multiple BARs. Your kernel is fine.

To prevent this, if we find a PNP0C02 resource that covers part of the MCH
space, extend it to cover the entire space.

Link: http://lkml.kernel.org/r/20140224162400.GE16457@pd.tnic
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pnp/quirks.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Borislav Petkov April 17, 2014, 7:48 p.m. UTC | #1
On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
> Thanks a lot for testing this out and debugging my issues.
> 
> Here's a new version that looks for both device IDs I know about.
> 
> I'm still nervous about the modeset problem Dave is seeing.  Since the
> original patch wouldn't find an 8086:0c00 device on Dave's system, it
> should have done nothing.  But since it caused a modesetting problem,
> there's something else doing on that I don't understand.

Yeah, this is strange, to put it mildly. This quirk wouldnt've done
anything besides the iteration over the pci devices with pci_get_device.
Which wouldn't do anything (refcount increment or so) if it didn't find
the device, right?

Bah, today is the day of the strange bugs. :-\

> PNP: Work around BIOS defects in Intel MCH area reporting
> 
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Work around BIOSes that don't report the entire Intel MCH area.
> 
> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
> PNP0C02 resource.  The MCH space was once 16KB, but is 32KB in newer parts.
> Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
> the rest of the MCH space is consumed but unreported.
> 
> This can cause resource map sanity check warnings or (theoretically) a
> device conflict if we assigned the unreported space to another device.
> 
> The Intel perf event uncore driver tripped over this when it claimed the
> MCH region:
> 
>   resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
>   Info: mapping multiple BARs. Your kernel is fine.
> 
> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
> space, extend it to cover the entire space.
> 
> Link: http://lkml.kernel.org/r/20140224162400.GE16457@pd.tnic
> Reported-by: Borislav Petkov <bp@alien8.de>

Yep, this one works fine:

[    0.403855] pnp 00:01: [Firmware Bug]: PNP resource [mem 0xfed10000-0xfed13fff] covers only part of 0000:00:00.0 Intel MCH; extending to [mem 0xfed10000-0xfed17fff]

Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>

Just a minor nitpick below.

> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pnp/quirks.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index 258fef272ea7..403bd5c42ed1 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -334,6 +334,79 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
>  }
>  #endif
>  
> +/* Device IDs of parts that have 32KB MCH space */
> +static const unsigned int mch_quirk_devices[] = {
> +	0x0154,	/* Ivy Bridge */
> +	0x0c00,	/* Haswell */
> +};
> +
> +static struct pci_dev *get_intel_host(void)
> +{
> +	int i;
> +	struct pci_dev *host;
> +
> +	for (i = 0; i < ARRAY_SIZE(mch_quirk_devices); i++) {
> +		host = pci_get_device(PCI_VENDOR_ID_INTEL, mch_quirk_devices[i],
> +				      NULL);
> +		if (host)
> +			return host;
> +	}
> +	return NULL;
> +}
> +
> +static void quirk_intel_mch(struct pnp_dev *dev)
> +{
> +	struct pci_dev *host;
> +	u32 addr_lo, addr_hi;
> +	struct pci_bus_region region;
> +	struct resource mch;
> +	struct pnp_resource *pnp_res;
> +	struct resource *res;
> +
> +	host = get_intel_host();
> +	if (!host)
> +		return;
> +
> +	/*
> +	 * MCHBAR is not an architected PCI BAR, so MCH space is usually
> +	 * reported as a PNP0C02 resource.  The MCH space was originally
> +	 * 16KB, but is 32KB in newer parts.  Some BIOSes still report a
> +	 * PNP0C02 resource that is only 16KB, which means the rest of the
> +	 * MCH space is consumed but unreported.
> +	 */
> +
> +	/*
> +	 * Read MCHBAR for Host Member Mapped Register Range Base
> +	 * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
> +	 * Sec 3.1.12.
> +	 */
> +	pci_read_config_dword(host, 0x48, &addr_lo);
> +	region.start = addr_lo & ~0x7fff;
> +	pci_read_config_dword(host, 0x4c, &addr_hi);
> +	region.start |= (dma_addr_t) addr_hi << 32;
> +	region.end = region.start + 32*1024 - 1 ;

checkpatch complains about a trailing space before the semicolon.

> +
> +	memset(&mch, 0, sizeof(mch));
> +	mch.flags = IORESOURCE_MEM;
> +	pcibios_bus_to_resource(host->bus, &mch, &region);
> +
> +	list_for_each_entry(pnp_res, &dev->resources, list) {
> +		res = &pnp_res->res;
> +		if (res->end < mch.start || res->start > mch.end)
> +			continue;	/* no overlap */
> +		if (res->start == mch.start && res->end == mch.end)
> +			continue;	/* exact match */
> +
> +		dev_info(&dev->dev, FW_BUG "PNP resource %pR covers only part of %s Intel MCH; extending to %pR\n",
> +			 res, pci_name(host), &mch);
> +		res->start = mch.start;
> +		res->end = mch.end;
> +		break;
> +	}
> +
> +	pci_dev_put(host);
> +}
> +
>  /*
>   *  PnP Quirks
>   *  Cards or devices that need some tweaking due to incomplete resource info
> @@ -364,6 +437,7 @@ static struct pnp_fixup pnp_fixups[] = {
>  #ifdef CONFIG_AMD_NB
>  	{"PNP0c01", quirk_amd_mmconfig_area},
>  #endif
> +	{"PNP0c02", quirk_intel_mch},
>  	{""}
>  };
Dave Jones April 17, 2014, 7:52 p.m. UTC | #2
On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
 > Thanks a lot for testing this out and debugging my issues.
 > 
 > Here's a new version that looks for both device IDs I know about.

I can confirm this patch does fix the backtrace.
I disabled lockdep, and now I can get to X each boot, but I still see
a black screen rather than a console between modesetting becoming active, and X starting.

(The lockdep thing turned out to be a known XFS false positive, but for
 some reason it actually caused X to lock up)

 > I'm still nervous about the modeset problem Dave is seeing.  Since the
 > original patch wouldn't find an 8086:0c00 device on Dave's system, it
 > should have done nothing.  But since it caused a modesetting problem,
 > there's something else doing on that I don't understand.

I don't know if it's relevant, but this laptop (and I suspect many other
thinkpads which seem affected) have dual gfx, both show up on the bus,
even if though the nvidia isn't in use..

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 2200
        Flags: bus master, fast devsel, latency 0, IRQ 44
        Memory at f1000000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 6000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915

01:00.0 3D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/820M / GT 620M/625M/630M/720M] (rev a1)
        Subsystem: Lenovo NVS 5200M
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 5000 [size=128]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>

Just as X starts up, I see this in dmesg..

[   42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov April 17, 2014, 8:01 p.m. UTC | #3
On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
> Just as X starts up, I see this in dmesg..
> 
> [   42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

FWIW, I have that too. It should be something i915-related:

[    0.617673] [drm] Memory usable by graphics device = 2048M
[    0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
[    0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    0.694631] [drm] Driver supports precise vblank timestamp query.
[    0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[    0.799829] fbcon: inteldrmfb (fb0) is primary device
[    1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
Dave Jones April 17, 2014, 8:03 p.m. UTC | #4
On Thu, Apr 17, 2014 at 10:01:27PM +0200, Borislav Petkov wrote:
 > On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
 > > Just as X starts up, I see this in dmesg..
 > > 
 > > [   42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
 > 
 > FWIW, I have that too. It should be something i915-related:
 > 
 > [    0.617673] [drm] Memory usable by graphics device = 2048M
 > [    0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
 > [    0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 > [    0.694631] [drm] Driver supports precise vblank timestamp query.
 > [    0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
 > [    0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
 > [    0.799829] fbcon: inteldrmfb (fb0) is primary device
 > [    1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

Can you send me your .config off-list ?
I wonder if this is something config specific that's causing me to see
this, and you not, given we've apparently got similar machines.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 17, 2014, 8:10 p.m. UTC | #5
On Thu, Apr 17, 2014 at 1:48 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
>> Thanks a lot for testing this out and debugging my issues.
>>
>> Here's a new version that looks for both device IDs I know about.
>>
>> I'm still nervous about the modeset problem Dave is seeing.  Since the
>> original patch wouldn't find an 8086:0c00 device on Dave's system, it
>> should have done nothing.  But since it caused a modesetting problem,
>> there's something else doing on that I don't understand.
>
> Yeah, this is strange, to put it mildly. This quirk wouldnt've done
> anything besides the iteration over the pci devices with pci_get_device.
> Which wouldn't do anything (refcount increment or so) if it didn't find
> the device, right?

Right.

> Bah, today is the day of the strange bugs. :-\
>
>> PNP: Work around BIOS defects in Intel MCH area reporting
>>
>> From: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Work around BIOSes that don't report the entire Intel MCH area.
>>
>> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
>> PNP0C02 resource.  The MCH space was once 16KB, but is 32KB in newer parts.
>> Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
>> the rest of the MCH space is consumed but unreported.
>>
>> This can cause resource map sanity check warnings or (theoretically) a
>> device conflict if we assigned the unreported space to another device.
>>
>> The Intel perf event uncore driver tripped over this when it claimed the
>> MCH region:
>>
>>   resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
>>   Info: mapping multiple BARs. Your kernel is fine.
>>
>> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
>> space, extend it to cover the entire space.
>>
>> Link: http://lkml.kernel.org/r/20140224162400.GE16457@pd.tnic
>> Reported-by: Borislav Petkov <bp@alien8.de>
>
> Yep, this one works fine:
>
> [    0.403855] pnp 00:01: [Firmware Bug]: PNP resource [mem 0xfed10000-0xfed13fff] covers only part of 0000:00:00.0 Intel MCH; extending to [mem 0xfed10000-0xfed17fff]
>
> Acked-by: Borislav Petkov <bp@suse.de>
> Tested-by: Borislav Petkov <bp@suse.de>

>> +     region.end = region.start + 32*1024 - 1 ;

> checkpatch complains about a trailing space before the semicolon.

Thanks!  I hate typos like that.

I'll fix this, add your tested-by and ack, and send to Rafael.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Jones April 17, 2014, 8:53 p.m. UTC | #6
On Thu, Apr 17, 2014 at 04:03:52PM -0400, Dave Jones wrote:
 > On Thu, Apr 17, 2014 at 10:01:27PM +0200, Borislav Petkov wrote:
 >  > On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
 >  > > Just as X starts up, I see this in dmesg..
 >  > > 
 >  > > [   42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
 >  > 
 >  > FWIW, I have that too. It should be something i915-related:
 >  > 
 >  > [    0.617673] [drm] Memory usable by graphics device = 2048M
 >  > [    0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
 >  > [    0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 >  > [    0.694631] [drm] Driver supports precise vblank timestamp query.
 >  > [    0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
 >  > [    0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
 >  > [    0.799829] fbcon: inteldrmfb (fb0) is primary device
 >  > [    1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
 > 
 > Can you send me your .config off-list ?
 > I wonder if this is something config specific that's causing me to see
 > this, and you not, given we've apparently got similar machines.

ok, with your config I get back to a console after the modesetting
switch, but then it hangs in USB init.

Hrmm.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov April 17, 2014, 9:01 p.m. UTC | #7
On Thu, Apr 17, 2014 at 04:53:55PM -0400, Dave Jones wrote:
> ok, with your config I get back to a console after the modesetting
> switch, but then it hangs in USB init.

Maybe because of our machines are not that similar there? Can you take
my config but paste the usb part of yours and see whether it boots fine
then? It could be yours and mine have different USB hw...
Borislav Petkov April 18, 2014, 10:38 a.m. UTC | #8
On Thu, Apr 17, 2014 at 05:30:27PM -0400, Dave Jones wrote:
> I think it's just implicated because that's the next thing that seems
> to init after the modeswitch. The config differences are small, just
> things like =m instead of =y or vice-versa.
>
> I'm about to head into a long weekend, so I'll get back to this on
> Monday, but for now I'm out of ideas.

This is for when you get back: :-)

Can you debug that hang a bit more, like enable some sensible options
under "Kernel Hacking" or somesuch, boot with initcall_debug, add
more printks at key places? If the machine would tell us why exactly
it hangs, we might have an idea, like corruption, transaction stall,
whatever...
diff mbox

Patch

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 258fef272ea7..403bd5c42ed1 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -334,6 +334,79 @@  static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
 }
 #endif
 
+/* Device IDs of parts that have 32KB MCH space */
+static const unsigned int mch_quirk_devices[] = {
+	0x0154,	/* Ivy Bridge */
+	0x0c00,	/* Haswell */
+};
+
+static struct pci_dev *get_intel_host(void)
+{
+	int i;
+	struct pci_dev *host;
+
+	for (i = 0; i < ARRAY_SIZE(mch_quirk_devices); i++) {
+		host = pci_get_device(PCI_VENDOR_ID_INTEL, mch_quirk_devices[i],
+				      NULL);
+		if (host)
+			return host;
+	}
+	return NULL;
+}
+
+static void quirk_intel_mch(struct pnp_dev *dev)
+{
+	struct pci_dev *host;
+	u32 addr_lo, addr_hi;
+	struct pci_bus_region region;
+	struct resource mch;
+	struct pnp_resource *pnp_res;
+	struct resource *res;
+
+	host = get_intel_host();
+	if (!host)
+		return;
+
+	/*
+	 * MCHBAR is not an architected PCI BAR, so MCH space is usually
+	 * reported as a PNP0C02 resource.  The MCH space was originally
+	 * 16KB, but is 32KB in newer parts.  Some BIOSes still report a
+	 * PNP0C02 resource that is only 16KB, which means the rest of the
+	 * MCH space is consumed but unreported.
+	 */
+
+	/*
+	 * Read MCHBAR for Host Member Mapped Register Range Base
+	 * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
+	 * Sec 3.1.12.
+	 */
+	pci_read_config_dword(host, 0x48, &addr_lo);
+	region.start = addr_lo & ~0x7fff;
+	pci_read_config_dword(host, 0x4c, &addr_hi);
+	region.start |= (dma_addr_t) addr_hi << 32;
+	region.end = region.start + 32*1024 - 1 ;
+
+	memset(&mch, 0, sizeof(mch));
+	mch.flags = IORESOURCE_MEM;
+	pcibios_bus_to_resource(host->bus, &mch, &region);
+
+	list_for_each_entry(pnp_res, &dev->resources, list) {
+		res = &pnp_res->res;
+		if (res->end < mch.start || res->start > mch.end)
+			continue;	/* no overlap */
+		if (res->start == mch.start && res->end == mch.end)
+			continue;	/* exact match */
+
+		dev_info(&dev->dev, FW_BUG "PNP resource %pR covers only part of %s Intel MCH; extending to %pR\n",
+			 res, pci_name(host), &mch);
+		res->start = mch.start;
+		res->end = mch.end;
+		break;
+	}
+
+	pci_dev_put(host);
+}
+
 /*
  *  PnP Quirks
  *  Cards or devices that need some tweaking due to incomplete resource info
@@ -364,6 +437,7 @@  static struct pnp_fixup pnp_fixups[] = {
 #ifdef CONFIG_AMD_NB
 	{"PNP0c01", quirk_amd_mmconfig_area},
 #endif
+	{"PNP0c02", quirk_intel_mch},
 	{""}
 };