WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized
diff mbox

Message ID 560B9323.6000309@linux.intel.com
State New
Headers show

Commit Message

Jiang Liu Sept. 30, 2015, 7:45 a.m. UTC
n 2015/9/29 18:51, Borislav Petkov wrote:
> On Tue, Sep 29, 2015 at 04:50:36PM +0800, Jiang Liu wrote:
>> So could you please help to apply the attached debug patch to gather
>> more information about the regression?
> 
> Sure, just did.
> 
> I'm sending you a full s/r cycle attempt caught over serial in a private
> message.

Hi Boris,
From the log file, we got to know that the NULL pointer dereference
was caused by AMD IOMMU device. For normal MSI-enabled PCI devices, we get
valid irq numbers such as:
[ 74.661170] ahci 0000:04:00.0: irqdomain: freeze msi 1 irq28
[ 74.661297] radeon 0000:01:00.0: irqdomain: freeze msi 1 irq47
But for AMD IOMMU device, we got an invalid irq number(0) after
enabling MSI as:
[ 74.662488] pci 0000:00:00.2: irqdomain: freeze msi 1 irq0
which then caused NULL pointer deference when __pci_restore_msi_state()
gets called by system resume code.
So we need to figure out why we got irq number 0 after enabling
MSI for AMD IOMMU device. The only hint I got is that iommu driver just
grabbing the PCI device without providing a PCI device driver for IOMMU
PCI device, we have solved a similar case for eata driver. So could you
please help to apply this debug patch to gather more info and send me
/proc/interrupts?
Thanks!
Gerry

O>
> Thanks.
>

Comments

Joerg Roedel Sept. 30, 2015, 12:44 p.m. UTC | #1
On Wed, Sep 30, 2015 at 03:45:39PM +0800, Jiang Liu wrote:
> So we need to figure out why we got irq number 0 after enabling
> MSI for AMD IOMMU device. The only hint I got is that iommu driver just
> grabbing the PCI device without providing a PCI device driver for IOMMU
> PCI device, we have solved a similar case for eata driver. So could you
> please help to apply this debug patch to gather more info and send me
> /proc/interrupts?

I think I have an idea on how dev->irq got 0 after pci_enable_msi(). The
PCI probe code calls pcibios_alloc_irq() and after a failed probe it calls
pcibios_free_irq(), which sets dev->irq to 0.
The AMD IOMMU driver does not register a pci_driver for itself, it just
doesn't make sense for it. But the PCI device containing the IOMMU gets
probed later, which fails because there is no driver for it. So the
following call to pcibios_free_irq() clears dev->irq, so that it is 0 on
the next resume. Does that make sense?


	Joerg
Jiang Liu Sept. 30, 2015, 5 p.m. UTC | #2
On 2015/9/30 20:44, Joerg Roedel wrote:
> On Wed, Sep 30, 2015 at 03:45:39PM +0800, Jiang Liu wrote:
>> So we need to figure out why we got irq number 0 after enabling
>> MSI for AMD IOMMU device. The only hint I got is that iommu driver just
>> grabbing the PCI device without providing a PCI device driver for IOMMU
>> PCI device, we have solved a similar case for eata driver. So could you
>> please help to apply this debug patch to gather more info and send me
>> /proc/interrupts?
> 
> I think I have an idea on how dev->irq got 0 after pci_enable_msi(). The
> PCI probe code calls pcibios_alloc_irq() and after a failed probe it calls
> pcibios_free_irq(), which sets dev->irq to 0.
> The AMD IOMMU driver does not register a pci_driver for itself, it just
> doesn't make sense for it. But the PCI device containing the IOMMU gets
> probed later, which fails because there is no driver for it. So the
> following call to pcibios_free_irq() clears dev->irq, so that it is 0 on
> the next resume. Does that make sense?

Thanks Joerg, that makes sense. If some driver tries to binding to the
IOMMU device, it will trigger the scenario as you described. For
example,  Xen backend driver will try to probe all PCI devices
if enabled. I will do more investigation tomorrow.
Thanks!
Gerry
Joerg Roedel Sept. 30, 2015, 6:06 p.m. UTC | #3
On Thu, Oct 01, 2015 at 01:00:44AM +0800, Jiang Liu wrote:
> Thanks Joerg, that makes sense. If some driver tries to binding to the
> IOMMU device, it will trigger the scenario as you described. For
> example,  Xen backend driver will try to probe all PCI devices
> if enabled. I will do more investigation tomorrow.

Not only that, the probe code looks like this in __pci_device_probe:

                error = -ENODEV;

                id = pci_match_device(drv, pci_dev);
                if (id)
                        error = pci_call_probe(drv, pci_dev, id);
                if (error >= 0)
                        error = 0;

The pci_match_device() function will always return NULL for the iommu
pci_dev, because no driver matches the ids of it. So the function
returns -ENODEV, which will be handled in the caller (pci_device_probe):


        error = pcibios_alloc_irq(pci_dev);
        if (error < 0)
                return error;

        pci_dev_get(pci_dev);
        error = __pci_device_probe(drv, pci_dev);
        if (error) {
                pcibios_free_irq(pci_dev);
                pci_dev_put(pci_dev);
        }

For the IOMMU pci_dev a pcibios-irq will be allocated (if there is one,
like on Boris' system) and because __pci_device_probe returns -ENODEV it
will be freed again with pcibios_free_irq().

The pcibios_free_irq() function will set dev->irq = 0, which overwrites
the value that pci_enable_msi() wrote there. So later in suspend/resume
code the msi-handling part tries to fetch the irq-descriptor for the
wrong irq (which is NULL) and causes the crash.

The issue got introduced because with your changes pci_enable_msi() is
only allowed after a pci-device was successfully probed by the driver.
But this assumption is not true, as the AMD IOMMU driver does not
register as a pci-driver.

Registering a pci-driver would actually be harmful, because a device can
be forcibly unbound from its driver, which would be pretty bad for an
IOMMU in the running system.

So the right fix is to allow pci_enable_msi() for pci-devices not
registered against a driver. The fix I sent Boris has issues (I think it
leaks pcibios irqs when MSI is in use), but was thinking about fixing it
in pci_device_probe by not allocating a pcibios-irq when MSI is already
active. What do you think?

Regards,

	Joerg

Patch
diff mbox

From 57d3013c1c64d9407e432598c645c8de256e0b42 Mon Sep 17 00:00:00 2001
From: Jiang Liu <jiang.liu@linux.intel.com>
Date: Wed, 30 Sep 2015 14:49:29 +0800
Subject: [PATCH] Debug: Gather more information about AMD iommu device

Hi Boris,
	From the log file, we got to know that the NULL pointer dereference
was caused by AMD IOMMU device. For normal MSI-enabled PCI devices, we get
valid irq numbers such as:
[   74.661170] ahci 0000:04:00.0: irqdomain: freeze msi 1 irq28
[   74.661297] radeon 0000:01:00.0: irqdomain: freeze msi 1 irq47
	But for AMD IOMMU device, we got an invalid irq number(0) after
enabling MSI as:
[   74.662488] pci 0000:00:00.2: irqdomain: freeze msi 1 irq0
which then caused NULL pointer deference when __pci_restore_msi_state()
gets called by system resume code.
	So we need to figure out why we got irq number 0 after enabling
MSI for AMD IOMMU device. The only hint I got is that iommu driver just
grabbing the PCI device without providing a PCI device driver for IOMMU
PCI device, we have solved a similar case for eata driver. So could you
please help to apply this debug patch to gather more info and send me
/proc/interrupts?
Thanks!
Gerry

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c     |    6 +++++-
 drivers/iommu/amd_iommu_init.c |    8 ++++++++
 drivers/pci/msi.c              |    4 ++++
 kernel/irq/msi.c               |    1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 5f1feb6854af..050dcf25577c 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -71,6 +71,7 @@  int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
 	struct irq_domain *domain;
 	struct irq_alloc_info info;
+	int ret;
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_MSI;
@@ -82,7 +83,10 @@  int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	if (domain == NULL)
 		return -ENOSYS;
 
-	return pci_msi_domain_alloc_irqs(domain, dev, nvec, type);
+	ret = pci_msi_domain_alloc_irqs(domain, dev, nvec, type);
+	dev_warn(&dev->dev, "irqdomain: domain %p, def_domain %p ret%d\n",
+		 domain, msi_default_domain, ret);
+	return ret;
 }
 
 void native_teardown_msi_irq(unsigned int irq)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5ef347a13cb5..23cd4d861dba 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1416,7 +1416,11 @@  static int iommu_setup_msi(struct amd_iommu *iommu)
 {
 	int r;
 
+	dev_warn(&iommu->dev->dev, "irqdomain: before enabling MSI for msi%d, irq%d\n",
+		 iommu->dev->msi_enabled, iommu->dev->irq);
 	r = pci_enable_msi(iommu->dev);
+	dev_warn(&iommu->dev->dev, "irqdomain: after enabling MSI for msi%d, irq%d\n",
+		 iommu->dev->msi_enabled, iommu->dev->irq);
 	if (r)
 		return r;
 
@@ -1428,6 +1432,8 @@  static int iommu_setup_msi(struct amd_iommu *iommu)
 
 	if (r) {
 		pci_disable_msi(iommu->dev);
+		dev_warn(&iommu->dev->dev, "irqdomain: failed to enable MSI for msi%d, irq%d\n",
+			 iommu->dev->msi_enabled, iommu->dev->irq);
 		return r;
 	}
 
@@ -1440,6 +1446,8 @@  static int iommu_init_msi(struct amd_iommu *iommu)
 {
 	int ret;
 
+	dev_warn(&iommu->dev->dev, "irqdomain: init msi for iommu %p int_enabled%d\n",
+		 iommu, iommu->int_enabled);
 	if (iommu->int_enabled)
 		goto enable_faults;
 
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d4497141d083..0301a18663b0 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -602,6 +602,8 @@  static int msi_capability_init(struct pci_dev *dev, int nvec)
 	int ret;
 	unsigned mask;
 
+	dev_warn(&dev->dev, "irqdomain: enable msi cap msi_enabled%d irq%d\n",
+		 dev->msi_enabled, dev->irq);
 	pci_msi_set_enable(dev, 0);	/* Disable MSI during set up */
 
 	entry = msi_setup_entry(dev, nvec);
@@ -643,6 +645,8 @@  static int msi_capability_init(struct pci_dev *dev, int nvec)
 
 	pcibios_free_irq(dev);
 	dev->irq = entry->irq;
+	dev_warn(&dev->dev, "irqdomain: succeed to enable msi cap msi_enabled%d irq%d\n",
+		 dev->msi_enabled, dev->irq);
 	return 0;
 }
 
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7e6512b9dc1f..535cf59bc5a7 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -298,6 +298,7 @@  int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 			return ret;
 		}
 
+		dev_warn(dev, "irqdomain: allocated virq%d\n", virq);
 		for (i = 0; i < desc->nvec_used; i++)
 			irq_set_msi_desc_off(virq, i, desc);
 	}
-- 
1.7.10.4