Message ID | 1373303174-21977-1-git-send-email-nhorman@tuxdriver.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Jul 08, 2013 at 01:06:14PM -0400, Neil Horman wrote: > A few years back intel published a spec update: > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > 53), which noted that these chipsets can't properly do interrupt remapping, and > as a result the recommend that interrupt remapping be disabled in bios. While > many vendors have a bios update to do exactly that, not all do, and of course > not all users update their bios to a level that corrects the problem. As a > result, occasionally interrupts can arrive at a cpu even after affinity for that > interrupt has be moved, leading to lost or spurrious interrupts (usually > characterized by the message: > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > There have been several incidents recently of people seeing this error, and > investigation has shown that they have system for which their BIOS level is such > that this feature was not properly turned off. As such, it would be good to > give them a reminder that their systems are vulnurable to this problem. For > details of those that reported the problem, please see: > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > Notes: Modified early-quirks.c to include linux/irq.h, to fix warnings. This > isn't needed in the upstream tree > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > CC: Prarit Bhargava <prarit@redhat.com> > CC: Don Zickus <dzickus@redhat.com> > CC: Don Dutile <ddutile@redhat.com> > CC: Bjorn Helgaas <bhelgaas@google.com> > CC: Asit Mallick <asit.k.mallick@intel.com> > CC: David Woodhouse <dwmw2@infradead.org> > CC: linux-pci@vger.kernel.org > CC: Joerg Roedel <joro@8bytes.org> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > CC: Arkadiusz Mi?kiewicz <arekm@maven.pl> > CC: Jean Delvare <jdelvare@suse.de> > CC: Greg KH <gregkh@linuxfoundation.org> > Signed-off-by: Joerg Roedel <joro@8bytes.org> > (cherry picked from commit 03bbcb2e7e292838bb0244f5a7816d194c911d62) As 3.9-stable is now dead, this one missed that window, sorry about that. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 23, 2013 at 10:06:02AM -0700, Greg KH wrote: > On Mon, Jul 08, 2013 at 01:06:14PM -0400, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > > 53), which noted that these chipsets can't properly do interrupt remapping, and > > as a result the recommend that interrupt remapping be disabled in bios. While > > many vendors have a bios update to do exactly that, not all do, and of course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. For > > details of those that reported the problem, please see: > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > > > Notes: Modified early-quirks.c to include linux/irq.h, to fix warnings. This > > isn't needed in the upstream tree > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > CC: Prarit Bhargava <prarit@redhat.com> > > CC: Don Zickus <dzickus@redhat.com> > > CC: Don Dutile <ddutile@redhat.com> > > CC: Bjorn Helgaas <bhelgaas@google.com> > > CC: Asit Mallick <asit.k.mallick@intel.com> > > CC: David Woodhouse <dwmw2@infradead.org> > > CC: linux-pci@vger.kernel.org > > CC: Joerg Roedel <joro@8bytes.org> > > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > CC: Arkadiusz Mi?kiewicz <arekm@maven.pl> > > CC: Jean Delvare <jdelvare@suse.de> > > CC: Greg KH <gregkh@linuxfoundation.org> > > Signed-off-by: Joerg Roedel <joro@8bytes.org> > > (cherry picked from commit 03bbcb2e7e292838bb0244f5a7816d194c911d62) > > As 3.9-stable is now dead, this one missed that window, sorry about > that. > Thanks for the heads up, we tried our best :). Neil > greg k-h > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Greg, hi Neil, Le Tuesday 23 July 2013 à 10:06 -0700, Greg KH a écrit : > On Mon, Jul 08, 2013 at 01:06:14PM -0400, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > > 53), which noted that these chipsets can't properly do interrupt remapping, and > > as a result the recommend that interrupt remapping be disabled in bios. While > > many vendors have a bios update to do exactly that, not all do, and of course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. For > > details of those that reported the problem, please see: > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > > > Notes: Modified early-quirks.c to include linux/irq.h, to fix warnings. This > > isn't needed in the upstream tree > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > CC: Prarit Bhargava <prarit@redhat.com> > > CC: Don Zickus <dzickus@redhat.com> > > CC: Don Dutile <ddutile@redhat.com> > > CC: Bjorn Helgaas <bhelgaas@google.com> > > CC: Asit Mallick <asit.k.mallick@intel.com> > > CC: David Woodhouse <dwmw2@infradead.org> > > CC: linux-pci@vger.kernel.org > > CC: Joerg Roedel <joro@8bytes.org> > > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > CC: Arkadiusz Mi?kiewicz <arekm@maven.pl> > > CC: Jean Delvare <jdelvare@suse.de> > > CC: Greg KH <gregkh@linuxfoundation.org> > > Signed-off-by: Joerg Roedel <joro@8bytes.org> > > (cherry picked from commit 03bbcb2e7e292838bb0244f5a7816d194c911d62) > > As 3.9-stable is now dead, this one missed that window, sorry about > that. Sorry for jumping in a little late, but I am looking into this again because one customer of ours requested this fix. I'm a bit confused now, by two different things: 1* Why did Greg claim that the fix missed the 3.9-stable window while I can see Neil's backport of the fix made it into 3.9.9? 2* Why was the backport only added to 3.9-stable and not also 3.4-longterm and 3.0 longterm? I need the fix for the SLE11 SP3 kernel which is based on 3.0. Neil, do you have a backport of the fix for this kernel? The 3.9 doesn't apply :( Thanks,
On Tue, Oct 08, 2013 at 10:45:32PM +0200, Jean Delvare wrote: > Hi Greg, hi Neil, > > Le Tuesday 23 July 2013 à 10:06 -0700, Greg KH a écrit : > > On Mon, Jul 08, 2013 at 01:06:14PM -0400, Neil Horman wrote: > > > A few years back intel published a spec update: > > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > > > 53), which noted that these chipsets can't properly do interrupt remapping, and > > > as a result the recommend that interrupt remapping be disabled in bios. While > > > many vendors have a bios update to do exactly that, not all do, and of course > > > not all users update their bios to a level that corrects the problem. As a > > > result, occasionally interrupts can arrive at a cpu even after affinity for that > > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > > characterized by the message: > > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > > > There have been several incidents recently of people seeing this error, and > > > investigation has shown that they have system for which their BIOS level is such > > > that this feature was not properly turned off. As such, it would be good to > > > give them a reminder that their systems are vulnurable to this problem. For > > > details of those that reported the problem, please see: > > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > > > [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > > > > > Notes: Modified early-quirks.c to include linux/irq.h, to fix warnings. This > > > isn't needed in the upstream tree > > > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > > CC: Prarit Bhargava <prarit@redhat.com> > > > CC: Don Zickus <dzickus@redhat.com> > > > CC: Don Dutile <ddutile@redhat.com> > > > CC: Bjorn Helgaas <bhelgaas@google.com> > > > CC: Asit Mallick <asit.k.mallick@intel.com> > > > CC: David Woodhouse <dwmw2@infradead.org> > > > CC: linux-pci@vger.kernel.org > > > CC: Joerg Roedel <joro@8bytes.org> > > > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > CC: Arkadiusz Mi?kiewicz <arekm@maven.pl> > > > CC: Jean Delvare <jdelvare@suse.de> > > > CC: Greg KH <gregkh@linuxfoundation.org> > > > Signed-off-by: Joerg Roedel <joro@8bytes.org> > > > (cherry picked from commit 03bbcb2e7e292838bb0244f5a7816d194c911d62) > > > > As 3.9-stable is now dead, this one missed that window, sorry about > > that. > > Sorry for jumping in a little late, but I am looking into this again > because one customer of ours requested this fix. I'm a bit confused now, > by two different things: > > 1* Why did Greg claim that the fix missed the 3.9-stable window while I > can see Neil's backport of the fix made it into 3.9.9? > > 2* Why was the backport only added to 3.9-stable and not also > 3.4-longterm and 3.0 longterm? > These are questions that you should ask Greg I think. > I need the fix for the SLE11 SP3 kernel which is based on 3.0. Neil, do > you have a backport of the fix for this kernel? The 3.9 doesn't apply :( > I don't. I've backported it to RHEL6.5 (which is 2.6.32 based), so it shouldn't be that hard to massage into place. Neil > Thanks, > -- > Jean Delvare > Suse L3 Support > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 08, 2013 at 10:45:32PM +0200, Jean Delvare wrote: > Hi Greg, hi Neil, > > Le Tuesday 23 July 2013 à 10:06 -0700, Greg KH a écrit : > > On Mon, Jul 08, 2013 at 01:06:14PM -0400, Neil Horman wrote: > > > A few years back intel published a spec update: > > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > > > 53), which noted that these chipsets can't properly do interrupt remapping, and > > > as a result the recommend that interrupt remapping be disabled in bios. While > > > many vendors have a bios update to do exactly that, not all do, and of course > > > not all users update their bios to a level that corrects the problem. As a > > > result, occasionally interrupts can arrive at a cpu even after affinity for that > > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > > characterized by the message: > > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > > > There have been several incidents recently of people seeing this error, and > > > investigation has shown that they have system for which their BIOS level is such > > > that this feature was not properly turned off. As such, it would be good to > > > give them a reminder that their systems are vulnurable to this problem. For > > > details of those that reported the problem, please see: > > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > > > [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > > > > > Notes: Modified early-quirks.c to include linux/irq.h, to fix warnings. This > > > isn't needed in the upstream tree > > > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > > CC: Prarit Bhargava <prarit@redhat.com> > > > CC: Don Zickus <dzickus@redhat.com> > > > CC: Don Dutile <ddutile@redhat.com> > > > CC: Bjorn Helgaas <bhelgaas@google.com> > > > CC: Asit Mallick <asit.k.mallick@intel.com> > > > CC: David Woodhouse <dwmw2@infradead.org> > > > CC: linux-pci@vger.kernel.org > > > CC: Joerg Roedel <joro@8bytes.org> > > > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > CC: Arkadiusz Mi?kiewicz <arekm@maven.pl> > > > CC: Jean Delvare <jdelvare@suse.de> > > > CC: Greg KH <gregkh@linuxfoundation.org> > > > Signed-off-by: Joerg Roedel <joro@8bytes.org> > > > (cherry picked from commit 03bbcb2e7e292838bb0244f5a7816d194c911d62) > > > > As 3.9-stable is now dead, this one missed that window, sorry about > > that. > > Sorry for jumping in a little late, but I am looking into this again > because one customer of ours requested this fix. I'm a bit confused now, > by two different things: > > 1* Why did Greg claim that the fix missed the 3.9-stable window while I > can see Neil's backport of the fix made it into 3.9.9? Greg doesn't remember what patches he applied to what stable tree, nor if he had applied them to a previous stable release at all. His short-term memory is almost non-existant given the huge rate of patches flowing through him, and he is feeling strange referring to himself in the third person... > 2* Why was the backport only added to 3.9-stable and not also > 3.4-longterm and 3.0 longterm? Probably because the patch sent to me didn't apply there? > I need the fix for the SLE11 SP3 kernel which is based on 3.0. Neil, do > you have a backport of the fix for this kernel? The 3.9 doesn't apply :( Ah, yes, that's why I didn't apply it. If someone sends me a version that does, I will add it to those trees. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index 95fd352..aca6aa2 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -28,6 +28,7 @@ extern void setup_irq_remapping_ops(void); extern int irq_remapping_supported(void); +extern void set_irq_remapping_broken(void); extern int irq_remapping_prepare(void); extern int irq_remapping_enable(void); extern void irq_remapping_disable(void); @@ -54,6 +55,7 @@ void irq_remap_modify_chip_defaults(struct irq_chip *chip); static inline void setup_irq_remapping_ops(void) { } static inline int irq_remapping_supported(void) { return 0; } +static inline void set_irq_remapping_broken(void) { } static inline int irq_remapping_prepare(void) { return -ENODEV; } static inline int irq_remapping_enable(void) { return -ENODEV; } static inline void irq_remapping_disable(void) { } diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 3755ef4..28bd4a1 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -12,12 +12,14 @@ #include <linux/pci.h> #include <linux/acpi.h> #include <linux/pci_ids.h> +#include <linux/irq.h> #include <asm/pci-direct.h> #include <asm/dma.h> #include <asm/io_apic.h> #include <asm/apic.h> #include <asm/iommu.h> #include <asm/gart.h> +#include <asm/irq_remapping.h> static void __init fix_hypertransport_config(int num, int slot, int func) { @@ -192,6 +194,21 @@ static void __init ati_bugs_contd(int num, int slot, int func) } #endif +static void __init intel_remapping_check(int num, int slot, int func) +{ + u8 revision; + + revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); + + /* + * Revision 0x13 of this chipset supports irq remapping + * but has an erratum that breaks its behavior, flag it as such + */ + if (revision == 0x13) + set_irq_remapping_broken(); + +} + #define QFLAG_APPLY_ONCE 0x1 #define QFLAG_APPLIED 0x2 #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED) @@ -221,6 +238,10 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs }, { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, + { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, {} }; diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index f3b8f23..5b19b2d 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -524,6 +524,16 @@ static int __init intel_irq_remapping_supported(void) if (disable_irq_remap) return 0; + if (irq_remap_broken) { + WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND, + "This system BIOS has enabled interrupt remapping\n" + "on a chipset that contains an erratum making that\n" + "feature unstable. To maintain system stability\n" + "interrupt remapping is being disabled. Please\n" + "contact your BIOS vendor for an update\n"); + disable_irq_remap = 1; + return 0; + } if (!dmar_ir_support()) return 0; diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index 7c11ff3..dcfea4e 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -18,6 +18,7 @@ int irq_remapping_enabled; int disable_irq_remap; +int irq_remap_broken; int disable_sourceid_checking; int no_x2apic_optout; @@ -210,6 +211,11 @@ void __init setup_irq_remapping_ops(void) #endif } +void set_irq_remapping_broken(void) +{ + irq_remap_broken = 1; +} + int irq_remapping_supported(void) { if (disable_irq_remap) diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h index ecb6376..90c4dae 100644 --- a/drivers/iommu/irq_remapping.h +++ b/drivers/iommu/irq_remapping.h @@ -32,6 +32,7 @@ struct pci_dev; struct msi_msg; extern int disable_irq_remap; +extern int irq_remap_broken; extern int disable_sourceid_checking; extern int no_x2apic_optout; extern int irq_remapping_enabled; @@ -89,6 +90,7 @@ extern struct irq_remap_ops amd_iommu_irq_ops; #define irq_remapping_enabled 0 #define disable_irq_remap 1 +#define irq_remap_broken 0 #endif /* CONFIG_IRQ_REMAP */