diff mbox

PCI: Fix pcibios_update_irq misuse of irq number

Message ID 1422456683-797-1-git-send-email-marc.zyngier@arm.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Marc Zyngier Jan. 28, 2015, 2:51 p.m. UTC
pcibios_update_irq writes an irq number into the config space
of a given PCI device, but ignores the fact that this number
is a virtual interrupt number, which might be a very different
value from what the underlying hardware is using.

The obvious fix is to fetch the HW interrupt number from the
corresponding irq_data structure. This is slightly complicated
by the fact that this interrupt might be services by a stacked
domain.

This has been tested on KVM with kvmtool.

Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/pci/setup-irq.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Jiang Liu Jan. 28, 2015, 3:21 p.m. UTC | #1
On 2015/1/28 22:51, Marc Zyngier wrote:
> pcibios_update_irq writes an irq number into the config space
> of a given PCI device, but ignores the fact that this number
> is a virtual interrupt number, which might be a very different
> value from what the underlying hardware is using.
> 
> The obvious fix is to fetch the HW interrupt number from the
> corresponding irq_data structure. This is slightly complicated
> by the fact that this interrupt might be services by a stacked
> domain.
> 
> This has been tested on KVM with kvmtool.
> 
> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Tested-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  drivers/pci/setup-irq.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
> index 4e2d595..828cbc9 100644
> --- a/drivers/pci/setup-irq.c
> +++ b/drivers/pci/setup-irq.c
> @@ -15,11 +15,19 @@
>  #include <linux/errno.h>
>  #include <linux/ioport.h>
>  #include <linux/cache.h>
> +#include <linux/irq.h>
>  
>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>  {
> -	dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
> -	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
> +	struct irq_data *d;
> +
> +	d = irq_get_irq_data(irq);
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +	while (d->parent_data)
> +		d = d->parent_data;
> +#endif
> +	dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
> +	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>  }
Hi Mark,
	Instead of modifying the common version, how about
implementing an arch specific version? Arch may have different
way to determine the irq number. Above implementation doesn't
work with x86, for example.
Regards!
Gerry

>  
>  static void pdev_fixup_irq(struct pci_dev *dev,
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 28, 2015, 3:43 p.m. UTC | #2
On Wed, Jan 28, 2015 at 9:27 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Gerry,
>
> On 28/01/15 15:21, Jiang Liu wrote:
>>
>>
>> On 2015/1/28 22:51, Marc Zyngier wrote:
>>> pcibios_update_irq writes an irq number into the config space
>>> of a given PCI device, but ignores the fact that this number
>>> is a virtual interrupt number, which might be a very different
>>> value from what the underlying hardware is using.
>>>
>>> The obvious fix is to fetch the HW interrupt number from the
>>> corresponding irq_data structure. This is slightly complicated
>>> by the fact that this interrupt might be services by a stacked
>>> domain.
>>>
>>> This has been tested on KVM with kvmtool.
>>>
>>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>> Tested-by: Andre Przywara <andre.przywara@arm.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  drivers/pci/setup-irq.c | 12 ++++++++++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
>>> index 4e2d595..828cbc9 100644
>>> --- a/drivers/pci/setup-irq.c
>>> +++ b/drivers/pci/setup-irq.c
>>> @@ -15,11 +15,19 @@
>>>  #include <linux/errno.h>
>>>  #include <linux/ioport.h>
>>>  #include <linux/cache.h>
>>> +#include <linux/irq.h>
>>>
>>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>>  {
>>> -    dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>>> -    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>>> +    struct irq_data *d;
>>> +
>>> +    d = irq_get_irq_data(irq);
>>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>>> +    while (d->parent_data)
>>> +            d = d->parent_data;
>>> +#endif
>>> +    dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>>> +    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>>>  }
>> Hi Mark,
>>       Instead of modifying the common version, how about
>> implementing an arch specific version? Arch may have different
>> way to determine the irq number. Above implementation doesn't
>> work with x86, for example.
>
> If you look at the Makefile, this file is used on:
>
> obj-$(CONFIG_ALPHA) += setup-irq.o
> obj-$(CONFIG_ARM) += setup-irq.o
> obj-$(CONFIG_UNICORE32) += setup-irq.o
> obj-$(CONFIG_SUPERH) += setup-irq.o
> obj-$(CONFIG_MIPS) += setup-irq.o
> obj-$(CONFIG_TILE) += setup-irq.o
> obj-$(CONFIG_SPARC_LEON) += setup-irq.o
> obj-$(CONFIG_M68K) += setup-irq.o
>
> x86 doesn't use that at all.

Since you're looking at this, Marc, do you see a nice way to get rid
of these arch dependencies in the Makefile and unify this a bit?  We
still have this pci_fixup_irqs() ugliness -- it's not really
arch-specific at all, but it's called from arch code, and it uses
for_each_pci_dev(), which obviously only works for things present at
boot and not for things hot-added later.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 2, 2015, 3:57 p.m. UTC | #3
On Wed, Jan 28, 2015 at 8:51 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> pcibios_update_irq writes an irq number into the config space
> of a given PCI device, but ignores the fact that this number
> is a virtual interrupt number, which might be a very different
> value from what the underlying hardware is using.
>
> The obvious fix is to fetch the HW interrupt number from the
> corresponding irq_data structure. This is slightly complicated
> by the fact that this interrupt might be services by a stacked
> domain.
>
> This has been tested on KVM with kvmtool.
>
> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Tested-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Jiang, are you OK with this patch as-is now, since it isn't used on x86?

Marc, Lorenzo, I assume this actually fixes a bug.  Can we get any
more details about what happens when you hit the bug, and how you
reproduced it (what platform, driver, etc.)?

Bjorn

> ---
>  drivers/pci/setup-irq.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
> index 4e2d595..828cbc9 100644
> --- a/drivers/pci/setup-irq.c
> +++ b/drivers/pci/setup-irq.c
> @@ -15,11 +15,19 @@
>  #include <linux/errno.h>
>  #include <linux/ioport.h>
>  #include <linux/cache.h>
> +#include <linux/irq.h>
>
>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>  {
> -       dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
> -       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
> +       struct irq_data *d;
> +
> +       d = irq_get_irq_data(irq);
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +       while (d->parent_data)
> +               d = d->parent_data;
> +#endif
> +       dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
> +       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>  }
>
>  static void pdev_fixup_irq(struct pci_dev *dev,
> --
> 2.1.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu Feb. 2, 2015, 4:06 p.m. UTC | #4
On 2015/2/2 23:57, Bjorn Helgaas wrote:
> On Wed, Jan 28, 2015 at 8:51 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> pcibios_update_irq writes an irq number into the config space
>> of a given PCI device, but ignores the fact that this number
>> is a virtual interrupt number, which might be a very different
>> value from what the underlying hardware is using.
>>
>> The obvious fix is to fetch the HW interrupt number from the
>> corresponding irq_data structure. This is slightly complicated
>> by the fact that this interrupt might be services by a stacked
>> domain.
>>
>> This has been tested on KVM with kvmtool.
>>
>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Tested-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> Jiang, are you OK with this patch as-is now, since it isn't used on x86?
Sure, I'm OK. I missed the point that it's not used on x86 at all
in previous review.

> 
> Marc, Lorenzo, I assume this actually fixes a bug.  Can we get any
> more details about what happens when you hit the bug, and how you
> reproduced it (what platform, driver, etc.)?
> 
> Bjorn
> 
>> ---
>>  drivers/pci/setup-irq.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
>> index 4e2d595..828cbc9 100644
>> --- a/drivers/pci/setup-irq.c
>> +++ b/drivers/pci/setup-irq.c
>> @@ -15,11 +15,19 @@
>>  #include <linux/errno.h>
>>  #include <linux/ioport.h>
>>  #include <linux/cache.h>
>> +#include <linux/irq.h>
>>
>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>  {
>> -       dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>> -       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>> +       struct irq_data *d;
>> +
>> +       d = irq_get_irq_data(irq);
>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>> +       while (d->parent_data)
>> +               d = d->parent_data;
>> +#endif
>> +       dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>> +       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>>  }
>>
>>  static void pdev_fixup_irq(struct pci_dev *dev,
>> --
>> 2.1.4
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier Feb. 2, 2015, 4:15 p.m. UTC | #5
On 28/01/15 15:43, Bjorn Helgaas wrote:
> On Wed, Jan 28, 2015 at 9:27 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> Hi Gerry,
>>
>> On 28/01/15 15:21, Jiang Liu wrote:
>>>
>>>
>>> On 2015/1/28 22:51, Marc Zyngier wrote:
>>>> pcibios_update_irq writes an irq number into the config space
>>>> of a given PCI device, but ignores the fact that this number
>>>> is a virtual interrupt number, which might be a very different
>>>> value from what the underlying hardware is using.
>>>>
>>>> The obvious fix is to fetch the HW interrupt number from the
>>>> corresponding irq_data structure. This is slightly complicated
>>>> by the fact that this interrupt might be services by a stacked
>>>> domain.
>>>>
>>>> This has been tested on KVM with kvmtool.
>>>>
>>>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>>> Tested-by: Andre Przywara <andre.przywara@arm.com>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  drivers/pci/setup-irq.c | 12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
>>>> index 4e2d595..828cbc9 100644
>>>> --- a/drivers/pci/setup-irq.c
>>>> +++ b/drivers/pci/setup-irq.c
>>>> @@ -15,11 +15,19 @@
>>>>  #include <linux/errno.h>
>>>>  #include <linux/ioport.h>
>>>>  #include <linux/cache.h>
>>>> +#include <linux/irq.h>
>>>>
>>>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>>>  {
>>>> -    dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>>>> -    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>>>> +    struct irq_data *d;
>>>> +
>>>> +    d = irq_get_irq_data(irq);
>>>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>>>> +    while (d->parent_data)
>>>> +            d = d->parent_data;
>>>> +#endif
>>>> +    dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>>>> +    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>>>>  }
>>> Hi Mark,
>>>       Instead of modifying the common version, how about
>>> implementing an arch specific version? Arch may have different
>>> way to determine the irq number. Above implementation doesn't
>>> work with x86, for example.
>>
>> If you look at the Makefile, this file is used on:
>>
>> obj-$(CONFIG_ALPHA) += setup-irq.o
>> obj-$(CONFIG_ARM) += setup-irq.o
>> obj-$(CONFIG_UNICORE32) += setup-irq.o
>> obj-$(CONFIG_SUPERH) += setup-irq.o
>> obj-$(CONFIG_MIPS) += setup-irq.o
>> obj-$(CONFIG_TILE) += setup-irq.o
>> obj-$(CONFIG_SPARC_LEON) += setup-irq.o
>> obj-$(CONFIG_M68K) += setup-irq.o
>>
>> x86 doesn't use that at all.
> 
> Since you're looking at this, Marc, do you see a nice way to get rid
> of these arch dependencies in the Makefile and unify this a bit?  We
> still have this pci_fixup_irqs() ugliness -- it's not really
> arch-specific at all, but it's called from arch code, and it uses
> for_each_pci_dev(), which obviously only works for things present at
> boot and not for things hot-added later.

I can have a look at this in the next cycle - I'm a bit strapped for
time just now.

As for for_each_pci_dev(), I'm not completely clear about what it should
be replaced for. Do we have some form of notifier for this?

Thanks,

	M.
Bjorn Helgaas Feb. 2, 2015, 4:22 p.m. UTC | #6
On Mon, Feb 2, 2015 at 10:15 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 28/01/15 15:43, Bjorn Helgaas wrote:
>> On Wed, Jan 28, 2015 at 9:27 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> Hi Gerry,
>>>
>>> On 28/01/15 15:21, Jiang Liu wrote:
>>>>
>>>>
>>>> On 2015/1/28 22:51, Marc Zyngier wrote:
>>>>> pcibios_update_irq writes an irq number into the config space
>>>>> of a given PCI device, but ignores the fact that this number
>>>>> is a virtual interrupt number, which might be a very different
>>>>> value from what the underlying hardware is using.
>>>>>
>>>>> The obvious fix is to fetch the HW interrupt number from the
>>>>> corresponding irq_data structure. This is slightly complicated
>>>>> by the fact that this interrupt might be services by a stacked
>>>>> domain.
>>>>>
>>>>> This has been tested on KVM with kvmtool.
>>>>>
>>>>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>>>> Tested-by: Andre Przywara <andre.przywara@arm.com>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> ---
>>>>>  drivers/pci/setup-irq.c | 12 ++++++++++--
>>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
>>>>> index 4e2d595..828cbc9 100644
>>>>> --- a/drivers/pci/setup-irq.c
>>>>> +++ b/drivers/pci/setup-irq.c
>>>>> @@ -15,11 +15,19 @@
>>>>>  #include <linux/errno.h>
>>>>>  #include <linux/ioport.h>
>>>>>  #include <linux/cache.h>
>>>>> +#include <linux/irq.h>
>>>>>
>>>>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>>>>  {
>>>>> -    dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>>>>> -    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>>>>> +    struct irq_data *d;
>>>>> +
>>>>> +    d = irq_get_irq_data(irq);
>>>>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>>>>> +    while (d->parent_data)
>>>>> +            d = d->parent_data;
>>>>> +#endif
>>>>> +    dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>>>>> +    pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>>>>>  }
>>>> Hi Mark,
>>>>       Instead of modifying the common version, how about
>>>> implementing an arch specific version? Arch may have different
>>>> way to determine the irq number. Above implementation doesn't
>>>> work with x86, for example.
>>>
>>> If you look at the Makefile, this file is used on:
>>>
>>> obj-$(CONFIG_ALPHA) += setup-irq.o
>>> obj-$(CONFIG_ARM) += setup-irq.o
>>> obj-$(CONFIG_UNICORE32) += setup-irq.o
>>> obj-$(CONFIG_SUPERH) += setup-irq.o
>>> obj-$(CONFIG_MIPS) += setup-irq.o
>>> obj-$(CONFIG_TILE) += setup-irq.o
>>> obj-$(CONFIG_SPARC_LEON) += setup-irq.o
>>> obj-$(CONFIG_M68K) += setup-irq.o
>>>
>>> x86 doesn't use that at all.
>>
>> Since you're looking at this, Marc, do you see a nice way to get rid
>> of these arch dependencies in the Makefile and unify this a bit?  We
>> still have this pci_fixup_irqs() ugliness -- it's not really
>> arch-specific at all, but it's called from arch code, and it uses
>> for_each_pci_dev(), which obviously only works for things present at
>> boot and not for things hot-added later.
>
> I can have a look at this in the next cycle - I'm a bit strapped for
> time just now.
>
> As for for_each_pci_dev(), I'm not completely clear about what it should
> be replaced for. Do we have some form of notifier for this?

I think this should be done somewhere in the enumeration path, e.g.,
maybe in pci_device_add().  Then we shouldn't need a
for_each_pci_dev() loop or a notifier at all.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier Feb. 2, 2015, 4:23 p.m. UTC | #7
On 02/02/15 15:57, Bjorn Helgaas wrote:
> On Wed, Jan 28, 2015 at 8:51 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> pcibios_update_irq writes an irq number into the config space
>> of a given PCI device, but ignores the fact that this number
>> is a virtual interrupt number, which might be a very different
>> value from what the underlying hardware is using.
>>
>> The obvious fix is to fetch the HW interrupt number from the
>> corresponding irq_data structure. This is slightly complicated
>> by the fact that this interrupt might be services by a stacked
>> domain.
>>
>> This has been tested on KVM with kvmtool.
>>
>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Tested-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> Jiang, are you OK with this patch as-is now, since it isn't used on x86?
> 
> Marc, Lorenzo, I assume this actually fixes a bug.  Can we get any
> more details about what happens when you hit the bug, and how you
> reproduced it (what platform, driver, etc.)?

It definitely fixes a bug. This has been found by running a KVM guest
using kvmtool PCI emulation, where the following things happen:

- Guest programs a virtual (bogus) interrupt number in the PCI device
config space (virtio disk in this case)
- kvmtool uses that interrupt number as is, without any other form of
validation
- Either the injection fails (because the interrupt is out of the range
of the virtual interrupt controller) -> virtio PCI device goes dead
- or the injection succeeds because this is a valid interrupt number,
but signals an unrelated peripheral -> virtio PCI device goes dead.

This can be trivially reproduced on any ARM PCI system that requires
legacy interrupts (i.e. no MSI support), and that uses a GIC interrupt
controller. Doing it in a VM is just much more convenient.

Hope this helps,

	M.
Russell King - ARM Linux Feb. 2, 2015, 4:33 p.m. UTC | #8
On Wed, Jan 28, 2015 at 02:51:23PM +0000, Marc Zyngier wrote:
>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>  {
> -	dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
> -	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
> +	struct irq_data *d;
> +
> +	d = irq_get_irq_data(irq);
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +	while (d->parent_data)
> +		d = d->parent_data;
> +#endif
> +	dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
> +	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);

I'm really not convinced about this being the correct thing to do.

Let's take an older ARM system, such as a Footbridge based system with a
PCI southbridge.

Such a system has IRQs 0-15 as the PCI southbridge ISA interrupts.  Then
there are four PCI interrupts provided by the on-board Footbridge.

Right now, PCI devices are programmed with the OS specific interrupt
number - eg:

00:06.1 IDE interface: Contaq Microsystems 82c693 (prog-if 80 [Master])
        Flags: medium devsel, IRQ 14
30: 00 00 00 00 00 00 00 00 00 00 00 00 0e 01 00 00

00:06.2 IDE interface: Contaq Microsystems 82c693 (prog-if 00 [])
        Flags: medium devsel, IRQ 15
30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 02 00 00

00:06.3 USB Controller: Contaq Microsystems 82c693 (prog-if 10 [OHCI])
        Flags: medium devsel, IRQ 12
30: 00 00 00 00 00 00 00 00 00 00 00 00 0c 01 00 00

00:07.0 Mass storage controller: Integrated Technology Express, Inc. IT/ITE8212
Dual channel ATA RAID controller (rev 13)
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 24
30: 00 00 02 04 80 00 00 00 00 00 00 00 18 01 08 08

00:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
        Flags: bus master, medium devsel, latency 32, IRQ 22
30: 00 00 06 04 dc 00 00 00 00 00 00 00 16 01 0a 0a

00:09.0 VGA compatible controller: S3 Inc. 86c775/86c785 [Trio 64V2/DX or /GX] (rev 16) (prog-if 00 [VGA controller])
        Flags: medium devsel, IRQ 21
30: 00 00 00 0c 00 00 00 00 00 00 00 00 15 01 00 00

What your change would mean is that the IRQs currently being programmed
>= 16 would be programmed into with numbers with 16 removed from them.
This means that legacy stuff (eg on the Southbridge which really do signal
via the ISA IRQ controller) end up using the same number range as those
which take PCI specific IRQs.

This surely is not sane.
Arnd Bergmann Feb. 2, 2015, 5:02 p.m. UTC | #9
On Wednesday 28 January 2015 14:51:23 Marc Zyngier wrote:
>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>  {
> -       dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
> -       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
> +       struct irq_data *d;
> +
> +       d = irq_get_irq_data(irq);
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +       while (d->parent_data)
> +               d = d->parent_data;
> +#endif
> +       dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
> +       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>  }

I'm puzzled by this. Why is it even important what we write into
the config space? Isn't this just an interface between BIOS and
OS for systems that rely on the interrupt numbers to be statically
assigned before boot?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier Feb. 2, 2015, 6:08 p.m. UTC | #10
Hi Russell,

On 02/02/15 16:33, Russell King - ARM Linux wrote:
> On Wed, Jan 28, 2015 at 02:51:23PM +0000, Marc Zyngier wrote:
>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>  {
>> -	dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>> -	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>> +	struct irq_data *d;
>> +
>> +	d = irq_get_irq_data(irq);
>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>> +	while (d->parent_data)
>> +		d = d->parent_data;
>> +#endif
>> +	dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>> +	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
> 
> I'm really not convinced about this being the correct thing to do.
> 
> Let's take an older ARM system, such as a Footbridge based system with a
> PCI southbridge.
> 
> Such a system has IRQs 0-15 as the PCI southbridge ISA interrupts.  Then
> there are four PCI interrupts provided by the on-board Footbridge.
> 
> Right now, PCI devices are programmed with the OS specific interrupt
> number - eg:
> 
> 00:06.1 IDE interface: Contaq Microsystems 82c693 (prog-if 80 [Master])
>         Flags: medium devsel, IRQ 14
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0e 01 00 00
> 
> 00:06.2 IDE interface: Contaq Microsystems 82c693 (prog-if 00 [])
>         Flags: medium devsel, IRQ 15
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 02 00 00
> 
> 00:06.3 USB Controller: Contaq Microsystems 82c693 (prog-if 10 [OHCI])
>         Flags: medium devsel, IRQ 12
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0c 01 00 00
> 
> 00:07.0 Mass storage controller: Integrated Technology Express, Inc. IT/ITE8212
> Dual channel ATA RAID controller (rev 13)
>         Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 24
> 30: 00 00 02 04 80 00 00 00 00 00 00 00 18 01 08 08
> 
> 00:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
>         Flags: bus master, medium devsel, latency 32, IRQ 22
> 30: 00 00 06 04 dc 00 00 00 00 00 00 00 16 01 0a 0a
> 
> 00:09.0 VGA compatible controller: S3 Inc. 86c775/86c785 [Trio 64V2/DX or /GX] (rev 16) (prog-if 00 [VGA controller])
>         Flags: medium devsel, IRQ 21
> 30: 00 00 00 0c 00 00 00 00 00 00 00 00 15 01 00 00
> 
> What your change would mean is that the IRQs currently being programmed
>> = 16 would be programmed into with numbers with 16 removed from them.
> This means that legacy stuff (eg on the Southbridge which really do signal
> via the ISA IRQ controller) end up using the same number range as those
> which take PCI specific IRQs.
> 
> This surely is not sane.

I suppose this is ebsa285? I must confess I don't see how to distinguish
the two cases (the GIC case uses a purely virtual number, and the
footbridge case uses something that seems to be physical).

A very easy fix would be to entirely contain this change within #ifdef
CONFIG_IRQ_DOMAIN_HIERARCHY/#endif, but that doesn't fill me with
confidence.

What I don't get is how the hwirq field is set in this case. It probably
isn't very useful (as there is no domain lookup), so I would have hoped
to see irq == hwirq. Obviously, this is not the case. What am I missing?

Thanks,

	M.
Russell King - ARM Linux Feb. 2, 2015, 6:20 p.m. UTC | #11
On Mon, Feb 02, 2015 at 06:08:17PM +0000, Marc Zyngier wrote:
> Hi Russell,
> 
> On 02/02/15 16:33, Russell King - ARM Linux wrote:
> > What your change would mean is that the IRQs currently being programmed
> >> = 16 would be programmed into with numbers with 16 removed from them.
> > This means that legacy stuff (eg on the Southbridge which really do signal
> > via the ISA IRQ controller) end up using the same number range as those
> > which take PCI specific IRQs.
> > 
> > This surely is not sane.
> 
> I suppose this is ebsa285? I must confess I don't see how to distinguish
> the two cases (the GIC case uses a purely virtual number, and the
> footbridge case uses something that seems to be physical).

Yep.

> A very easy fix would be to entirely contain this change within #ifdef
> CONFIG_IRQ_DOMAIN_HIERARCHY/#endif, but that doesn't fill me with
> confidence.
> 
> What I don't get is how the hwirq field is set in this case. It probably
> isn't very useful (as there is no domain lookup), so I would have hoped
> to see irq == hwirq. Obviously, this is not the case. What am I missing?

Well, it depends how this register is to be interpreted.  The PCI specs
say that it's used to communicate the interrupt line routing information
from POST to device drivers and operating systems.  "The value in this
register tells which input of the system interrupt controller(s) the
device's interrupt pin is connected to."  Note the plural there - which
imples that any per-interrupt controller numbering scheme is quite
certainly wrong.

It also implies that there is a known, shared numberspace between POST and
OS implementations on a platform which is used by PCI devices to describe
how the PCI interrupts are wired up.
Marc Zyngier Feb. 3, 2015, 10:38 a.m. UTC | #12
On 02/02/15 17:02, Arnd Bergmann wrote:
> On Wednesday 28 January 2015 14:51:23 Marc Zyngier wrote:
>>  void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
>>  {
>> -       dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
>> -       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
>> +       struct irq_data *d;
>> +
>> +       d = irq_get_irq_data(irq);
>> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>> +       while (d->parent_data)
>> +               d = d->parent_data;
>> +#endif
>> +       dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
>> +       pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
>>  }
> 
> I'm puzzled by this. Why is it even important what we write into
> the config space? Isn't this just an interface between BIOS and
> OS for systems that rely on the interrupt numbers to be statically
> assigned before boot?

That's exactly what I thought until Lorenzo reported kvmtool falling
over because of this write. Obviously, some platforms must actually
require this (possibly for bridges that are not known by the firmware).

Entirely removing that code solves my problem too, but that'd cannot be
the right solution...

Thanks,

	M.
Arnd Bergmann Feb. 3, 2015, 11:31 a.m. UTC | #13
On Tuesday 03 February 2015 10:38:25 Marc Zyngier wrote:
> 
> That's exactly what I thought until Lorenzo reported kvmtool falling
> over because of this write. Obviously, some platforms must actually
> require this (possibly for bridges that are not known by the firmware).

This sounds much like a bug in kvmtool.
 
> Entirely removing that code solves my problem too, but that'd cannot be
> the right solution...

The comment in pdev_fixup_irq() says 

        /* Always tell the device, so the driver knows what is
           the real IRQ to use; the device does not use it. */

which I read to mean that there are drivers that incorrectly use
'pci_read_config_byte(dev, PCI_INTERRUPT_LINE)' as the number
they pass into request_irq, rather than using dev->irq.
However, this means that your patch is actually wrong, because
what the driver cares about is the virtual irq number (which
request_irq expects), not the number relative to some interrupt
controller.

There is another comment in include/linux/pci.h that states
       /*
        * Instead of touching interrupt line and base address registers
        * directly, use the values stored here. They might be different!
        */
       unsigned int    irq;

so apparently this has been a cause for problems in the past,
and drivers that rely on the number are already broken.

I also checked ancient kernel versions from the 2.1 days when the
code was first added. And indeed at the time drivers used to read
the word, but now none of them use the number for anything real,
they were all fixed during linux-2.2 at the latest.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier Feb. 3, 2015, 11:37 a.m. UTC | #14
On 03/02/15 11:31, Arnd Bergmann wrote:
> On Tuesday 03 February 2015 10:38:25 Marc Zyngier wrote:
>>
>> That's exactly what I thought until Lorenzo reported kvmtool falling
>> over because of this write. Obviously, some platforms must actually
>> require this (possibly for bridges that are not known by the firmware).
> 
> This sounds much like a bug in kvmtool.

Lorenzo and I just came to a similar conclusion, given that the HW
should never use that information.

>> Entirely removing that code solves my problem too, but that'd cannot be
>> the right solution...
> 
> The comment in pdev_fixup_irq() says 
> 
>         /* Always tell the device, so the driver knows what is
>            the real IRQ to use; the device does not use it. */
> 
> which I read to mean that there are drivers that incorrectly use
> 'pci_read_config_byte(dev, PCI_INTERRUPT_LINE)' as the number
> they pass into request_irq, rather than using dev->irq.
> However, this means that your patch is actually wrong, because
> what the driver cares about is the virtual irq number (which
> request_irq expects), not the number relative to some interrupt
> controller.

Yes, I now realise that. That makes a lot more sense actually, because I
was getting very confused about how the HW should interpret that number.

Side question: In the probe-only case, should we still allow this write
to happen?

Thanks,

	M.
Arnd Bergmann Feb. 3, 2015, 12:57 p.m. UTC | #15
On Tuesday 03 February 2015 11:37:30 Marc Zyngier wrote:
> Side question: In the probe-only case, should we still allow this write
> to happen?

No, my understanding is that PCI_PROBE_ONLY precisely means that we
do not modify the config space and instead trust what is there to
be sensible.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
index 4e2d595..828cbc9 100644
--- a/drivers/pci/setup-irq.c
+++ b/drivers/pci/setup-irq.c
@@ -15,11 +15,19 @@ 
 #include <linux/errno.h>
 #include <linux/ioport.h>
 #include <linux/cache.h>
+#include <linux/irq.h>
 
 void __weak pcibios_update_irq(struct pci_dev *dev, int irq)
 {
-	dev_dbg(&dev->dev, "assigning IRQ %02d\n", irq);
-	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq);
+	struct irq_data *d;
+
+	d = irq_get_irq_data(irq);
+#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
+	while (d->parent_data)
+		d = d->parent_data;
+#endif
+	dev_dbg(&dev->dev, "assigning IRQ %02ld\n", d->hwirq);
+	pci_write_config_byte(dev, PCI_INTERRUPT_LINE, d->hwirq);
 }
 
 static void pdev_fixup_irq(struct pci_dev *dev,