diff mbox

irqdomain: Fix NULL pointer dererence in irq_domain_free_irqs_parent

Message ID 1416531745-24661-1-git-send-email-suravee.suthikulpanit@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suravee Suthikulpanit Nov. 21, 2014, 1:02 a.m. UTC
From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

This patch checks if the parent domain is NULL before recursively freeing
irqs in the parent domains.

In this case, GICv2m is freeing irqs in parent (GIC), which calls
irq_domain_free_irqs_top. This fixes the crash below:

Unble to handle kernel NULL pointer dereference at virtual address 00000018
pgd = fffffe03c78c0000
[00000018] *pgd=00000083c8700003, *pud=00000083c8700003, *pmd=00000083c8700003, *pte=0000000000000000
Internal error: Oops: 96000007 [#1] SMP
Modules linked in: mlx4_core(-) rtc_efi efivarfs [last unloaded: mlx4_en]
CPU: 5 PID: 985 Comm: modprobe Not tainted 3.18.0-rc4-marc-v2m+ #223
task: fffffe03c20c0000 ti: fffffe03c1fb8000 task.ti: fffffe03c1fb8000
PC is at irq_domain_free_irqs_recursive+0x10/0x84
LR is at irq_domain_free_irqs_common+0x8c/0xa0
pc : [<fffffe00000efb2c>] lr : [<fffffe00000f028c>] pstate: 60000145
sp : fffffe03c1fbb9a0
x29: fffffe03c1fbb9a0 x28: fffffe03c1fb8000
x27: fffffe000092f000 x26: fffffe03c10eba00
...
Call trace:
[<fffffe00000efb2c>] irq_domain_free_irqs_recursive+0x10/0x84
[<fffffe00000f0288>] irq_domain_free_irqs_common+0x88/0xa0
[<fffffe00000f030c>] irq_domain_free_irqs_top+0x6c/0x84
[<fffffe00000efb40>] irq_domain_free_irqs_recursive+0x24/0x84
[<fffffe00000f0954>] irq_domain_free_irqs_parent+0x14/0x20
[<fffffe000042c4fc>] gicv2m_irq_domain_free+0x48/0x88
[<fffffe00000efb40>] irq_domain_free_irqs_recursive+0x24/0x84
[<fffffe00000f0288>] irq_domain_free_irqs_common+0x88/0xa0
[<fffffe00000f030c>] irq_domain_free_irqs_top+0x6c/0x84
[<fffffe00000f1a38>] msi_domain_free+0x74/0x8c
[<fffffe00000efb40>] irq_domain_free_irqs_recursive+0x24/0x84
[<fffffe00000f0898>] irq_domain_free_irqs+0x110/0x184
[<fffffe00000f2124>] msi_domain_free_irqs+0x28/0x4c
[<fffffe0000448194>] free_msi_irqs+0x90/0x1d8
[<fffffe0000449278>] pci_disable_msix+0x40/0x50

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 kernel/irq/irqdomain.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Thomas Gleixner Nov. 21, 2014, 1:32 a.m. UTC | #1
On Thu, 20 Nov 2014, suravee.suthikulpanit@amd.com wrote:
> This patch checks if the parent domain is NULL before recursively freeing
> irqs in the parent domains.

Which is nonsense, because if the thing has not been allocated in the
first place, then it cannot explode in the free path magically, except
there is a missing check in the allocation path error handling.
 
And that's obviously not the case simply because this originates from:
> [<fffffe0000449278>] pci_disable_msix+0x40/0x50
 
Suravee, this is the last warning. I'm tired of your half baken
patches which lack any explanation. Read back on my previous replies
to your mails for further explanation.

This is not a 'try and error and hack enough nonsensical checks into
the code' commercial project.

This is core kernel code and requires proper explanation.

Thanks,

	tglx
Suravee Suthikulpanit Nov. 21, 2014, 2:08 a.m. UTC | #2
On 11/20/2014 07:32 PM, Thomas Gleixner wrote:
> On Thu, 20 Nov 2014, suravee.suthikulpanit@amd.com wrote:
>> This patch checks if the parent domain is NULL before recursively freeing
>> irqs in the parent domains.
>
> Which is nonsense, because if the thing has not been allocated in the
> first place, then it cannot explode in the free path magically, except
> there is a missing check in the allocation path error handling.
>
> And that's obviously not the case simply because this originates from:
>> [<fffffe0000449278>] pci_disable_msix+0x40/0x50
>

Thomas,

In this case, I have the following irq domain hierarchy:

[GIC] -- [GICv2m] -- [MSI]

which recursively calling the freeing function:

In GIC domain, it currently defines the struct irq_domain_ops.free() with :
   --> irq_domain_free_irqs_top()
     |--> irq_domain_free_irqs_common()
       |--> irq_domain_free_irq_parent()
         |--> irq_domain_free_irqs_recursive()

and there is no check before passing the NULL domain->parent into the 
irq_domain_free_irqs_recursive(), which causes the error.

Since the GIC is the top most domain, it does not have parent domain.
So, I'm not sure what is missing from the allocation path error 
handling, as you mentioned.

Thanks,

Suravee
Jiang Liu Nov. 21, 2014, 2:49 a.m. UTC | #3
On 2014/11/21 10:08, Suravee Suthikulpanit wrote:
> On 11/20/2014 07:32 PM, Thomas Gleixner wrote:
>> On Thu, 20 Nov 2014, suravee.suthikulpanit@amd.com wrote:
>>> This patch checks if the parent domain is NULL before recursively
>>> freeing
>>> irqs in the parent domains.
>>
>> Which is nonsense, because if the thing has not been allocated in the
>> first place, then it cannot explode in the free path magically, except
>> there is a missing check in the allocation path error handling.
>>
>> And that's obviously not the case simply because this originates from:
>>> [<fffffe0000449278>] pci_disable_msix+0x40/0x50
>>
> 
> Thomas,
> 
> In this case, I have the following irq domain hierarchy:
> 
> [GIC] -- [GICv2m] -- [MSI]
> 
> which recursively calling the freeing function:
> 
> In GIC domain, it currently defines the struct irq_domain_ops.free() with :
>   --> irq_domain_free_irqs_top()
>     |--> irq_domain_free_irqs_common()
>       |--> irq_domain_free_irq_parent()
>         |--> irq_domain_free_irqs_recursive()
> 
> and there is no check before passing the NULL domain->parent into the
> irq_domain_free_irqs_recursive(), which causes the error.
> 
> Since the GIC is the top most domain, it does not have parent domain.
> So, I'm not sure what is missing from the allocation path error
> handling, as you mentioned.
Hi Thomas,
	We have had a discussion about this issue in another thread.
Originally irq_domain_free_irqs_common() is designed to be used by
irqdomains with parent. But there are desires to reuse it to support
irqdomains without parent too for code reduction.
So I suggest to change irq_domain_free_irqs_common() instead of
irq_domain_free_irqs_parent() because caller of
irq_domain_free_irqs_parent() should guarantee current domain do have
a parent.
I'm preparing a patch for this:)
Regards!
Gerry
> 
> Thanks,
> 
> Suravee
Suravee Suthikulpanit Nov. 21, 2014, 3:06 a.m. UTC | #4
On 11/20/2014 08:49 PM, Jiang Liu wrote:
>
>
> On 2014/11/21 10:08, Suravee Suthikulpanit wrote:
>> On 11/20/2014 07:32 PM, Thomas Gleixner wrote:
>>> On Thu, 20 Nov 2014, suravee.suthikulpanit@amd.com wrote:
>>>> This patch checks if the parent domain is NULL before recursively
>>>> freeing
>>>> irqs in the parent domains.
>>>
>>> Which is nonsense, because if the thing has not been allocated in the
>>> first place, then it cannot explode in the free path magically, except
>>> there is a missing check in the allocation path error handling.
>>>
>>> And that's obviously not the case simply because this originates from:
>>>> [<fffffe0000449278>] pci_disable_msix+0x40/0x50
>>>
>>
>> Thomas,
>>
>> In this case, I have the following irq domain hierarchy:
>>
>> [GIC] -- [GICv2m] -- [MSI]
>>
>> which recursively calling the freeing function:
>>
>> In GIC domain, it currently defines the struct irq_domain_ops.free() with :
>>    --> irq_domain_free_irqs_top()
>>      |--> irq_domain_free_irqs_common()
>>        |--> irq_domain_free_irq_parent()
>>          |--> irq_domain_free_irqs_recursive()
>>
>> and there is no check before passing the NULL domain->parent into the
>> irq_domain_free_irqs_recursive(), which causes the error.
>>
>> Since the GIC is the top most domain, it does not have parent domain.
>> So, I'm not sure what is missing from the allocation path error
>> handling, as you mentioned.
> Hi Thomas,
> 	We have had a discussion about this issue in another thread.
> Originally irq_domain_free_irqs_common() is designed to be used by
> irqdomains with parent. But there are desires to reuse it to support
> irqdomains without parent too for code reduction.
> So I suggest to change irq_domain_free_irqs_common() instead of
> irq_domain_free_irqs_parent() because caller of
> irq_domain_free_irqs_parent() should guarantee current domain do have
> a parent.
> I'm preparing a patch for this:)
> Regards!
> Gerry

Thanks Gerry and Thomas.

Suravee

>>
>> Thanks,
>>
>> Suravee
diff mbox

Patch

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 029acf1..4390eb8 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1166,6 +1166,9 @@  int irq_domain_alloc_irqs_parent(struct irq_domain *domain,
 void irq_domain_free_irqs_parent(struct irq_domain *domain,
 				 unsigned int irq_base, unsigned int nr_irqs)
 {
+	if (!domain->parent)
+		return;
+
 	/* irq_domain_free_irqs_recursive() will call parent's free */
 	if (!irq_domain_is_auto_recursive(domain))
 		irq_domain_free_irqs_recursive(domain->parent, irq_base,