xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd
diff mbox series

Message ID 1579246910-22736-1-git-send-email-macpaul.lin@mediatek.com
State New
Headers show
Series
  • xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd
Related show

Commit Message

Macpaul Lin Jan. 17, 2020, 7:41 a.m. UTC
According to NULL pointer fix: https://tinyurl.com/uqft5ra
xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
The similar issue has also been found in QC activities in Mediatek.

Here quote the description from the referenced patch as follows.
"Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
at xhci removal") sets xhci_shared_hcd to NULL without
stopping xhci host. This results into a race condition
where shared_hcd (super speed roothub) related interrupts
are being handled with xhci_irq happens when the
xhci_plat_remove is called and shared_hcd is set to NULL.
Fix this by setting the shared_hcd to NULL only after the
controller is halted and no interrupts are generated."

Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
---
 drivers/usb/host/xhci-mtk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mathias Nyman Jan. 31, 2020, 2:50 p.m. UTC | #1
On 17.1.2020 9.41, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
> 
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
> 
> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> ---
>   drivers/usb/host/xhci-mtk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>   	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>   
>   	usb_remove_hcd(shared_hcd);
> -	xhci->shared_hcd = NULL;
>   	device_init_wakeup(&dev->dev, false);
>   
>   	usb_remove_hcd(hcd);
>   	usb_put_hcd(shared_hcd);
> +	xhci->shared_hcd = NULL;
>   	usb_put_hcd(hcd);
>   	xhci_mtk_sch_exit(mtk);
>   	xhci_mtk_clks_disable(mtk);
> 

Could you share details of the NULL pointer dereference, (backtrace).

The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
We might want to add some checks to make sure we are not using the removed
hcd anymore in that codepath anymore.

-Mathias
Macpaul Lin Feb. 1, 2020, 11:20 a.m. UTC | #2
On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> On 17.1.2020 9.41, Macpaul Lin wrote:
> > According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > The similar issue has also been found in QC activities in Mediatek.
> > 
> > Here quote the description from the referenced patch as follows.
> > "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > at xhci removal") sets xhci_shared_hcd to NULL without
> > stopping xhci host. This results into a race condition
> > where shared_hcd (super speed roothub) related interrupts
> > are being handled with xhci_irq happens when the
> > xhci_plat_remove is called and shared_hcd is set to NULL.
> > Fix this by setting the shared_hcd to NULL only after the
> > controller is halted and no interrupts are generated."
> > 
> > Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> > Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> > ---
> >   drivers/usb/host/xhci-mtk.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > index b18a6baef204..c227c67f5dc5 100644
> > --- a/drivers/usb/host/xhci-mtk.c
> > +++ b/drivers/usb/host/xhci-mtk.c
> > @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >   	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> >   
> >   	usb_remove_hcd(shared_hcd);
> > -	xhci->shared_hcd = NULL;
> >   	device_init_wakeup(&dev->dev, false);
> >   
> >   	usb_remove_hcd(hcd);
> >   	usb_put_hcd(shared_hcd);
> > +	xhci->shared_hcd = NULL;
> >   	usb_put_hcd(hcd);
> >   	xhci_mtk_sch_exit(mtk);
> >   	xhci_mtk_clks_disable(mtk);
> > 
> 
> Could you share details of the NULL pointer dereference, (backtrace).

This bug was found by our QA staff while doing 500 times plug-in and
plug-out devices. The backtrace I have was recorded by QA and I didn't
reproduce this issue on my own environment. However, after applied this
patch the issue seems resolve. Here is the backtrace:

Exception Class: Kernel (KE)
PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364

Current Executing Process:
[iptables, 859][netdagent, 770]

Backtrace:
[<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130   
[<ffffff80080eb6d4>] notify_die+0x84/0xac      
[<ffffff800808e874>] die+0x1d8/0x3b8   
[<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188     
[<ffffff80080a81b4>] do_page_fault+0x44/0x3b0  
[<ffffff80080a811c>] do_translation_fault+0x44/0x98    
[<ffffff8008080e08>] do_mem_abort+0x4c/0x128   
[<ffffff80080832d0>] el1_da+0x24/0x3c  
[<ffffff8008cccbc0>] xhci_irq+0x728/0x2364     
[<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44     
[<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4     
[<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
[<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0    
[<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738   
[<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
[<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c   
[<ffffffffffffffff>] 0xffffffffffffffff

> The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
> We might want to add some checks to make sure we are not using the removed
> hcd anymore in that codepath anymore.
> 
> -Mathias

Thanks,
Macpaul Lin
Guenter Roeck Feb. 2, 2020, 7:29 p.m. UTC | #3
On Fri, Jan 17, 2020 at 03:41:50PM +0800, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
> 
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
> 
> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> ---
>  drivers/usb/host/xhci-mtk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>  	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>  
>  	usb_remove_hcd(shared_hcd);
> -	xhci->shared_hcd = NULL;
>  	device_init_wakeup(&dev->dev, false);
>  
>  	usb_remove_hcd(hcd);
>  	usb_put_hcd(shared_hcd);
> +	xhci->shared_hcd = NULL;

I may be wrong, but I have some concerns that this replaces
the NULL pointer access with a UAF.

Guenter

>  	usb_put_hcd(hcd);
>  	xhci_mtk_sch_exit(mtk);
>  	xhci_mtk_clks_disable(mtk);
> -- 
> 2.18.0
Mathias Nyman Feb. 4, 2020, 9:44 a.m. UTC | #4
On 1.2.2020 13.20, Macpaul Lin wrote:
> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
>> On 17.1.2020 9.41, Macpaul Lin wrote:
>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
>>> The similar issue has also been found in QC activities in Mediatek.
>>>
>>> Here quote the description from the referenced patch as follows.
>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
>>> at xhci removal") sets xhci_shared_hcd to NULL without
>>> stopping xhci host. This results into a race condition
>>> where shared_hcd (super speed roothub) related interrupts
>>> are being handled with xhci_irq happens when the
>>> xhci_plat_remove is called and shared_hcd is set to NULL.
>>> Fix this by setting the shared_hcd to NULL only after the
>>> controller is halted and no interrupts are generated."
>>>
>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
>>> ---
>>>    drivers/usb/host/xhci-mtk.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
>>> index b18a6baef204..c227c67f5dc5 100644
>>> --- a/drivers/usb/host/xhci-mtk.c
>>> +++ b/drivers/usb/host/xhci-mtk.c
>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>>>    
>>>    	usb_remove_hcd(shared_hcd);
>>> -	xhci->shared_hcd = NULL;
>>>    	device_init_wakeup(&dev->dev, false);
>>>    
>>>    	usb_remove_hcd(hcd);
>>>    	usb_put_hcd(shared_hcd);
>>> +	xhci->shared_hcd = NULL;
>>>    	usb_put_hcd(hcd);
>>>    	xhci_mtk_sch_exit(mtk);
>>>    	xhci_mtk_clks_disable(mtk);
>>>
>>
>> Could you share details of the NULL pointer dereference, (backtrace).
> 
> This bug was found by our QA staff while doing 500 times plug-in and
> plug-out devices. The backtrace I have was recorded by QA and I didn't
> reproduce this issue on my own environment. However, after applied this
> patch the issue seems resolve. Here is the backtrace:
> 
> Exception Class: Kernel (KE)
> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> 
> Current Executing Process:
> [iptables, 859][netdagent, 770]
> 
> Backtrace:
> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> [<ffffff800808e874>] die+0x1d8/0x3b8
> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> [<ffffff80080832d0>] el1_da+0x24/0x3c
> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> [<ffffffffffffffff>] 0xffffffffffffffff
> 

Thanks,
Could you help me find out which line of code xhci_irq+0x728 is in your case.

As Guenter pointed out there is a risk of turning the NULL pointer dereference
into a use after free if we just solve this by setting xhci->shared_hcd = NULL
later.

If you still have that kernel around, and xhci is compiled in:
gdb vmlinux
gdb li *(xhci_irq+0x728)

-Mathias

Patch
diff mbox series

diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
index b18a6baef204..c227c67f5dc5 100644
--- a/drivers/usb/host/xhci-mtk.c
+++ b/drivers/usb/host/xhci-mtk.c
@@ -593,11 +593,11 @@  static int xhci_mtk_remove(struct platform_device *dev)
 	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
 
 	usb_remove_hcd(shared_hcd);
-	xhci->shared_hcd = NULL;
 	device_init_wakeup(&dev->dev, false);
 
 	usb_remove_hcd(hcd);
 	usb_put_hcd(shared_hcd);
+	xhci->shared_hcd = NULL;
 	usb_put_hcd(hcd);
 	xhci_mtk_sch_exit(mtk);
 	xhci_mtk_clks_disable(mtk);