xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd
diff mbox series

Message ID 1579246910-22736-1-git-send-email-macpaul.lin@mediatek.com
State New
Headers show
Series
  • xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd
Related show

Commit Message

Macpaul Lin Jan. 17, 2020, 7:41 a.m. UTC
According to NULL pointer fix: https://tinyurl.com/uqft5ra
xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
The similar issue has also been found in QC activities in Mediatek.

Here quote the description from the referenced patch as follows.
"Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
at xhci removal") sets xhci_shared_hcd to NULL without
stopping xhci host. This results into a race condition
where shared_hcd (super speed roothub) related interrupts
are being handled with xhci_irq happens when the
xhci_plat_remove is called and shared_hcd is set to NULL.
Fix this by setting the shared_hcd to NULL only after the
controller is halted and no interrupts are generated."

Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
---
 drivers/usb/host/xhci-mtk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mathias Nyman Jan. 31, 2020, 2:50 p.m. UTC | #1
On 17.1.2020 9.41, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
> 
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
> 
> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> ---
>   drivers/usb/host/xhci-mtk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>   	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>   
>   	usb_remove_hcd(shared_hcd);
> -	xhci->shared_hcd = NULL;
>   	device_init_wakeup(&dev->dev, false);
>   
>   	usb_remove_hcd(hcd);
>   	usb_put_hcd(shared_hcd);
> +	xhci->shared_hcd = NULL;
>   	usb_put_hcd(hcd);
>   	xhci_mtk_sch_exit(mtk);
>   	xhci_mtk_clks_disable(mtk);
> 

Could you share details of the NULL pointer dereference, (backtrace).

The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
We might want to add some checks to make sure we are not using the removed
hcd anymore in that codepath anymore.

-Mathias
Macpaul Lin Feb. 1, 2020, 11:20 a.m. UTC | #2
On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> On 17.1.2020 9.41, Macpaul Lin wrote:
> > According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > The similar issue has also been found in QC activities in Mediatek.
> > 
> > Here quote the description from the referenced patch as follows.
> > "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > at xhci removal") sets xhci_shared_hcd to NULL without
> > stopping xhci host. This results into a race condition
> > where shared_hcd (super speed roothub) related interrupts
> > are being handled with xhci_irq happens when the
> > xhci_plat_remove is called and shared_hcd is set to NULL.
> > Fix this by setting the shared_hcd to NULL only after the
> > controller is halted and no interrupts are generated."
> > 
> > Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> > Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> > ---
> >   drivers/usb/host/xhci-mtk.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > index b18a6baef204..c227c67f5dc5 100644
> > --- a/drivers/usb/host/xhci-mtk.c
> > +++ b/drivers/usb/host/xhci-mtk.c
> > @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >   	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> >   
> >   	usb_remove_hcd(shared_hcd);
> > -	xhci->shared_hcd = NULL;
> >   	device_init_wakeup(&dev->dev, false);
> >   
> >   	usb_remove_hcd(hcd);
> >   	usb_put_hcd(shared_hcd);
> > +	xhci->shared_hcd = NULL;
> >   	usb_put_hcd(hcd);
> >   	xhci_mtk_sch_exit(mtk);
> >   	xhci_mtk_clks_disable(mtk);
> > 
> 
> Could you share details of the NULL pointer dereference, (backtrace).

This bug was found by our QA staff while doing 500 times plug-in and
plug-out devices. The backtrace I have was recorded by QA and I didn't
reproduce this issue on my own environment. However, after applied this
patch the issue seems resolve. Here is the backtrace:

Exception Class: Kernel (KE)
PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364

Current Executing Process:
[iptables, 859][netdagent, 770]

Backtrace:
[<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130   
[<ffffff80080eb6d4>] notify_die+0x84/0xac      
[<ffffff800808e874>] die+0x1d8/0x3b8   
[<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188     
[<ffffff80080a81b4>] do_page_fault+0x44/0x3b0  
[<ffffff80080a811c>] do_translation_fault+0x44/0x98    
[<ffffff8008080e08>] do_mem_abort+0x4c/0x128   
[<ffffff80080832d0>] el1_da+0x24/0x3c  
[<ffffff8008cccbc0>] xhci_irq+0x728/0x2364     
[<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44     
[<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4     
[<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
[<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0    
[<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738   
[<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
[<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c   
[<ffffffffffffffff>] 0xffffffffffffffff

> The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
> We might want to add some checks to make sure we are not using the removed
> hcd anymore in that codepath anymore.
> 
> -Mathias

Thanks,
Macpaul Lin
Guenter Roeck Feb. 2, 2020, 7:29 p.m. UTC | #3
On Fri, Jan 17, 2020 at 03:41:50PM +0800, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
> 
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
> 
> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> ---
>  drivers/usb/host/xhci-mtk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>  	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>  
>  	usb_remove_hcd(shared_hcd);
> -	xhci->shared_hcd = NULL;
>  	device_init_wakeup(&dev->dev, false);
>  
>  	usb_remove_hcd(hcd);
>  	usb_put_hcd(shared_hcd);
> +	xhci->shared_hcd = NULL;

I may be wrong, but I have some concerns that this replaces
the NULL pointer access with a UAF.

Guenter

>  	usb_put_hcd(hcd);
>  	xhci_mtk_sch_exit(mtk);
>  	xhci_mtk_clks_disable(mtk);
> -- 
> 2.18.0
Mathias Nyman Feb. 4, 2020, 9:44 a.m. UTC | #4
On 1.2.2020 13.20, Macpaul Lin wrote:
> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
>> On 17.1.2020 9.41, Macpaul Lin wrote:
>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
>>> The similar issue has also been found in QC activities in Mediatek.
>>>
>>> Here quote the description from the referenced patch as follows.
>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
>>> at xhci removal") sets xhci_shared_hcd to NULL without
>>> stopping xhci host. This results into a race condition
>>> where shared_hcd (super speed roothub) related interrupts
>>> are being handled with xhci_irq happens when the
>>> xhci_plat_remove is called and shared_hcd is set to NULL.
>>> Fix this by setting the shared_hcd to NULL only after the
>>> controller is halted and no interrupts are generated."
>>>
>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
>>> ---
>>>    drivers/usb/host/xhci-mtk.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
>>> index b18a6baef204..c227c67f5dc5 100644
>>> --- a/drivers/usb/host/xhci-mtk.c
>>> +++ b/drivers/usb/host/xhci-mtk.c
>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>>>    
>>>    	usb_remove_hcd(shared_hcd);
>>> -	xhci->shared_hcd = NULL;
>>>    	device_init_wakeup(&dev->dev, false);
>>>    
>>>    	usb_remove_hcd(hcd);
>>>    	usb_put_hcd(shared_hcd);
>>> +	xhci->shared_hcd = NULL;
>>>    	usb_put_hcd(hcd);
>>>    	xhci_mtk_sch_exit(mtk);
>>>    	xhci_mtk_clks_disable(mtk);
>>>
>>
>> Could you share details of the NULL pointer dereference, (backtrace).
> 
> This bug was found by our QA staff while doing 500 times plug-in and
> plug-out devices. The backtrace I have was recorded by QA and I didn't
> reproduce this issue on my own environment. However, after applied this
> patch the issue seems resolve. Here is the backtrace:
> 
> Exception Class: Kernel (KE)
> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> 
> Current Executing Process:
> [iptables, 859][netdagent, 770]
> 
> Backtrace:
> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> [<ffffff800808e874>] die+0x1d8/0x3b8
> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> [<ffffff80080832d0>] el1_da+0x24/0x3c
> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> [<ffffffffffffffff>] 0xffffffffffffffff
> 

Thanks,
Could you help me find out which line of code xhci_irq+0x728 is in your case.

As Guenter pointed out there is a risk of turning the NULL pointer dereference
into a use after free if we just solve this by setting xhci->shared_hcd = NULL
later.

If you still have that kernel around, and xhci is compiled in:
gdb vmlinux
gdb li *(xhci_irq+0x728)

-Mathias
Macpaul Lin March 4, 2020, 3:16 a.m. UTC | #5
On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> On 1.2.2020 13.20, Macpaul Lin wrote:
> > On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> >> On 17.1.2020 9.41, Macpaul Lin wrote:
> >>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> >>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> >>> The similar issue has also been found in QC activities in Mediatek.
> >>>
> >>> Here quote the description from the referenced patch as follows.
> >>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> >>> at xhci removal") sets xhci_shared_hcd to NULL without
> >>> stopping xhci host. This results into a race condition
> >>> where shared_hcd (super speed roothub) related interrupts
> >>> are being handled with xhci_irq happens when the
> >>> xhci_plat_remove is called and shared_hcd is set to NULL.
> >>> Fix this by setting the shared_hcd to NULL only after the
> >>> controller is halted and no interrupts are generated."
> >>>
> >>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> >>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> >>> ---
> >>>    drivers/usb/host/xhci-mtk.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> >>> index b18a6baef204..c227c67f5dc5 100644
> >>> --- a/drivers/usb/host/xhci-mtk.c
> >>> +++ b/drivers/usb/host/xhci-mtk.c
> >>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> >>>    
> >>>    	usb_remove_hcd(shared_hcd);
> >>> -	xhci->shared_hcd = NULL;
> >>>    	device_init_wakeup(&dev->dev, false);
> >>>    
> >>>    	usb_remove_hcd(hcd);
> >>>    	usb_put_hcd(shared_hcd);
> >>> +	xhci->shared_hcd = NULL;
> >>>    	usb_put_hcd(hcd);
> >>>    	xhci_mtk_sch_exit(mtk);
> >>>    	xhci_mtk_clks_disable(mtk);
> >>>
> >>
> >> Could you share details of the NULL pointer dereference, (backtrace).
> > 
> > This bug was found by our QA staff while doing 500 times plug-in and
> > plug-out devices. The backtrace I have was recorded by QA and I didn't
> > reproduce this issue on my own environment. However, after applied this
> > patch the issue seems resolve. Here is the backtrace:
> > 
> > Exception Class: Kernel (KE)
> > PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> > 
> > Current Executing Process:
> > [iptables, 859][netdagent, 770]
> > 
> > Backtrace:
> > [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > [<ffffff800808e874>] die+0x1d8/0x3b8
> > [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > [<ffffff80080832d0>] el1_da+0x24/0x3c
> > [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> 
> Thanks,
> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> 
> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> later.
> 
> If you still have that kernel around, and xhci is compiled in:
> gdb vmlinux
> gdb li *(xhci_irq+0x728)
> 

Sorry that I couldn't get back to you soon. The internal code version
for this issue was really old and a little bit difficult to rewind to
that version.
However, I think the following dump might be correct for the code base.

(gdb) li *(xhci_irq+0x728)
0xffffff8008cc8634 is in xhci_irq (*stripped*
kernel-4.14/drivers/usb/host/xhci.h:1694).
1689     */
1690    #define XHCI_MAX_REXIT_TIMEOUT_MS       20
1691
1692    static inline unsigned int hcd_index(struct usb_hcd *hcd)
1693    {
1694            if (hcd->speed >= HCD_USB3)
1695                    return 0;
1696            else
1697                    return 1;
1698    }
(gdb)

Thanks
Macpaul Lin
Mathias Nyman March 4, 2020, 2:39 p.m. UTC | #6
On 4.3.2020 5.16, Macpaul Lin wrote:
> On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
>> On 1.2.2020 13.20, Macpaul Lin wrote:
>>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
>>>> On 17.1.2020 9.41, Macpaul Lin wrote:
>>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
>>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
>>>>> The similar issue has also been found in QC activities in Mediatek.
>>>>>
>>>>> Here quote the description from the referenced patch as follows.
>>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
>>>>> at xhci removal") sets xhci_shared_hcd to NULL without
>>>>> stopping xhci host. This results into a race condition
>>>>> where shared_hcd (super speed roothub) related interrupts
>>>>> are being handled with xhci_irq happens when the
>>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
>>>>> Fix this by setting the shared_hcd to NULL only after the
>>>>> controller is halted and no interrupts are generated."
>>>>>
>>>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
>>>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
>>>>> ---
>>>>>    drivers/usb/host/xhci-mtk.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
>>>>> index b18a6baef204..c227c67f5dc5 100644
>>>>> --- a/drivers/usb/host/xhci-mtk.c
>>>>> +++ b/drivers/usb/host/xhci-mtk.c
>>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>>>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
>>>>>    
>>>>>    	usb_remove_hcd(shared_hcd);
>>>>> -	xhci->shared_hcd = NULL;
>>>>>    	device_init_wakeup(&dev->dev, false);
>>>>>    
>>>>>    	usb_remove_hcd(hcd);
>>>>>    	usb_put_hcd(shared_hcd);
>>>>> +	xhci->shared_hcd = NULL;
>>>>>    	usb_put_hcd(hcd);
>>>>>    	xhci_mtk_sch_exit(mtk);
>>>>>    	xhci_mtk_clks_disable(mtk);
>>>>>
>>>>
>>>> Could you share details of the NULL pointer dereference, (backtrace).
>>>
>>> This bug was found by our QA staff while doing 500 times plug-in and
>>> plug-out devices. The backtrace I have was recorded by QA and I didn't
>>> reproduce this issue on my own environment. However, after applied this
>>> patch the issue seems resolve. Here is the backtrace:
>>>
>>> Exception Class: Kernel (KE)
>>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
>>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
>>>
>>> Current Executing Process:
>>> [iptables, 859][netdagent, 770]
>>>
>>> Backtrace:
>>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
>>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
>>> [<ffffff800808e874>] die+0x1d8/0x3b8
>>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
>>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
>>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
>>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
>>> [<ffffff80080832d0>] el1_da+0x24/0x3c
>>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
>>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
>>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
>>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
>>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
>>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
>>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
>>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>
>> Thanks,
>> Could you help me find out which line of code xhci_irq+0x728 is in your case.
>>
>> As Guenter pointed out there is a risk of turning the NULL pointer dereference
>> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
>> later.
>>
>> If you still have that kernel around, and xhci is compiled in:
>> gdb vmlinux
>> gdb li *(xhci_irq+0x728)
>>
> 
> Sorry that I couldn't get back to you soon. The internal code version
> for this issue was really old and a little bit difficult to rewind to
> that version.
> However, I think the following dump might be correct for the code base.
> 
> (gdb) li *(xhci_irq+0x728)
> 0xffffff8008cc8634 is in xhci_irq (*stripped*
> kernel-4.14/drivers/usb/host/xhci.h:1694).
> 1689     */
> 1690    #define XHCI_MAX_REXIT_TIMEOUT_MS       20
> 1691
> 1692    static inline unsigned int hcd_index(struct usb_hcd *hcd)
> 1693    {
> 1694            if (hcd->speed >= HCD_USB3)
> 1695                    return 0;
> 1696            else
> 1697                    return 1;
> 1698    }
> (gdb)
> 
> Thanks
> Macpaul Lin
> 

Ah, it was a 4.14 kernel.
This should be fixed in 4.20 with patch:
1245374e9b83 xhci: handle port status events for removed USB3 hcd

Port arrays/structures were changed completely in 4.18

Something like the below should work for 4.14:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 61fa3007a74a..e7367b9f19c5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
 	if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
 		hcd = xhci->shared_hcd;
 
+	if (!hcd) {
+		xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
+		bogus_port_status = true;
+		goto cleanup;
+	}
+
 	if (major_revision == 0) {
 		xhci_warn(xhci, "Event for port %u not in "
 				"Extended Capabilities, ignoring.\n",


-Mathias
Macpaul Lin March 5, 2020, 2:58 a.m. UTC | #7
On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> On 4.3.2020 5.16, Macpaul Lin wrote:
> > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> >> On 1.2.2020 13.20, Macpaul Lin wrote:
> >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> >>>>> The similar issue has also been found in QC activities in Mediatek.
> >>>>>
> >>>>> Here quote the description from the referenced patch as follows.
> >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> >>>>> stopping xhci host. This results into a race condition
> >>>>> where shared_hcd (super speed roothub) related interrupts
> >>>>> are being handled with xhci_irq happens when the
> >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> >>>>> Fix this by setting the shared_hcd to NULL only after the
> >>>>> controller is halted and no interrupts are generated."
> >>>>>
> >>>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> >>>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> >>>>> ---
> >>>>>    drivers/usb/host/xhci-mtk.c | 2 +-
> >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> >>>>> index b18a6baef204..c227c67f5dc5 100644
> >>>>> --- a/drivers/usb/host/xhci-mtk.c
> >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >>>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> >>>>>    
> >>>>>    	usb_remove_hcd(shared_hcd);
> >>>>> -	xhci->shared_hcd = NULL;
> >>>>>    	device_init_wakeup(&dev->dev, false);
> >>>>>    
> >>>>>    	usb_remove_hcd(hcd);
> >>>>>    	usb_put_hcd(shared_hcd);
> >>>>> +	xhci->shared_hcd = NULL;
> >>>>>    	usb_put_hcd(hcd);
> >>>>>    	xhci_mtk_sch_exit(mtk);
> >>>>>    	xhci_mtk_clks_disable(mtk);
> >>>>>
> >>>>
> >>>> Could you share details of the NULL pointer dereference, (backtrace).
> >>>
> >>> This bug was found by our QA staff while doing 500 times plug-in and
> >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> >>> reproduce this issue on my own environment. However, after applied this
> >>> patch the issue seems resolve. Here is the backtrace:
> >>>
> >>> Exception Class: Kernel (KE)
> >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> >>>
> >>> Current Executing Process:
> >>> [iptables, 859][netdagent, 770]
> >>>
> >>> Backtrace:
> >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>
> >> Thanks,
> >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> >>
> >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> >> later.
> >>
> >> If you still have that kernel around, and xhci is compiled in:
> >> gdb vmlinux
> >> gdb li *(xhci_irq+0x728)
> >>
> > 
> > Sorry that I couldn't get back to you soon. The internal code version
> > for this issue was really old and a little bit difficult to rewind to
> > that version.
> > However, I think the following dump might be correct for the code base.
> > 
> > (gdb) li *(xhci_irq+0x728)
> > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > 1689     */
> > 1690    #define XHCI_MAX_REXIT_TIMEOUT_MS       20
> > 1691
> > 1692    static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > 1693    {
> > 1694            if (hcd->speed >= HCD_USB3)
> > 1695                    return 0;
> > 1696            else
> > 1697                    return 1;
> > 1698    }
> > (gdb)
> > 
> > Thanks
> > Macpaul Lin
> > 
> 
> Ah, it was a 4.14 kernel.
> This should be fixed in 4.20 with patch:
> 1245374e9b83 xhci: handle port status events for removed USB3 hcd
> 
> Port arrays/structures were changed completely in 4.18
> 
> Something like the below should work for 4.14:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 61fa3007a74a..e7367b9f19c5 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
>  	if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
>  		hcd = xhci->shared_hcd;
>  
> +	if (!hcd) {
> +		xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> +		bogus_port_status = true;
> +		goto cleanup;
> +	}
> +
>  	if (major_revision == 0) {
>  		xhci_warn(xhci, "Event for port %u not in "
>  				"Extended Capabilities, ignoring.\n",

Thanks for this suggestion, this is much better! I am sorry that we're
using android kernel that some reported issue might be out of date. I
will update the suggestion into our code base. Thanks!

Regards,
Macpaul Lin
Greg KH March 5, 2020, 6:32 p.m. UTC | #8
On Thu, Mar 05, 2020 at 10:58:46AM +0800, Macpaul Lin wrote:
> On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> > On 4.3.2020 5.16, Macpaul Lin wrote:
> > > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> > >> On 1.2.2020 13.20, Macpaul Lin wrote:
> > >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> > >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> > >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > >>>>> The similar issue has also been found in QC activities in Mediatek.
> > >>>>>
> > >>>>> Here quote the description from the referenced patch as follows.
> > >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> > >>>>> stopping xhci host. This results into a race condition
> > >>>>> where shared_hcd (super speed roothub) related interrupts
> > >>>>> are being handled with xhci_irq happens when the
> > >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> > >>>>> Fix this by setting the shared_hcd to NULL only after the
> > >>>>> controller is halted and no interrupts are generated."
> > >>>>>
> > >>>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> > >>>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> > >>>>> ---
> > >>>>>    drivers/usb/host/xhci-mtk.c | 2 +-
> > >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>
> > >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > >>>>> index b18a6baef204..c227c67f5dc5 100644
> > >>>>> --- a/drivers/usb/host/xhci-mtk.c
> > >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> > >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> > >>>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> > >>>>>    
> > >>>>>    	usb_remove_hcd(shared_hcd);
> > >>>>> -	xhci->shared_hcd = NULL;
> > >>>>>    	device_init_wakeup(&dev->dev, false);
> > >>>>>    
> > >>>>>    	usb_remove_hcd(hcd);
> > >>>>>    	usb_put_hcd(shared_hcd);
> > >>>>> +	xhci->shared_hcd = NULL;
> > >>>>>    	usb_put_hcd(hcd);
> > >>>>>    	xhci_mtk_sch_exit(mtk);
> > >>>>>    	xhci_mtk_clks_disable(mtk);
> > >>>>>
> > >>>>
> > >>>> Could you share details of the NULL pointer dereference, (backtrace).
> > >>>
> > >>> This bug was found by our QA staff while doing 500 times plug-in and
> > >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> > >>> reproduce this issue on my own environment. However, after applied this
> > >>> patch the issue seems resolve. Here is the backtrace:
> > >>>
> > >>> Exception Class: Kernel (KE)
> > >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> > >>>
> > >>> Current Executing Process:
> > >>> [iptables, 859][netdagent, 770]
> > >>>
> > >>> Backtrace:
> > >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> > >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> > >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > >>> [<ffffffffffffffff>] 0xffffffffffffffff
> > >>>
> > >>
> > >> Thanks,
> > >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> > >>
> > >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> > >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> > >> later.
> > >>
> > >> If you still have that kernel around, and xhci is compiled in:
> > >> gdb vmlinux
> > >> gdb li *(xhci_irq+0x728)
> > >>
> > > 
> > > Sorry that I couldn't get back to you soon. The internal code version
> > > for this issue was really old and a little bit difficult to rewind to
> > > that version.
> > > However, I think the following dump might be correct for the code base.
> > > 
> > > (gdb) li *(xhci_irq+0x728)
> > > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > > 1689     */
> > > 1690    #define XHCI_MAX_REXIT_TIMEOUT_MS       20
> > > 1691
> > > 1692    static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > > 1693    {
> > > 1694            if (hcd->speed >= HCD_USB3)
> > > 1695                    return 0;
> > > 1696            else
> > > 1697                    return 1;
> > > 1698    }
> > > (gdb)
> > > 
> > > Thanks
> > > Macpaul Lin
> > > 
> > 
> > Ah, it was a 4.14 kernel.
> > This should be fixed in 4.20 with patch:
> > 1245374e9b83 xhci: handle port status events for removed USB3 hcd
> > 
> > Port arrays/structures were changed completely in 4.18
> > 
> > Something like the below should work for 4.14:
> > 
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 61fa3007a74a..e7367b9f19c5 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
> >  	if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
> >  		hcd = xhci->shared_hcd;
> >  
> > +	if (!hcd) {
> > +		xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> > +		bogus_port_status = true;
> > +		goto cleanup;
> > +	}
> > +
> >  	if (major_revision == 0) {
> >  		xhci_warn(xhci, "Event for port %u not in "
> >  				"Extended Capabilities, ignoring.\n",
> 
> Thanks for this suggestion, this is much better! I am sorry that we're
> using android kernel that some reported issue might be out of date. I
> will update the suggestion into our code base. Thanks!

Should I backport this to 4.14 and older kernels to prevent this issue
from showing up in newer Android devices that are using these older
kernels?

thanks,

greg k-h
Macpaul Lin March 6, 2020, 2:36 a.m. UTC | #9
On Thu, 2020-03-05 at 19:32 +0100, Greg Kroah-Hartman wrote:
> On Thu, Mar 05, 2020 at 10:58:46AM +0800, Macpaul Lin wrote:
> > On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> > > On 4.3.2020 5.16, Macpaul Lin wrote:
> > > > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> > > >> On 1.2.2020 13.20, Macpaul Lin wrote:
> > > >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> > > >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> > > >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > > >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > > >>>>> The similar issue has also been found in QC activities in Mediatek.
> > > >>>>>
> > > >>>>> Here quote the description from the referenced patch as follows.
> > > >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > > >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> > > >>>>> stopping xhci host. This results into a race condition
> > > >>>>> where shared_hcd (super speed roothub) related interrupts
> > > >>>>> are being handled with xhci_irq happens when the
> > > >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> > > >>>>> Fix this by setting the shared_hcd to NULL only after the
> > > >>>>> controller is halted and no interrupts are generated."
> > > >>>>>
> > > >>>>> Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
> > > >>>>> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
> > > >>>>> ---
> > > >>>>>    drivers/usb/host/xhci-mtk.c | 2 +-
> > > >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>>>
> > > >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > > >>>>> index b18a6baef204..c227c67f5dc5 100644
> > > >>>>> --- a/drivers/usb/host/xhci-mtk.c
> > > >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> > > >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> > > >>>>>    	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
> > > >>>>>    
> > > >>>>>    	usb_remove_hcd(shared_hcd);
> > > >>>>> -	xhci->shared_hcd = NULL;
> > > >>>>>    	device_init_wakeup(&dev->dev, false);
> > > >>>>>    
> > > >>>>>    	usb_remove_hcd(hcd);
> > > >>>>>    	usb_put_hcd(shared_hcd);
> > > >>>>> +	xhci->shared_hcd = NULL;
> > > >>>>>    	usb_put_hcd(hcd);
> > > >>>>>    	xhci_mtk_sch_exit(mtk);
> > > >>>>>    	xhci_mtk_clks_disable(mtk);
> > > >>>>>
> > > >>>>
> > > >>>> Could you share details of the NULL pointer dereference, (backtrace).
> > > >>>
> > > >>> This bug was found by our QA staff while doing 500 times plug-in and
> > > >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> > > >>> reproduce this issue on my own environment. However, after applied this
> > > >>> patch the issue seems resolve. Here is the backtrace:
> > > >>>
> > > >>> Exception Class: Kernel (KE)
> > > >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > > >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> > > >>>
> > > >>> Current Executing Process:
> > > >>> [iptables, 859][netdagent, 770]
> > > >>>
> > > >>> Backtrace:
> > > >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > > >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > > >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> > > >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > > >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > > >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > > >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > > >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> > > >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > > >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > > >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > > >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > > >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > > >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > > >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > > >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > > >>> [<ffffffffffffffff>] 0xffffffffffffffff
> > > >>>
> > > >>
> > > >> Thanks,
> > > >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> > > >>
> > > >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> > > >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> > > >> later.
> > > >>
> > > >> If you still have that kernel around, and xhci is compiled in:
> > > >> gdb vmlinux
> > > >> gdb li *(xhci_irq+0x728)
> > > >>
> > > > 
> > > > Sorry that I couldn't get back to you soon. The internal code version
> > > > for this issue was really old and a little bit difficult to rewind to
> > > > that version.
> > > > However, I think the following dump might be correct for the code base.
> > > > 
> > > > (gdb) li *(xhci_irq+0x728)
> > > > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > > > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > > > 1689     */
> > > > 1690    #define XHCI_MAX_REXIT_TIMEOUT_MS       20
> > > > 1691
> > > > 1692    static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > > > 1693    {
> > > > 1694            if (hcd->speed >= HCD_USB3)
> > > > 1695                    return 0;
> > > > 1696            else
> > > > 1697                    return 1;
> > > > 1698    }
> > > > (gdb)
> > > > 
> > > > Thanks
> > > > Macpaul Lin
> > > > 
> > > 
> > > Ah, it was a 4.14 kernel.
> > > This should be fixed in 4.20 with patch:
> > > 1245374e9b83 xhci: handle port status events for removed USB3 hcd
> > > 
> > > Port arrays/structures were changed completely in 4.18
> > > 
> > > Something like the below should work for 4.14:
> > > 
> > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > > index 61fa3007a74a..e7367b9f19c5 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
> > >  	if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
> > >  		hcd = xhci->shared_hcd;
> > >  
> > > +	if (!hcd) {
> > > +		xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> > > +		bogus_port_status = true;
> > > +		goto cleanup;
> > > +	}
> > > +
> > >  	if (major_revision == 0) {
> > >  		xhci_warn(xhci, "Event for port %u not in "
> > >  				"Extended Capabilities, ignoring.\n",
> > 
> > Thanks for this suggestion, this is much better! I am sorry that we're
> > using android kernel that some reported issue might be out of date. I
> > will update the suggestion into our code base. Thanks!
> 
> Should I backport this to 4.14 and older kernels to prevent this issue
> from showing up in newer Android devices that are using these older
> kernels?
> 
> thanks,
> 
> greg k-h

If this could be backported to older kernel that will be great for newer
Android devices. Some of the shipping devices will have requirement of
kernel upgrade. Hence if you could backport this patch will be great.

Thanks!

Regards,
Macpaul Lin

Patch
diff mbox series

diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
index b18a6baef204..c227c67f5dc5 100644
--- a/drivers/usb/host/xhci-mtk.c
+++ b/drivers/usb/host/xhci-mtk.c
@@ -593,11 +593,11 @@  static int xhci_mtk_remove(struct platform_device *dev)
 	struct usb_hcd  *shared_hcd = xhci->shared_hcd;
 
 	usb_remove_hcd(shared_hcd);
-	xhci->shared_hcd = NULL;
 	device_init_wakeup(&dev->dev, false);
 
 	usb_remove_hcd(hcd);
 	usb_put_hcd(shared_hcd);
+	xhci->shared_hcd = NULL;
 	usb_put_hcd(hcd);
 	xhci_mtk_sch_exit(mtk);
 	xhci_mtk_clks_disable(mtk);