diff mbox series

[RFT] xhci: Fix memory leak when caching protocol extended capability PSI tables

Message ID 20200108151730.21022-1-mathias.nyman@linux.intel.com (mailing list archive)
State Superseded
Headers show
Series [RFT] xhci: Fix memory leak when caching protocol extended capability PSI tables | expand

Commit Message

Mathias Nyman Jan. 8, 2020, 3:17 p.m. UTC
xhci driver assumed that xHC controllers have at most one custom
supported speed table (PSI) for all usb 3.x ports.
Memory was allocated for one PSI table under the xhci hub structure.

Turns out this is not the case, some controllers have a separate
"supported protocol capability" entry with a PSI table for each port.
This means each usb3 port can in theory support different custom speeds.

To solve this cache all supported protocol capabilities with their PSI
tables in an array, and add pointers to the xhci port structure so that
every port points to its capability entry in the array.

When creating the SuperSpeedPlus USB Device Capability BOS descriptor
for the xhci USB 3.1 roothub we for now will use only data from the
first USB 3.1 capable protocol capability entry in the array.
This could be improved later, this patch focuses resolving
the memory leak.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-hub.c | 25 +++++++++++-----
 drivers/usb/host/xhci-mem.c | 60 +++++++++++++++++++++++--------------
 drivers/usb/host/xhci.h     | 14 +++++++--
 3 files changed, 66 insertions(+), 33 deletions(-)

Comments

Greg Kroah-Hartman Jan. 8, 2020, 3:40 p.m. UTC | #1
On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
> 
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 port can in theory support different custom speeds.

Is there a "max" number of port capabilities that can happen?  Or this
this truely dynamic?

> +	for (i = 0; i < xhci->num_port_caps; i++) {
> +		kfree(xhci->port_caps[i].psi);
> +		xhci->port_caps[i].psi = NULL;
> +	}

Nit, no need to set to NULL here :)

thanks,

greg k-h
Mathias Nyman Jan. 8, 2020, 3:56 p.m. UTC | #2
On 8.1.2020 17.40, Greg KH wrote:
> On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote:
>> xhci driver assumed that xHC controllers have at most one custom
>> supported speed table (PSI) for all usb 3.x ports.
>> Memory was allocated for one PSI table under the xhci hub structure.
>>
>> Turns out this is not the case, some controllers have a separate
>> "supported protocol capability" entry with a PSI table for each port.
>> This means each usb3 port can in theory support different custom speeds.
> 
> Is there a "max" number of port capabilities that can happen?  Or this
> this truely dynamic?

Almost truly dynamic, each capability points to the next, last points to 0

But we can't have more "supported protocol capabilities" than xHC ports.
(MaxPorts value in xHC HCSPARAMS1 register)

> 
>> +	for (i = 0; i < xhci->num_port_caps; i++) {
>> +		kfree(xhci->port_caps[i].psi);
>> +		xhci->port_caps[i].psi = NULL;
>> +	}
> 
> Nit, no need to set to NULL here :)

Thanks, will remove that

-Mathias
Marek Szyprowski Feb. 11, 2020, 10:56 a.m. UTC | #3
Hi

On 08.01.2020 16:17, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
>
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 port can in theory support different custom speeds.
>
> To solve this cache all supported protocol capabilities with their PSI
> tables in an array, and add pointers to the xhci port structure so that
> every port points to its capability entry in the array.
>
> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> for the xhci USB 3.1 roothub we for now will use only data from the
> first USB 3.1 capable protocol capability entry in the array.
> This could be improved later, this patch focuses resolving
> the memory leak.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

This patch landed in today's linux-next (20200211) and causes NULL 
pointer dereference during second suspend/resume cycle on Samsung 
Exynos5422-based (arm 32bit) Odroid XU3lite board:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
PM: suspend entry (deep)
Filesystems sync: 0.012 seconds
Freezing user space processes ... (elapsed 0.010 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
wake enabled for irq 153
wake enabled for irq 158
samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
mask: 0xffffffe7
Disabling non-boot CPUs ...
IRQ 51: no longer affine to CPU1
IRQ 52: no longer affine to CPU2
s3c2410-wdt 101d0000.watchdog: watchdog disabled
wake disabled for irq 158
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
wake disabled for irq 153
exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
usb usb3: root hub lost power or was reset
s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
usb usb4: root hub lost power or was reset
xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
PM: Device xhci-hcd.8.auto failed to resume async: error -12
hub 3-0:1.0: hub_ext_port_status failed (err = -32)
hub 4-0:1.0: hub_ext_port_status failed (err = -32)
usb 1-1: reset high-speed USB device number 2 using exynos-ehci
usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
OOM killer enabled.
Restarting tasks ... done.

real    0m11.890s
user    0m0.001s
sys     0m0.679s
root@target:~# PM: suspend exit
mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, 
actual 396825HZ div = 63)
mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, 
actual 200000000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, 
actual 50000000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, 
actual 200000000HZ div = 1)
smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1

root@target:~#
root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling

rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
PM: suspend entry (deep)
Filesystems sync: 0.004 seconds
Freezing user space processes ... (elapsed 0.006 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
hub 4-0:1.0: hub_ext_port_status failed (err = -32)
hub 3-0:1.0: hub_ext_port_status failed (err = -32)
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000014
pgd = 4c26b54b
[00000014] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 
5.6.0-rc1-next-20200211 #268
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound async_run_entry_fn
PC is at xhci_suspend+0x12c/0x520
LR is at 0xa6aa9898
pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
sp : ec401df8  ip : 0000001a  fp : c12e7864
r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
Stack: (0xec401df8 to 0xec402000)
...
[<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
[<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] 
(__device_suspend+0x134/0x7e8)
[<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
[<c061c42c>] (async_suspend) from [<c0154bd0>] 
(async_run_entry_fn+0x48/0x1b8)
[<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] 
(process_one_work+0x230/0x7bc)
[<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
[<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
[<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xec401fb0 to 0xec401ff8)
...
---[ end trace c72caf6487666442 ]---
note: kworker/u16:23[1468] exited with preempt_count 1

Reverting it fixes the NULL pointer issue. I can provide more 
information or do some other tests. Just let me know what will help to 
fix it.

 > ...

Best regards
Greg Kroah-Hartman Feb. 11, 2020, 12:23 p.m. UTC | #4
On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
> Hi
> 
> On 08.01.2020 16:17, Mathias Nyman wrote:
> > xhci driver assumed that xHC controllers have at most one custom
> > supported speed table (PSI) for all usb 3.x ports.
> > Memory was allocated for one PSI table under the xhci hub structure.
> >
> > Turns out this is not the case, some controllers have a separate
> > "supported protocol capability" entry with a PSI table for each port.
> > This means each usb3 port can in theory support different custom speeds.
> >
> > To solve this cache all supported protocol capabilities with their PSI
> > tables in an array, and add pointers to the xhci port structure so that
> > every port points to its capability entry in the array.
> >
> > When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> > for the xhci USB 3.1 roothub we for now will use only data from the
> > first USB 3.1 capable protocol capability entry in the array.
> > This could be improved later, this patch focuses resolving
> > the memory leak.
> >
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com>
> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> 
> This patch landed in today's linux-next (20200211) and causes NULL 
> pointer dereference during second suspend/resume cycle on Samsung 
> Exynos5422-based (arm 32bit) Odroid XU3lite board:
> 
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.012 seconds
> Freezing user space processes ... (elapsed 0.010 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
> wake enabled for irq 153
> wake enabled for irq 158
> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
> mask: 0xffffffe7
> Disabling non-boot CPUs ...
> IRQ 51: no longer affine to CPU1
> IRQ 52: no longer affine to CPU2
> s3c2410-wdt 101d0000.watchdog: watchdog disabled
> wake disabled for irq 158
> usb usb1: root hub lost power or was reset
> usb usb2: root hub lost power or was reset
> wake disabled for irq 153
> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
> usb usb3: root hub lost power or was reset
> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
> usb usb4: root hub lost power or was reset
> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
> PM: Device xhci-hcd.8.auto failed to resume async: error -12
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
> OOM killer enabled.
> Restarting tasks ... done.
> 
> real    0m11.890s
> user    0m0.001s
> sys     0m0.679s
> root@target:~# PM: suspend exit
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, 
> actual 396825HZ div = 63)
> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, 
> actual 200000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, 
> actual 50000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, 
> actual 200000000HZ div = 1)
> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
> 
> root@target:~#
> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
> 
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.004 seconds
> Freezing user space processes ... (elapsed 0.006 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> 8<--- cut here ---
> Unable to handle kernel NULL pointer dereference at virtual address 00000014
> pgd = 4c26b54b
> [00000014] *pgd=00000000
> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 
> 5.6.0-rc1-next-20200211 #268
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> PC is at xhci_suspend+0x12c/0x520
> LR is at 0xa6aa9898
> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
> sp : ec401df8  ip : 0000001a  fp : c12e7864
> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
> Stack: (0xec401df8 to 0xec402000)
> ...
> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] 
> (__device_suspend+0x134/0x7e8)
> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
> [<c061c42c>] (async_suspend) from [<c0154bd0>] 
> (async_run_entry_fn+0x48/0x1b8)
> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] 
> (process_one_work+0x230/0x7bc)
> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xec401fb0 to 0xec401ff8)
> ...
> ---[ end trace c72caf6487666442 ]---
> note: kworker/u16:23[1468] exited with preempt_count 1
> 
> Reverting it fixes the NULL pointer issue. I can provide more 
> information or do some other tests. Just let me know what will help to 
> fix it.
> 
>  > ...

Ugh.  Mathias, should I just revert this for now?

thanks,

greg k-h
Mathias Nyman Feb. 11, 2020, 12:29 p.m. UTC | #5
On 11.2.2020 14.23, Greg KH wrote:
> On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
>> Hi
>>
>> On 08.01.2020 16:17, Mathias Nyman wrote:
>>> xhci driver assumed that xHC controllers have at most one custom
>>> supported speed table (PSI) for all usb 3.x ports.
>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>
>>> Turns out this is not the case, some controllers have a separate
>>> "supported protocol capability" entry with a PSI table for each port.
>>> This means each usb3 port can in theory support different custom speeds.
>>>
>>> To solve this cache all supported protocol capabilities with their PSI
>>> tables in an array, and add pointers to the xhci port structure so that
>>> every port points to its capability entry in the array.
>>>
>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>> first USB 3.1 capable protocol capability entry in the array.
>>> This could be improved later, this patch focuses resolving
>>> the memory leak.
>>>
>>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
>>> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com>
>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>
>> This patch landed in today's linux-next (20200211) and causes NULL 
>> pointer dereference during second suspend/resume cycle on Samsung 
>> Exynos5422-based (arm 32bit) Odroid XU3lite board:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
>> PM: suspend entry (deep)
>> Filesystems sync: 0.012 seconds
>> Freezing user space processes ... (elapsed 0.010 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
>> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
>> wake enabled for irq 153
>> wake enabled for irq 158
>> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
>> mask: 0xffffffe7
>> Disabling non-boot CPUs ...
>> IRQ 51: no longer affine to CPU1
>> IRQ 52: no longer affine to CPU2
>> s3c2410-wdt 101d0000.watchdog: watchdog disabled
>> wake disabled for irq 158
>> usb usb1: root hub lost power or was reset
>> usb usb2: root hub lost power or was reset
>> wake disabled for irq 153
>> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
>> usb usb3: root hub lost power or was reset
>> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
>> usb usb4: root hub lost power or was reset
>> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
>> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
>> PM: Device xhci-hcd.8.auto failed to resume async: error -12
>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
>> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
>> OOM killer enabled.
>> Restarting tasks ... done.
>>
>> real    0m11.890s
>> user    0m0.001s
>> sys     0m0.679s
>> root@target:~# PM: suspend exit
>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, 
>> actual 396825HZ div = 63)
>> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, 
>> actual 200000000HZ div = 0)
>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, 
>> actual 50000000HZ div = 0)
>> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, 
>> actual 200000000HZ div = 1)
>> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
>>
>> root@target:~#
>> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
>>
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
>> PM: suspend entry (deep)
>> Filesystems sync: 0.004 seconds
>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>> 8<--- cut here ---
>> Unable to handle kernel NULL pointer dereference at virtual address 00000014
>> pgd = 4c26b54b
>> [00000014] *pgd=00000000
>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
>> Modules linked in:
>> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 
>> 5.6.0-rc1-next-20200211 #268
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>> Workqueue: events_unbound async_run_entry_fn
>> PC is at xhci_suspend+0x12c/0x520
>> LR is at 0xa6aa9898
>> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
>> sp : ec401df8  ip : 0000001a  fp : c12e7864
>> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
>> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
>> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
>> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
>> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
>> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
>> Stack: (0xec401df8 to 0xec402000)
>> ...
>> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
>> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] 
>> (__device_suspend+0x134/0x7e8)
>> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
>> [<c061c42c>] (async_suspend) from [<c0154bd0>] 
>> (async_run_entry_fn+0x48/0x1b8)
>> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] 
>> (process_one_work+0x230/0x7bc)
>> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
>> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
>> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>> Exception stack(0xec401fb0 to 0xec401ff8)
>> ...
>> ---[ end trace c72caf6487666442 ]---
>> note: kworker/u16:23[1468] exited with preempt_count 1
>>
>> Reverting it fixes the NULL pointer issue. I can provide more 
>> information or do some other tests. Just let me know what will help to 
>> fix it.
>>
>>  > ...
> 
> Ugh.  Mathias, should I just revert this for now?
> 

Yes, revert it.

This looks very odd, after second resume, and losing power driver
can't find any port at all.

Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?"
message on second resume after reverting the patch?

-Mathias
Mathias Nyman Feb. 11, 2020, 2:08 p.m. UTC | #6
On 11.2.2020 14.29, Mathias Nyman wrote:
> On 11.2.2020 14.23, Greg KH wrote:
>> On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
>>> Hi
>>>
>>> On 08.01.2020 16:17, Mathias Nyman wrote:
>>>> xhci driver assumed that xHC controllers have at most one custom
>>>> supported speed table (PSI) for all usb 3.x ports.
>>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>>
>>>> Turns out this is not the case, some controllers have a separate
>>>> "supported protocol capability" entry with a PSI table for each port.
>>>> This means each usb3 port can in theory support different custom speeds.
>>>>
>>>> To solve this cache all supported protocol capabilities with their PSI
>>>> tables in an array, and add pointers to the xhci port structure so that
>>>> every port points to its capability entry in the array.
>>>>
>>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>>> first USB 3.1 capable protocol capability entry in the array.
>>>> This could be improved later, this patch focuses resolving
>>>> the memory leak.
>>>>
>>>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
>>>> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com>
>>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>
>>> This patch landed in today's linux-next (20200211) and causes NULL 
>>> pointer dereference during second suspend/resume cycle on Samsung 
>>> Exynos5422-based (arm 32bit) Odroid XU3lite board:
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.012 seconds
>>> Freezing user space processes ... (elapsed 0.010 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
>>> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
>>> wake enabled for irq 153
>>> wake enabled for irq 158
>>> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
>>> mask: 0xffffffe7
>>> Disabling non-boot CPUs ...
>>> IRQ 51: no longer affine to CPU1
>>> IRQ 52: no longer affine to CPU2
>>> s3c2410-wdt 101d0000.watchdog: watchdog disabled
>>> wake disabled for irq 158
>>> usb usb1: root hub lost power or was reset
>>> usb usb2: root hub lost power or was reset
>>> wake disabled for irq 153
>>> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
>>> usb usb3: root hub lost power or was reset
>>> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
>>> usb usb4: root hub lost power or was reset
>>> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
>>> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
>>> PM: Device xhci-hcd.8.auto failed to resume async: error -12
>>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>>> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
>>> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
>>> OOM killer enabled.
>>> Restarting tasks ... done.
>>>
>>> real    0m11.890s
>>> user    0m0.001s
>>> sys     0m0.679s
>>> root@target:~# PM: suspend exit
>>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, 
>>> actual 396825HZ div = 63)
>>> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, 
>>> actual 200000000HZ div = 0)
>>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, 
>>> actual 50000000HZ div = 0)
>>> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, 
>>> actual 200000000HZ div = 1)
>>> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
>>>
>>> root@target:~#
>>> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
>>>
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.004 seconds
>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
>>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>>> 8<--- cut here ---
>>> Unable to handle kernel NULL pointer dereference at virtual address 00000014
>>> pgd = 4c26b54b
>>> [00000014] *pgd=00000000
>>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
>>> Modules linked in:
>>> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 
>>> 5.6.0-rc1-next-20200211 #268
>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>> Workqueue: events_unbound async_run_entry_fn
>>> PC is at xhci_suspend+0x12c/0x520
>>> LR is at 0xa6aa9898
>>> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
>>> sp : ec401df8  ip : 0000001a  fp : c12e7864
>>> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
>>> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
>>> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
>>> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
>>> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
>>> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
>>> Stack: (0xec401df8 to 0xec402000)
>>> ...
>>> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
>>> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] 
>>> (__device_suspend+0x134/0x7e8)
>>> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
>>> [<c061c42c>] (async_suspend) from [<c0154bd0>] 
>>> (async_run_entry_fn+0x48/0x1b8)
>>> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] 
>>> (process_one_work+0x230/0x7bc)
>>> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
>>> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
>>> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>> Exception stack(0xec401fb0 to 0xec401ff8)
>>> ...
>>> ---[ end trace c72caf6487666442 ]---
>>> note: kworker/u16:23[1468] exited with preempt_count 1
>>>
>>> Reverting it fixes the NULL pointer issue. I can provide more 
>>> information or do some other tests. Just let me know what will help to 
>>> fix it.
>>>
>>>  > ...
>>
>> Ugh.  Mathias, should I just revert this for now?
>>
> 
> Yes, revert it.
> 
> This looks very odd, after second resume, and losing power driver
> can't find any port at all.
> 
> Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?"
> message on second resume after reverting the patch?
> 

Ok, I think I got it. 
Patch doesn't set xhci->num_port_caps to 0 in xhci_mem_cleanup().

Adding new ports will fail when we reinitialize xhci manually, like in this
exynos case where xhci loses power in suspend/resume cycle.  

I'll post a new version soon

-Mathias
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 7a3a29e5e9d2..0974eebd28e7 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -55,6 +55,7 @@  static u8 usb_bos_descriptor [] = {
 static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
 				     u16 wLength)
 {
+	struct xhci_port_cap *port_cap;
 	int i, ssa_count;
 	u32 temp;
 	u16 desc_size, ssp_cap_size, ssa_size = 0;
@@ -64,16 +65,24 @@  static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
 	ssp_cap_size = sizeof(usb_bos_descriptor) - desc_size;
 
 	/* does xhci support USB 3.1 Enhanced SuperSpeed */
-	if (xhci->usb3_rhub.min_rev >= 0x01) {
+	for (i = 0; i < xhci->num_port_caps; i++) {
+		if (xhci->port_caps[i].maj_rev == 0x03 &&
+		    xhci->port_caps[i].min_rev >= 0x01) {
+			usb3_1 = true;
+			port_cap = &xhci->port_caps[i];
+			break;
+		}
+	}
+
+	if (usb3_1) {
 		/* does xhci provide a PSI table for SSA speed attributes? */
-		if (xhci->usb3_rhub.psi_count) {
+		if (port_cap->psi_count) {
 			/* two SSA entries for each unique PSI ID, RX and TX */
-			ssa_count = xhci->usb3_rhub.psi_uid_count * 2;
+			ssa_count = port_cap->psi_uid_count * 2;
 			ssa_size = ssa_count * sizeof(u32);
 			ssp_cap_size -= 16; /* skip copying the default SSA */
 		}
 		desc_size += ssp_cap_size;
-		usb3_1 = true;
 	}
 	memcpy(buf, &usb_bos_descriptor, min(desc_size, wLength));
 
@@ -99,7 +108,7 @@  static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
 	}
 
 	/* If PSI table exists, add the custom speed attributes from it */
-	if (usb3_1 && xhci->usb3_rhub.psi_count) {
+	if (usb3_1 && port_cap->psi_count) {
 		u32 ssp_cap_base, bm_attrib, psi, psi_mant, psi_exp;
 		int offset;
 
@@ -111,7 +120,7 @@  static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
 
 		/* attribute count SSAC bits 4:0 and ID count SSIC bits 8:5 */
 		bm_attrib = (ssa_count - 1) & 0x1f;
-		bm_attrib |= (xhci->usb3_rhub.psi_uid_count - 1) << 5;
+		bm_attrib |= (port_cap->psi_uid_count - 1) << 5;
 		put_unaligned_le32(bm_attrib, &buf[ssp_cap_base + 4]);
 
 		if (wLength < desc_size + ssa_size)
@@ -124,8 +133,8 @@  static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
 		 * USB 3.1 requires two SSA entries (RX and TX) for every link
 		 */
 		offset = desc_size;
-		for (i = 0; i < xhci->usb3_rhub.psi_count; i++) {
-			psi = xhci->usb3_rhub.psi[i];
+		for (i = 0; i < port_cap->psi_count; i++) {
+			psi = port_cap->psi[i];
 			psi &= ~USB_SSP_SUBLINK_SPEED_RSVD;
 			psi_exp = XHCI_EXT_PORT_PSIE(psi);
 			psi_mant = XHCI_EXT_PORT_PSIM(psi);
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3b1388fa2f36..cf4d27774a7d 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1909,17 +1909,18 @@  void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->usb3_rhub.num_ports = 0;
 	xhci->num_active_eps = 0;
 	kfree(xhci->usb2_rhub.ports);
-	kfree(xhci->usb2_rhub.psi);
 	kfree(xhci->usb3_rhub.ports);
-	kfree(xhci->usb3_rhub.psi);
 	kfree(xhci->hw_ports);
 	kfree(xhci->rh_bw);
 	kfree(xhci->ext_caps);
+	for (i = 0; i < xhci->num_port_caps; i++) {
+		kfree(xhci->port_caps[i].psi);
+		xhci->port_caps[i].psi = NULL;
+	}
+	kfree(xhci->port_caps);
 
 	xhci->usb2_rhub.ports = NULL;
-	xhci->usb2_rhub.psi = NULL;
 	xhci->usb3_rhub.ports = NULL;
-	xhci->usb3_rhub.psi = NULL;
 	xhci->hw_ports = NULL;
 	xhci->rh_bw = NULL;
 	xhci->ext_caps = NULL;
@@ -2120,6 +2121,7 @@  static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
 	u8 major_revision, minor_revision;
 	struct xhci_hub *rhub;
 	struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
+	struct xhci_port_cap *port_cap;
 
 	temp = readl(addr);
 	major_revision = XHCI_EXT_PORT_MAJOR(temp);
@@ -2154,31 +2156,39 @@  static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
 		/* WTF? "Valid values are ‘1’ to MaxPorts" */
 		return;
 
-	rhub->psi_count = XHCI_EXT_PORT_PSIC(temp);
-	if (rhub->psi_count) {
-		rhub->psi = kcalloc_node(rhub->psi_count, sizeof(*rhub->psi),
-				    GFP_KERNEL, dev_to_node(dev));
-		if (!rhub->psi)
-			rhub->psi_count = 0;
+	port_cap = &xhci->port_caps[xhci->num_port_caps++];
+	if (xhci->num_port_caps > max_caps)
+		return;
+
+	port_cap->maj_rev = major_revision;
+	port_cap->min_rev = minor_revision;
+	port_cap->psi_count = XHCI_EXT_PORT_PSIC(temp);
+
+	if (port_cap->psi_count) {
+		port_cap->psi = kcalloc_node(port_cap->psi_count,
+					     sizeof(*port_cap->psi),
+					     GFP_KERNEL, dev_to_node(dev));
+		if (!port_cap->psi)
+			port_cap->psi_count = 0;
 
-		rhub->psi_uid_count++;
-		for (i = 0; i < rhub->psi_count; i++) {
-			rhub->psi[i] = readl(addr + 4 + i);
+		port_cap->psi_uid_count++;
+		for (i = 0; i < port_cap->psi_count; i++) {
+			port_cap->psi[i] = readl(addr + 4 + i);
 
 			/* count unique ID values, two consecutive entries can
 			 * have the same ID if link is assymetric
 			 */
-			if (i && (XHCI_EXT_PORT_PSIV(rhub->psi[i]) !=
-				  XHCI_EXT_PORT_PSIV(rhub->psi[i - 1])))
-				rhub->psi_uid_count++;
+			if (i && (XHCI_EXT_PORT_PSIV(port_cap->psi[i]) !=
+				  XHCI_EXT_PORT_PSIV(port_cap->psi[i - 1])))
+				port_cap->psi_uid_count++;
 
 			xhci_dbg(xhci, "PSIV:%d PSIE:%d PLT:%d PFD:%d LP:%d PSIM:%d\n",
-				  XHCI_EXT_PORT_PSIV(rhub->psi[i]),
-				  XHCI_EXT_PORT_PSIE(rhub->psi[i]),
-				  XHCI_EXT_PORT_PLT(rhub->psi[i]),
-				  XHCI_EXT_PORT_PFD(rhub->psi[i]),
-				  XHCI_EXT_PORT_LP(rhub->psi[i]),
-				  XHCI_EXT_PORT_PSIM(rhub->psi[i]));
+				  XHCI_EXT_PORT_PSIV(port_cap->psi[i]),
+				  XHCI_EXT_PORT_PSIE(port_cap->psi[i]),
+				  XHCI_EXT_PORT_PLT(port_cap->psi[i]),
+				  XHCI_EXT_PORT_PFD(port_cap->psi[i]),
+				  XHCI_EXT_PORT_LP(port_cap->psi[i]),
+				  XHCI_EXT_PORT_PSIM(port_cap->psi[i]));
 		}
 	}
 	/* cache usb2 port capabilities */
@@ -2213,6 +2223,7 @@  static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
 			continue;
 		}
 		hw_port->rhub = rhub;
+		hw_port->port_cap = port_cap;
 		rhub->num_ports++;
 	}
 	/* FIXME: Should we disable ports not in the Extended Capabilities? */
@@ -2303,6 +2314,11 @@  static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags)
 	if (!xhci->ext_caps)
 		return -ENOMEM;
 
+	xhci->port_caps = kcalloc_node(cap_count, sizeof(*xhci->port_caps),
+				flags, dev_to_node(dev));
+	if (!xhci->port_caps)
+		return -ENOMEM;
+
 	offset = cap_start;
 
 	while (offset) {
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 13d8838cd552..3ecee10fdcdc 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1702,12 +1702,20 @@  struct xhci_bus_state {
  * Intel Lynx Point LP xHCI host.
  */
 #define	XHCI_MAX_REXIT_TIMEOUT_MS	20
+struct xhci_port_cap {
+	u32			*psi;	/* array of protocol speed ID entries */
+	u8			psi_count;
+	u8			psi_uid_count;
+	u8			maj_rev;
+	u8			min_rev;
+};
 
 struct xhci_port {
 	__le32 __iomem		*addr;
 	int			hw_portnum;
 	int			hcd_portnum;
 	struct xhci_hub		*rhub;
+	struct xhci_port_cap	*port_cap;
 };
 
 struct xhci_hub {
@@ -1719,9 +1727,6 @@  struct xhci_hub {
 	/* supported prococol extended capabiliy values */
 	u8			maj_rev;
 	u8			min_rev;
-	u32			*psi;	/* array of protocol speed ID entries */
-	u8			psi_count;
-	u8			psi_uid_count;
 };
 
 /* There is one xhci_hcd structure per controller */
@@ -1880,6 +1885,9 @@  struct xhci_hcd {
 	/* cached usb2 extened protocol capabilites */
 	u32                     *ext_caps;
 	unsigned int            num_ext_caps;
+	/* cached extended protocol port capabilities */
+	struct xhci_port_cap	*port_caps;
+	unsigned int		num_port_caps;
 	/* Compliance Mode Recovery Data */
 	struct timer_list	comp_mode_recovery_timer;
 	u32			port_status_u0;