Message ID | 20200108151730.21022-1-mathias.nyman@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFT] xhci: Fix memory leak when caching protocol extended capability PSI tables | expand |
On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote: > xhci driver assumed that xHC controllers have at most one custom > supported speed table (PSI) for all usb 3.x ports. > Memory was allocated for one PSI table under the xhci hub structure. > > Turns out this is not the case, some controllers have a separate > "supported protocol capability" entry with a PSI table for each port. > This means each usb3 port can in theory support different custom speeds. Is there a "max" number of port capabilities that can happen? Or this this truely dynamic? > + for (i = 0; i < xhci->num_port_caps; i++) { > + kfree(xhci->port_caps[i].psi); > + xhci->port_caps[i].psi = NULL; > + } Nit, no need to set to NULL here :) thanks, greg k-h
On 8.1.2020 17.40, Greg KH wrote: > On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote: >> xhci driver assumed that xHC controllers have at most one custom >> supported speed table (PSI) for all usb 3.x ports. >> Memory was allocated for one PSI table under the xhci hub structure. >> >> Turns out this is not the case, some controllers have a separate >> "supported protocol capability" entry with a PSI table for each port. >> This means each usb3 port can in theory support different custom speeds. > > Is there a "max" number of port capabilities that can happen? Or this > this truely dynamic? Almost truly dynamic, each capability points to the next, last points to 0 But we can't have more "supported protocol capabilities" than xHC ports. (MaxPorts value in xHC HCSPARAMS1 register) > >> + for (i = 0; i < xhci->num_port_caps; i++) { >> + kfree(xhci->port_caps[i].psi); >> + xhci->port_caps[i].psi = NULL; >> + } > > Nit, no need to set to NULL here :) Thanks, will remove that -Mathias
Hi On 08.01.2020 16:17, Mathias Nyman wrote: > xhci driver assumed that xHC controllers have at most one custom > supported speed table (PSI) for all usb 3.x ports. > Memory was allocated for one PSI table under the xhci hub structure. > > Turns out this is not the case, some controllers have a separate > "supported protocol capability" entry with a PSI table for each port. > This means each usb3 port can in theory support different custom speeds. > > To solve this cache all supported protocol capabilities with their PSI > tables in an array, and add pointers to the xhci port structure so that > every port points to its capability entry in the array. > > When creating the SuperSpeedPlus USB Device Capability BOS descriptor > for the xhci USB 3.1 roothub we for now will use only data from the > first USB 3.1 capable protocol capability entry in the array. > This could be improved later, this patch focuses resolving > the memory leak. > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> > Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> This patch landed in today's linux-next (20200211) and causes NULL pointer dereference during second suspend/resume cycle on Samsung Exynos5422-based (arm 32bit) Odroid XU3lite board: # time rtcwake -s10 -mmem rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020 PM: suspend entry (deep) Filesystems sync: 0.012 seconds Freezing user space processes ... (elapsed 0.010 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode wake enabled for irq 153 wake enabled for irq 158 samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt mask: 0xffffffe7 Disabling non-boot CPUs ... IRQ 51: no longer affine to CPU1 IRQ 52: no longer affine to CPU2 s3c2410-wdt 101d0000.watchdog: watchdog disabled wake disabled for irq 158 usb usb1: root hub lost power or was reset usb usb2: root hub lost power or was reset wake disabled for irq 153 exynos-tmu 10060000.tmu: More trip points than supported by this TMU. exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode. exynos-tmu 10064000.tmu: More trip points than supported by this TMU. exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode. exynos-tmu 10068000.tmu: More trip points than supported by this TMU. exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode. exynos-tmu 1006c000.tmu: More trip points than supported by this TMU. exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode. exynos-tmu 100a0000.tmu: More trip points than supported by this TMU. exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode. usb usb3: root hub lost power or was reset s3c-rtc 101e0000.rtc: rtc disabled, re-enabling usb usb4: root hub lost power or was reset xhci-hcd xhci-hcd.8.auto: No ports on the roothubs? PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12 PM: Device xhci-hcd.8.auto failed to resume async: error -12 hub 3-0:1.0: hub_ext_port_status failed (err = -32) hub 4-0:1.0: hub_ext_port_status failed (err = -32) usb 1-1: reset high-speed USB device number 2 using exynos-ehci usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci OOM killer enabled. Restarting tasks ... done. real 0m11.890s user 0m0.001s sys 0m0.679s root@target:~# PM: suspend exit mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63) mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, actual 200000000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, actual 200000000HZ div = 1) smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 root@target:~# root@target:~# time rtcwake -s10 -mmem[ 35.451572] vdd_ldo12: disabling rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020 PM: suspend entry (deep) Filesystems sync: 0.004 seconds Freezing user space processes ... (elapsed 0.006 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done. hub 4-0:1.0: hub_ext_port_status failed (err = -32) hub 3-0:1.0: hub_ext_port_status failed (err = -32) 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000014 pgd = 4c26b54b [00000014] *pgd=00000000 Internal error: Oops: 17 [#1] PREEMPT SMP ARM Modules linked in: CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 5.6.0-rc1-next-20200211 #268 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound async_run_entry_fn PC is at xhci_suspend+0x12c/0x520 LR is at 0xa6aa9898 pc : [<c0724c90>] lr : [<a6aa9898>] psr: 60000093 sp : ec401df8 ip : 0000001a fp : c12e7864 r10: 00000000 r9 : ecfb87b0 r8 : ecfb8220 r7 : 00000000 r6 : 00000000 r5 : 00000004 r4 : ecfb81f0 r3 : 00007d00 r2 : 00000001 r1 : 00000001 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6bd4006a DAC: 00000051 Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba) Stack: (0xec401df8 to 0xec402000) ... [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc) [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] (__device_suspend+0x134/0x7e8) [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94) [<c061c42c>] (async_suspend) from [<c0154bd0>] (async_run_entry_fn+0x48/0x1b8) [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] (process_one_work+0x230/0x7bc) [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524) [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164) [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) Exception stack(0xec401fb0 to 0xec401ff8) ... ---[ end trace c72caf6487666442 ]--- note: kworker/u16:23[1468] exited with preempt_count 1 Reverting it fixes the NULL pointer issue. I can provide more information or do some other tests. Just let me know what will help to fix it. > ... Best regards
On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote: > Hi > > On 08.01.2020 16:17, Mathias Nyman wrote: > > xhci driver assumed that xHC controllers have at most one custom > > supported speed table (PSI) for all usb 3.x ports. > > Memory was allocated for one PSI table under the xhci hub structure. > > > > Turns out this is not the case, some controllers have a separate > > "supported protocol capability" entry with a PSI table for each port. > > This means each usb3 port can in theory support different custom speeds. > > > > To solve this cache all supported protocol capabilities with their PSI > > tables in an array, and add pointers to the xhci port structure so that > > every port points to its capability entry in the array. > > > > When creating the SuperSpeedPlus USB Device Capability BOS descriptor > > for the xhci USB 3.1 roothub we for now will use only data from the > > first USB 3.1 capable protocol capability entry in the array. > > This could be improved later, this patch focuses resolving > > the memory leak. > > > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> > > Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com> > > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> > > This patch landed in today's linux-next (20200211) and causes NULL > pointer dereference during second suspend/resume cycle on Samsung > Exynos5422-based (arm 32bit) Odroid XU3lite board: > > # time rtcwake -s10 -mmem > rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020 > PM: suspend entry (deep) > Filesystems sync: 0.012 seconds > Freezing user space processes ... (elapsed 0.010 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode > wake enabled for irq 153 > wake enabled for irq 158 > samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt > mask: 0xffffffe7 > Disabling non-boot CPUs ... > IRQ 51: no longer affine to CPU1 > IRQ 52: no longer affine to CPU2 > s3c2410-wdt 101d0000.watchdog: watchdog disabled > wake disabled for irq 158 > usb usb1: root hub lost power or was reset > usb usb2: root hub lost power or was reset > wake disabled for irq 153 > exynos-tmu 10060000.tmu: More trip points than supported by this TMU. > exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode. > exynos-tmu 10064000.tmu: More trip points than supported by this TMU. > exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode. > exynos-tmu 10068000.tmu: More trip points than supported by this TMU. > exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode. > exynos-tmu 1006c000.tmu: More trip points than supported by this TMU. > exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode. > exynos-tmu 100a0000.tmu: More trip points than supported by this TMU. > exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode. > usb usb3: root hub lost power or was reset > s3c-rtc 101e0000.rtc: rtc disabled, re-enabling > usb usb4: root hub lost power or was reset > xhci-hcd xhci-hcd.8.auto: No ports on the roothubs? > PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12 > PM: Device xhci-hcd.8.auto failed to resume async: error -12 > hub 3-0:1.0: hub_ext_port_status failed (err = -32) > hub 4-0:1.0: hub_ext_port_status failed (err = -32) > usb 1-1: reset high-speed USB device number 2 using exynos-ehci > usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci > OOM killer enabled. > Restarting tasks ... done. > > real 0m11.890s > user 0m0.001s > sys 0m0.679s > root@target:~# PM: suspend exit > mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, > actual 396825HZ div = 63) > mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, > actual 200000000HZ div = 0) > mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, > actual 50000000HZ div = 0) > mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, > actual 200000000HZ div = 1) > smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 > > root@target:~# > root@target:~# time rtcwake -s10 -mmem[ 35.451572] vdd_ldo12: disabling > > rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020 > PM: suspend entry (deep) > Filesystems sync: 0.004 seconds > Freezing user space processes ... (elapsed 0.006 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done. > hub 4-0:1.0: hub_ext_port_status failed (err = -32) > hub 3-0:1.0: hub_ext_port_status failed (err = -32) > 8<--- cut here --- > Unable to handle kernel NULL pointer dereference at virtual address 00000014 > pgd = 4c26b54b > [00000014] *pgd=00000000 > Internal error: Oops: 17 [#1] PREEMPT SMP ARM > Modules linked in: > CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted > 5.6.0-rc1-next-20200211 #268 > Hardware name: Samsung Exynos (Flattened Device Tree) > Workqueue: events_unbound async_run_entry_fn > PC is at xhci_suspend+0x12c/0x520 > LR is at 0xa6aa9898 > pc : [<c0724c90>] lr : [<a6aa9898>] psr: 60000093 > sp : ec401df8 ip : 0000001a fp : c12e7864 > r10: 00000000 r9 : ecfb87b0 r8 : ecfb8220 > r7 : 00000000 r6 : 00000000 r5 : 00000004 r4 : ecfb81f0 > r3 : 00007d00 r2 : 00000001 r1 : 00000001 r0 : 00000000 > Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 6bd4006a DAC: 00000051 > Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba) > Stack: (0xec401df8 to 0xec402000) > ... > [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc) > [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] > (__device_suspend+0x134/0x7e8) > [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94) > [<c061c42c>] (async_suspend) from [<c0154bd0>] > (async_run_entry_fn+0x48/0x1b8) > [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] > (process_one_work+0x230/0x7bc) > [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524) > [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164) > [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) > Exception stack(0xec401fb0 to 0xec401ff8) > ... > ---[ end trace c72caf6487666442 ]--- > note: kworker/u16:23[1468] exited with preempt_count 1 > > Reverting it fixes the NULL pointer issue. I can provide more > information or do some other tests. Just let me know what will help to > fix it. > > > ... Ugh. Mathias, should I just revert this for now? thanks, greg k-h
On 11.2.2020 14.23, Greg KH wrote: > On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote: >> Hi >> >> On 08.01.2020 16:17, Mathias Nyman wrote: >>> xhci driver assumed that xHC controllers have at most one custom >>> supported speed table (PSI) for all usb 3.x ports. >>> Memory was allocated for one PSI table under the xhci hub structure. >>> >>> Turns out this is not the case, some controllers have a separate >>> "supported protocol capability" entry with a PSI table for each port. >>> This means each usb3 port can in theory support different custom speeds. >>> >>> To solve this cache all supported protocol capabilities with their PSI >>> tables in an array, and add pointers to the xhci port structure so that >>> every port points to its capability entry in the array. >>> >>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor >>> for the xhci USB 3.1 roothub we for now will use only data from the >>> first USB 3.1 capable protocol capability entry in the array. >>> This could be improved later, this patch focuses resolving >>> the memory leak. >>> >>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> >>> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com> >>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> >> >> This patch landed in today's linux-next (20200211) and causes NULL >> pointer dereference during second suspend/resume cycle on Samsung >> Exynos5422-based (arm 32bit) Odroid XU3lite board: >> >> # time rtcwake -s10 -mmem >> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020 >> PM: suspend entry (deep) >> Filesystems sync: 0.012 seconds >> Freezing user space processes ... (elapsed 0.010 seconds) done. >> OOM killer disabled. >> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. >> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode >> wake enabled for irq 153 >> wake enabled for irq 158 >> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt >> mask: 0xffffffe7 >> Disabling non-boot CPUs ... >> IRQ 51: no longer affine to CPU1 >> IRQ 52: no longer affine to CPU2 >> s3c2410-wdt 101d0000.watchdog: watchdog disabled >> wake disabled for irq 158 >> usb usb1: root hub lost power or was reset >> usb usb2: root hub lost power or was reset >> wake disabled for irq 153 >> exynos-tmu 10060000.tmu: More trip points than supported by this TMU. >> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode. >> exynos-tmu 10064000.tmu: More trip points than supported by this TMU. >> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode. >> exynos-tmu 10068000.tmu: More trip points than supported by this TMU. >> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode. >> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU. >> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode. >> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU. >> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode. >> usb usb3: root hub lost power or was reset >> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling >> usb usb4: root hub lost power or was reset >> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs? >> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12 >> PM: Device xhci-hcd.8.auto failed to resume async: error -12 >> hub 3-0:1.0: hub_ext_port_status failed (err = -32) >> hub 4-0:1.0: hub_ext_port_status failed (err = -32) >> usb 1-1: reset high-speed USB device number 2 using exynos-ehci >> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci >> OOM killer enabled. >> Restarting tasks ... done. >> >> real 0m11.890s >> user 0m0.001s >> sys 0m0.679s >> root@target:~# PM: suspend exit >> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, >> actual 396825HZ div = 63) >> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, >> actual 200000000HZ div = 0) >> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, >> actual 50000000HZ div = 0) >> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, >> actual 200000000HZ div = 1) >> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 >> >> root@target:~# >> root@target:~# time rtcwake -s10 -mmem[ 35.451572] vdd_ldo12: disabling >> >> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020 >> PM: suspend entry (deep) >> Filesystems sync: 0.004 seconds >> Freezing user space processes ... (elapsed 0.006 seconds) done. >> OOM killer disabled. >> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done. >> hub 4-0:1.0: hub_ext_port_status failed (err = -32) >> hub 3-0:1.0: hub_ext_port_status failed (err = -32) >> 8<--- cut here --- >> Unable to handle kernel NULL pointer dereference at virtual address 00000014 >> pgd = 4c26b54b >> [00000014] *pgd=00000000 >> Internal error: Oops: 17 [#1] PREEMPT SMP ARM >> Modules linked in: >> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted >> 5.6.0-rc1-next-20200211 #268 >> Hardware name: Samsung Exynos (Flattened Device Tree) >> Workqueue: events_unbound async_run_entry_fn >> PC is at xhci_suspend+0x12c/0x520 >> LR is at 0xa6aa9898 >> pc : [<c0724c90>] lr : [<a6aa9898>] psr: 60000093 >> sp : ec401df8 ip : 0000001a fp : c12e7864 >> r10: 00000000 r9 : ecfb87b0 r8 : ecfb8220 >> r7 : 00000000 r6 : 00000000 r5 : 00000004 r4 : ecfb81f0 >> r3 : 00007d00 r2 : 00000001 r1 : 00000001 r0 : 00000000 >> Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none >> Control: 10c5387d Table: 6bd4006a DAC: 00000051 >> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba) >> Stack: (0xec401df8 to 0xec402000) >> ... >> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc) >> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] >> (__device_suspend+0x134/0x7e8) >> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94) >> [<c061c42c>] (async_suspend) from [<c0154bd0>] >> (async_run_entry_fn+0x48/0x1b8) >> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] >> (process_one_work+0x230/0x7bc) >> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524) >> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164) >> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) >> Exception stack(0xec401fb0 to 0xec401ff8) >> ... >> ---[ end trace c72caf6487666442 ]--- >> note: kworker/u16:23[1468] exited with preempt_count 1 >> >> Reverting it fixes the NULL pointer issue. I can provide more >> information or do some other tests. Just let me know what will help to >> fix it. >> >> > ... > > Ugh. Mathias, should I just revert this for now? > Yes, revert it. This looks very odd, after second resume, and losing power driver can't find any port at all. Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?" message on second resume after reverting the patch? -Mathias
On 11.2.2020 14.29, Mathias Nyman wrote: > On 11.2.2020 14.23, Greg KH wrote: >> On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote: >>> Hi >>> >>> On 08.01.2020 16:17, Mathias Nyman wrote: >>>> xhci driver assumed that xHC controllers have at most one custom >>>> supported speed table (PSI) for all usb 3.x ports. >>>> Memory was allocated for one PSI table under the xhci hub structure. >>>> >>>> Turns out this is not the case, some controllers have a separate >>>> "supported protocol capability" entry with a PSI table for each port. >>>> This means each usb3 port can in theory support different custom speeds. >>>> >>>> To solve this cache all supported protocol capabilities with their PSI >>>> tables in an array, and add pointers to the xhci port structure so that >>>> every port points to its capability entry in the array. >>>> >>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor >>>> for the xhci USB 3.1 roothub we for now will use only data from the >>>> first USB 3.1 capable protocol capability entry in the array. >>>> This could be improved later, this patch focuses resolving >>>> the memory leak. >>>> >>>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> >>>> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com> >>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> >>> >>> This patch landed in today's linux-next (20200211) and causes NULL >>> pointer dereference during second suspend/resume cycle on Samsung >>> Exynos5422-based (arm 32bit) Odroid XU3lite board: >>> >>> # time rtcwake -s10 -mmem >>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020 >>> PM: suspend entry (deep) >>> Filesystems sync: 0.012 seconds >>> Freezing user space processes ... (elapsed 0.010 seconds) done. >>> OOM killer disabled. >>> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. >>> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode >>> wake enabled for irq 153 >>> wake enabled for irq 158 >>> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt >>> mask: 0xffffffe7 >>> Disabling non-boot CPUs ... >>> IRQ 51: no longer affine to CPU1 >>> IRQ 52: no longer affine to CPU2 >>> s3c2410-wdt 101d0000.watchdog: watchdog disabled >>> wake disabled for irq 158 >>> usb usb1: root hub lost power or was reset >>> usb usb2: root hub lost power or was reset >>> wake disabled for irq 153 >>> exynos-tmu 10060000.tmu: More trip points than supported by this TMU. >>> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode. >>> exynos-tmu 10064000.tmu: More trip points than supported by this TMU. >>> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode. >>> exynos-tmu 10068000.tmu: More trip points than supported by this TMU. >>> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode. >>> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU. >>> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode. >>> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU. >>> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode. >>> usb usb3: root hub lost power or was reset >>> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling >>> usb usb4: root hub lost power or was reset >>> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs? >>> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12 >>> PM: Device xhci-hcd.8.auto failed to resume async: error -12 >>> hub 3-0:1.0: hub_ext_port_status failed (err = -32) >>> hub 4-0:1.0: hub_ext_port_status failed (err = -32) >>> usb 1-1: reset high-speed USB device number 2 using exynos-ehci >>> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci >>> OOM killer enabled. >>> Restarting tasks ... done. >>> >>> real 0m11.890s >>> user 0m0.001s >>> sys 0m0.679s >>> root@target:~# PM: suspend exit >>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, >>> actual 396825HZ div = 63) >>> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, >>> actual 200000000HZ div = 0) >>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, >>> actual 50000000HZ div = 0) >>> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, >>> actual 200000000HZ div = 1) >>> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 >>> >>> root@target:~# >>> root@target:~# time rtcwake -s10 -mmem[ 35.451572] vdd_ldo12: disabling >>> >>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020 >>> PM: suspend entry (deep) >>> Filesystems sync: 0.004 seconds >>> Freezing user space processes ... (elapsed 0.006 seconds) done. >>> OOM killer disabled. >>> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done. >>> hub 4-0:1.0: hub_ext_port_status failed (err = -32) >>> hub 3-0:1.0: hub_ext_port_status failed (err = -32) >>> 8<--- cut here --- >>> Unable to handle kernel NULL pointer dereference at virtual address 00000014 >>> pgd = 4c26b54b >>> [00000014] *pgd=00000000 >>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM >>> Modules linked in: >>> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted >>> 5.6.0-rc1-next-20200211 #268 >>> Hardware name: Samsung Exynos (Flattened Device Tree) >>> Workqueue: events_unbound async_run_entry_fn >>> PC is at xhci_suspend+0x12c/0x520 >>> LR is at 0xa6aa9898 >>> pc : [<c0724c90>] lr : [<a6aa9898>] psr: 60000093 >>> sp : ec401df8 ip : 0000001a fp : c12e7864 >>> r10: 00000000 r9 : ecfb87b0 r8 : ecfb8220 >>> r7 : 00000000 r6 : 00000000 r5 : 00000004 r4 : ecfb81f0 >>> r3 : 00007d00 r2 : 00000001 r1 : 00000001 r0 : 00000000 >>> Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none >>> Control: 10c5387d Table: 6bd4006a DAC: 00000051 >>> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba) >>> Stack: (0xec401df8 to 0xec402000) >>> ... >>> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc) >>> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] >>> (__device_suspend+0x134/0x7e8) >>> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94) >>> [<c061c42c>] (async_suspend) from [<c0154bd0>] >>> (async_run_entry_fn+0x48/0x1b8) >>> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] >>> (process_one_work+0x230/0x7bc) >>> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524) >>> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164) >>> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) >>> Exception stack(0xec401fb0 to 0xec401ff8) >>> ... >>> ---[ end trace c72caf6487666442 ]--- >>> note: kworker/u16:23[1468] exited with preempt_count 1 >>> >>> Reverting it fixes the NULL pointer issue. I can provide more >>> information or do some other tests. Just let me know what will help to >>> fix it. >>> >>> > ... >> >> Ugh. Mathias, should I just revert this for now? >> > > Yes, revert it. > > This looks very odd, after second resume, and losing power driver > can't find any port at all. > > Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?" > message on second resume after reverting the patch? > Ok, I think I got it. Patch doesn't set xhci->num_port_caps to 0 in xhci_mem_cleanup(). Adding new ports will fail when we reinitialize xhci manually, like in this exynos case where xhci loses power in suspend/resume cycle. I'll post a new version soon -Mathias
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 7a3a29e5e9d2..0974eebd28e7 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -55,6 +55,7 @@ static u8 usb_bos_descriptor [] = { static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf, u16 wLength) { + struct xhci_port_cap *port_cap; int i, ssa_count; u32 temp; u16 desc_size, ssp_cap_size, ssa_size = 0; @@ -64,16 +65,24 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf, ssp_cap_size = sizeof(usb_bos_descriptor) - desc_size; /* does xhci support USB 3.1 Enhanced SuperSpeed */ - if (xhci->usb3_rhub.min_rev >= 0x01) { + for (i = 0; i < xhci->num_port_caps; i++) { + if (xhci->port_caps[i].maj_rev == 0x03 && + xhci->port_caps[i].min_rev >= 0x01) { + usb3_1 = true; + port_cap = &xhci->port_caps[i]; + break; + } + } + + if (usb3_1) { /* does xhci provide a PSI table for SSA speed attributes? */ - if (xhci->usb3_rhub.psi_count) { + if (port_cap->psi_count) { /* two SSA entries for each unique PSI ID, RX and TX */ - ssa_count = xhci->usb3_rhub.psi_uid_count * 2; + ssa_count = port_cap->psi_uid_count * 2; ssa_size = ssa_count * sizeof(u32); ssp_cap_size -= 16; /* skip copying the default SSA */ } desc_size += ssp_cap_size; - usb3_1 = true; } memcpy(buf, &usb_bos_descriptor, min(desc_size, wLength)); @@ -99,7 +108,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf, } /* If PSI table exists, add the custom speed attributes from it */ - if (usb3_1 && xhci->usb3_rhub.psi_count) { + if (usb3_1 && port_cap->psi_count) { u32 ssp_cap_base, bm_attrib, psi, psi_mant, psi_exp; int offset; @@ -111,7 +120,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf, /* attribute count SSAC bits 4:0 and ID count SSIC bits 8:5 */ bm_attrib = (ssa_count - 1) & 0x1f; - bm_attrib |= (xhci->usb3_rhub.psi_uid_count - 1) << 5; + bm_attrib |= (port_cap->psi_uid_count - 1) << 5; put_unaligned_le32(bm_attrib, &buf[ssp_cap_base + 4]); if (wLength < desc_size + ssa_size) @@ -124,8 +133,8 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf, * USB 3.1 requires two SSA entries (RX and TX) for every link */ offset = desc_size; - for (i = 0; i < xhci->usb3_rhub.psi_count; i++) { - psi = xhci->usb3_rhub.psi[i]; + for (i = 0; i < port_cap->psi_count; i++) { + psi = port_cap->psi[i]; psi &= ~USB_SSP_SUBLINK_SPEED_RSVD; psi_exp = XHCI_EXT_PORT_PSIE(psi); psi_mant = XHCI_EXT_PORT_PSIM(psi); diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 3b1388fa2f36..cf4d27774a7d 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1909,17 +1909,18 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) xhci->usb3_rhub.num_ports = 0; xhci->num_active_eps = 0; kfree(xhci->usb2_rhub.ports); - kfree(xhci->usb2_rhub.psi); kfree(xhci->usb3_rhub.ports); - kfree(xhci->usb3_rhub.psi); kfree(xhci->hw_ports); kfree(xhci->rh_bw); kfree(xhci->ext_caps); + for (i = 0; i < xhci->num_port_caps; i++) { + kfree(xhci->port_caps[i].psi); + xhci->port_caps[i].psi = NULL; + } + kfree(xhci->port_caps); xhci->usb2_rhub.ports = NULL; - xhci->usb2_rhub.psi = NULL; xhci->usb3_rhub.ports = NULL; - xhci->usb3_rhub.psi = NULL; xhci->hw_ports = NULL; xhci->rh_bw = NULL; xhci->ext_caps = NULL; @@ -2120,6 +2121,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports, u8 major_revision, minor_revision; struct xhci_hub *rhub; struct device *dev = xhci_to_hcd(xhci)->self.sysdev; + struct xhci_port_cap *port_cap; temp = readl(addr); major_revision = XHCI_EXT_PORT_MAJOR(temp); @@ -2154,31 +2156,39 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports, /* WTF? "Valid values are ‘1’ to MaxPorts" */ return; - rhub->psi_count = XHCI_EXT_PORT_PSIC(temp); - if (rhub->psi_count) { - rhub->psi = kcalloc_node(rhub->psi_count, sizeof(*rhub->psi), - GFP_KERNEL, dev_to_node(dev)); - if (!rhub->psi) - rhub->psi_count = 0; + port_cap = &xhci->port_caps[xhci->num_port_caps++]; + if (xhci->num_port_caps > max_caps) + return; + + port_cap->maj_rev = major_revision; + port_cap->min_rev = minor_revision; + port_cap->psi_count = XHCI_EXT_PORT_PSIC(temp); + + if (port_cap->psi_count) { + port_cap->psi = kcalloc_node(port_cap->psi_count, + sizeof(*port_cap->psi), + GFP_KERNEL, dev_to_node(dev)); + if (!port_cap->psi) + port_cap->psi_count = 0; - rhub->psi_uid_count++; - for (i = 0; i < rhub->psi_count; i++) { - rhub->psi[i] = readl(addr + 4 + i); + port_cap->psi_uid_count++; + for (i = 0; i < port_cap->psi_count; i++) { + port_cap->psi[i] = readl(addr + 4 + i); /* count unique ID values, two consecutive entries can * have the same ID if link is assymetric */ - if (i && (XHCI_EXT_PORT_PSIV(rhub->psi[i]) != - XHCI_EXT_PORT_PSIV(rhub->psi[i - 1]))) - rhub->psi_uid_count++; + if (i && (XHCI_EXT_PORT_PSIV(port_cap->psi[i]) != + XHCI_EXT_PORT_PSIV(port_cap->psi[i - 1]))) + port_cap->psi_uid_count++; xhci_dbg(xhci, "PSIV:%d PSIE:%d PLT:%d PFD:%d LP:%d PSIM:%d\n", - XHCI_EXT_PORT_PSIV(rhub->psi[i]), - XHCI_EXT_PORT_PSIE(rhub->psi[i]), - XHCI_EXT_PORT_PLT(rhub->psi[i]), - XHCI_EXT_PORT_PFD(rhub->psi[i]), - XHCI_EXT_PORT_LP(rhub->psi[i]), - XHCI_EXT_PORT_PSIM(rhub->psi[i])); + XHCI_EXT_PORT_PSIV(port_cap->psi[i]), + XHCI_EXT_PORT_PSIE(port_cap->psi[i]), + XHCI_EXT_PORT_PLT(port_cap->psi[i]), + XHCI_EXT_PORT_PFD(port_cap->psi[i]), + XHCI_EXT_PORT_LP(port_cap->psi[i]), + XHCI_EXT_PORT_PSIM(port_cap->psi[i])); } } /* cache usb2 port capabilities */ @@ -2213,6 +2223,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports, continue; } hw_port->rhub = rhub; + hw_port->port_cap = port_cap; rhub->num_ports++; } /* FIXME: Should we disable ports not in the Extended Capabilities? */ @@ -2303,6 +2314,11 @@ static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags) if (!xhci->ext_caps) return -ENOMEM; + xhci->port_caps = kcalloc_node(cap_count, sizeof(*xhci->port_caps), + flags, dev_to_node(dev)); + if (!xhci->port_caps) + return -ENOMEM; + offset = cap_start; while (offset) { diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 13d8838cd552..3ecee10fdcdc 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1702,12 +1702,20 @@ struct xhci_bus_state { * Intel Lynx Point LP xHCI host. */ #define XHCI_MAX_REXIT_TIMEOUT_MS 20 +struct xhci_port_cap { + u32 *psi; /* array of protocol speed ID entries */ + u8 psi_count; + u8 psi_uid_count; + u8 maj_rev; + u8 min_rev; +}; struct xhci_port { __le32 __iomem *addr; int hw_portnum; int hcd_portnum; struct xhci_hub *rhub; + struct xhci_port_cap *port_cap; }; struct xhci_hub { @@ -1719,9 +1727,6 @@ struct xhci_hub { /* supported prococol extended capabiliy values */ u8 maj_rev; u8 min_rev; - u32 *psi; /* array of protocol speed ID entries */ - u8 psi_count; - u8 psi_uid_count; }; /* There is one xhci_hcd structure per controller */ @@ -1880,6 +1885,9 @@ struct xhci_hcd { /* cached usb2 extened protocol capabilites */ u32 *ext_caps; unsigned int num_ext_caps; + /* cached extended protocol port capabilities */ + struct xhci_port_cap *port_caps; + unsigned int num_port_caps; /* Compliance Mode Recovery Data */ struct timer_list comp_mode_recovery_timer; u32 port_status_u0;
xhci driver assumed that xHC controllers have at most one custom supported speed table (PSI) for all usb 3.x ports. Memory was allocated for one PSI table under the xhci hub structure. Turns out this is not the case, some controllers have a separate "supported protocol capability" entry with a PSI table for each port. This means each usb3 port can in theory support different custom speeds. To solve this cache all supported protocol capabilities with their PSI tables in an array, and add pointers to the xhci port structure so that every port points to its capability entry in the array. When creating the SuperSpeedPlus USB Device Capability BOS descriptor for the xhci USB 3.1 roothub we for now will use only data from the first USB 3.1 capable protocol capability entry in the array. This could be improved later, this patch focuses resolving the memory leak. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@amd.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> --- drivers/usb/host/xhci-hub.c | 25 +++++++++++----- drivers/usb/host/xhci-mem.c | 60 +++++++++++++++++++++++-------------- drivers/usb/host/xhci.h | 14 +++++++-- 3 files changed, 66 insertions(+), 33 deletions(-)