mbox series

[4.19.y,v2,0/9] Fix scheduling while atomic in dwc3_gadget_ep_dequeue

Message ID 20190628182413.33225-1-john.stultz@linaro.org (mailing list archive)
Headers show
Series Fix scheduling while atomic in dwc3_gadget_ep_dequeue | expand

Message

John Stultz June 28, 2019, 6:24 p.m. UTC
With recent changes in AOSP, adb is using asynchronous io, which
causes the following crash usually on a reboot:

[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
[  184.316034] Preemption disabled at:
[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
[  184.334963] Hardware name: HiKey960 (DT)
[  184.338892] Call trace:
[  184.341352]  dump_backtrace+0x0/0x158
[  184.345025]  show_stack+0x14/0x20
[  184.348355]  dump_stack+0x80/0xa4
[  184.351685]  __schedule_bug+0x6c/0xc0
[  184.355363]  __schedule+0x64c/0x978
[  184.358863]  schedule+0x2c/0x90
[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
[  184.367210]  usb_ep_dequeue+0x24/0xf8
[  184.370884]  ffs_aio_cancel+0x3c/0x80
[  184.374561]  free_ioctx_users+0x40/0x148
[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
[  184.383830]  rcu_process_callbacks+0x24c/0x5d8
[  184.388283]  __do_softirq+0x13c/0x398
[  184.391959]  run_ksoftirqd+0x3c/0x48
[  184.395549]  smpboot_thread_fn+0x220/0x288
[  184.399660]  kthread+0x12c/0x130
[  184.402901]  ret_from_fork+0x10/0x1c


This happens as usb_ep_dequeue can be called in interrupt
context, and dwc3_gadget_ep_dequeue() then calls
wait_event_lock_irq() which can sleep.

Upstream kernels are not affected due to the change
fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
removes the wait_even_lock_irq code. Unfortunately that change
has a number of dependencies, which I'm submitting here.

Also, to match upstream, in this series I've reverted one
change that was backported to -stable, to replace it with the
cherry-picked upstream commit (as the dependencies are now
there)

This issue also affects 4.14,4.9 and I believe 4.4 kernels,
however I don't know how to best backport this functionality
that far back. Help from the maintainers would be very much
appreciated!


New in v2:
* Reordered the patchset to put the revert patch first, which
  avoids any bisection build issues. (Thanks to Jack Pham for
  the suggestion!)


Feedback and comments would be welcome!

thanks
-john

Cc: Fei Yang <fei.yang@intel.com>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: linux-usb@vger.kernel.org
Cc: stable@vger.kernel.org # 4.19.y


Felipe Balbi (7):
  usb: dwc3: gadget: combine unaligned and zero flags
  usb: dwc3: gadget: track number of TRBs per request
  usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
  usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
  usb: dwc3: gadget: introduce cancelled_list
  usb: dwc3: gadget: move requests to cancelled_list
  usb: dwc3: gadget: remove wait_end_transfer

Jack Pham (1):
  usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup

John Stultz (1):
  Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"

 drivers/usb/dwc3/core.h   |  15 ++--
 drivers/usb/dwc3/gadget.c | 158 +++++++++++++-------------------------
 drivers/usb/dwc3/gadget.h |  15 ++++
 3 files changed, 75 insertions(+), 113 deletions(-)

Comments

Sasha Levin June 28, 2019, 10:58 p.m. UTC | #1
On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
>With recent changes in AOSP, adb is using asynchronous io, which
>causes the following crash usually on a reboot:
>
>[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
>[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
>[  184.316034] Preemption disabled at:
>[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
>[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
>[  184.334963] Hardware name: HiKey960 (DT)
>[  184.338892] Call trace:
>[  184.341352]  dump_backtrace+0x0/0x158
>[  184.345025]  show_stack+0x14/0x20
>[  184.348355]  dump_stack+0x80/0xa4
>[  184.351685]  __schedule_bug+0x6c/0xc0
>[  184.355363]  __schedule+0x64c/0x978
>[  184.358863]  schedule+0x2c/0x90
>[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
>[  184.367210]  usb_ep_dequeue+0x24/0xf8
>[  184.370884]  ffs_aio_cancel+0x3c/0x80
>[  184.374561]  free_ioctx_users+0x40/0x148
>[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
>[  184.383830]  rcu_process_callbacks+0x24c/0x5d8
>[  184.388283]  __do_softirq+0x13c/0x398
>[  184.391959]  run_ksoftirqd+0x3c/0x48
>[  184.395549]  smpboot_thread_fn+0x220/0x288
>[  184.399660]  kthread+0x12c/0x130
>[  184.402901]  ret_from_fork+0x10/0x1c
>
>
>This happens as usb_ep_dequeue can be called in interrupt
>context, and dwc3_gadget_ep_dequeue() then calls
>wait_event_lock_irq() which can sleep.
>
>Upstream kernels are not affected due to the change
>fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
>removes the wait_even_lock_irq code. Unfortunately that change
>has a number of dependencies, which I'm submitting here.
>
>Also, to match upstream, in this series I've reverted one
>change that was backported to -stable, to replace it with the
>cherry-picked upstream commit (as the dependencies are now
>there)
>
>This issue also affects 4.14,4.9 and I believe 4.4 kernels,
>however I don't know how to best backport this functionality
>that far back. Help from the maintainers would be very much
>appreciated!
>
>
>New in v2:
>* Reordered the patchset to put the revert patch first, which
>  avoids any bisection build issues. (Thanks to Jack Pham for
>  the suggestion!)
>
>
>Feedback and comments would be welcome!

I've queued it up for 4.19.

Is it the case that for older kernels the dependency list is too long?

--
Thanks,
Sasha
John Stultz June 28, 2019, 11:03 p.m. UTC | #2
On Fri, Jun 28, 2019 at 3:58 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
> >With recent changes in AOSP, adb is using asynchronous io, which
> >causes the following crash usually on a reboot:
> >
> >[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
> >[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
> >[  184.316034] Preemption disabled at:
> >[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
> >[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
> >[  184.334963] Hardware name: HiKey960 (DT)
> >[  184.338892] Call trace:
> >[  184.341352]  dump_backtrace+0x0/0x158
> >[  184.345025]  show_stack+0x14/0x20
> >[  184.348355]  dump_stack+0x80/0xa4
> >[  184.351685]  __schedule_bug+0x6c/0xc0
> >[  184.355363]  __schedule+0x64c/0x978
> >[  184.358863]  schedule+0x2c/0x90
> >[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
> >[  184.367210]  usb_ep_dequeue+0x24/0xf8
> >[  184.370884]  ffs_aio_cancel+0x3c/0x80
> >[  184.374561]  free_ioctx_users+0x40/0x148
> >[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
> >[  184.383830]  rcu_process_callbacks+0x24c/0x5d8
> >[  184.388283]  __do_softirq+0x13c/0x398
> >[  184.391959]  run_ksoftirqd+0x3c/0x48
> >[  184.395549]  smpboot_thread_fn+0x220/0x288
> >[  184.399660]  kthread+0x12c/0x130
> >[  184.402901]  ret_from_fork+0x10/0x1c
> >
> >
> >This happens as usb_ep_dequeue can be called in interrupt
> >context, and dwc3_gadget_ep_dequeue() then calls
> >wait_event_lock_irq() which can sleep.
> >
> >Upstream kernels are not affected due to the change
> >fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
> >removes the wait_even_lock_irq code. Unfortunately that change
> >has a number of dependencies, which I'm submitting here.
> >
> >Also, to match upstream, in this series I've reverted one
> >change that was backported to -stable, to replace it with the
> >cherry-picked upstream commit (as the dependencies are now
> >there)
> >
> >This issue also affects 4.14,4.9 and I believe 4.4 kernels,
> >however I don't know how to best backport this functionality
> >that far back. Help from the maintainers would be very much
> >appreciated!
> >
> >
> >New in v2:
> >* Reordered the patchset to put the revert patch first, which
> >  avoids any bisection build issues. (Thanks to Jack Pham for
> >  the suggestion!)
> >
> >
> >Feedback and comments would be welcome!
>
> I've queued it up for 4.19.
>
> Is it the case that for older kernels the dependency list is too long?

Yea. It gets ugly and I'm not enough of an expert on the driver to
feel comfortable knowing if I'm doing the right thing reworking this
stack onto an even older tree.

But I do see crashes on reboot w/ 4.14 and 4.9 (I and suspect 4.4 as
well), so I'll need to figure out something eventually.

thanks
-john
Thinh Nguyen July 1, 2019, 11:36 p.m. UTC | #3
Hi,

John Stultz wrote:
> On Fri, Jun 28, 2019 at 3:58 PM Sasha Levin <sashal@kernel.org> wrote:
>> On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
>>> With recent changes in AOSP, adb is using asynchronous io, which
>>> causes the following crash usually on a reboot:
>>>
>>> [  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
>>> [  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
>>> [  184.316034] Preemption disabled at:
>>> [  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
>>> [  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
>>> [  184.334963] Hardware name: HiKey960 (DT)
>>> [  184.338892] Call trace:
>>> [  184.341352]  dump_backtrace+0x0/0x158
>>> [  184.345025]  show_stack+0x14/0x20
>>> [  184.348355]  dump_stack+0x80/0xa4
>>> [  184.351685]  __schedule_bug+0x6c/0xc0
>>> [  184.355363]  __schedule+0x64c/0x978
>>> [  184.358863]  schedule+0x2c/0x90
>>> [  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
>>> [  184.367210]  usb_ep_dequeue+0x24/0xf8
>>> [  184.370884]  ffs_aio_cancel+0x3c/0x80
>>> [  184.374561]  free_ioctx_users+0x40/0x148
>>> [  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
>>> [  184.383830]  rcu_process_callbacks+0x24c/0x5d8
>>> [  184.388283]  __do_softirq+0x13c/0x398
>>> [  184.391959]  run_ksoftirqd+0x3c/0x48
>>> [  184.395549]  smpboot_thread_fn+0x220/0x288
>>> [  184.399660]  kthread+0x12c/0x130
>>> [  184.402901]  ret_from_fork+0x10/0x1c
>>>
>>>
>>> This happens as usb_ep_dequeue can be called in interrupt
>>> context, and dwc3_gadget_ep_dequeue() then calls
>>> wait_event_lock_irq() which can sleep.
>>>
>>> Upstream kernels are not affected due to the change
>>> fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
>>> removes the wait_even_lock_irq code. Unfortunately that change
>>> has a number of dependencies, which I'm submitting here.
>>>
>>> Also, to match upstream, in this series I've reverted one
>>> change that was backported to -stable, to replace it with the
>>> cherry-picked upstream commit (as the dependencies are now
>>> there)
>>>
>>> This issue also affects 4.14,4.9 and I believe 4.4 kernels,
>>> however I don't know how to best backport this functionality
>>> that far back. Help from the maintainers would be very much
>>> appreciated!
>>>
>>>
>>> New in v2:
>>> * Reordered the patchset to put the revert patch first, which
>>>  avoids any bisection build issues. (Thanks to Jack Pham for
>>>  the suggestion!)
>>>
>>>
>>> Feedback and comments would be welcome!
>> I've queued it up for 4.19.
>>
>> Is it the case that for older kernels the dependency list is too long?
> Yea. It gets ugly and I'm not enough of an expert on the driver to
> feel comfortable knowing if I'm doing the right thing reworking this
> stack onto an even older tree.
>
> But I do see crashes on reboot w/ 4.14 and 4.9 (I and suspect 4.4 as
> well), so I'll need to figure out something eventually.
>
>

If you're backporting this series, then you also need to apply these
fixes for this series:

This fixes a race issue:
c5353b225df9 ("usb: dwc3: gadget: don't enable interrupt when disabling
endpoint")

This fixes incorrect TRB skip:
c7152763f02e ("usb: dwc3: Reset num_trbs after skipping")

BR,
Thinh