Message ID | 1553059940-127038-1-git-send-email-fei.yang@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [V2] usb: gadget: f_fs: don't free buffer prematurely | expand |
On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > From: Fei Yang <fei.yang@intel.com> > > The following kernel panic happens due to the io_data buffer gets deallocated > before the async io is completed. Add a check for the case where io_data buffer > should be deallocated by ffs_user_copy_worker. It looks like this happened because data got renamed to io_data, which made the `data = NULL` marked with "Do not kfree the buffer in this function" not do what it was hoping. This should probably either delete the assignment above or fix the assignment to refer to io_data? (EIOCBQUEUED presumably can't come from elsewhere?)
On Tue, Mar 19, 2019 at 10:56 PM Josh Gao <jmgao@google.com> wrote: > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > > From: Fei Yang <fei.yang@intel.com> > > > > The following kernel panic happens due to the io_data buffer gets deallocated > > before the async io is completed. Add a check for the case where io_data buffer > > should be deallocated by ffs_user_copy_worker. > > It looks like this happened because data got renamed to io_data, which made the > `data = NULL` marked with "Do not kfree the buffer in this function" not do > what it was hoping. This should probably either delete the assignment above or > fix the assignment to refer to io_data? (EIOCBQUEUED presumably can't come from > elsewhere?) (except ffs_free_buffer doesn't check for null, so probably the former)
On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > From: Fei Yang <fei.yang@intel.com> > > The following kernel panic happens due to the io_data buffer gets deallocated > before the async io is completed. Add a check for the case where io_data buffer > should be deallocated by ffs_user_copy_worker. > > [ 41.663334] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 > [ 41.672099] #PF error: [normal kernel read fault] > [ 41.677356] PGD 20c974067 P4D 20c974067 PUD 20c973067 PMD 0 > [ 41.683687] Oops: 0000 [#1] PREEMPT SMP > [ 41.687976] CPU: 1 PID: 7 Comm: kworker/u8:0 Tainted: G U 5.0.0-quilt-2e5dc0ac-00790-gd8c79f2-dirty #2 > [ 41.705309] Workqueue: adb ffs_user_copy_worker > [ 41.705316] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.705318] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.705320] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.705322] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.705323] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.705324] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.705325] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.705326] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.705328] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.705329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.705330] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.705331] Call Trace: > [ 41.705338] vfree+0x50/0xb0 > [ 41.705341] ffs_user_copy_worker+0xe9/0x1c0 > [ 41.705344] process_one_work+0x19f/0x3e0 > [ 41.705348] worker_thread+0x3f/0x3b0 > [ 41.829766] kthread+0x12b/0x150 > [ 41.833371] ? process_one_work+0x3e0/0x3e0 > [ 41.838045] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 41.843695] ret_from_fork+0x3a/0x50 > [ 41.847689] Modules linked in: hci_uart bluetooth ecdh_generic rfkill_gpio dwc3_pci dwc3 snd_usb_audio mei_me tpm_crb snd_usbmidi_lib xhci_pci xhci_hcd mei tpm snd_hwdep cfg80211 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_hda_core videobuf2_dma_sg crlmodule > [ 41.876880] CR2: 0000000000000048 > [ 41.880584] ---[ end trace 2bc4addff0f2e673 ]--- > [ 41.891346] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.895734] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.916740] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.922583] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.930563] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.938540] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.946520] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.954502] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.962482] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.971536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.977960] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.985930] Kernel panic - not syncing: Fatal exception > [ 41.991817] Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 42.009525] Rebooting in 10 seconds.. > [ 52.014376] ACPI MEMORY or I/O RESET_REG. > > Fixes: 772a7a724f6 ("usb: gadget: f_fs: Allow scatter-gather buffers") > Signed-off-by: Fei Yang <fei.yang@intel.com> > Reviewed-by: Manu Gautam <mgautam@codeaurora.org> > Tested-by: John Stultz <john.stultz@linaro.org> > --- > V2: add tag: "Fixes: 772a7a724f6 ......", Reviewed-by and Tested-by. Hey Fei, So while this patch does resolve the issues I was seeing with mainline kernels and recent changes to adbd, Josh pointed out that it wouldn't resolve the issues I was seeing with older kernels which is slightly different (but still related to aio usage). On the older kernels I'm hitting scheduling while atomic on reboot, which seems to be due to ffs_aio_cancel() taking a spinlock then calling usb_ep_dequeue() which might sleep. It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers") which was then reverted by a9c859033f6e. Elsewhere it seems the ffs driver takes effort to drop any locks before calling usb_ep_dequeue(), so this seems like that should be addressed, but it also seems like recent change to the dwc3 driver has been made to avoid sleeping in that path (see fec9095bdef4 ("usb: dwc3: gadget: remove wait_end_transfer")), which may be why I'm not seeing the problem with mainline (and your patch here, of coarse). But that also doesn't clarify if its still a potential issue w/ non-dwc3 platforms. So for older kernels, do you have a suggestion of which approach is advised? Does usb_ep_dequeue need to avoid sleeping or do we need to rework the ffs_aio_cancel logic? thanks -john
On Wed, 20 Mar 2019, John Stultz wrote: > Hey Fei, > So while this patch does resolve the issues I was seeing with > mainline kernels and recent changes to adbd, Josh pointed out that it > wouldn't resolve the issues I was seeing with older kernels which is > slightly different (but still related to aio usage). > > On the older kernels I'm hitting scheduling while atomic on reboot, > which seems to be due to ffs_aio_cancel() taking a spinlock then > calling usb_ep_dequeue() which might sleep. > > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: > gadget: ffs: Fix BUG when userland exits with submitted AIO > transfers") which was then reverted by a9c859033f6e. > > Elsewhere it seems the ffs driver takes effort to drop any locks > before calling usb_ep_dequeue(), so this seems like that should be > addressed, but it also seems like recent change to the dwc3 driver has > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > seeing the problem with mainline (and your patch here, of coarse). > But that also doesn't clarify if its still a potential issue w/ > non-dwc3 platforms. > > So for older kernels, do you have a suggestion of which approach is > advised? Does usb_ep_dequeue need to avoid sleeping or do we need to > rework the ffs_aio_cancel logic? usb_ep_dequeue can be called in interrupt context, meaning it is never allowed to sleep. This is mentioned in the kerneldoc: /** * usb_ep_dequeue - dequeues (cancels, unlinks) an I/O request from an endpoint * @ep:the endpoint associated with the request * @req:the request being canceled * * If the request is still active on the endpoint, it is dequeued and * eventually its completion routine is called (with status -ECONNRESET); * else a negative error code is returned. This routine is asynchronous, * that is, it may return before the completion routine runs. * * Note that some hardware can't clear out write fifos (to unlink the request * at the head of the queue) except as part of disconnecting from usb. Such * restrictions prevent drivers from supporting configuration changes, * even to configuration zero (a "chapter 9" requirement). * * This routine may be called in interrupt context. */ Alan Stern
On Wed, Mar 20, 2019 at 9:52 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > On Wed, 20 Mar 2019, John Stultz wrote: > > > Hey Fei, > > So while this patch does resolve the issues I was seeing with > > mainline kernels and recent changes to adbd, Josh pointed out that it > > wouldn't resolve the issues I was seeing with older kernels which is > > slightly different (but still related to aio usage). > > > > On the older kernels I'm hitting scheduling while atomic on reboot, > > which seems to be due to ffs_aio_cancel() taking a spinlock then > > calling usb_ep_dequeue() which might sleep. > > > > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: > > gadget: ffs: Fix BUG when userland exits with submitted AIO > > transfers") which was then reverted by a9c859033f6e. > > > > Elsewhere it seems the ffs driver takes effort to drop any locks > > before calling usb_ep_dequeue(), so this seems like that should be > > addressed, but it also seems like recent change to the dwc3 driver has > > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > > seeing the problem with mainline (and your patch here, of coarse). > > But that also doesn't clarify if its still a potential issue w/ > > non-dwc3 platforms. > > > > So for older kernels, do you have a suggestion of which approach is > > advised? Does usb_ep_dequeue need to avoid sleeping or do we need to > > rework the ffs_aio_cancel logic? > > usb_ep_dequeue can be called in interrupt context, meaning it is never > allowed to sleep. This is mentioned in the kerneldoc: Thanks for the clarification! Sorry I didn't see the kernel doc! -john
> On Tue, Mar 19, 2019 at 10:56 PM Josh Gao <jmgao@google.com> wrote: > > > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > > > > From: Fei Yang <fei.yang@intel.com> > > > > > > The following kernel panic happens due to the io_data buffer gets > > > deallocated before the async io is completed. Add a check for the > > > case where io_data buffer should be deallocated by ffs_user_copy_worker. > > > > It looks like this happened because data got renamed to io_data, which > > made the `data = NULL` marked with "Do not kfree the buffer in this > > function" not do what it was hoping. This should probably either > > delete the assignment above or fix the assignment to refer to io_data? > > (EIOCBQUEUED presumably can't come from > > elsewhere?) > > (except ffs_free_buffer doesn't check for null, so probably the former) I don’t see EIOCBQUEUED coming from anywhere else in the context of epfile_io, but you are right it's hard to be 100% sure about that. Maybe I should simply check for (data == NULL) so that the original logic remains intact, If (data) ffs_free_buffer(io_data);
On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > So while this patch does resolve the issues I was seeing with > mainline kernels and recent changes to adbd, Josh pointed out that it > wouldn't resolve the issues I was seeing with older kernels which is > slightly different (but still related to aio usage). > > On the older kernels I'm hitting scheduling while atomic on reboot, > which seems to be due to ffs_aio_cancel() taking a spinlock then > calling usb_ep_dequeue() which might sleep. > > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: > gadget: ffs: Fix BUG when userland exits with submitted AIO > transfers") which was then reverted by a9c859033f6e. > > Elsewhere it seems the ffs driver takes effort to drop any locks > before calling usb_ep_dequeue(), so this seems like that should be > addressed, but it also seems like recent change to the dwc3 driver has > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > seeing the problem with mainline (and your patch here, of coarse). > But that also doesn't clarify if its still a potential issue w/ > non-dwc3 platforms. Felipe: Given Alan's point, does it make sense to mark the commits that remove the possible sleep from wait_event_lock_irq() in dwc3_gadget_ep_dequeue() for -stable? Against 4.19.30, the following set manages to cherry-pick cleanly: git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 # To get things building, revert modified -stable fix git revert 25ad17d #pick actual upstream fix replacing the previous git cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 (Though I'm always a bit hesitant with -stable backports on subsystems I don't know well. So I'm not sure if this set is fully correct.) This set seems to avoid the crash on reboot I was seeing. And of course, I'm sure getting that set backported to 4.14 and 4.9 (and maybe even 4.4, I need to check) will be less clean. thanks -john
> Hey Fei, > So while this patch does resolve the issues I was seeing with mainline kernels and recent changes to adbd, > Josh pointed out that it wouldn't resolve the issues I was seeing with older kernels which is slightly different (but still related to aio usage). > > On the older kernels I'm hitting scheduling while atomic on reboot, which seems to be due to ffs_aio_cancel() taking a spinlock then calling usb_ep_dequeue() which might sleep. > > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: > gadget: ffs: Fix BUG when userland exits with submitted AIO > transfers") which was then reverted by a9c859033f6e. > > Elsewhere it seems the ffs driver takes effort to drop any locks before calling usb_ep_dequeue(), so this seems like > that should be addressed, but it also seems like recent change to the dwc3 driver has been made to avoid sleeping > in that path (see fec9095bdef4 ("usb: dwc3: gadget: remove wait_end_transfer")), which may be why I'm not seeing > the problem with mainline (and your patch here, of coarse). But that also doesn't clarify if its still a potential issue > w/ non-dwc3 platforms. > > So for older kernels, do you have a suggestion of which approach is advised? Does usb_ep_dequeue need to avoid > sleeping or do we need to rework the ffs_aio_cancel logic? Are you seeing this issue with Android? When running adb reboot? I have tried 4.19 and 4.9 kernel with Android P-dessert on one of the Intel platforms, but no luck on reproducing the issue. I will get back to you if I could reproduce the issue. I'm afraid I won’t be able to do much by just looking at the code.
On Wed, Mar 20, 2019 at 4:28 PM Yang, Fei <fei.yang@intel.com> wrote: > > > Hey Fei, > > So while this patch does resolve the issues I was seeing with mainline kernels and recent changes to adbd, > > Josh pointed out that it wouldn't resolve the issues I was seeing with older kernels which is slightly different (but still related to aio usage). > > > > On the older kernels I'm hitting scheduling while atomic on reboot, which seems to be due to ffs_aio_cancel() taking a spinlock then calling usb_ep_dequeue() which might sleep. > > > > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: > > gadget: ffs: Fix BUG when userland exits with submitted AIO > > transfers") which was then reverted by a9c859033f6e. > > > > Elsewhere it seems the ffs driver takes effort to drop any locks before calling usb_ep_dequeue(), so this seems like > > that should be addressed, but it also seems like recent change to the dwc3 driver has been made to avoid sleeping > > in that path (see fec9095bdef4 ("usb: dwc3: gadget: remove wait_end_transfer")), which may be why I'm not seeing > > the problem with mainline (and your patch here, of coarse). But that also doesn't clarify if its still a potential issue > > w/ non-dwc3 platforms. > > > > So for older kernels, do you have a suggestion of which approach is advised? Does usb_ep_dequeue need to avoid > > sleeping or do we need to rework the ffs_aio_cancel logic? > > Are you seeing this issue with Android? When running adb reboot? > I have tried 4.19 and 4.9 kernel with Android P-dessert on one of the Intel platforms, but no luck on reproducing the issue. You probably need to be running AOSP/master to trigger this. The changes which uncovered this landed just last week. > I will get back to you if I could reproduce the issue. I'm afraid I won’t be able to do much by just looking at the code. So as discussed in further mails, the main issue seems to be the dwc3 code was sleeping in its ep_dequeue logic, which isn't safe as ffs_aio_cancel calls it while holding a spinlock. Upstream the dw3c driver has been fixed, but -stable kernels still have the issue. thanks -john
On Wed, Mar 20, 2019 at 12:32 PM <fei.yang@intel.com> wrote: > > From: Fei Yang <fei.yang@intel.com> > > The following kernel panic happens due to the io_data buffer gets deallocated > before the async io is completed. Add a check for the case where io_data buffer > should be deallocated by ffs_user_copy_worker. > > [ 41.663334] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 > [ 41.672099] #PF error: [normal kernel read fault] > [ 41.677356] PGD 20c974067 P4D 20c974067 PUD 20c973067 PMD 0 > [ 41.683687] Oops: 0000 [#1] PREEMPT SMP > [ 41.687976] CPU: 1 PID: 7 Comm: kworker/u8:0 Tainted: G U 5.0.0-quilt-2e5dc0ac-00790-gd8c79f2-dirty #2 > [ 41.705309] Workqueue: adb ffs_user_copy_worker > [ 41.705316] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.705318] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.705320] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.705322] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.705323] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.705324] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.705325] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.705326] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.705328] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.705329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.705330] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.705331] Call Trace: > [ 41.705338] vfree+0x50/0xb0 > [ 41.705341] ffs_user_copy_worker+0xe9/0x1c0 > [ 41.705344] process_one_work+0x19f/0x3e0 > [ 41.705348] worker_thread+0x3f/0x3b0 > [ 41.829766] kthread+0x12b/0x150 > [ 41.833371] ? process_one_work+0x3e0/0x3e0 > [ 41.838045] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 41.843695] ret_from_fork+0x3a/0x50 > [ 41.847689] Modules linked in: hci_uart bluetooth ecdh_generic rfkill_gpio dwc3_pci dwc3 snd_usb_audio mei_me tpm_crb snd_usbmidi_lib xhci_pci xhci_hcd mei tpm snd_hwdep cfg80211 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_hda_core videobuf2_dma_sg crlmodule > [ 41.876880] CR2: 0000000000000048 > [ 41.880584] ---[ end trace 2bc4addff0f2e673 ]--- > [ 41.891346] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.895734] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.916740] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.922583] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.930563] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.938540] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.946520] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.954502] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.962482] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.971536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.977960] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.985930] Kernel panic - not syncing: Fatal exception > [ 41.991817] Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 42.009525] Rebooting in 10 seconds.. > [ 52.014376] ACPI MEMORY or I/O RESET_REG. > > Fixes: 772a7a724f6 ("usb: gadget: f_fs: Allow scatter-gather buffers") > Signed-off-by: Fei Yang <fei.yang@intel.com> > Reviewed-by: Manu Gautam <mgautam@codeaurora.org> > Tested-by: John Stultz <john.stultz@linaro.org> > --- > V2: add tag: "Fixes: 772a7a724f6 ......", Reviewed-by and Tested-by. > Felipe: Just wanted to follow up on this, as I've still not seen this land in your tree. Is there any objection with this fix? thanks -john
On Wed, Mar 20, 2019 at 11:21 AM John Stultz <john.stultz@linaro.org> wrote: > On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > Elsewhere it seems the ffs driver takes effort to drop any locks > > before calling usb_ep_dequeue(), so this seems like that should be > > addressed, but it also seems like recent change to the dwc3 driver has > > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > > seeing the problem with mainline (and your patch here, of coarse). > > But that also doesn't clarify if its still a potential issue w/ > > non-dwc3 platforms. > > Felipe: Given Alan's point, does it make sense to mark the commits > that remove the possible sleep from wait_event_lock_irq() in > dwc3_gadget_ep_dequeue() for -stable? > > Against 4.19.30, the following set manages to cherry-pick cleanly: > git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb > git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d > git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 > git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d > git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 > git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 > git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 > # To get things building, revert modified -stable fix > git revert 25ad17d > #pick actual upstream fix replacing the previous > git cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 > > (Though I'm always a bit hesitant with -stable backports on subsystems > I don't know well. So I'm not sure if this set is fully correct.) > > This set seems to avoid the crash on reboot I was seeing. > > And of course, I'm sure getting that set backported to 4.14 and 4.9 > (and maybe even 4.4, I need to check) will be less clean. Also, I just wanted to follow up on this as well. Does the above set of cherry-picks look ok to others for 4.19-stable? Does anyone have suggestions on how they'd like to see backports to 4.14, 4.9 and 4.4? thanks -john
On Tue, Apr 02, 2019 at 10:26:51AM +0700, John Stultz wrote: > On Wed, Mar 20, 2019 at 11:21 AM John Stultz <john.stultz@linaro.org> wrote: > > On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > Elsewhere it seems the ffs driver takes effort to drop any locks > > > before calling usb_ep_dequeue(), so this seems like that should be > > > addressed, but it also seems like recent change to the dwc3 driver has > > > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > > > seeing the problem with mainline (and your patch here, of coarse). > > > But that also doesn't clarify if its still a potential issue w/ > > > non-dwc3 platforms. > > > > Felipe: Given Alan's point, does it make sense to mark the commits > > that remove the possible sleep from wait_event_lock_irq() in > > dwc3_gadget_ep_dequeue() for -stable? > > > > Against 4.19.30, the following set manages to cherry-pick cleanly: > > git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb > > git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d > > git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 > > git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d > > git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 > > git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 > > git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 > > # To get things building, revert modified -stable fix > > git revert 25ad17d > > #pick actual upstream fix replacing the previous > > git cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 > > > > (Though I'm always a bit hesitant with -stable backports on subsystems > > I don't know well. So I'm not sure if this set is fully correct.) > > > > This set seems to avoid the crash on reboot I was seeing. > > > > And of course, I'm sure getting that set backported to 4.14 and 4.9 > > (and maybe even 4.4, I need to check) will be less clean. > > Also, I just wanted to follow up on this as well. Does the above set > of cherry-picks look ok to others for 4.19-stable? Does anyone have > suggestions on how they'd like to see backports to 4.14, 4.9 and 4.4? If they are ok, can someone send me the commits as a series of patches, as doing the above really doesn't help much :) thanks, greg k-h
On Wed, Apr 24, 2019 at 9:50 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Apr 02, 2019 at 10:26:51AM +0700, John Stultz wrote: > > On Wed, Mar 20, 2019 at 11:21 AM John Stultz <john.stultz@linaro.org> wrote: > > > On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > > Elsewhere it seems the ffs driver takes effort to drop any locks > > > > before calling usb_ep_dequeue(), so this seems like that should be > > > > addressed, but it also seems like recent change to the dwc3 driver has > > > > been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm not > > > > seeing the problem with mainline (and your patch here, of coarse). > > > > But that also doesn't clarify if its still a potential issue w/ > > > > non-dwc3 platforms. > > > > > > Felipe: Given Alan's point, does it make sense to mark the commits > > > that remove the possible sleep from wait_event_lock_irq() in > > > dwc3_gadget_ep_dequeue() for -stable? > > > > > > Against 4.19.30, the following set manages to cherry-pick cleanly: > > > git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb > > > git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d > > > git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 > > > git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d > > > git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 > > > git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 > > > git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 > > > # To get things building, revert modified -stable fix > > > git revert 25ad17d > > > #pick actual upstream fix replacing the previous > > > git cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 > > > > > > (Though I'm always a bit hesitant with -stable backports on subsystems > > > I don't know well. So I'm not sure if this set is fully correct.) > > > > > > This set seems to avoid the crash on reboot I was seeing. > > > > > > And of course, I'm sure getting that set backported to 4.14 and 4.9 > > > (and maybe even 4.4, I need to check) will be less clean. > > > > Also, I just wanted to follow up on this as well. Does the above set > > of cherry-picks look ok to others for 4.19-stable? Does anyone have > > suggestions on how they'd like to see backports to 4.14, 4.9 and 4.4? > > If they are ok, can someone send me the commits as a series of patches, > as doing the above really doesn't help much :) Yea, so I'm happy to send that set to you for 4.19, assuming Felipe or someone else OKs it. I'm still at a bit of a loss for what to do for older (4.14/4.9/4.4) -stable kernels, as its going to be difficult to backport those. thanks -john
> On Wed, Apr 24, 2019 at 9:50 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Apr 02, 2019 at 10:26:51AM +0700, John Stultz wrote: > > > On Wed, Mar 20, 2019 at 11:21 AM John Stultz <john.stultz@linaro.org> wrote: > > > > On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > > > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > > > Elsewhere it seems the ffs driver takes effort to drop any locks > > > > > before calling usb_ep_dequeue(), so this seems like that should > > > > > be addressed, but it also seems like recent change to the dwc3 > > > > > driver has been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > > > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm > > > > > not seeing the problem with mainline (and your patch here, of coarse). > > > > > But that also doesn't clarify if its still a potential issue w/ > > > > > non-dwc3 platforms. > > > > > > > > Felipe: Given Alan's point, does it make sense to mark the commits > > > > that remove the possible sleep from wait_event_lock_irq() in > > > > dwc3_gadget_ep_dequeue() for -stable? > > > > > > > > Against 4.19.30, the following set manages to cherry-pick cleanly: > > > > git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb > > > > git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d > > > > git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 > > > > git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d > > > > git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 > > > > git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 > > > > git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 > > > > # To get things building, revert modified -stable fix git revert > > > > 25ad17d #pick actual upstream fix replacing the previous git > > > > cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 > > > > > > > > (Though I'm always a bit hesitant with -stable backports on > > > > subsystems I don't know well. So I'm not sure if this set is fully > > > > correct.) > > > > > > > > This set seems to avoid the crash on reboot I was seeing. > > > > > > > > And of course, I'm sure getting that set backported to 4.14 and > > > > 4.9 (and maybe even 4.4, I need to check) will be less clean. > > > > > > Also, I just wanted to follow up on this as well. Does the above > > > set of cherry-picks look ok to others for 4.19-stable? Does anyone > > > have suggestions on how they'd like to see backports to 4.14, 4.9 and 4.4? > > > > If they are ok, can someone send me the commits as a series of > > patches, as doing the above really doesn't help much :) > > Yea, so I'm happy to send that set to you for 4.19, assuming Felipe or someone else OKs it. > I'm still at a bit of a loss for what to do for older (4.14/4.9/4.4) -stable kernels, as its going to be difficult to backport those. The patch mentioned in the title of this thread doesn't apply to older kernels. The kernel panic was introduced in 5.0-rcX.
On Thu, Apr 25, 2019 at 9:01 AM Yang, Fei <fei.yang@intel.com> wrote: > > > On Wed, Apr 24, 2019 at 9:50 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Apr 02, 2019 at 10:26:51AM +0700, John Stultz wrote: > > > > On Wed, Mar 20, 2019 at 11:21 AM John Stultz <john.stultz@linaro.org> wrote: > > > > > On Wed, Mar 20, 2019 at 9:40 AM John Stultz <john.stultz@linaro.org> wrote: > > > > > > On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@intel.com> wrote: > > > > > > Elsewhere it seems the ffs driver takes effort to drop any locks > > > > > > before calling usb_ep_dequeue(), so this seems like that should > > > > > > be addressed, but it also seems like recent change to the dwc3 > > > > > > driver has been made to avoid sleeping in that path (see fec9095bdef4 ("usb: > > > > > > dwc3: gadget: remove wait_end_transfer")), which may be why I'm > > > > > > not seeing the problem with mainline (and your patch here, of coarse). > > > > > > But that also doesn't clarify if its still a potential issue w/ > > > > > > non-dwc3 platforms. > > > > > > > > > > Felipe: Given Alan's point, does it make sense to mark the commits > > > > > that remove the possible sleep from wait_event_lock_irq() in > > > > > dwc3_gadget_ep_dequeue() for -stable? > > > > > > > > > > Against 4.19.30, the following set manages to cherry-pick cleanly: > > > > > git cherry-pick 1a22ec643580626f439c8583edafdcc73798f2fb > > > > > git cherry-pick 09fe1f8d7e2f461275b1cdd832f2cfa5e9be346d > > > > > git cherry-pick c3acd59014148470dc58519870fbc779785b4bf7 > > > > > git cherry-pick 7746a8dfb3f9c91b3a0b63a1d5c2664410e6498d > > > > > git cherry-pick d5443bbf5fc8f8389cce146b1fc2987cdd229d12 > > > > > git cherry-pick d4f1afe5e896c18ae01099a85dab5e1a198bd2a8 > > > > > git cherry-pick fec9095bdef4e7c988adb603d0d4f92ee735d4a1 > > > > > # To get things building, revert modified -stable fix git revert > > > > > 25ad17d #pick actual upstream fix replacing the previous git > > > > > cherry-pick bd6742249b9ca918565e4e3abaa06665e587f4b5 > > > > > > > > > > (Though I'm always a bit hesitant with -stable backports on > > > > > subsystems I don't know well. So I'm not sure if this set is fully > > > > > correct.) > > > > > > > > > > This set seems to avoid the crash on reboot I was seeing. > > > > > > > > > > And of course, I'm sure getting that set backported to 4.14 and > > > > > 4.9 (and maybe even 4.4, I need to check) will be less clean. > > > > > > > > Also, I just wanted to follow up on this as well. Does the above > > > > set of cherry-picks look ok to others for 4.19-stable? Does anyone > > > > have suggestions on how they'd like to see backports to 4.14, 4.9 and 4.4? > > > > > > If they are ok, can someone send me the commits as a series of > > > patches, as doing the above really doesn't help much :) > > > > Yea, so I'm happy to send that set to you for 4.19, assuming Felipe or someone else OKs it. > > I'm still at a bit of a loss for what to do for older (4.14/4.9/4.4) -stable kernels, as its going to be difficult to backport those. > > The patch mentioned in the title of this thread doesn't apply to older kernels. The kernel panic was introduced in 5.0-rcX. Right. On older kernels there was a separate issue that seems to be dwc3 specific that was biting me in a similar way. That's what the sha list above references. thanks -john
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 20413c2..47be961 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -1133,7 +1133,8 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) error_mutex: mutex_unlock(&epfile->mutex); error: - ffs_free_buffer(io_data); + if (ret != -EIOCBQUEUED) /* don't free if there is iocb queued */ + ffs_free_buffer(io_data); return ret; }