Message ID | 20200520233129.3704-1-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | USB: cdc-wdm: Call wake_up_all() when clearing WDM_IN_USE bit. | expand |
On Thu, May 21, 2020 at 08:31:29AM +0900, Tetsuo Handa wrote: > Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit, > make sure to call wake_up_all() when clearing WDM_IN_USE bit. I'm sorry, but I don't understand what this changelog text is trying to say. Can you reword this differently? > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> And is this a bugfix? For what? Does it need to go to stable kernels? > --- > drivers/usb/class/cdc-wdm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..bf2a6deb7b08 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) > kfree(desc->outbuf); > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -426,6 +426,7 @@ static ssize_t wdm_write > clear_bit(WDM_IN_USE, &desc->flags); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > + wake_up_all(&desc->wait); Is this a fix for two different issues? confused, greg k-h
On 2020/05/21 16:33, Greg KH wrote: > On Thu, May 21, 2020 at 08:31:29AM +0900, Tetsuo Handa wrote: >> Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit, >> make sure to call wake_up_all() when clearing WDM_IN_USE bit. > > I'm sorry, but I don't understand what this changelog text is trying to > say. Can you reword this differently? wait_event("condition") wants corresponding wake_up() (or its variant) after making "condition" == true. For example, wdm_flush() calls wait_event(desc->wait, /* * needs both flags. We cannot do with one * because resetting it would cause a race * with write() yet we need to signal * a disconnect */ !test_bit(WDM_IN_USE, &desc->flags) || test_bit(WDM_DISCONNECTING, &desc->flags)); but wdm_write() is not calling wake_up(&desc->wait) after clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed. > And is this a bugfix? For what? Does it need to go to stable kernels? Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(), but I think that this patch won't fix these bugs. Therefore, I don't think this patch needs to go to stable kernels.
Am Donnerstag, den 21.05.2020, 19:01 +0900 schrieb Tetsuo Handa: > Hi, interesting. Do you have a test case for these patches working? > wait_event(desc->wait, > /* > * needs both flags. We cannot do with one > * because resetting it would cause a race > * with write() yet we need to signal > * a disconnect > */ > !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > > but wdm_write() is not calling wake_up(&desc->wait) after > clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed. Yes, because desc->wlock is held. There can be nobody sleeping here. > > And is this a bugfix? For what? Does it need to go to stable kernels? > > Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(), > but I think that this patch won't fix these bugs. Therefore, I don't think > this patch needs to go to stable kernels. Do you have links for them? Regards Oliver
On 2020/05/22 4:50, Oliver Neukum wrote: > interesting. Do you have a test case for these patches working? No. > >> wait_event(desc->wait, >> /* >> * needs both flags. We cannot do with one >> * because resetting it would cause a race >> * with write() yet we need to signal >> * a disconnect >> */ >> !test_bit(WDM_IN_USE, &desc->flags) || >> test_bit(WDM_DISCONNECTING, &desc->flags)); >> >> but wdm_write() is not calling wake_up(&desc->wait) after >> clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed. > > Yes, because desc->wlock is held. There can be nobody sleeping here. Then, this patch is not needed. (But adding some comment is welcomed.) > >>> And is this a bugfix? For what? Does it need to go to stable kernels? >> >> Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(), >> but I think that this patch won't fix these bugs. Therefore, I don't think >> this patch needs to go to stable kernels. > > Do you have links for them? It turned out that the second bug report is a duplicate of the first bug report. The dashboard link for the first bug report is https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 .
Am Freitag, den 22.05.2020, 07:48 +0900 schrieb Tetsuo Handa: Hi, I looked at your patch again and I am impressed and I need to apologize. I looked at only the intended use, but overlooked the unintended use. I think we need to work on the description of the patch though. > On 2020/05/22 4:50, Oliver Neukum wrote: > > interesting. Do you have a test case for these patches working? > > No. Yes, going through the logs it looks that you need to trigger an error case for this race to strike. > > > wait_event(desc->wait, > > > /* > > > * needs both flags. We cannot do with one > > > * because resetting it would cause a race > > > * with write() yet we need to signal > > > * a disconnect > > > */ > > > !test_bit(WDM_IN_USE, &desc->flags) || > > > test_bit(WDM_DISCONNECTING, &desc->flags)); > > > > > > but wdm_write() is not calling wake_up(&desc->wait) after > > > clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed. > > > > Yes, because desc->wlock is held. There can be nobody sleeping here. > > Then, this patch is not needed. (But adding some comment is welcomed.) OK and here I screwed up. wlock is held in wdm_write(), but not in wdm_flush() So may I suggest the following log: -- WDM_IN_USE is used in wdm_write() to protect against concurrent writes and in wdm_flush() to wait for all messages to flush, so that errors are not lost. The former use is guarded by a mutex, the later, as it does no IO, is not. In the error case of wdm_write(), however, there is a race, which can make wdm_flush() wait for IO that was supposed to be started but was never started due to an error condition. Hence if an error is detected in wdm_write() after WDM_IN_USE was toggeled, all potential waiters must be woken. As multiple tasks can be in wdm_flush, wake_up_all() must be used in all cases WDM_IN_USE is reset. -- May I ask you to redo the patch with comments added stating that the wake up is done for the sake of wdm_flush(), change the description and add the link to syzkaller? > > > > And is this a bugfix? For what? Does it need to go to stable kernels? Yes, it is. The bug is ancient. It goes back to afba937e540c9, which introduced the driver. Again, thank you for this impressive piece of debugging. Regards Oliver
On 2020/05/22 17:04, Oliver Neukum wrote: > May I ask you to redo the patch with comments added stating > that the wake up is done for the sake of wdm_flush(), change > the description and add the link to syzkaller? You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 , but this patch did not solve the problem syzbot has found. Thus, I don't add a link to syzkaller...
Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa: > On 2020/05/22 17:04, Oliver Neukum wrote: > > May I ask you to redo the patch with comments added stating > > that the wake up is done for the sake of wdm_flush(), change > > the description and add the link to syzkaller? > > You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at > https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 , > but this patch did not solve the problem syzbot has found. Thus, I don't add > a link to syzkaller... Hi, this is odd. I looked at the driver and it looks to me like pre/post_reset() is the only other place that touches WDM_IN_USE. And it does so correctly. Any idea what could be wrong? Regards Oliver
On 2020/05/25 21:06, Oliver Neukum wrote: > Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa: >> On 2020/05/22 17:04, Oliver Neukum wrote: >>> May I ask you to redo the patch with comments added stating >>> that the wake up is done for the sake of wdm_flush(), change >>> the description and add the link to syzkaller? >> >> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at >> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 , >> but this patch did not solve the problem syzbot has found. Thus, I don't add >> a link to syzkaller... > > Hi, > > this is odd. I looked at the driver and it looks to me like > pre/post_reset() is the only other place that touches WDM_IN_USE. > And it does so correctly. Any idea what could be wrong? I haven't understood the root cause of this problem. You were in the thread at https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ . My patch was made from that thread (which simply makes sure to call wake_up_all() when clearing WDM_IN_USE bit) apart from the root cause of this problem.
On 2020/05/25 22:32, Tetsuo Handa wrote: > On 2020/05/25 21:06, Oliver Neukum wrote: >> Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa: >>> On 2020/05/22 17:04, Oliver Neukum wrote: >>>> May I ask you to redo the patch with comments added stating >>>> that the wake up is done for the sake of wdm_flush(), change >>>> the description and add the link to syzkaller? >>> >>> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at >>> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 , >>> but this patch did not solve the problem syzbot has found. Thus, I don't add >>> a link to syzkaller... >> >> Hi, >> >> this is odd. I looked at the driver and it looks to me like >> pre/post_reset() is the only other place that touches WDM_IN_USE. >> And it does so correctly. Any idea what could be wrong? > > I haven't understood the root cause of this problem. You were in the thread at > https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ . > My patch was made from that thread (which simply makes sure to call wake_up_all() > when clearing WDM_IN_USE bit) apart from the root cause of this problem. > I might have found what is wrong. My understanding is that a process using /dev/raw-gadget is responsible for reacting to every USB request. I don't know whether /dev/raw-gadget already provides callback for aborting the in-flight USB requests (in order to resume wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget already provides such callback in the following paragraphs. Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into deadlock condition due to the need to close both files when the reproducer is killed. My guess is that since that process is stuck at wdm_flush() (due to explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL), that process cannot react to USB requests which are needed for resuming wdm_flush(). Unexpectedly blocking a process which is responsible for reacting to USB requests will look as if it is a broken hardware. A debug printk() patch (shown below) says that the desc->command urb issued by wdm_write() never gets wdm_out_callback(). I guess that it is an evidence of such dependency exists, and I think that a process had better avoid opening both /dev/raw-gadget and /dev/cdc-wdm0 at the same time. ---------------------------------------- diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 5b0bd8581fe6..e352b54dbbf2 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -320,12 +320,19 @@ EXPORT_SYMBOL_GPL(unwind_get_return_address); unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) { + struct task_struct *task = state->task; + if (unwind_done(state)) return NULL; if (state->regs) return &state->regs->ip; + if (task != current && state->sp == task->thread.sp) { + struct inactive_task_frame *frame = (void *)task->thread.sp; + return &frame->ret_addr; + } + if (state->sp) return (unsigned long *)state->sp - 1; diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..5d28a1a2521d 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -150,8 +150,9 @@ static void wdm_out_callback(struct urb *urb) spin_unlock_irqrestore(&desc->iuspin, flags); kfree(desc->outbuf); desc->outbuf = NULL; + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -417,17 +418,21 @@ static ssize_t wdm_write req->wValue = 0; req->wIndex = desc->inum; /* already converted */ req->wLength = cpu_to_le16(count); + printk("%s : set WDM_IN_USE %px %px\n", __func__, desc, desc->command); set_bit(WDM_IN_USE, &desc->flags); desc->outbuf = buf; rv = usb_submit_urb(desc->command, GFP_KERNEL); if (rv < 0) { desc->outbuf = NULL; + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); clear_bit(WDM_IN_USE, &desc->flags); dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); + wake_up_all(&desc->wait); goto out_free_mem_pm; } else { + printk("%s : waiting for wdm_out_callback %px\n", __func__, desc); dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n", le16_to_cpu(req->wIndex)); } @@ -587,6 +592,7 @@ static int wdm_flush(struct file *file, fl_owner_t id) { struct wdm_device *desc = file->private_data; + printk("%s : start wait_event %px\n", __func__, desc); wait_event(desc->wait, /* * needs both flags. We cannot do with one @@ -596,6 +602,7 @@ static int wdm_flush(struct file *file, fl_owner_t id) */ !test_bit(WDM_IN_USE, &desc->flags) || test_bit(WDM_DISCONNECTING, &desc->flags)); + printk("%s : end wait_event %px\n", __func__, desc); /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) @@ -642,6 +649,7 @@ static int wdm_open(struct inode *inode, struct file *file) mutex_lock(&wdm_mutex); desc = wdm_find_device_by_minor(minor); + printk("%s : start %px\n", __func__, desc); if (!desc) goto out; @@ -676,6 +684,7 @@ static int wdm_open(struct inode *inode, struct file *file) desc->manage_power(intf, 1); usb_autopm_put_interface(desc->intf); out: + printk("%s : done %px\n", __func__, desc); mutex_unlock(&wdm_mutex); return rv; } @@ -685,6 +694,7 @@ static int wdm_release(struct inode *inode, struct file *file) struct wdm_device *desc = file->private_data; mutex_lock(&wdm_mutex); + printk("%s : start %px\n", __func__, desc); /* using write lock to protect desc->count */ mutex_lock(&desc->wlock); @@ -694,6 +704,7 @@ static int wdm_release(struct inode *inode, struct file *file) if (!desc->count) { if (!test_bit(WDM_DISCONNECTING, &desc->flags)) { dev_dbg(&desc->intf->dev, "wdm_release: cleanup\n"); + printk("%s : kill_urbs() %px\n", __func__, desc); kill_urbs(desc); spin_lock_irq(&desc->iuspin); desc->resp_count = 0; @@ -705,6 +716,7 @@ static int wdm_release(struct inode *inode, struct file *file) cleanup(desc); } } + printk("%s : done %px\n", __func__, desc); mutex_unlock(&wdm_mutex); return 0; } @@ -832,6 +844,7 @@ static int wdm_create(struct usb_interface *intf, struct usb_endpoint_descriptor desc->command = usb_alloc_urb(0, GFP_KERNEL); if (!desc->command) goto err; + desc->command->trace_me = true; desc->ubuf = kmalloc(desc->wMaxCommand, GFP_KERNEL); if (!desc->ubuf) @@ -979,15 +992,18 @@ static void wdm_disconnect(struct usb_interface *intf) usb_deregister_dev(intf, &wdm_class); desc = wdm_find_device(intf); mutex_lock(&wdm_mutex); + printk("%s : start %px\n", __func__, desc); /* the spinlock makes sure no new urbs are generated in the callbacks */ spin_lock_irqsave(&desc->iuspin, flags); + printk("%s : set WDM_DISCONNECTING %px\n", __func__, desc); set_bit(WDM_DISCONNECTING, &desc->flags); set_bit(WDM_READ, &desc->flags); spin_unlock_irqrestore(&desc->iuspin, flags); wake_up_all(&desc->wait); mutex_lock(&desc->rlock); mutex_lock(&desc->wlock); + printk("%s : kill_urbs() %px\n", __func__, desc); kill_urbs(desc); cancel_work_sync(&desc->rxwork); cancel_work_sync(&desc->service_outs_intr); @@ -1003,6 +1019,7 @@ static void wdm_disconnect(struct usb_interface *intf) cleanup(desc); else dev_dbg(&intf->dev, "%d open files - postponing cleanup\n", desc->count); + printk("%s : done %px\n", __func__, desc); mutex_unlock(&wdm_mutex); } @@ -1031,6 +1048,7 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message) set_bit(WDM_SUSPENDING, &desc->flags); spin_unlock_irq(&desc->iuspin); /* callback submits work - order is essential */ + printk("%s : kill_urbs() %px\n", __func__, desc); kill_urbs(desc); cancel_work_sync(&desc->rxwork); cancel_work_sync(&desc->service_outs_intr); @@ -1082,15 +1100,19 @@ static int wdm_pre_reset(struct usb_interface *intf) * must be done before recovery lest a spontaneous * message from the device is lost */ + printk("%s : start\n", __func__); spin_lock_irq(&desc->iuspin); + printk("%s : set WDM_RESETTING %px\n", __func__, desc); set_bit(WDM_RESETTING, &desc->flags); /* inform read/write */ set_bit(WDM_READ, &desc->flags); /* unblock read */ + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); clear_bit(WDM_IN_USE, &desc->flags); /* unblock write */ desc->rerr = -EINTR; spin_unlock_irq(&desc->iuspin); wake_up_all(&desc->wait); mutex_lock(&desc->rlock); mutex_lock(&desc->wlock); + printk("%s : kill_urbs() %px\n", __func__, desc); kill_urbs(desc); cancel_work_sync(&desc->rxwork); cancel_work_sync(&desc->service_outs_intr); @@ -1103,10 +1125,12 @@ static int wdm_post_reset(struct usb_interface *intf) int rv; clear_bit(WDM_OVERFLOW, &desc->flags); + printk("%s : clear WDM_RESETTING %px\n", __func__, desc); clear_bit(WDM_RESETTING, &desc->flags); rv = recover_from_urb_loss(desc); mutex_unlock(&desc->wlock); mutex_unlock(&desc->rlock); + printk("%s : done\n", __func__); return rv; } diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index aa45840d8273..ab3d3b65a6cf 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1521,6 +1521,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) int status; struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); + if (urb->trace_me) + printk("%s : start %px\n", __func__, urb); /* increment urb's reference count as part of giving it to the HCD * (which will control it). HCD guarantees that it either returns * an error or calls giveback(), but not both. @@ -1550,6 +1552,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) } if (unlikely(status)) { + if (urb->trace_me) + printk("%s : error %px %d\n", __func__, urb, status); usbmon_urb_submit_error(&hcd->self, urb, status); urb->hcpriv = NULL; INIT_LIST_HEAD(&urb->urb_list); @@ -1559,6 +1563,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) wake_up(&usb_kill_urb_queue); usb_put_urb(urb); } + if (urb->trace_me) + printk("%s : end %px %d\n", __func__, urb, status); return status; } @@ -1582,6 +1588,8 @@ static int unlink1(struct usb_hcd *hcd, struct urb *urb, int status) * Such failures should be harmless. */ value = hcd->driver->urb_dequeue(hcd, urb, status); } + if (urb->trace_me) + printk("%s : %px %d\n", __func__, urb, value); return value; } @@ -1645,6 +1653,8 @@ static void __usb_hcd_giveback_urb(struct urb *urb) /* pass ownership to the completion handler */ urb->status = status; + if (urb->trace_me) + printk("%s : complete %px %d\n", __func__, urb, status); urb->complete(urb); usb_anchor_resume_wakeups(anchor); diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c index e01e366d89cd..9adcc978804e 100644 --- a/drivers/usb/gadget/legacy/raw_gadget.c +++ b/drivers/usb/gadget/legacy/raw_gadget.c @@ -33,6 +33,10 @@ MODULE_DESCRIPTION(DRIVER_DESC); MODULE_AUTHOR("Andrey Konovalov"); MODULE_LICENSE("GPL"); +#undef dev_dbg +#define dev_dbg(dev, fmt, ...) \ + dev_printk(KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__) + /*----------------------------------------------------------------------*/ #define RAW_EVENT_QUEUE_SIZE 16 diff --git a/include/linux/usb.h b/include/linux/usb.h index 9f3c721c70dc..0843e07f6cc2 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1589,6 +1589,7 @@ struct urb { int error_count; /* (return) number of ISO errors */ void *context; /* (in) context for completion */ usb_complete_t complete; /* (in) completion routine */ + bool trace_me; struct usb_iso_packet_descriptor iso_frame_desc[]; /* (in) ISO ONLY */ }; ---------------------------------------- ---------------------------------------- [ 25.056066][ T12] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef [ 25.056724][ T1175] udc-core: couldn't find an available UDC or it's busy [ 25.056730][ T1175] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.057711][ T1176] udc-core: couldn't find an available UDC or it's busy [ 25.057719][ T1176] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.058090][ T1171] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.058104][ T1178] udc-core: couldn't find an available UDC or it's busy [ 25.058233][ T1177] udc-core: couldn't find an available UDC or it's busy [ 25.058238][ T1177] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.058685][ T1179] udc-core: couldn't find an available UDC or it's busy [ 25.058692][ T1179] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.060237][ T12] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2 [ 25.060250][ T1180] udc-core: couldn't find an available UDC or it's busy [ 25.060255][ T1180] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.060737][ T1181] udc-core: couldn't find an available UDC or it's busy [ 25.060741][ T1181] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.060836][ T1182] udc-core: couldn't find an available UDC or it's busy [ 25.060840][ T1182] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.061235][ T1183] udc-core: couldn't find an available UDC or it's busy [ 25.061241][ T1183] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.062374][ T1178] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.064298][ T12] usb 5-1: SerialNumber: syz [ 25.065151][ T1185] udc-core: couldn't find an available UDC or it's busy [ 25.065161][ T1184] udc-core: couldn't find an available UDC or it's busy [ 25.065168][ T1184] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.065191][ T1186] udc-core: couldn't find an available UDC or it's busy [ 25.065195][ T1186] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.065451][ T1187] udc-core: couldn't find an available UDC or it's busy [ 25.065457][ T1187] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.068979][ T1189] udc-core: couldn't find an available UDC or it's busy [ 25.070166][ T1185] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.072145][ T1191] udc-core: couldn't find an available UDC or it's busy [ 25.072151][ T1191] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.072166][ T1190] udc-core: couldn't find an available UDC or it's busy [ 25.072174][ T1190] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.072254][ T1189] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.073991][ T1193] udc-core: couldn't find an available UDC or it's busy [ 25.073997][ T1193] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.074002][ T1192] udc-core: couldn't find an available UDC or it's busy [ 25.074006][ T1192] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.087896][ T1196] udc-core: couldn't find an available UDC or it's busy [ 25.087933][ T1195] udc-core: couldn't find an available UDC or it's busy [ 25.087941][ T1195] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.088229][ T1197] udc-core: couldn't find an available UDC or it's busy [ 25.088235][ T1197] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.088239][ T1194] udc-core: couldn't find an available UDC or it's busy [ 25.088244][ T1194] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.090788][ T1198] udc-core: couldn't find an available UDC or it's busy [ 25.092032][ T1196] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.093018][ T1199] udc-core: couldn't find an available UDC or it's busy [ 25.093023][ T1200] udc-core: couldn't find an available UDC or it's busy [ 25.093027][ T1200] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.093886][ T1198] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.094586][ T1201] udc-core: couldn't find an available UDC or it's busy [ 25.095028][ T1202] udc-core: couldn't find an available UDC or it's busy [ 25.095032][ T1202] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.095403][ T1204] udc-core: couldn't find an available UDC or it's busy [ 25.095409][ T1204] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.095411][ T1203] udc-core: couldn't find an available UDC or it's busy [ 25.095416][ T1203] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.095896][ T1199] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.099620][ T1205] udc-core: couldn't find an available UDC or it's busy [ 25.100765][ T1201] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.102512][ T1205] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.104434][ T1207] udc-core: couldn't find an available UDC or it's busy [ 25.104439][ T1207] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.104483][ T1206] udc-core: couldn't find an available UDC or it's busy [ 25.104489][ T1206] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.104516][ T1208] udc-core: couldn't find an available UDC or it's busy [ 25.104521][ T1208] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.122543][ T12] hub 5-1:118.0: bad descriptor, ignoring hub [ 25.122597][ T1210] udc-core: couldn't find an available UDC or it's busy [ 25.122604][ T1210] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.122976][ T1211] udc-core: couldn't find an available UDC or it's busy [ 25.122983][ T1211] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.123591][ T1212] udc-core: couldn't find an available UDC or it's busy [ 25.123596][ T1212] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.123654][ T1209] udc-core: couldn't find an available UDC or it's busy [ 25.123659][ T1209] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.124649][ T1213] udc-core: couldn't find an available UDC or it's busy [ 25.124809][ T1214] udc-core: couldn't find an available UDC or it's busy [ 25.124814][ T1214] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.125168][ T1215] udc-core: couldn't find an available UDC or it's busy [ 25.125174][ T1215] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.126547][ T12] hub: probe of 5-1:118.0 failed with error -5 [ 25.128767][ T1213] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.129656][ T1216] udc-core: couldn't find an available UDC or it's busy [ 25.129662][ T1216] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.129678][ T1217] udc-core: couldn't find an available UDC or it's busy [ 25.129684][ T1217] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.129968][ T1218] udc-core: couldn't find an available UDC or it's busy [ 25.129973][ T1218] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.132093][ T12] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device [ 25.132549][ T1219] udc-core: couldn't find an available UDC or it's busy [ 25.132554][ T1219] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.133379][ T1222] udc-core: couldn't find an available UDC or it's busy [ 25.133385][ T1222] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.133552][ T1223] udc-core: couldn't find an available UDC or it's busy [ 25.133708][ T1221] udc-core: couldn't find an available UDC or it's busy [ 25.133713][ T1221] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.135592][ T1224] udc-core: couldn't find an available UDC or it's busy [ 25.135598][ T1224] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.136158][ T12] wdm_suspend : kill_urbs() ffff88822dffd000 [ 25.136578][ T1225] udc-core: couldn't find an available UDC or it's busy [ 25.136583][ T1225] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.136638][ T1226] udc-core: couldn't find an available UDC or it's busy [ 25.136642][ T1226] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.136928][ T1227] udc-core: couldn't find an available UDC or it's busy [ 25.136933][ T1227] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.138275][ T1228] udc-core: couldn't find an available UDC or it's busy [ 25.138281][ T1228] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.138293][ T1223] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.138821][ T1229] udc-core: couldn't find an available UDC or it's busy [ 25.138948][ T1230] udc-core: couldn't find an available UDC or it's busy [ 25.138952][ T1230] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.139091][ T1231] udc-core: couldn't find an available UDC or it's busy [ 25.139096][ T1231] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.141700][ T1232] udc-core: couldn't find an available UDC or it's busy [ 25.142014][ T1233] udc-core: couldn't find an available UDC or it's busy [ 25.142020][ T1233] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.142097][ T1233] wdm_open : start ffff88822dffd000 [ 25.142251][ T1234] udc-core: couldn't find an available UDC or it's busy [ 25.142258][ T1234] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.142285][ T1235] udc-core: couldn't find an available UDC or it's busy [ 25.142289][ T1235] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.142372][ T1229] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.144150][ T1232] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 25.319563][ T1233] wdm_open : done ffff88822dffd000 [ 25.732316][ T1235] wdm_open : start ffff88822dffd000 [ 25.732389][ T1233] wdm_write : set WDM_IN_USE ffff88822dffd000 ffff888234392f00 [ 25.736342][ T1233] usb_hcd_submit_urb : start ffff888234392f00 [ 25.738112][ T1233] usb_hcd_submit_urb : end ffff888234392f00 0 [ 25.739989][ T1233] wdm_write : waiting for wdm_out_callback ffff88822dffd000 [ 25.742837][ T1233] wdm_flush : start wait_event ffff88822dffd000 [ 25.742948][ T1235] wdm_open : done ffff88822dffd000 [ 25.747707][ T1234] wdm_open : start ffff88822dffd000 [ 30.141494][ T1234] wdm_open : done ffff88822dffd000 [ 30.141711][ T1235] wdm_flush : start wait_event ffff88822dffd000 [ 30.143693][ T1232] wdm_open : start ffff88822dffd000 [ 30.143916][ T1234] wdm_flush : start wait_event ffff88822dffd000 [ 30.151409][ T1232] wdm_open : done ffff88822dffd000 [ 30.153610][ T841] wdm_open : start ffff88822dffd000 [ 30.153788][ T1232] wdm_flush : start wait_event ffff88822dffd000 [ 30.155816][ T841] wdm_open : done ffff88822dffd000 [ 30.159953][ T1229] wdm_open : start ffff88822dffd000 [ 30.160146][ T841] wdm_flush : start wait_event ffff88822dffd000 [ 30.162005][ T1229] wdm_open : done ffff88822dffd000 [ 30.165691][ T1229] wdm_flush : start wait_event ffff88822dffd000 [ 63.298753][ C3] random: crng init done [ 63.299926][ C3] random: 7 urandom warning(s) missed due to ratelimiting [ 168.509958][ T52] INFO: task a.out:1233 blocked for more than 141 seconds. [ 168.516579][ T52] Not tainted 5.7.0-rc5+ #13 [ 168.520995][ T52] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 168.528400][ T52] a.out D13024 1233 838 0x00000004 [ 168.532156][ T52] Call Trace: [ 168.533970][ T52] __schedule+0x363/0x840 [ 168.535991][ T52] ? wdm_suspend+0x160/0x160 [ 168.538199][ T52] schedule+0x5a/0xd0 [ 168.539782][ T52] wdm_flush+0x124/0x190 [ 168.541542][ T52] ? remove_wait_queue+0x60/0x60 [ 168.543503][ T52] filp_close+0x3d/0xa0 [ 168.544874][ T52] __close_fd+0x8f/0x110 [ 168.546267][ T52] __x64_sys_close+0x2a/0x60 [ 168.547843][ T52] do_syscall_64+0x58/0x270 [ 168.549214][ T52] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 168.550889][ T52] RIP: 0033:0x7f54551f4f30 [ 168.552150][ T52] Code: Bad RIP value. [ 168.553323][ T52] RSP: 002b:00007ffe39c17878 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 168.555439][ T52] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f54551f4f30 [ 168.558429][ T52] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 [ 168.561016][ T52] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffe39c178ae [ 168.563155][ T52] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff [ 168.565136][ T52] R13: 0000000000005fb3 R14: 0000000000000000 R15: 0000000000000000 [ 168.567108][ T52] [ 168.567108][ T52] Showing all locks held in the system: [ 168.569123][ T52] 1 lock held by khungtaskd/52: [ 168.570400][ T52] #0: ffffffff83a6baa0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x1b0 [ 168.573387][ T52] 1 lock held by in:imklog/465: [ 168.574597][ T52] #0: ffff88822d28fcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x55/0x60 [ 168.576825][ T52] [ 168.577408][ T52] ============================================= [ 168.577408][ T52] [ 168.579559][ T52] NMI backtrace for cpu 1 [ 168.580819][ T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13 [ 168.582916][ T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 168.585746][ T52] Call Trace: [ 168.586565][ T52] dump_stack+0x97/0xde [ 168.588235][ T52] ? irq_force_complete_move+0x61/0x80 [ 168.590020][ T52] nmi_cpu_backtrace+0xe2/0xf0 [ 168.591382][ T52] ? lapic_can_unplug_cpu+0x90/0x90 [ 168.592679][ T52] nmi_trigger_cpumask_backtrace+0x124/0x170 [ 168.594338][ T52] watchdog+0x4e4/0x950 [ 168.595543][ T52] ? hungtask_pm_notify+0x70/0x70 [ 168.596790][ T52] kthread+0x128/0x170 [ 168.597801][ T52] ? kthread_create_worker_on_cpu+0x90/0x90 [ 168.599257][ T52] ret_from_fork+0x24/0x30 [ 168.600448][ T52] Sending NMI from CPU 1 to CPUs 0,2-7: [ 168.602261][ C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x25/0x180 [ 168.602268][ C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x25/0x180 [ 168.602272][ C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x25/0x180 [ 168.602277][ C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x25/0x180 [ 168.602294][ C7] NMI backtrace for cpu 7 skipped: idling at default_idle+0x25/0x180 [ 168.602299][ C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x25/0x180 [ 168.602308][ C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x25/0x180 [ 168.603243][ T52] Kernel panic - not syncing: hung_task: blocked tasks [ 168.623795][ T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13 [ 168.625777][ T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 168.629089][ T52] Call Trace: [ 168.630281][ T52] dump_stack+0x97/0xde [ 168.631495][ T52] panic+0x135/0x3d5 [ 168.632577][ T52] ? watchdog+0x4e4/0x950 [ 168.633730][ T52] watchdog+0x4f0/0x950 [ 168.634829][ T52] ? hungtask_pm_notify+0x70/0x70 [ 168.636160][ T52] kthread+0x128/0x170 [ 168.637238][ T52] ? kthread_create_worker_on_cpu+0x90/0x90 [ 168.638792][ T52] ret_from_fork+0x24/0x30 [ 168.640142][ T52] Kernel Offset: disabled [ 168.663256][ T52] Rebooting in 86400 seconds.. ----------------------------------------
On Wed, May 27, 2020 at 6:47 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/05/25 22:32, Tetsuo Handa wrote: > > On 2020/05/25 21:06, Oliver Neukum wrote: > >> Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa: > >>> On 2020/05/22 17:04, Oliver Neukum wrote: > >>>> May I ask you to redo the patch with comments added stating > >>>> that the wake up is done for the sake of wdm_flush(), change > >>>> the description and add the link to syzkaller? > >>> > >>> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at > >>> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 , > >>> but this patch did not solve the problem syzbot has found. Thus, I don't add > >>> a link to syzkaller... > >> > >> Hi, > >> > >> this is odd. I looked at the driver and it looks to me like > >> pre/post_reset() is the only other place that touches WDM_IN_USE. > >> And it does so correctly. Any idea what could be wrong? > > > > I haven't understood the root cause of this problem. You were in the thread at > > https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ . > > My patch was made from that thread (which simply makes sure to call wake_up_all() > > when clearing WDM_IN_USE bit) apart from the root cause of this problem. > > > > I might have found what is wrong. > > My understanding is that a process using /dev/raw-gadget is responsible for > reacting to every USB request. I don't know whether /dev/raw-gadget already > provides callback for aborting the in-flight USB requests (in order to resume > wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or > implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget > already provides such callback in the following paragraphs. raw-gadget should kill all unfishished USB requests when the file is closed. > > Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and > /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into > deadlock condition due to the need to close both files when the reproducer is > killed. My guess is that since that process is stuck at wdm_flush() (due to > explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL), > that process cannot react to USB requests which are needed for resuming wdm_flush(). > Unexpectedly blocking a process which is responsible for reacting to USB requests > will look as if it is a broken hardware. Hm, so wdm_flush() is unable to finish unless an expected USB request is received from the device? This is a bug in the wdm driver then. Should we use wait_event_interruptible() instead of wait_event() in wdm_flush()? > > A debug printk() patch (shown below) says that the desc->command urb issued by > wdm_write() never gets wdm_out_callback(). I guess that it is an evidence of > such dependency exists, and I think that a process had better avoid opening both > /dev/raw-gadget and /dev/cdc-wdm0 at the same time. > > ---------------------------------------- > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > index 5b0bd8581fe6..e352b54dbbf2 100644 > --- a/arch/x86/kernel/unwind_orc.c > +++ b/arch/x86/kernel/unwind_orc.c > @@ -320,12 +320,19 @@ EXPORT_SYMBOL_GPL(unwind_get_return_address); > > unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) > { > + struct task_struct *task = state->task; > + > if (unwind_done(state)) > return NULL; > > if (state->regs) > return &state->regs->ip; > > + if (task != current && state->sp == task->thread.sp) { > + struct inactive_task_frame *frame = (void *)task->thread.sp; > + return &frame->ret_addr; > + } > + > if (state->sp) > return (unsigned long *)state->sp - 1; > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..5d28a1a2521d 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -150,8 +150,9 @@ static void wdm_out_callback(struct urb *urb) > spin_unlock_irqrestore(&desc->iuspin, flags); > kfree(desc->outbuf); > desc->outbuf = NULL; > + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -417,17 +418,21 @@ static ssize_t wdm_write > req->wValue = 0; > req->wIndex = desc->inum; /* already converted */ > req->wLength = cpu_to_le16(count); > + printk("%s : set WDM_IN_USE %px %px\n", __func__, desc, desc->command); > set_bit(WDM_IN_USE, &desc->flags); > desc->outbuf = buf; > > rv = usb_submit_urb(desc->command, GFP_KERNEL); > if (rv < 0) { > desc->outbuf = NULL; > + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); > clear_bit(WDM_IN_USE, &desc->flags); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > + wake_up_all(&desc->wait); > goto out_free_mem_pm; > } else { > + printk("%s : waiting for wdm_out_callback %px\n", __func__, desc); > dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n", > le16_to_cpu(req->wIndex)); > } > @@ -587,6 +592,7 @@ static int wdm_flush(struct file *file, fl_owner_t id) > { > struct wdm_device *desc = file->private_data; > > + printk("%s : start wait_event %px\n", __func__, desc); > wait_event(desc->wait, > /* > * needs both flags. We cannot do with one > @@ -596,6 +602,7 @@ static int wdm_flush(struct file *file, fl_owner_t id) > */ > !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > + printk("%s : end wait_event %px\n", __func__, desc); > > /* cannot dereference desc->intf if WDM_DISCONNECTING */ > if (test_bit(WDM_DISCONNECTING, &desc->flags)) > @@ -642,6 +649,7 @@ static int wdm_open(struct inode *inode, struct file *file) > > mutex_lock(&wdm_mutex); > desc = wdm_find_device_by_minor(minor); > + printk("%s : start %px\n", __func__, desc); > if (!desc) > goto out; > > @@ -676,6 +684,7 @@ static int wdm_open(struct inode *inode, struct file *file) > desc->manage_power(intf, 1); > usb_autopm_put_interface(desc->intf); > out: > + printk("%s : done %px\n", __func__, desc); > mutex_unlock(&wdm_mutex); > return rv; > } > @@ -685,6 +694,7 @@ static int wdm_release(struct inode *inode, struct file *file) > struct wdm_device *desc = file->private_data; > > mutex_lock(&wdm_mutex); > + printk("%s : start %px\n", __func__, desc); > > /* using write lock to protect desc->count */ > mutex_lock(&desc->wlock); > @@ -694,6 +704,7 @@ static int wdm_release(struct inode *inode, struct file *file) > if (!desc->count) { > if (!test_bit(WDM_DISCONNECTING, &desc->flags)) { > dev_dbg(&desc->intf->dev, "wdm_release: cleanup\n"); > + printk("%s : kill_urbs() %px\n", __func__, desc); > kill_urbs(desc); > spin_lock_irq(&desc->iuspin); > desc->resp_count = 0; > @@ -705,6 +716,7 @@ static int wdm_release(struct inode *inode, struct file *file) > cleanup(desc); > } > } > + printk("%s : done %px\n", __func__, desc); > mutex_unlock(&wdm_mutex); > return 0; > } > @@ -832,6 +844,7 @@ static int wdm_create(struct usb_interface *intf, struct usb_endpoint_descriptor > desc->command = usb_alloc_urb(0, GFP_KERNEL); > if (!desc->command) > goto err; > + desc->command->trace_me = true; > > desc->ubuf = kmalloc(desc->wMaxCommand, GFP_KERNEL); > if (!desc->ubuf) > @@ -979,15 +992,18 @@ static void wdm_disconnect(struct usb_interface *intf) > usb_deregister_dev(intf, &wdm_class); > desc = wdm_find_device(intf); > mutex_lock(&wdm_mutex); > + printk("%s : start %px\n", __func__, desc); > > /* the spinlock makes sure no new urbs are generated in the callbacks */ > spin_lock_irqsave(&desc->iuspin, flags); > + printk("%s : set WDM_DISCONNECTING %px\n", __func__, desc); > set_bit(WDM_DISCONNECTING, &desc->flags); > set_bit(WDM_READ, &desc->flags); > spin_unlock_irqrestore(&desc->iuspin, flags); > wake_up_all(&desc->wait); > mutex_lock(&desc->rlock); > mutex_lock(&desc->wlock); > + printk("%s : kill_urbs() %px\n", __func__, desc); > kill_urbs(desc); > cancel_work_sync(&desc->rxwork); > cancel_work_sync(&desc->service_outs_intr); > @@ -1003,6 +1019,7 @@ static void wdm_disconnect(struct usb_interface *intf) > cleanup(desc); > else > dev_dbg(&intf->dev, "%d open files - postponing cleanup\n", desc->count); > + printk("%s : done %px\n", __func__, desc); > mutex_unlock(&wdm_mutex); > } > > @@ -1031,6 +1048,7 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message) > set_bit(WDM_SUSPENDING, &desc->flags); > spin_unlock_irq(&desc->iuspin); > /* callback submits work - order is essential */ > + printk("%s : kill_urbs() %px\n", __func__, desc); > kill_urbs(desc); > cancel_work_sync(&desc->rxwork); > cancel_work_sync(&desc->service_outs_intr); > @@ -1082,15 +1100,19 @@ static int wdm_pre_reset(struct usb_interface *intf) > * must be done before recovery lest a spontaneous > * message from the device is lost > */ > + printk("%s : start\n", __func__); > spin_lock_irq(&desc->iuspin); > + printk("%s : set WDM_RESETTING %px\n", __func__, desc); > set_bit(WDM_RESETTING, &desc->flags); /* inform read/write */ > set_bit(WDM_READ, &desc->flags); /* unblock read */ > + printk("%s : clear WDM_IN_USE %px\n", __func__, desc); > clear_bit(WDM_IN_USE, &desc->flags); /* unblock write */ > desc->rerr = -EINTR; > spin_unlock_irq(&desc->iuspin); > wake_up_all(&desc->wait); > mutex_lock(&desc->rlock); > mutex_lock(&desc->wlock); > + printk("%s : kill_urbs() %px\n", __func__, desc); > kill_urbs(desc); > cancel_work_sync(&desc->rxwork); > cancel_work_sync(&desc->service_outs_intr); > @@ -1103,10 +1125,12 @@ static int wdm_post_reset(struct usb_interface *intf) > int rv; > > clear_bit(WDM_OVERFLOW, &desc->flags); > + printk("%s : clear WDM_RESETTING %px\n", __func__, desc); > clear_bit(WDM_RESETTING, &desc->flags); > rv = recover_from_urb_loss(desc); > mutex_unlock(&desc->wlock); > mutex_unlock(&desc->rlock); > + printk("%s : done\n", __func__); > return rv; > } > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index aa45840d8273..ab3d3b65a6cf 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -1521,6 +1521,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) > int status; > struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); > > + if (urb->trace_me) > + printk("%s : start %px\n", __func__, urb); > /* increment urb's reference count as part of giving it to the HCD > * (which will control it). HCD guarantees that it either returns > * an error or calls giveback(), but not both. > @@ -1550,6 +1552,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) > } > > if (unlikely(status)) { > + if (urb->trace_me) > + printk("%s : error %px %d\n", __func__, urb, status); > usbmon_urb_submit_error(&hcd->self, urb, status); > urb->hcpriv = NULL; > INIT_LIST_HEAD(&urb->urb_list); > @@ -1559,6 +1563,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) > wake_up(&usb_kill_urb_queue); > usb_put_urb(urb); > } > + if (urb->trace_me) > + printk("%s : end %px %d\n", __func__, urb, status); > return status; > } > > @@ -1582,6 +1588,8 @@ static int unlink1(struct usb_hcd *hcd, struct urb *urb, int status) > * Such failures should be harmless. */ > value = hcd->driver->urb_dequeue(hcd, urb, status); > } > + if (urb->trace_me) > + printk("%s : %px %d\n", __func__, urb, value); > return value; > } > > @@ -1645,6 +1653,8 @@ static void __usb_hcd_giveback_urb(struct urb *urb) > > /* pass ownership to the completion handler */ > urb->status = status; > + if (urb->trace_me) > + printk("%s : complete %px %d\n", __func__, urb, status); > urb->complete(urb); > > usb_anchor_resume_wakeups(anchor); > diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c > index e01e366d89cd..9adcc978804e 100644 > --- a/drivers/usb/gadget/legacy/raw_gadget.c > +++ b/drivers/usb/gadget/legacy/raw_gadget.c > @@ -33,6 +33,10 @@ MODULE_DESCRIPTION(DRIVER_DESC); > MODULE_AUTHOR("Andrey Konovalov"); > MODULE_LICENSE("GPL"); > > +#undef dev_dbg > +#define dev_dbg(dev, fmt, ...) \ > + dev_printk(KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__) > + > /*----------------------------------------------------------------------*/ > > #define RAW_EVENT_QUEUE_SIZE 16 > diff --git a/include/linux/usb.h b/include/linux/usb.h > index 9f3c721c70dc..0843e07f6cc2 100644 > --- a/include/linux/usb.h > +++ b/include/linux/usb.h > @@ -1589,6 +1589,7 @@ struct urb { > int error_count; /* (return) number of ISO errors */ > void *context; /* (in) context for completion */ > usb_complete_t complete; /* (in) completion routine */ > + bool trace_me; > struct usb_iso_packet_descriptor iso_frame_desc[]; > /* (in) ISO ONLY */ > }; > ---------------------------------------- > > ---------------------------------------- > [ 25.056066][ T12] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef > [ 25.056724][ T1175] udc-core: couldn't find an available UDC or it's busy > [ 25.056730][ T1175] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.057711][ T1176] udc-core: couldn't find an available UDC or it's busy > [ 25.057719][ T1176] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.058090][ T1171] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.058104][ T1178] udc-core: couldn't find an available UDC or it's busy > [ 25.058233][ T1177] udc-core: couldn't find an available UDC or it's busy > [ 25.058238][ T1177] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.058685][ T1179] udc-core: couldn't find an available UDC or it's busy > [ 25.058692][ T1179] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.060237][ T12] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2 > [ 25.060250][ T1180] udc-core: couldn't find an available UDC or it's busy > [ 25.060255][ T1180] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.060737][ T1181] udc-core: couldn't find an available UDC or it's busy > [ 25.060741][ T1181] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.060836][ T1182] udc-core: couldn't find an available UDC or it's busy > [ 25.060840][ T1182] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.061235][ T1183] udc-core: couldn't find an available UDC or it's busy > [ 25.061241][ T1183] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.062374][ T1178] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.064298][ T12] usb 5-1: SerialNumber: syz > [ 25.065151][ T1185] udc-core: couldn't find an available UDC or it's busy > [ 25.065161][ T1184] udc-core: couldn't find an available UDC or it's busy > [ 25.065168][ T1184] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.065191][ T1186] udc-core: couldn't find an available UDC or it's busy > [ 25.065195][ T1186] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.065451][ T1187] udc-core: couldn't find an available UDC or it's busy > [ 25.065457][ T1187] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.068979][ T1189] udc-core: couldn't find an available UDC or it's busy > [ 25.070166][ T1185] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.072145][ T1191] udc-core: couldn't find an available UDC or it's busy > [ 25.072151][ T1191] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.072166][ T1190] udc-core: couldn't find an available UDC or it's busy > [ 25.072174][ T1190] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.072254][ T1189] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.073991][ T1193] udc-core: couldn't find an available UDC or it's busy > [ 25.073997][ T1193] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.074002][ T1192] udc-core: couldn't find an available UDC or it's busy > [ 25.074006][ T1192] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.087896][ T1196] udc-core: couldn't find an available UDC or it's busy > [ 25.087933][ T1195] udc-core: couldn't find an available UDC or it's busy > [ 25.087941][ T1195] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.088229][ T1197] udc-core: couldn't find an available UDC or it's busy > [ 25.088235][ T1197] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.088239][ T1194] udc-core: couldn't find an available UDC or it's busy > [ 25.088244][ T1194] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.090788][ T1198] udc-core: couldn't find an available UDC or it's busy > [ 25.092032][ T1196] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.093018][ T1199] udc-core: couldn't find an available UDC or it's busy > [ 25.093023][ T1200] udc-core: couldn't find an available UDC or it's busy > [ 25.093027][ T1200] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.093886][ T1198] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.094586][ T1201] udc-core: couldn't find an available UDC or it's busy > [ 25.095028][ T1202] udc-core: couldn't find an available UDC or it's busy > [ 25.095032][ T1202] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.095403][ T1204] udc-core: couldn't find an available UDC or it's busy > [ 25.095409][ T1204] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.095411][ T1203] udc-core: couldn't find an available UDC or it's busy > [ 25.095416][ T1203] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.095896][ T1199] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.099620][ T1205] udc-core: couldn't find an available UDC or it's busy > [ 25.100765][ T1201] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.102512][ T1205] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.104434][ T1207] udc-core: couldn't find an available UDC or it's busy > [ 25.104439][ T1207] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.104483][ T1206] udc-core: couldn't find an available UDC or it's busy > [ 25.104489][ T1206] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.104516][ T1208] udc-core: couldn't find an available UDC or it's busy > [ 25.104521][ T1208] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.122543][ T12] hub 5-1:118.0: bad descriptor, ignoring hub > [ 25.122597][ T1210] udc-core: couldn't find an available UDC or it's busy > [ 25.122604][ T1210] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.122976][ T1211] udc-core: couldn't find an available UDC or it's busy > [ 25.122983][ T1211] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.123591][ T1212] udc-core: couldn't find an available UDC or it's busy > [ 25.123596][ T1212] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.123654][ T1209] udc-core: couldn't find an available UDC or it's busy > [ 25.123659][ T1209] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.124649][ T1213] udc-core: couldn't find an available UDC or it's busy > [ 25.124809][ T1214] udc-core: couldn't find an available UDC or it's busy > [ 25.124814][ T1214] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.125168][ T1215] udc-core: couldn't find an available UDC or it's busy > [ 25.125174][ T1215] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.126547][ T12] hub: probe of 5-1:118.0 failed with error -5 > [ 25.128767][ T1213] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.129656][ T1216] udc-core: couldn't find an available UDC or it's busy > [ 25.129662][ T1216] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.129678][ T1217] udc-core: couldn't find an available UDC or it's busy > [ 25.129684][ T1217] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.129968][ T1218] udc-core: couldn't find an available UDC or it's busy > [ 25.129973][ T1218] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.132093][ T12] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device > [ 25.132549][ T1219] udc-core: couldn't find an available UDC or it's busy > [ 25.132554][ T1219] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.133379][ T1222] udc-core: couldn't find an available UDC or it's busy > [ 25.133385][ T1222] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.133552][ T1223] udc-core: couldn't find an available UDC or it's busy > [ 25.133708][ T1221] udc-core: couldn't find an available UDC or it's busy > [ 25.133713][ T1221] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.135592][ T1224] udc-core: couldn't find an available UDC or it's busy > [ 25.135598][ T1224] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.136158][ T12] wdm_suspend : kill_urbs() ffff88822dffd000 > [ 25.136578][ T1225] udc-core: couldn't find an available UDC or it's busy > [ 25.136583][ T1225] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.136638][ T1226] udc-core: couldn't find an available UDC or it's busy > [ 25.136642][ T1226] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.136928][ T1227] udc-core: couldn't find an available UDC or it's busy > [ 25.136933][ T1227] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.138275][ T1228] udc-core: couldn't find an available UDC or it's busy > [ 25.138281][ T1228] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.138293][ T1223] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.138821][ T1229] udc-core: couldn't find an available UDC or it's busy > [ 25.138948][ T1230] udc-core: couldn't find an available UDC or it's busy > [ 25.138952][ T1230] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.139091][ T1231] udc-core: couldn't find an available UDC or it's busy > [ 25.139096][ T1231] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.141700][ T1232] udc-core: couldn't find an available UDC or it's busy > [ 25.142014][ T1233] udc-core: couldn't find an available UDC or it's busy > [ 25.142020][ T1233] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.142097][ T1233] wdm_open : start ffff88822dffd000 > [ 25.142251][ T1234] udc-core: couldn't find an available UDC or it's busy > [ 25.142258][ T1234] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.142285][ T1235] udc-core: couldn't find an available UDC or it's busy > [ 25.142289][ T1235] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.142372][ T1229] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.144150][ T1232] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 25.319563][ T1233] wdm_open : done ffff88822dffd000 > [ 25.732316][ T1235] wdm_open : start ffff88822dffd000 > [ 25.732389][ T1233] wdm_write : set WDM_IN_USE ffff88822dffd000 ffff888234392f00 > [ 25.736342][ T1233] usb_hcd_submit_urb : start ffff888234392f00 > [ 25.738112][ T1233] usb_hcd_submit_urb : end ffff888234392f00 0 > [ 25.739989][ T1233] wdm_write : waiting for wdm_out_callback ffff88822dffd000 > [ 25.742837][ T1233] wdm_flush : start wait_event ffff88822dffd000 > [ 25.742948][ T1235] wdm_open : done ffff88822dffd000 > [ 25.747707][ T1234] wdm_open : start ffff88822dffd000 > [ 30.141494][ T1234] wdm_open : done ffff88822dffd000 > [ 30.141711][ T1235] wdm_flush : start wait_event ffff88822dffd000 > [ 30.143693][ T1232] wdm_open : start ffff88822dffd000 > [ 30.143916][ T1234] wdm_flush : start wait_event ffff88822dffd000 > [ 30.151409][ T1232] wdm_open : done ffff88822dffd000 > [ 30.153610][ T841] wdm_open : start ffff88822dffd000 > [ 30.153788][ T1232] wdm_flush : start wait_event ffff88822dffd000 > [ 30.155816][ T841] wdm_open : done ffff88822dffd000 > [ 30.159953][ T1229] wdm_open : start ffff88822dffd000 > [ 30.160146][ T841] wdm_flush : start wait_event ffff88822dffd000 > [ 30.162005][ T1229] wdm_open : done ffff88822dffd000 > [ 30.165691][ T1229] wdm_flush : start wait_event ffff88822dffd000 > [ 63.298753][ C3] random: crng init done > [ 63.299926][ C3] random: 7 urandom warning(s) missed due to ratelimiting > [ 168.509958][ T52] INFO: task a.out:1233 blocked for more than 141 seconds. > [ 168.516579][ T52] Not tainted 5.7.0-rc5+ #13 > [ 168.520995][ T52] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 168.528400][ T52] a.out D13024 1233 838 0x00000004 > [ 168.532156][ T52] Call Trace: > [ 168.533970][ T52] __schedule+0x363/0x840 > [ 168.535991][ T52] ? wdm_suspend+0x160/0x160 > [ 168.538199][ T52] schedule+0x5a/0xd0 > [ 168.539782][ T52] wdm_flush+0x124/0x190 > [ 168.541542][ T52] ? remove_wait_queue+0x60/0x60 > [ 168.543503][ T52] filp_close+0x3d/0xa0 > [ 168.544874][ T52] __close_fd+0x8f/0x110 > [ 168.546267][ T52] __x64_sys_close+0x2a/0x60 > [ 168.547843][ T52] do_syscall_64+0x58/0x270 > [ 168.549214][ T52] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > [ 168.550889][ T52] RIP: 0033:0x7f54551f4f30 > [ 168.552150][ T52] Code: Bad RIP value. > [ 168.553323][ T52] RSP: 002b:00007ffe39c17878 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 > [ 168.555439][ T52] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f54551f4f30 > [ 168.558429][ T52] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 > [ 168.561016][ T52] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffe39c178ae > [ 168.563155][ T52] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > [ 168.565136][ T52] R13: 0000000000005fb3 R14: 0000000000000000 R15: 0000000000000000 > [ 168.567108][ T52] > [ 168.567108][ T52] Showing all locks held in the system: > [ 168.569123][ T52] 1 lock held by khungtaskd/52: > [ 168.570400][ T52] #0: ffffffff83a6baa0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x1b0 > [ 168.573387][ T52] 1 lock held by in:imklog/465: > [ 168.574597][ T52] #0: ffff88822d28fcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x55/0x60 > [ 168.576825][ T52] > [ 168.577408][ T52] ============================================= > [ 168.577408][ T52] > [ 168.579559][ T52] NMI backtrace for cpu 1 > [ 168.580819][ T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13 > [ 168.582916][ T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 > [ 168.585746][ T52] Call Trace: > [ 168.586565][ T52] dump_stack+0x97/0xde > [ 168.588235][ T52] ? irq_force_complete_move+0x61/0x80 > [ 168.590020][ T52] nmi_cpu_backtrace+0xe2/0xf0 > [ 168.591382][ T52] ? lapic_can_unplug_cpu+0x90/0x90 > [ 168.592679][ T52] nmi_trigger_cpumask_backtrace+0x124/0x170 > [ 168.594338][ T52] watchdog+0x4e4/0x950 > [ 168.595543][ T52] ? hungtask_pm_notify+0x70/0x70 > [ 168.596790][ T52] kthread+0x128/0x170 > [ 168.597801][ T52] ? kthread_create_worker_on_cpu+0x90/0x90 > [ 168.599257][ T52] ret_from_fork+0x24/0x30 > [ 168.600448][ T52] Sending NMI from CPU 1 to CPUs 0,2-7: > [ 168.602261][ C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x25/0x180 > [ 168.602268][ C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x25/0x180 > [ 168.602272][ C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x25/0x180 > [ 168.602277][ C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x25/0x180 > [ 168.602294][ C7] NMI backtrace for cpu 7 skipped: idling at default_idle+0x25/0x180 > [ 168.602299][ C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x25/0x180 > [ 168.602308][ C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x25/0x180 > [ 168.603243][ T52] Kernel panic - not syncing: hung_task: blocked tasks > [ 168.623795][ T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13 > [ 168.625777][ T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 > [ 168.629089][ T52] Call Trace: > [ 168.630281][ T52] dump_stack+0x97/0xde > [ 168.631495][ T52] panic+0x135/0x3d5 > [ 168.632577][ T52] ? watchdog+0x4e4/0x950 > [ 168.633730][ T52] watchdog+0x4f0/0x950 > [ 168.634829][ T52] ? hungtask_pm_notify+0x70/0x70 > [ 168.636160][ T52] kthread+0x128/0x170 > [ 168.637238][ T52] ? kthread_create_worker_on_cpu+0x90/0x90 > [ 168.638792][ T52] ret_from_fork+0x24/0x30 > [ 168.640142][ T52] Kernel Offset: disabled > [ 168.663256][ T52] Rebooting in 86400 seconds.. > ----------------------------------------
On 2020/05/29 0:18, Andrey Konovalov wrote: >> I might have found what is wrong. >> >> My understanding is that a process using /dev/raw-gadget is responsible for >> reacting to every USB request. I don't know whether /dev/raw-gadget already >> provides callback for aborting the in-flight USB requests (in order to resume >> wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or >> implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget >> already provides such callback in the following paragraphs. > > raw-gadget should kill all unfishished USB requests when the file is closed. I see. But > >> >> Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and >> /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into >> deadlock condition due to the need to close both files when the reproducer is >> killed. My guess is that since that process is stuck at wdm_flush() (due to >> explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL), >> that process cannot react to USB requests which are needed for resuming wdm_flush(). >> Unexpectedly blocking a process which is responsible for reacting to USB requests >> will look as if it is a broken hardware. > > Hm, so wdm_flush() is unable to finish unless an expected USB request > is received from the device? This is a bug in the wdm driver then. this specific bug report is caused by being unable to close /dev/cdc-wdm0 due to /dev/raw-gadget API usage bug in the userspace process. In other words, this bug report should be closed with "#syz invalid" like a bug report at https://syzkaller.appspot.com/bug?id=287aa8708bc940d0ca1645223c53dd4c2d203be6 which unexpectedly did ioctl(FIFREEZE) without corresponding ioctl(FITHAW). > Should we use wait_event_interruptible() instead of wait_event() in > wdm_flush()? That only shadows this kind of bug reports, by not using TASK_UNINTERRUPTIBLE. The problem that the userspace process which is responsible for closing /dev/raw-gadget gets stuck at wdm_flush() unless interrupted by a signal when closing /dev/cdc-wdm0 is remaining. I think that a process should not open /dev/raw-gadget and /dev/cdc-wdm0 at the same time.
On Thu, May 28, 2020 at 6:03 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/05/29 0:18, Andrey Konovalov wrote: > >> I might have found what is wrong. > >> > >> My understanding is that a process using /dev/raw-gadget is responsible for > >> reacting to every USB request. I don't know whether /dev/raw-gadget already > >> provides callback for aborting the in-flight USB requests (in order to resume > >> wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or > >> implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget > >> already provides such callback in the following paragraphs. > > > > raw-gadget should kill all unfishished USB requests when the file is closed. > > I see. But > > > > >> > >> Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and > >> /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into > >> deadlock condition due to the need to close both files when the reproducer is > >> killed. My guess is that since that process is stuck at wdm_flush() (due to > >> explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL), > >> that process cannot react to USB requests which are needed for resuming wdm_flush(). > >> Unexpectedly blocking a process which is responsible for reacting to USB requests > >> will look as if it is a broken hardware. > > > > Hm, so wdm_flush() is unable to finish unless an expected USB request > > is received from the device? This is a bug in the wdm driver then. > > this specific bug report is caused by being unable to close /dev/cdc-wdm0 > due to /dev/raw-gadget API usage bug in the userspace process. In other words, > this bug report should be closed with "#syz invalid" like a bug report at > https://syzkaller.appspot.com/bug?id=287aa8708bc940d0ca1645223c53dd4c2d203be6 > which unexpectedly did ioctl(FIFREEZE) without corresponding ioctl(FITHAW). > > > Should we use wait_event_interruptible() instead of wait_event() in > > wdm_flush()? > > That only shadows this kind of bug reports, by not using TASK_UNINTERRUPTIBLE. > > The problem that the userspace process which is responsible for closing > /dev/raw-gadget gets stuck at wdm_flush() unless interrupted by a signal > when closing /dev/cdc-wdm0 is remaining. I think that a process should not > open /dev/raw-gadget and /dev/cdc-wdm0 at the same time. Ah, so the problem is that when a process exits, it tries to close wdm fd first, which ends up calling wdm_flush(), which can't finish because the USB requests are not terminated before raw-gadget fd is closed, which is supposed to happen after wdm fd is closed. Is this correct? I wonder what will happen if a real device stays connected and ignores wdm requests. I don't understand though, how using wait_event_interruptible() will shadow anything here. Alan, Greg, is this acceptable behavior for a USB driver?
On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > Ah, so the problem is that when a process exits, it tries to close wdm > fd first, which ends up calling wdm_flush(), which can't finish > because the USB requests are not terminated before raw-gadget fd is > closed, which is supposed to happen after wdm fd is closed. Is this > correct? I wonder what will happen if a real device stays connected > and ignores wdm requests. > > I don't understand though, how using wait_event_interruptible() will > shadow anything here. > > Alan, Greg, is this acceptable behavior for a USB driver? I don't understand what the problem is. Can you explain in more general terms -- nothing specific to wdm or anything like that -- what you are concerned about? Is this something that could happen to any gadget driver? Or any USB class device driver? Or does it only affect usespace components of raw-gadget drivers? Alan Stern
On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: > > On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > > > Ah, so the problem is that when a process exits, it tries to close wdm > > fd first, which ends up calling wdm_flush(), which can't finish > > because the USB requests are not terminated before raw-gadget fd is > > closed, which is supposed to happen after wdm fd is closed. Is this > > correct? I wonder what will happen if a real device stays connected > > and ignores wdm requests. > > > > I don't understand though, how using wait_event_interruptible() will > > shadow anything here. > > > > Alan, Greg, is this acceptable behavior for a USB driver? > > I don't understand what the problem is. Can you explain in more general > terms -- nothing specific to wdm or anything like that -- what you are > concerned about? Is this something that could happen to any gadget > driver? Or any USB class device driver? Or does it only affect > usespace components of raw-gadget drivers? So, AFAIU, we have a driver whose flush() callback blocks on wait_event(), which can only terminate when either 1) the driver receives a particular USB response from the device or 2) the device disconnects. For 1) the emulated device doesn't provide required responses. For 2) the problem is that the emulated via raw-gadget device disconnects when the process is killed (and raw-gadget fd is closed). But that process is the same process that is currently stuck on wait_event() in the flush callback(), and therefore unkillable. This can generally happen with any driver that goes into uninterruptible sleep within one of its code paths reachable from userspace that can only be unblocked by a particular behavior from the USB device. But I haven't seen any such drivers so far, wdm is the first.
On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: > > > > On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > > > > > Ah, so the problem is that when a process exits, it tries to close wdm > > > fd first, which ends up calling wdm_flush(), which can't finish > > > because the USB requests are not terminated before raw-gadget fd is > > > closed, which is supposed to happen after wdm fd is closed. Is this > > > correct? I wonder what will happen if a real device stays connected > > > and ignores wdm requests. > > > > > > I don't understand though, how using wait_event_interruptible() will > > > shadow anything here. > > > > > > Alan, Greg, is this acceptable behavior for a USB driver? > > > > I don't understand what the problem is. Can you explain in more general > > terms -- nothing specific to wdm or anything like that -- what you are > > concerned about? Is this something that could happen to any gadget > > driver? Or any USB class device driver? Or does it only affect > > usespace components of raw-gadget drivers? > > So, AFAIU, we have a driver whose flush() callback blocks on > wait_event(), which can only terminate when either 1) the driver > receives a particular USB response from the device or 2) the device > disconnects. This sounds like a bug in the driver. What would it do if someone had a genuine (not emulated) but buggy USB device which didn't send the desired response? The only way to unblock the driver would be to unplug the device! That isn't acceptable behavior. > For 1) the emulated device doesn't provide required > responses. For 2) the problem is that the emulated via raw-gadget > device disconnects when the process is killed (and raw-gadget fd is > closed). But that process is the same process that is currently stuck > on wait_event() in the flush callback(), and therefore unkillable. What would happen if you unload dummy-hcd at this point? Or even just do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is the bus number of the dummy-hcd bus? > This can generally happen with any driver that goes into > uninterruptible sleep within one of its code paths reachable from > userspace that can only be unblocked by a particular behavior from the > USB device. But I haven't seen any such drivers so far, wdm is the > first. Drivers should never go into uninterruptible sleep states unless they can guarantee that the duration will be bounded somehow (for example, by a reasonable timeout). Or that cutting the sleep state short would cause the system to crash -- but that's not an issue here. Alan Stern
On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: > > On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > > On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > > On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > > > > > > > Ah, so the problem is that when a process exits, it tries to close wdm > > > > fd first, which ends up calling wdm_flush(), which can't finish > > > > because the USB requests are not terminated before raw-gadget fd is > > > > closed, which is supposed to happen after wdm fd is closed. Is this > > > > correct? I wonder what will happen if a real device stays connected > > > > and ignores wdm requests. > > > > > > > > I don't understand though, how using wait_event_interruptible() will > > > > shadow anything here. > > > > > > > > Alan, Greg, is this acceptable behavior for a USB driver? > > > > > > I don't understand what the problem is. Can you explain in more general > > > terms -- nothing specific to wdm or anything like that -- what you are > > > concerned about? Is this something that could happen to any gadget > > > driver? Or any USB class device driver? Or does it only affect > > > usespace components of raw-gadget drivers? > > > > So, AFAIU, we have a driver whose flush() callback blocks on > > wait_event(), which can only terminate when either 1) the driver > > receives a particular USB response from the device or 2) the device > > disconnects. > > This sounds like a bug in the driver. What would it do if someone had a > genuine (not emulated) but buggy USB device which didn't send the > desired response? The only way to unblock the driver would be to unplug > the device! That isn't acceptable behavior. OK, that's what I thought. > > > For 1) the emulated device doesn't provide required > > responses. For 2) the problem is that the emulated via raw-gadget > > device disconnects when the process is killed (and raw-gadget fd is > > closed). But that process is the same process that is currently stuck > > on wait_event() in the flush callback(), and therefore unkillable. > > What would happen if you unload dummy-hcd at this point? Or even just > do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is > the bus number of the dummy-hcd bus? The device disconnects and flush() unblocks. > > This can generally happen with any driver that goes into > > uninterruptible sleep within one of its code paths reachable from > > userspace that can only be unblocked by a particular behavior from the > > USB device. But I haven't seen any such drivers so far, wdm is the > > first. > > Drivers should never go into uninterruptible sleep states unless they > can guarantee that the duration will be bounded somehow (for example, by > a reasonable timeout). Or that cutting the sleep state short would > cause the system to crash -- but that's not an issue here. OK, thank you, Alan! Tetsuo, could you clarify why you think that using wait_event_interruptible() is a bad fix here?
On 2020/05/30 5:41, Andrey Konovalov wrote: > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: >> >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: >>>> >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: >>>> >>>>> Ah, so the problem is that when a process exits, it tries to close wdm >>>>> fd first, which ends up calling wdm_flush(), which can't finish >>>>> because the USB requests are not terminated before raw-gadget fd is >>>>> closed, which is supposed to happen after wdm fd is closed. Is this >>>>> correct? I wonder what will happen if a real device stays connected >>>>> and ignores wdm requests. >>>>> >>>>> I don't understand though, how using wait_event_interruptible() will >>>>> shadow anything here. >>>>> >>>>> Alan, Greg, is this acceptable behavior for a USB driver? >>>> >>>> I don't understand what the problem is. Can you explain in more general >>>> terms -- nothing specific to wdm or anything like that -- what you are >>>> concerned about? Is this something that could happen to any gadget >>>> driver? Or any USB class device driver? Or does it only affect >>>> usespace components of raw-gadget drivers? >>> >>> So, AFAIU, we have a driver whose flush() callback blocks on >>> wait_event(), which can only terminate when either 1) the driver >>> receives a particular USB response from the device or 2) the device >>> disconnects. >> >> This sounds like a bug in the driver. What would it do if someone had a >> genuine (not emulated) but buggy USB device which didn't send the >> desired response? The only way to unblock the driver would be to unplug >> the device! That isn't acceptable behavior. > > OK, that's what I thought. I believe that this is not a bug in the driver but a problem of hardware failure. Unless this is high-availability code which is designed for safely failing over to other node, we don't need to care about hardware failure. > >> >>> For 1) the emulated device doesn't provide required >>> responses. For 2) the problem is that the emulated via raw-gadget >>> device disconnects when the process is killed (and raw-gadget fd is >>> closed). But that process is the same process that is currently stuck >>> on wait_event() in the flush callback(), and therefore unkillable. >> >> What would happen if you unload dummy-hcd at this point? Or even just >> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is >> the bus number of the dummy-hcd bus? > > The device disconnects and flush() unblocks. > >>> This can generally happen with any driver that goes into >>> uninterruptible sleep within one of its code paths reachable from >>> userspace that can only be unblocked by a particular behavior from the >>> USB device. But I haven't seen any such drivers so far, wdm is the >>> first. >> >> Drivers should never go into uninterruptible sleep states unless they >> can guarantee that the duration will be bounded somehow (for example, by >> a reasonable timeout). Or that cutting the sleep state short would >> cause the system to crash -- but that's not an issue here. > > OK, thank you, Alan! > > Tetsuo, could you clarify why you think that using > wait_event_interruptible() is a bad fix here? > If wait_event() in wdm_flush() were using timeout or interruptible, can the wdm driver handle that case safely? Since cleanup(desc) from wdm_release() might release "desc", wouldn't "not-waiting-for-completion" has a risk of "use-after-free write"? Please add comment block why it is safe if it is actually safe.
On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote: > On 2020/05/30 5:41, Andrey Konovalov wrote: > > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >> This sounds like a bug in the driver. What would it do if someone had a > >> genuine (not emulated) but buggy USB device which didn't send the > >> desired response? The only way to unblock the driver would be to unplug > >> the device! That isn't acceptable behavior. > > > > OK, that's what I thought. > > I believe that this is not a bug in the driver but a problem of hardware > failure. Unless this is high-availability code which is designed for safely > failing over to other node, we don't need to care about hardware failure. Oh my! I can't even imagine what Linus would say if he saw that... :-( Have you heard of Bad USB? The kernel most definitely does need to protect itself against misbehaving hardware. Let's just leave it at that. If you don't believe me, ask Greg KH. I admit, causing a driver to hang isn't the worst thing a buggy device can do. But the kernel is supposed to be able to cope with such things gracefully. Alan Stern
On 2020/05/30 10:10, Alan Stern wrote: > On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote: >> On 2020/05/30 5:41, Andrey Konovalov wrote: >>> On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >>>> This sounds like a bug in the driver. What would it do if someone had a >>>> genuine (not emulated) but buggy USB device which didn't send the >>>> desired response? The only way to unblock the driver would be to unplug >>>> the device! That isn't acceptable behavior. >>> >>> OK, that's what I thought. >> >> I believe that this is not a bug in the driver but a problem of hardware >> failure. Unless this is high-availability code which is designed for safely >> failing over to other node, we don't need to care about hardware failure. > > Oh my! I can't even imagine what Linus would say if he saw that... :-( > > Have you heard of Bad USB? Of course, I've heard of that. Please show me as a patch first. > > The kernel most definitely does need to protect itself against > misbehaving hardware. Let's just leave it at that. If you don't > believe me, ask Greg KH. I've made many locations killable (in order to reduce damage caused by OOM condition). But I can't make locations killable where handling SIGKILL case is too difficult to implement. "struct file_operations"->flush() is called from filp_close() when there is something which has to be done before "struct file_operations"->release() is called. As far as I read this thread, what you are trying to do sounds like allow "not waiting for completion of wdm_out_callback()" with only 's/wait_event/wait_event_intrruptible/' in wdm_flush(). Then, please do remove wdm_flush() call itself. I'm not familiar with USB. But at least we would need to do something similar to commit d0bd587a80960d7b ("usermodehelper: implement UMH_KILLABLE") in addition to 's/wait_event/wait_event_intrruptible/' in wdm_flush(). > > I admit, causing a driver to hang isn't the worst thing a buggy device > can do. But the kernel is supposed to be able to cope with such things > gracefully. My understanding is that the "misbehaving hardware" in this bug report is not "USB device itself" but "CPU used for receiving request from that USB device and sending response to that USB device". I don't know how wdm_flush() can recover when the CPU which is supposed to unblock wait_event() is blocked inside that wait_event() itself. Unless you can safely omit wdm_flush() by doing something similar to commit d0bd587a80960d7b, this looks to me like a circular dependency which is impossible to solve. Therefore, again, please show me as a patch first.
On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote: > On 2020/05/30 5:41, Andrey Konovalov wrote: > > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >> > >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >>>> > >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > >>>> > >>>>> Ah, so the problem is that when a process exits, it tries to close wdm > >>>>> fd first, which ends up calling wdm_flush(), which can't finish > >>>>> because the USB requests are not terminated before raw-gadget fd is > >>>>> closed, which is supposed to happen after wdm fd is closed. Is this > >>>>> correct? I wonder what will happen if a real device stays connected > >>>>> and ignores wdm requests. > >>>>> > >>>>> I don't understand though, how using wait_event_interruptible() will > >>>>> shadow anything here. > >>>>> > >>>>> Alan, Greg, is this acceptable behavior for a USB driver? > >>>> > >>>> I don't understand what the problem is. Can you explain in more general > >>>> terms -- nothing specific to wdm or anything like that -- what you are > >>>> concerned about? Is this something that could happen to any gadget > >>>> driver? Or any USB class device driver? Or does it only affect > >>>> usespace components of raw-gadget drivers? > >>> > >>> So, AFAIU, we have a driver whose flush() callback blocks on > >>> wait_event(), which can only terminate when either 1) the driver > >>> receives a particular USB response from the device or 2) the device > >>> disconnects. > >> > >> This sounds like a bug in the driver. What would it do if someone had a > >> genuine (not emulated) but buggy USB device which didn't send the > >> desired response? The only way to unblock the driver would be to unplug > >> the device! That isn't acceptable behavior. > > > > OK, that's what I thought. > > I believe that this is not a bug in the driver but a problem of hardware > failure. Unless this is high-availability code which is designed for safely > failing over to other node, we don't need to care about hardware failure. As Alan said, that's just not true. It's the job of an operating system kernel to handle all of the crazy ways hardware is broken, and make it work properly for people. We deal with hardware failure all the time. So don't do uninterruptable waits or loop for forever waiting for some hardware value to change that might not change. That's a sure way to lock up the system and make users mad at you. thanks, greg k-h
Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: > On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > > > I don't understand what the problem is. Can you explain in more general > > > terms -- nothing specific to wdm or anything like that -- what you are > > > concerned about? Is this something that could happen to any gadget > > > driver? Or any USB class device driver? Or does it only affect > > > usespace components of raw-gadget drivers? > > > > So, AFAIU, we have a driver whose flush() callback blocks on > > wait_event(), which can only terminate when either 1) the driver > > receives a particular USB response from the device or 2) the device > > disconnects. > > This sounds like a bug in the driver. What would it do if someone had a Arguably yes. I will introduce a timeout. Unfortunately flush() requires a non-interruptible sleep, as you cannot sanely return EAGAIN. Regards Oliver
On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote: > Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: > > On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > > > > > I don't understand what the problem is. Can you explain in more general > > > > terms -- nothing specific to wdm or anything like that -- what you are > > > > concerned about? Is this something that could happen to any gadget > > > > driver? Or any USB class device driver? Or does it only affect > > > > usespace components of raw-gadget drivers? > > > > > > So, AFAIU, we have a driver whose flush() callback blocks on > > > wait_event(), which can only terminate when either 1) the driver > > > receives a particular USB response from the device or 2) the device > > > disconnects. > > > > This sounds like a bug in the driver. What would it do if someone had a > > Arguably yes. I will introduce a timeout. Unfortunately flush() > requires a non-interruptible sleep, as you cannot sanely return EAGAIN. But maybe you can kill some URBs and drop some data. Alan Stern
On Sat, May 30, 2020 at 2:43 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/05/30 5:41, Andrey Konovalov wrote: > > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >> > >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote: > >>>> > >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > >>>> > >>>>> Ah, so the problem is that when a process exits, it tries to close wdm > >>>>> fd first, which ends up calling wdm_flush(), which can't finish > >>>>> because the USB requests are not terminated before raw-gadget fd is > >>>>> closed, which is supposed to happen after wdm fd is closed. Is this > >>>>> correct? I wonder what will happen if a real device stays connected > >>>>> and ignores wdm requests. > >>>>> > >>>>> I don't understand though, how using wait_event_interruptible() will > >>>>> shadow anything here. > >>>>> > >>>>> Alan, Greg, is this acceptable behavior for a USB driver? > >>>> > >>>> I don't understand what the problem is. Can you explain in more general > >>>> terms -- nothing specific to wdm or anything like that -- what you are > >>>> concerned about? Is this something that could happen to any gadget > >>>> driver? Or any USB class device driver? Or does it only affect > >>>> usespace components of raw-gadget drivers? > >>> > >>> So, AFAIU, we have a driver whose flush() callback blocks on > >>> wait_event(), which can only terminate when either 1) the driver > >>> receives a particular USB response from the device or 2) the device > >>> disconnects. > >> > >> This sounds like a bug in the driver. What would it do if someone had a > >> genuine (not emulated) but buggy USB device which didn't send the > >> desired response? The only way to unblock the driver would be to unplug > >> the device! That isn't acceptable behavior. > > > > OK, that's what I thought. > > I believe that this is not a bug in the driver but a problem of hardware > failure. Unless this is high-availability code which is designed for safely > failing over to other node, we don't need to care about hardware failure. > > > > >> > >>> For 1) the emulated device doesn't provide required > >>> responses. For 2) the problem is that the emulated via raw-gadget > >>> device disconnects when the process is killed (and raw-gadget fd is > >>> closed). But that process is the same process that is currently stuck > >>> on wait_event() in the flush callback(), and therefore unkillable. > >> > >> What would happen if you unload dummy-hcd at this point? Or even just > >> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is > >> the bus number of the dummy-hcd bus? > > > > The device disconnects and flush() unblocks. > > > >>> This can generally happen with any driver that goes into > >>> uninterruptible sleep within one of its code paths reachable from > >>> userspace that can only be unblocked by a particular behavior from the > >>> USB device. But I haven't seen any such drivers so far, wdm is the > >>> first. > >> > >> Drivers should never go into uninterruptible sleep states unless they > >> can guarantee that the duration will be bounded somehow (for example, by > >> a reasonable timeout). Or that cutting the sleep state short would > >> cause the system to crash -- but that's not an issue here. > > > > OK, thank you, Alan! > > > > Tetsuo, could you clarify why you think that using > > wait_event_interruptible() is a bad fix here? > > > > If wait_event() in wdm_flush() were using timeout or interruptible, can the > wdm driver handle that case safely? Since cleanup(desc) from wdm_release() > might release "desc", wouldn't "not-waiting-for-completion" has a risk of > "use-after-free write"? Please add comment block why it is safe if it is > actually safe. Oh, it might be that just replacing wait_event() with wait_event_interruptible() can lead to other issues, and a more involved fix is needed. The suggestion was rather to avoid blocking flush() indefinitely.
On 2020/05/31 0:47, Alan Stern wrote: > On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote: >> Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: >>> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: >> >>>>> I don't understand what the problem is. Can you explain in more general >>>>> terms -- nothing specific to wdm or anything like that -- what you are >>>>> concerned about? Is this something that could happen to any gadget >>>>> driver? Or any USB class device driver? Or does it only affect >>>>> usespace components of raw-gadget drivers? >>>> >>>> So, AFAIU, we have a driver whose flush() callback blocks on >>>> wait_event(), which can only terminate when either 1) the driver >>>> receives a particular USB response from the device or 2) the device >>>> disconnects. >>> >>> This sounds like a bug in the driver. What would it do if someone had a >> >> Arguably yes. I will introduce a timeout. Unfortunately flush() >> requires a non-interruptible sleep, as you cannot sanely return EAGAIN. > > But maybe you can kill some URBs and drop some data. You mean call usb_kill_urb() via kill_urbs() ? As far as I tested, it seems that usb_kill_urb() sometimes fails to call wdm_out_callback() despite the comment for usb_kill_urb() says * This routine cancels an in-progress request. It is guaranteed that * upon return all completion handlers will have finished and the URB * will be totally idle and available for reuse. These features make * this an ideal way to stop I/O in a disconnect() callback or close() * function. If the request has not already finished or been unlinked * the completion handler will see urb->status == -ENOENT. . Is something still wrong? Or just replacing BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && !test_bit(WDM_DISCONNECTING, &desc->flags)); with wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || test_bit(WDM_DISCONNECTING, &desc->flags)); in the patch shown below is sufficient? diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..3e92e79ce0a0 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) kfree(desc->outbuf); desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -424,6 +424,7 @@ static ssize_t wdm_write if (rv < 0) { desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); + wake_up_all(&desc->wait); dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); goto out_free_mem_pm; @@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) { struct wdm_device *desc = file->private_data; - wait_event(desc->wait, - /* - * needs both flags. We cannot do with one - * because resetting it would cause a race - * with write() yet we need to signal - * a disconnect - */ - !test_bit(WDM_IN_USE, &desc->flags) || - test_bit(WDM_DISCONNECTING, &desc->flags)); + /* + * needs both flags. We cannot do with one because resetting it would + * cause a race with write() yet we need to signal a disconnect + */ + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { + kill_urbs(desc); + BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && + !test_bit(WDM_DISCONNECTING, &desc->flags)); + } /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) [ 208.339533] [ T5240] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339614] [ T5224] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339753] [ T5184] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339811] [ T5223] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339820] [ T5242] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339902] [ T5222] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.339988] [ T5201] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 208.343226] [ T5270] udc-core: couldn't find an available UDC or it's busy [ 208.343237] [ T5270] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.343693] [ T5271] udc-core: couldn't find an available UDC or it's busy [ 208.343700] [ T5271] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.345655] [ T5273] udc-core: couldn't find an available UDC or it's busy [ 208.345863] [ T5275] udc-core: couldn't find an available UDC or it's busy [ 208.345870] [ T5275] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.345933] [ T5272] udc-core: couldn't find an available UDC or it's busy [ 208.345943] [ T5272] misc raw-gadget: fail, usb_gadget_probe_driver returned -16 [ 208.345982] [ T5274] udc-core: couldn't find an available UDC or it's busy [ 208.346007] [ T5274] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.347209] [ T5276] udc-core: couldn't find an available UDC or it's busy [ 208.348035] [ T5273] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.356463] [ T5182] ------------[ cut here ]------------ [ 208.356752] [ T5276] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.358439] [ T5182] kernel BUG at drivers/usb/class/cdc-wdm.c:599! [ 208.358481] [ T5182] invalid opcode: 0000 [#1] SMP KASAN [ 208.377041] [ T5271] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.378521] [ T5182] CPU: 1 PID: 5182 Comm: a.out Not tainted 5.7.0-rc5+ #22 [ 208.406262] [ T5274] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.406692] [ T5182] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 208.406703] [ T5182] RIP: 0010:wdm_flush+0x363/0x460 [ 208.409106] [ T5277] udc-core: couldn't find an available UDC or it's busy [ 208.409114] [ T5277] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.411250] [ T5278] udc-core: couldn't find an available UDC or it's busy [ 208.411464] [ T5182] Code: 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 90 00 00 00 48 8b 83 90 00 00 00 a8 04 0f 85 85 fd ff ff e8 0d 28 bf fd <0f> 0b bd ed ff ff ff e9 00 fe ff ff 4c 89 e7 e8 79 73 e5 fd e9 d0 [ 208.411467] [ T5182] RSP: 0018:ffff8881e8ed7df0 EFLAGS: 00010293 [ 208.411470] [ T5182] RAX: 0000000000000000 RBX: ffff8881f44d5800 RCX: ffffffff837d9974 [ 208.411472] [ T5182] RDX: ffff8881e734b180 RSI: ffffffff837d99a3 RDI: ffff8881f44d5890 [ 208.411473] [ T5182] RBP: ffff8881f44d5890 R08: ffffed103e89ab13 R09: ffff8881f44d5897 [ 208.411479] [ T5182] R10: ffffed103e89ab12 R11: ffffed103e89ab13 R12: 0000000000000000 [ 208.414031] [ T5278] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.415966] [ T5182] R13: ffff8881e8ed7e18 R14: ffff8881f44d59e0 R15: ffffed103e89ab12 [ 208.415973] [ T5182] FS: 00007f95f02a3440(0000) GS:ffff8881f6c40000(0000) knlGS:0000000000000000 [ 208.456966] [ T5278] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.456971] [ T5277] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.456974] [ T5275] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.456988] [ T5276] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.456991] [ T5272] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.457405] [ T5273] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.457527] [ T5182] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 208.458378] [ T5279] udc-core: couldn't find an available UDC or it's busy [ 208.458384] [ T5279] misc raw-gadget: fail, usb_gadget_probe_driver returned -16 [ 208.458606] [ T5280] udc-core: couldn't find an available UDC or it's busy [ 208.458611] [ T5280] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.460681] [ T5281] udc-core: couldn't find an available UDC or it's busy [ 208.460925] [ T5282] udc-core: couldn't find an available UDC or it's busy [ 208.460936] [ T5282] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.461031] [ T5283] udc-core: couldn't find an available UDC or it's busy [ 208.461037] [ T5283] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.461268] [ T5284] udc-core: couldn't find an available UDC or it's busy [ 208.461273] [ T5284] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.462427] [ T5182] CR2: 00007fb0ae7797a0 CR3: 00000001eb03d002 CR4: 00000000003606e0 [ 208.462435] [ T5182] Call Trace: [ 208.462444] [ T5182] ? wdm_poll+0x230/0x230 [ 208.463320] [ T5285] udc-core: couldn't find an available UDC or it's busy [ 208.463325] [ T5285] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.464664] [ T5281] misc raw-gadget: fail, usb_gadget_probe_driver returned -16 [ 208.466731] [ T5182] ? remove_wait_queue+0x190/0x190 [ 208.466737] [ T5182] ? wdm_poll+0x230/0x230 [ 208.466743] [ T5182] filp_close+0xad/0x160 [ 208.466749] [ T5182] __close_fd+0x112/0x1d0 [ 208.466755] [ T5182] __x64_sys_close+0x67/0xc0 [ 208.466762] [ T5182] ? lockdep_hardirqs_on+0x3f1/0x5b0 [ 208.466767] [ T5182] do_syscall_64+0x9b/0x510 [ 208.466774] [ T5182] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 208.466780] [ T5182] RIP: 0033:0x7f95efdc3f30 [ 208.466786] [ T5182] Code: 00 64 c7 00 0d 00 00 00 b8 ff ff ff ff eb 90 b8 ff ff ff ff eb 89 0f 1f 40 00 83 3d d9 27 2c 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 95 01 00 48 89 04 24 [ 208.466791] [ T5182] RSP: 002b:00007ffc958cd6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 208.466799] [ T5182] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f95efdc3f30 [ 208.476513] [ T5280] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.478577] [ T5182] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 [ 208.478581] [ T5182] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffc958cd70e [ 208.478586] [ T5182] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff [ 208.482884] [ T5286] udc-core: couldn't find an available UDC or it's busy [ 208.483534] [ T5182] R13: 000000000002daa7 R14: 0000000000000000 R15: 0000000000000000 [ 208.483536] [ T5182] Modules linked in: [ 208.483635] [ T5182] ---[ end trace 6a73cddfff8988f4 ]--- [ 208.485364] [ T5286] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.487479] [ T5182] RIP: 0010:wdm_flush+0x363/0x460 [ 208.506707] [ T5282] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.508658] [ T5182] Code: 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 90 00 00 00 48 8b 83 90 00 00 00 a8 04 0f 85 85 fd ff ff e8 0d 28 bf fd <0f> 0b bd ed ff ff ff e9 00 fe ff ff 4c 89 e7 e8 79 73 e5 fd e9 d0 [ 208.528782] [ T5287] udc-core: couldn't find an available UDC or it's busy [ 208.529695] [ T5182] RSP: 0018:ffff8881e8ed7df0 EFLAGS: 00010293 [ 208.529699] [ T5182] RAX: 0000000000000000 RBX: ffff8881f44d5800 RCX: ffffffff837d9974 [ 208.529701] [ T5182] RDX: ffff8881e734b180 RSI: ffffffff837d99a3 RDI: ffff8881f44d5890 [ 208.529703] [ T5182] RBP: ffff8881f44d5890 R08: ffffed103e89ab13 R09: ffff8881f44d5897 [ 208.529705] [ T5182] R10: ffffed103e89ab12 R11: ffffed103e89ab13 R12: 0000000000000000 [ 208.529706] [ T5182] R13: ffff8881e8ed7e18 R14: ffff8881f44d59e0 R15: ffffed103e89ab12 [ 208.529709] [ T5182] FS: 00007f95f02a3440(0000) GS:ffff8881f6c40000(0000) knlGS:0000000000000000 [ 208.529718] [ T5182] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 208.530960] [ T5287] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.537715] [ T5182] CR2: 00007fb0ae7797a0 CR3: 00000001eb03d002 CR4: 00000000003606e0 [ 208.577096] [ T5284] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.577105] [ T5281] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.577108] [ T5287] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.577111] [ T5283] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.577525] [ T5286] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 208.578527] [ T5288] udc-core: couldn't find an available UDC or it's busy [ 208.578533] [ T5288] misc raw-gadget: fail, usb_gadget_probe_driver returned -16 [ 208.580556] [ T5291] udc-core: couldn't find an available UDC or it's busy [ 208.580644] [ T5289] udc-core: couldn't find an available UDC or it's busy [ 208.580646] [ T5290] udc-core: couldn't find an available UDC or it's busy [ 208.580652] [ T5289] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.580655] [ T5290] misc raw-gadget: fail, usb_gadget_probe_driver returned -16 [ 208.580844] [ T5292] udc-core: couldn't find an available UDC or it's busy [ 208.580851] [ T5292] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.581053] [ T5293] udc-core: couldn't find an available UDC or it's busy [ 208.581058] [ T5293] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.599340] [ T5294] udc-core: couldn't find an available UDC or it's busy [ 208.599866] [ T5291] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.601707] [ T5294] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 208.616268] [ T5289] cdc_wdm 5-1:118.0: Error in flush path: -32
On 2020/06/08 11:24, Tetsuo Handa wrote: > As far as I tested, it seems that usb_kill_urb() sometimes fails to call > wdm_out_callback() despite the comment for usb_kill_urb() says > > * This routine cancels an in-progress request. It is guaranteed that > * upon return all completion handlers will have finished and the URB > * will be totally idle and available for reuse. These features make > * this an ideal way to stop I/O in a disconnect() callback or close() > * function. If the request has not already finished or been unlinked > * the completion handler will see urb->status == -ENOENT. > > . Is something still wrong? Or just replacing > > BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > !test_bit(WDM_DISCONNECTING, &desc->flags)); > > with > > wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > > in the patch shown below is sufficient? How is the progress? Another debug printk() patch shown below says that while wdm_flush() waits for clearing of WDM_IN_USE using wait_event(), concurrently executed wdm_write() also waits for clearing of WDM_IN_USE using wait_event_interruptible(), and wdm_write() can immediately set WDM_IN_USE again as soon as returning from wait_event_interruptible() even if somebody was already waiting at wdm_flush() to clear WDM_IN_USE. That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb() request which is started after wait_event() found that WDM_IN_USE was cleared. Then, what does this wait_event() in wdm_flush() want to flush? There is no need to wait for completion of usb_submit_urb() ? diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..d2c4d8b9cc18 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -82,6 +82,7 @@ struct wdm_device { spinlock_t iuspin; unsigned long flags; + atomic_t flushers; u16 bufsize; u16 wMaxCommand; u16 wMaxPacketSize; @@ -151,7 +152,7 @@ static void wdm_out_callback(struct urb *urb) kfree(desc->outbuf); desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -417,6 +418,13 @@ static ssize_t wdm_write req->wValue = 0; req->wIndex = desc->inum; /* already converted */ req->wLength = cpu_to_le16(count); + { + const int flushers = atomic_read(&desc->flushers); + if (flushers) { + printk(KERN_ERR "There is %u flushers.\n", flushers); + BUG_ON(1); + } + } set_bit(WDM_IN_USE, &desc->flags); desc->outbuf = buf; @@ -424,6 +432,7 @@ static ssize_t wdm_write if (rv < 0) { desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); + wake_up_all(&desc->wait); dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); goto out_free_mem_pm; @@ -587,15 +596,20 @@ static int wdm_flush(struct file *file, fl_owner_t id) { struct wdm_device *desc = file->private_data; - wait_event(desc->wait, - /* - * needs both flags. We cannot do with one - * because resetting it would cause a race - * with write() yet we need to signal - * a disconnect - */ - !test_bit(WDM_IN_USE, &desc->flags) || - test_bit(WDM_DISCONNECTING, &desc->flags)); + /* + * needs both flags. We cannot do with one because resetting it would + * cause a race with write() yet we need to signal a disconnect + */ + atomic_inc(&desc->flushers); + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { + atomic_dec(&desc->flushers); + kill_urbs(desc); + BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && + !test_bit(WDM_DISCONNECTING, &desc->flags)); + } else { + atomic_dec(&desc->flushers); + } /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) [ 103.934085][ T118] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef [ 103.934093][ T118] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2 [ 103.934100][ T118] usb 5-1: SerialNumber: syz [ 103.934145][ T4108] udc-core: couldn't find an available UDC or it's busy [ 103.934151][ T4108] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.934810][ T4109] udc-core: couldn't find an available UDC or it's busy [ 103.934821][ T4109] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.935385][ T4110] udc-core: couldn't find an available UDC or it's busy [ 103.935392][ T4110] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.935411][ T4106] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.938481][ T4112] udc-core: couldn't find an available UDC or it's busy [ 103.939183][ T4113] udc-core: couldn't find an available UDC or it's busy [ 103.939192][ T4113] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.939415][ T4114] udc-core: couldn't find an available UDC or it's busy [ 103.939421][ T4114] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.943069][ T4115] udc-core: couldn't find an available UDC or it's busy [ 103.943257][ T4116] udc-core: couldn't find an available UDC or it's busy [ 103.943266][ T4116] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.944118][ T4117] udc-core: couldn't find an available UDC or it's busy [ 103.944129][ T4117] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.944160][ T4118] udc-core: couldn't find an available UDC or it's busy [ 103.944169][ T4118] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.944738][ T4112] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.946715][ T4115] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.947159][ T4119] udc-core: couldn't find an available UDC or it's busy [ 103.947165][ T4119] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.947315][ T4120] udc-core: couldn't find an available UDC or it's busy [ 103.947324][ T4120] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.948093][ T4121] udc-core: couldn't find an available UDC or it's busy [ 103.948103][ T4121] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.949190][ T4122] udc-core: couldn't find an available UDC or it's busy [ 103.950070][ T4123] udc-core: couldn't find an available UDC or it's busy [ 103.950080][ T4123] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.950580][ T4124] udc-core: couldn't find an available UDC or it's busy [ 103.950587][ T4124] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.953081][ T4125] udc-core: couldn't find an available UDC or it's busy [ 103.953430][ T4126] udc-core: couldn't find an available UDC or it's busy [ 103.953437][ T4126] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.953963][ T4122] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.954780][ T4128] udc-core: couldn't find an available UDC or it's busy [ 103.954786][ T4128] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.955864][ T4129] udc-core: couldn't find an available UDC or it's busy [ 103.955872][ T4129] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.955962][ T4127] udc-core: couldn't find an available UDC or it's busy [ 103.955968][ T4127] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.956010][ T4125] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.958600][ T4130] udc-core: couldn't find an available UDC or it's busy [ 103.958962][ T4131] udc-core: couldn't find an available UDC or it's busy [ 103.958972][ T4131] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.959737][ T4132] udc-core: couldn't find an available UDC or it's busy [ 103.959749][ T4132] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.963405][ T4133] udc-core: couldn't find an available UDC or it's busy [ 103.963413][ T4133] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.964329][ T4134] udc-core: couldn't find an available UDC or it's busy [ 103.965201][ T4135] udc-core: couldn't find an available UDC or it's busy [ 103.965211][ T4135] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.965511][ T4136] udc-core: couldn't find an available UDC or it's busy [ 103.965520][ T4136] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.966011][ T4130] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.966939][ T4137] udc-core: couldn't find an available UDC or it's busy [ 103.966946][ T4137] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.967467][ T4138] udc-core: couldn't find an available UDC or it's busy [ 103.967473][ T4138] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.967909][ T4134] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.969292][ T4139] udc-core: couldn't find an available UDC or it's busy [ 103.969301][ T4139] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.969708][ T4140] udc-core: couldn't find an available UDC or it's busy [ 103.969717][ T4140] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.970343][ T4141] udc-core: couldn't find an available UDC or it's busy [ 103.973658][ T118] hub 5-1:118.0: bad descriptor, ignoring hub [ 103.974100][ T4142] udc-core: couldn't find an available UDC or it's busy [ 103.974107][ T4142] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.974518][ T4141] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.974657][ T4143] udc-core: couldn't find an available UDC or it's busy [ 103.974663][ T4143] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.975767][ T4144] udc-core: couldn't find an available UDC or it's busy [ 103.975774][ T4144] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.976777][ T118] hub: probe of 5-1:118.0 failed with error -5 [ 103.977257][ T4145] udc-core: couldn't find an available UDC or it's busy [ 103.977263][ T4145] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.978020][ T4146] udc-core: couldn't find an available UDC or it's busy [ 103.978029][ T4146] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.978223][ T4148] udc-core: couldn't find an available UDC or it's busy [ 103.978229][ T4148] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.978388][ T4147] udc-core: couldn't find an available UDC or it's busy [ 103.978396][ T4147] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.980149][ T4149] udc-core: couldn't find an available UDC or it's busy [ 103.982395][ T4151] udc-core: couldn't find an available UDC or it's busy [ 103.982468][ T4152] udc-core: couldn't find an available UDC or it's busy [ 103.982478][ T4152] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.982674][ T4153] udc-core: couldn't find an available UDC or it's busy [ 103.982682][ T4153] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.982780][ T4150] udc-core: couldn't find an available UDC or it's busy [ 103.982788][ T4150] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.983634][ T4149] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.983843][ T118] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device [ 103.984589][ T4155] udc-core: couldn't find an available UDC or it's busy [ 103.984595][ T4155] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.985615][ T4156] udc-core: couldn't find an available UDC or it's busy [ 103.985624][ T4156] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.985747][ T4151] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.985906][ T4157] udc-core: couldn't find an available UDC or it's busy [ 103.985912][ T4157] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.987838][ T4158] udc-core: couldn't find an available UDC or it's busy [ 103.987845][ T4158] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.987955][ T4159] udc-core: couldn't find an available UDC or it's busy [ 103.987962][ T4159] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.988233][ T4160] udc-core: couldn't find an available UDC or it's busy [ 103.988239][ T4160] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.989778][ T4161] udc-core: couldn't find an available UDC or it's busy [ 103.990308][ T4162] udc-core: couldn't find an available UDC or it's busy [ 103.990319][ T4162] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.991055][ T4163] udc-core: couldn't find an available UDC or it's busy [ 103.993669][ T4161] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.994007][ T4164] udc-core: couldn't find an available UDC or it's busy [ 103.994142][ T4165] udc-core: couldn't find an available UDC or it's busy [ 103.994151][ T4165] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.995212][ T4166] udc-core: couldn't find an available UDC or it's busy [ 103.995222][ T4166] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 103.996400][ T4163] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 106.505076][ T4164] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 124.793950][ T3360] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 124.793977][ T4165] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 124.793987][ T4166] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 124.793997][ T4164] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 124.794461][ T4163] cdc_wdm 5-1:118.0: Error in flush path: -2 [ 124.798193][ T4168] udc-core: couldn't find an available UDC or it's busy [ 124.798218][ T4168] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 124.798292][ T4169] udc-core: couldn't find an available UDC or it's busy [ 124.798315][ T4169] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 124.798531][ T4170] udc-core: couldn't find an available UDC or it's busy [ 124.798537][ T4170] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 [ 124.812947][ T4170] There is 2 flushers. [ 124.812973][ T4161] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 124.812982][ T4168] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 124.812987][ T4169] cdc_wdm 5-1:118.0: Error in flush path: -32 [ 124.823862][ T4170] ------------[ cut here ]------------ [ 124.825440][ T4170] kernel BUG at drivers/usb/class/cdc-wdm.c:425! [ 124.827049][ T4170] invalid opcode: 0000 [#1] SMP KASAN [ 124.828439][ T4170] CPU: 3 PID: 4170 Comm: a.out Not tainted 5.7.0-rc5+ #27 [ 124.830192][ T4170] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 124.833192][ T4170] RIP: 0010:wdm_write+0xce7/0xf00 [ 124.834683][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24 [ 124.839523][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286 [ 124.841496][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc [ 124.843673][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec [ 124.846399][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f [ 124.848563][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac [ 124.850559][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000 [ 124.852875][ T4170] FS: 00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000 [ 124.855270][ T4170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 124.857106][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0 [ 124.859118][ T4170] Call Trace: [ 124.860123][ T4170] ? wdm_probe+0x2b0/0x2b0 [ 124.861567][ T4170] ? remove_wait_queue+0x190/0x190 [ 124.862896][ T4170] ? wdm_probe+0x2b0/0x2b0 [ 124.864214][ T4170] __vfs_write+0x76/0x100 [ 124.865314][ T4170] vfs_write+0x185/0x510 [ 124.866418][ T4170] ksys_write+0xff/0x200 [ 124.867653][ T4170] ? __ia32_sys_read+0xb0/0xb0 [ 124.868851][ T4170] ? trace_hardirqs_off_caller+0x22/0x1c0 [ 124.870283][ T4170] do_syscall_64+0x9b/0x510 [ 124.871575][ T4170] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 124.873352][ T4170] RIP: 0033:0x7fbf55cc2469 [ 124.874481][ T4170] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 [ 124.879954][ T4170] RSP: 002b:00007ffdc294dd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 124.882090][ T4170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf55cc2469 [ 124.884573][ T4170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 [ 124.887178][ T4170] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffdc294dd8e [ 124.889941][ T4170] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff [ 124.892208][ T4170] R13: 00000000000193a4 R14: 0000000000000000 R15: 0000000000000000 [ 124.894103][ T4170] Modules linked in: [ 124.895465][ T4170] ---[ end trace 2aea2ce7d52c22e0 ]--- [ 124.897307][ T4170] RIP: 0010:wdm_write+0xce7/0xf00 [ 124.898975][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24 [ 124.905012][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286 [ 124.906793][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc [ 124.909151][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec [ 124.911276][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f [ 124.913784][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac [ 124.916882][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000 [ 124.919078][ T4170] FS: 00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000 [ 124.921484][ T4170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 124.923337][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0 [ 124.925606][ T4170] cdc_wdm 5-1:118.0: Error in flush path: -32
On Thu, Jun 18, 2020 at 2:49 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/06/08 11:24, Tetsuo Handa wrote: > > As far as I tested, it seems that usb_kill_urb() sometimes fails to call > > wdm_out_callback() despite the comment for usb_kill_urb() says > > > > * This routine cancels an in-progress request. It is guaranteed that > > * upon return all completion handlers will have finished and the URB > > * will be totally idle and available for reuse. These features make > > * this an ideal way to stop I/O in a disconnect() callback or close() > > * function. If the request has not already finished or been unlinked > > * the completion handler will see urb->status == -ENOENT. > > > > . Is something still wrong? Or just replacing > > > > BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > > !test_bit(WDM_DISCONNECTING, &desc->flags)); > > > > with > > > > wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > > test_bit(WDM_DISCONNECTING, &desc->flags)); > > > > in the patch shown below is sufficient? > > How is the progress? > > Another debug printk() patch shown below says that while wdm_flush() waits for > clearing of WDM_IN_USE using wait_event(), concurrently executed wdm_write() also > waits for clearing of WDM_IN_USE using wait_event_interruptible(), and wdm_write() can > immediately set WDM_IN_USE again as soon as returning from wait_event_interruptible() > even if somebody was already waiting at wdm_flush() to clear WDM_IN_USE. > > That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb() > request which is started after wait_event() found that WDM_IN_USE was cleared. Then, > what does this wait_event() in wdm_flush() want to flush? There is no need to wait for > completion of usb_submit_urb() ? Oliver, any chance you could help us with fixing the hang in this driver? You seem to be its original author. This hang is one of the top crashers on syzbot, with over 32000 crashed kernels. > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..d2c4d8b9cc18 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -82,6 +82,7 @@ struct wdm_device { > spinlock_t iuspin; > > unsigned long flags; > + atomic_t flushers; > u16 bufsize; > u16 wMaxCommand; > u16 wMaxPacketSize; > @@ -151,7 +152,7 @@ static void wdm_out_callback(struct urb *urb) > kfree(desc->outbuf); > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -417,6 +418,13 @@ static ssize_t wdm_write > req->wValue = 0; > req->wIndex = desc->inum; /* already converted */ > req->wLength = cpu_to_le16(count); > + { > + const int flushers = atomic_read(&desc->flushers); > + if (flushers) { > + printk(KERN_ERR "There is %u flushers.\n", flushers); > + BUG_ON(1); > + } > + } > set_bit(WDM_IN_USE, &desc->flags); > desc->outbuf = buf; > > @@ -424,6 +432,7 @@ static ssize_t wdm_write > if (rv < 0) { > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > + wake_up_all(&desc->wait); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > goto out_free_mem_pm; > @@ -587,15 +596,20 @@ static int wdm_flush(struct file *file, fl_owner_t id) > { > struct wdm_device *desc = file->private_data; > > - wait_event(desc->wait, > - /* > - * needs both flags. We cannot do with one > - * because resetting it would cause a race > - * with write() yet we need to signal > - * a disconnect > - */ > - !test_bit(WDM_IN_USE, &desc->flags) || > - test_bit(WDM_DISCONNECTING, &desc->flags)); > + /* > + * needs both flags. We cannot do with one because resetting it would > + * cause a race with write() yet we need to signal a disconnect > + */ > + atomic_inc(&desc->flushers); > + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { > + atomic_dec(&desc->flushers); > + kill_urbs(desc); > + BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > + !test_bit(WDM_DISCONNECTING, &desc->flags)); > + } else { > + atomic_dec(&desc->flushers); > + } > > /* cannot dereference desc->intf if WDM_DISCONNECTING */ > if (test_bit(WDM_DISCONNECTING, &desc->flags)) > > > > [ 103.934085][ T118] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef > [ 103.934093][ T118] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2 > [ 103.934100][ T118] usb 5-1: SerialNumber: syz > [ 103.934145][ T4108] udc-core: couldn't find an available UDC or it's busy > [ 103.934151][ T4108] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.934810][ T4109] udc-core: couldn't find an available UDC or it's busy > [ 103.934821][ T4109] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.935385][ T4110] udc-core: couldn't find an available UDC or it's busy > [ 103.935392][ T4110] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.935411][ T4106] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.938481][ T4112] udc-core: couldn't find an available UDC or it's busy > [ 103.939183][ T4113] udc-core: couldn't find an available UDC or it's busy > [ 103.939192][ T4113] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.939415][ T4114] udc-core: couldn't find an available UDC or it's busy > [ 103.939421][ T4114] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.943069][ T4115] udc-core: couldn't find an available UDC or it's busy > [ 103.943257][ T4116] udc-core: couldn't find an available UDC or it's busy > [ 103.943266][ T4116] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.944118][ T4117] udc-core: couldn't find an available UDC or it's busy > [ 103.944129][ T4117] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.944160][ T4118] udc-core: couldn't find an available UDC or it's busy > [ 103.944169][ T4118] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.944738][ T4112] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.946715][ T4115] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.947159][ T4119] udc-core: couldn't find an available UDC or it's busy > [ 103.947165][ T4119] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.947315][ T4120] udc-core: couldn't find an available UDC or it's busy > [ 103.947324][ T4120] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.948093][ T4121] udc-core: couldn't find an available UDC or it's busy > [ 103.948103][ T4121] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.949190][ T4122] udc-core: couldn't find an available UDC or it's busy > [ 103.950070][ T4123] udc-core: couldn't find an available UDC or it's busy > [ 103.950080][ T4123] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.950580][ T4124] udc-core: couldn't find an available UDC or it's busy > [ 103.950587][ T4124] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.953081][ T4125] udc-core: couldn't find an available UDC or it's busy > [ 103.953430][ T4126] udc-core: couldn't find an available UDC or it's busy > [ 103.953437][ T4126] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.953963][ T4122] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.954780][ T4128] udc-core: couldn't find an available UDC or it's busy > [ 103.954786][ T4128] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.955864][ T4129] udc-core: couldn't find an available UDC or it's busy > [ 103.955872][ T4129] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.955962][ T4127] udc-core: couldn't find an available UDC or it's busy > [ 103.955968][ T4127] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.956010][ T4125] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.958600][ T4130] udc-core: couldn't find an available UDC or it's busy > [ 103.958962][ T4131] udc-core: couldn't find an available UDC or it's busy > [ 103.958972][ T4131] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.959737][ T4132] udc-core: couldn't find an available UDC or it's busy > [ 103.959749][ T4132] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.963405][ T4133] udc-core: couldn't find an available UDC or it's busy > [ 103.963413][ T4133] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.964329][ T4134] udc-core: couldn't find an available UDC or it's busy > [ 103.965201][ T4135] udc-core: couldn't find an available UDC or it's busy > [ 103.965211][ T4135] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.965511][ T4136] udc-core: couldn't find an available UDC or it's busy > [ 103.965520][ T4136] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.966011][ T4130] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.966939][ T4137] udc-core: couldn't find an available UDC or it's busy > [ 103.966946][ T4137] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.967467][ T4138] udc-core: couldn't find an available UDC or it's busy > [ 103.967473][ T4138] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.967909][ T4134] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.969292][ T4139] udc-core: couldn't find an available UDC or it's busy > [ 103.969301][ T4139] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.969708][ T4140] udc-core: couldn't find an available UDC or it's busy > [ 103.969717][ T4140] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.970343][ T4141] udc-core: couldn't find an available UDC or it's busy > [ 103.973658][ T118] hub 5-1:118.0: bad descriptor, ignoring hub > [ 103.974100][ T4142] udc-core: couldn't find an available UDC or it's busy > [ 103.974107][ T4142] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.974518][ T4141] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.974657][ T4143] udc-core: couldn't find an available UDC or it's busy > [ 103.974663][ T4143] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.975767][ T4144] udc-core: couldn't find an available UDC or it's busy > [ 103.975774][ T4144] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.976777][ T118] hub: probe of 5-1:118.0 failed with error -5 > [ 103.977257][ T4145] udc-core: couldn't find an available UDC or it's busy > [ 103.977263][ T4145] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.978020][ T4146] udc-core: couldn't find an available UDC or it's busy > [ 103.978029][ T4146] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.978223][ T4148] udc-core: couldn't find an available UDC or it's busy > [ 103.978229][ T4148] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.978388][ T4147] udc-core: couldn't find an available UDC or it's busy > [ 103.978396][ T4147] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.980149][ T4149] udc-core: couldn't find an available UDC or it's busy > [ 103.982395][ T4151] udc-core: couldn't find an available UDC or it's busy > [ 103.982468][ T4152] udc-core: couldn't find an available UDC or it's busy > [ 103.982478][ T4152] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.982674][ T4153] udc-core: couldn't find an available UDC or it's busy > [ 103.982682][ T4153] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.982780][ T4150] udc-core: couldn't find an available UDC or it's busy > [ 103.982788][ T4150] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.983634][ T4149] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.983843][ T118] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device > [ 103.984589][ T4155] udc-core: couldn't find an available UDC or it's busy > [ 103.984595][ T4155] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.985615][ T4156] udc-core: couldn't find an available UDC or it's busy > [ 103.985624][ T4156] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.985747][ T4151] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.985906][ T4157] udc-core: couldn't find an available UDC or it's busy > [ 103.985912][ T4157] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.987838][ T4158] udc-core: couldn't find an available UDC or it's busy > [ 103.987845][ T4158] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.987955][ T4159] udc-core: couldn't find an available UDC or it's busy > [ 103.987962][ T4159] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.988233][ T4160] udc-core: couldn't find an available UDC or it's busy > [ 103.988239][ T4160] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.989778][ T4161] udc-core: couldn't find an available UDC or it's busy > [ 103.990308][ T4162] udc-core: couldn't find an available UDC or it's busy > [ 103.990319][ T4162] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.991055][ T4163] udc-core: couldn't find an available UDC or it's busy > [ 103.993669][ T4161] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.994007][ T4164] udc-core: couldn't find an available UDC or it's busy > [ 103.994142][ T4165] udc-core: couldn't find an available UDC or it's busy > [ 103.994151][ T4165] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.995212][ T4166] udc-core: couldn't find an available UDC or it's busy > [ 103.995222][ T4166] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 103.996400][ T4163] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 106.505076][ T4164] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 124.793950][ T3360] cdc_wdm 5-1:118.0: Error in flush path: -2 > [ 124.793977][ T4165] cdc_wdm 5-1:118.0: Error in flush path: -2 > [ 124.793987][ T4166] cdc_wdm 5-1:118.0: Error in flush path: -2 > [ 124.793997][ T4164] cdc_wdm 5-1:118.0: Error in flush path: -2 > [ 124.794461][ T4163] cdc_wdm 5-1:118.0: Error in flush path: -2 > [ 124.798193][ T4168] udc-core: couldn't find an available UDC or it's busy > [ 124.798218][ T4168] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 124.798292][ T4169] udc-core: couldn't find an available UDC or it's busy > [ 124.798315][ T4169] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 124.798531][ T4170] udc-core: couldn't find an available UDC or it's busy > [ 124.798537][ T4170] misc raw-gadget: fail, usb_gadget_probe_driver returned -19 > [ 124.812947][ T4170] There is 2 flushers. > [ 124.812973][ T4161] cdc_wdm 5-1:118.0: Error in flush path: -32 > [ 124.812982][ T4168] cdc_wdm 5-1:118.0: Error in flush path: -32 > [ 124.812987][ T4169] cdc_wdm 5-1:118.0: Error in flush path: -32 > [ 124.823862][ T4170] ------------[ cut here ]------------ > [ 124.825440][ T4170] kernel BUG at drivers/usb/class/cdc-wdm.c:425! > [ 124.827049][ T4170] invalid opcode: 0000 [#1] SMP KASAN > [ 124.828439][ T4170] CPU: 3 PID: 4170 Comm: a.out Not tainted 5.7.0-rc5+ #27 > [ 124.830192][ T4170] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 > [ 124.833192][ T4170] RIP: 0010:wdm_write+0xce7/0xf00 > [ 124.834683][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24 > [ 124.839523][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286 > [ 124.841496][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc > [ 124.843673][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec > [ 124.846399][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f > [ 124.848563][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac > [ 124.850559][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000 > [ 124.852875][ T4170] FS: 00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000 > [ 124.855270][ T4170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 124.857106][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0 > [ 124.859118][ T4170] Call Trace: > [ 124.860123][ T4170] ? wdm_probe+0x2b0/0x2b0 > [ 124.861567][ T4170] ? remove_wait_queue+0x190/0x190 > [ 124.862896][ T4170] ? wdm_probe+0x2b0/0x2b0 > [ 124.864214][ T4170] __vfs_write+0x76/0x100 > [ 124.865314][ T4170] vfs_write+0x185/0x510 > [ 124.866418][ T4170] ksys_write+0xff/0x200 > [ 124.867653][ T4170] ? __ia32_sys_read+0xb0/0xb0 > [ 124.868851][ T4170] ? trace_hardirqs_off_caller+0x22/0x1c0 > [ 124.870283][ T4170] do_syscall_64+0x9b/0x510 > [ 124.871575][ T4170] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > [ 124.873352][ T4170] RIP: 0033:0x7fbf55cc2469 > [ 124.874481][ T4170] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 > [ 124.879954][ T4170] RSP: 002b:00007ffdc294dd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 124.882090][ T4170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf55cc2469 > [ 124.884573][ T4170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 > [ 124.887178][ T4170] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffdc294dd8e > [ 124.889941][ T4170] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > [ 124.892208][ T4170] R13: 00000000000193a4 R14: 0000000000000000 R15: 0000000000000000 > [ 124.894103][ T4170] Modules linked in: > [ 124.895465][ T4170] ---[ end trace 2aea2ce7d52c22e0 ]--- > [ 124.897307][ T4170] RIP: 0010:wdm_write+0xce7/0xf00 > [ 124.898975][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24 > [ 124.905012][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286 > [ 124.906793][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc > [ 124.909151][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec > [ 124.911276][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f > [ 124.913784][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac > [ 124.916882][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000 > [ 124.919078][ T4170] FS: 00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000 > [ 124.921484][ T4170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 124.923337][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0 > [ 124.925606][ T4170] cdc_wdm 5-1:118.0: Error in flush path: -32 >
On 2020/06/19 22:56, Andrey Konovalov wrote: > Oliver, any chance you could help us with fixing the hang in this > driver? You seem to be its original author. This hang is one of the > top crashers on syzbot, with over 32000 crashed kernels. > Yes, I think that wdm_flush() has another bug and wdm_write() has yet another bug. I need the authors' comments. wdm_flush() says /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) return -ENODEV; if (desc->werr < 0) dev_err(&desc->intf->dev, "Error in flush path: %d\n", desc->werr); but it seems to me that nothing guarantees that test_bit(WDM_DISCONNECTING) == false indicates dereferencing desc->intf->dev is safe, for wdm_flush() tests WDM_DISCONNECTING without any lock whereas wdm_disconnect() sets WDM_DISCONNECTING under wdm_mutex and desc->iuspin held. It might be safe to dereference from wdm_release() which holds wdm_mutex. Also, if wait_event() in wdm_flush() might fail to wake up (due to close() dependency problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ? I feel that wdm_flush() is so bogus (which could/should be removed).
Am Samstag, den 30.05.2020, 13:58 +0900 schrieb Tetsuo Handa: Hi, sorry for taking this long. There has been a family emergency. > > The kernel most definitely does need to protect itself against > > misbehaving hardware. Let's just leave it at that. If you don't > > believe me, ask Greg KH. > > I've made many locations killable (in order to reduce damage caused by OOM > condition). But I can't make locations killable where handling SIGKILL case is > too difficult to implement. We can make flush interruptible. But that will not do the job. We would get a file that cannot be closed. > "struct file_operations"->flush() is called from filp_close() when there is > something which has to be done before "struct file_operations"->release() is > called. Yes, in particular error reporting. Without flush() there is no way to know whether the last write() has actually worked. > As far as I read this thread, what you are trying to do sounds like allow > "not waiting for completion of wdm_out_callback()" with only > 's/wait_event/wait_event_intrruptible/' in wdm_flush(). Then, please do remove > wdm_flush() call itself. That would break error reporting. That flush() waits for IO to complete is basically a side effect. You can know whether IO has worked after it is finished. > Therefore, again, please show me as a patch first. Sure, attached. The difficulty here is that I see three possible interacting errors, two of which are races. Regards Oliver
On 2020/06/24 20:57, Oliver Neukum wrote: > Am Samstag, den 30.05.2020, 13:58 +0900 schrieb Tetsuo Handa: > > Hi, > > sorry for taking this long. There has been a family emergency. No problem. Thank you for responding. >> Therefore, again, please show me as a patch first. > > Sure, attached. The difficulty here is that I see three possible > interacting errors, two of which are races. This thread grew long, and you have many posts to read up to https://lkml.kernel.org/r/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp .
Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa: Hi, sorry for being late in reply. I have had an emergency to take care of. > On 2020/05/31 0:47, Alan Stern wrote: > > On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote: > > > Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: > > > > This sounds like a bug in the driver. What would it do if someone had a > > > > > > Arguably yes. I will introduce a timeout. Unfortunately flush() > > > requires a non-interruptible sleep, as you cannot sanely return EAGAIN. > > > > But maybe you can kill some URBs and drop some data. > > You mean call usb_kill_urb() via kill_urbs() ? I have to correct myself. We can return -EINTR. But that is no solution ultimately. We could not close the fd, though we would not hang. > As far as I tested, it seems that usb_kill_urb() sometimes fails to call > wdm_out_callback() despite the comment for usb_kill_urb() says > > * This routine cancels an in-progress request. It is guaranteed that > * upon return all completion handlers will have finished and the URB > * will be totally idle and available for reuse. These features make > * this an ideal way to stop I/O in a disconnect() callback or close() > * function. If the request has not already finished or been unlinked > * the completion handler will see urb->status == -ENOENT. It looks like it does exactly as the description says. Cancelling an URB is by necessity a race condition. It can always finish before you can kill it. > . Is something still wrong? Or just replacing > > BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > !test_bit(WDM_DISCONNECTING, &desc->flags)); > > with > > wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > > in the patch shown below is sufficient? > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..3e92e79ce0a0 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) > kfree(desc->outbuf); > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -424,6 +424,7 @@ static ssize_t wdm_write > if (rv < 0) { > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > + wake_up_all(&desc->wait); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > goto out_free_mem_pm; > @@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) > { > struct wdm_device *desc = file->private_data; > > - wait_event(desc->wait, > - /* > - * needs both flags. We cannot do with one > - * because resetting it would cause a race > - * with write() yet we need to signal > - * a disconnect > - */ > - !test_bit(WDM_IN_USE, &desc->flags) || > - test_bit(WDM_DISCONNECTING, &desc->flags)); > + /* > + * needs both flags. We cannot do with one because resetting it would > + * cause a race with write() yet we need to signal a disconnect > + */ > + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { > + kill_urbs(desc); No. We cannot just kill all URBs just because one fd's owner wants to flush. In fact we have multiple code paths that can reach the same hang. Could you test the attached patches? Regards Oliver
On 2020/06/25 18:56, Oliver Neukum wrote: > I have to correct myself. We can return -EINTR. > But that is no solution ultimately. We could not close the fd, > though we would not hang. Returning -EINTR upon close() is legal, but I think we should avoid it because it is not safe for userspace process to retry close() upon -EINTR. > In fact we have multiple code paths that can reach the same hang. > Could you test the attached patches? Have you already read https://lore.kernel.org/linux-usb/254939d4-f3a1-8c7e-94e5-9862c02774fa@i-love.sakura.ne.jp/ and https://lore.kernel.org/linux-usb/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp/ ?
Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa: I am having mail troubles. This is a test, as mail seems not to be delivered in this thread specifically. Sorry Oliver
On 2020/06/23 20:20, Tetsuo Handa wrote: > Also, if wait_event() in wdm_flush() might fail to wake up (due to close() dependency > problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might > also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we > wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ? > > I feel that wdm_flush() is so bogus (which could/should be removed). > I'd like to simplify like below. diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..9631d1054799 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) kfree(desc->outbuf); desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -426,6 +426,7 @@ static ssize_t wdm_write clear_bit(WDM_IN_USE, &desc->flags); dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); + wake_up_all(&desc->wait); goto out_free_mem_pm; } else { dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n", @@ -434,6 +435,24 @@ static ssize_t wdm_write usb_autopm_put_interface(desc->intf); mutex_unlock(&desc->wlock); + if (rv >= 0 && + /* + * needs both flags. We cannot do with one + * because resetting it would cause a race + * with write() yet we need to signal + * a disconnect + */ + wait_event_killable_timeout(desc->wait, + !test_bit(WDM_IN_USE, &desc->flags) || + test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) { + if (mutex_lock_killable(&desc->wlock) == 0) { + if (!test_bit(WDM_DISCONNECTING, &desc->flags)) + dev_err(&desc->intf->dev, + "Tx URB not responding index=%d\n", + le16_to_cpu(req->wIndex)); + mutex_unlock(&desc->wlock); + } + } return count; out_free_mem_pm: @@ -583,30 +602,6 @@ static ssize_t wdm_read return rv; } -static int wdm_flush(struct file *file, fl_owner_t id) -{ - struct wdm_device *desc = file->private_data; - - wait_event(desc->wait, - /* - * needs both flags. We cannot do with one - * because resetting it would cause a race - * with write() yet we need to signal - * a disconnect - */ - !test_bit(WDM_IN_USE, &desc->flags) || - test_bit(WDM_DISCONNECTING, &desc->flags)); - - /* cannot dereference desc->intf if WDM_DISCONNECTING */ - if (test_bit(WDM_DISCONNECTING, &desc->flags)) - return -ENODEV; - if (desc->werr < 0) - dev_err(&desc->intf->dev, "Error in flush path: %d\n", - desc->werr); - - return usb_translate_errors(desc->werr); -} - static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait) { struct wdm_device *desc = file->private_data; @@ -730,7 +725,6 @@ static const struct file_operations wdm_fops = { .read = wdm_read, .write = wdm_write, .open = wdm_open, - .flush = wdm_flush, .release = wdm_release, .poll = wdm_poll, .unlocked_ioctl = wdm_ioctl,
Am Donnerstag, den 02.07.2020, 14:44 +0900 schrieb Tetsuo Handa: > > usb_autopm_put_interface(desc->intf) > mutex_unlock(&desc->wlock); > + if (rv >= 0 && > + /* > + * needs both flags. We cannot do with one > + * because resetting it would cause a race > + * with write() yet we need to signal > + * a disconnect > + */ > + wait_event_killable_timeout(desc->wait, > + !test_bit(WDM_IN_USE, &desc->flags) || > + test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) { > + if (mutex_lock_killable(&desc->wlock) == 0) { > + if (!test_bit(WDM_DISCONNECTING, &desc->flags)) > + dev_err(&desc->intf->dev, > + "Tx URB not responding index=%d\n", > + le16_to_cpu(req->wIndex)); > + mutex_unlock(&desc->wlock); > + } > + } Hi, I am afraid this would 1. serialize the driver, harming performance 2. introduce a race with every timer a task is running Regards Oliver
On 2020/07/02 16:24, Oliver Neukum wrote: > Am Donnerstag, den 02.07.2020, 14:44 +0900 schrieb Tetsuo Handa: > >> >> usb_autopm_put_interface(desc->intf) >> mutex_unlock(&desc->wlock); >> + if (rv >= 0 && >> + /* >> + * needs both flags. We cannot do with one >> + * because resetting it would cause a race >> + * with write() yet we need to signal >> + * a disconnect >> + */ >> + wait_event_killable_timeout(desc->wait, >> + !test_bit(WDM_IN_USE, &desc->flags) || >> + test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) { >> + if (mutex_lock_killable(&desc->wlock) == 0) { >> + if (!test_bit(WDM_DISCONNECTING, &desc->flags)) >> + dev_err(&desc->intf->dev, >> + "Tx URB not responding index=%d\n", >> + le16_to_cpu(req->wIndex)); >> + mutex_unlock(&desc->wlock); >> + } >> + } > > Hi, > > I am afraid this would > > 1. serialize the driver, harming performance Is wdm_write() called so frequently (e.g. 1000 times per one second) ? > 2. introduce a race with every timer a task is running What is estimated response time from usb_submit_urb() to wdm_out_callback() ? Can it be many seconds? I didn't try your patches at https://lkml.kernel.org/r/1593078968.28236.15.camel@suse.com because it seems to me that your patch does not answer my 3 concerns: (1) wdm_flush() says /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) return -ENODEV; if (desc->werr < 0) dev_err(&desc->intf->dev, "Error in flush path: %d\n", desc->werr); but it seems to me that nothing guarantees that test_bit(WDM_DISCONNECTING) == false indicates dereferencing desc->intf->dev is safe, for wdm_flush() tests WDM_DISCONNECTING without any lock whereas wdm_disconnect() sets WDM_DISCONNECTING under wdm_mutex and desc->iuspin held. It might be safe to dereference from wdm_release() which holds wdm_mutex. (2) If wait_event() in wdm_flush() might fail to wake up (due to close() dependency problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ? (3) While wdm_flush() waits for clearing of WDM_IN_USE using wait_event(), concurrently executed wdm_write() also waits for clearing of WDM_IN_USE using wait_event_interruptible(), and wdm_write() can immediately set WDM_IN_USE again as soon as returning from wait_event_interruptible() even if somebody was already waiting at wdm_flush() to clear WDM_IN_USE. That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb() request which is started after wait_event() found that WDM_IN_USE was cleared. Then, why does this wait_event() in wdm_flush() want to flush which current thread might not have issued? Current thread synchronously waits for completion of wdm_out_callback() issued by current thread's usb_submit_urb() request might make sense. But how much value is there for current thread waits for completion of wdm_out_callback() issued by other thread's usb_submit_urb() request? Multiple threads can use the same "desc" pointer, and trying to flush upon close() by each thread using that "desc" pointer... If synchronous waiting harms performance so much, do we want to know the error at all? wdm_write() already returned success (the number of bytes passed to write()). And there is no guarantee that the error code which current thread will receive from wmd_flush() corresponds with a request current thread has issued? I'm skeptical about the value of trying to synchronously return an error code for wmd_write() request to the caller. I'm really inclined to remove wdm_flush() completely.
On 2020/07/15 15:15, Tetsuo Handa wrote: > I'm skeptical about the value of trying to synchronously return an error code for > wmd_write() request to the caller. I'm really inclined to remove wdm_flush() completely. > No response. It seems to me that this module is orphaned. Can we go with simply removing wdm_flush() completely?
diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..bf2a6deb7b08 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) kfree(desc->outbuf); desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -426,6 +426,7 @@ static ssize_t wdm_write clear_bit(WDM_IN_USE, &desc->flags); dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); + wake_up_all(&desc->wait); goto out_free_mem_pm; } else { dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n",
Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit, make sure to call wake_up_all() when clearing WDM_IN_USE bit. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- drivers/usb/class/cdc-wdm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)