diff mbox series

USB: cdc-wdm: Call wake_up_all() when clearing WDM_IN_USE bit.

Message ID 20200520233129.3704-1-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series USB: cdc-wdm: Call wake_up_all() when clearing WDM_IN_USE bit. | expand

Commit Message

Tetsuo Handa May 20, 2020, 11:31 p.m. UTC
Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit,
make sure to call wake_up_all() when clearing WDM_IN_USE bit.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/usb/class/cdc-wdm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Greg KH May 21, 2020, 7:33 a.m. UTC | #1
On Thu, May 21, 2020 at 08:31:29AM +0900, Tetsuo Handa wrote:
> Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit,
> make sure to call wake_up_all() when clearing WDM_IN_USE bit.

I'm sorry, but I don't understand what this changelog text is trying to
say.  Can you reword this differently?

> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

And is this a bugfix?  For what?  Does it need to go to stable kernels?

> ---
>  drivers/usb/class/cdc-wdm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index e3db6fbeadef..bf2a6deb7b08 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb)
>  	kfree(desc->outbuf);
>  	desc->outbuf = NULL;
>  	clear_bit(WDM_IN_USE, &desc->flags);
> -	wake_up(&desc->wait);
> +	wake_up_all(&desc->wait);
>  }
>  
>  static void wdm_in_callback(struct urb *urb)
> @@ -426,6 +426,7 @@ static ssize_t wdm_write
>  		clear_bit(WDM_IN_USE, &desc->flags);
>  		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
>  		rv = usb_translate_errors(rv);
> +		wake_up_all(&desc->wait);

Is this a fix for two different issues?

confused,

greg k-h
Tetsuo Handa May 21, 2020, 10:01 a.m. UTC | #2
On 2020/05/21 16:33, Greg KH wrote:
> On Thu, May 21, 2020 at 08:31:29AM +0900, Tetsuo Handa wrote:
>> Since e.g. wdm_flush() calls wait_event() on WDM_IN_USE bit,
>> make sure to call wake_up_all() when clearing WDM_IN_USE bit.
> 
> I'm sorry, but I don't understand what this changelog text is trying to
> say.  Can you reword this differently?

wait_event("condition") wants corresponding wake_up() (or its variant)
after making "condition" == true. For example, wdm_flush() calls

	wait_event(desc->wait,
			/*
			 * needs both flags. We cannot do with one
			 * because resetting it would cause a race
			 * with write() yet we need to signal
			 * a disconnect
			 */
			!test_bit(WDM_IN_USE, &desc->flags) ||
			test_bit(WDM_DISCONNECTING, &desc->flags));

but wdm_write() is not calling wake_up(&desc->wait) after
clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed.

> And is this a bugfix?  For what?  Does it need to go to stable kernels?

Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(),
but I think that this patch won't fix these bugs. Therefore, I don't think
this patch needs to go to stable kernels.
Oliver Neukum May 21, 2020, 7:50 p.m. UTC | #3
Am Donnerstag, den 21.05.2020, 19:01 +0900 schrieb Tetsuo Handa:
> 


Hi,

interesting. Do you have a test case for these patches working?

> 	wait_event(desc->wait,
> 			/*
> 			 * needs both flags. We cannot do with one
> 			 * because resetting it would cause a race
> 			 * with write() yet we need to signal
> 			 * a disconnect
> 			 */
> 			!test_bit(WDM_IN_USE, &desc->flags) ||
> 			test_bit(WDM_DISCONNECTING, &desc->flags));
> 
> but wdm_write() is not calling wake_up(&desc->wait) after
> clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed.

Yes, because desc->wlock is held. There can be nobody sleeping here.

> > And is this a bugfix?  For what?  Does it need to go to stable kernels?
> 
> Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(),
> but I think that this patch won't fix these bugs. Therefore, I don't think
> this patch needs to go to stable kernels.

Do you have links for them?

	Regards
		Oliver
Tetsuo Handa May 21, 2020, 10:48 p.m. UTC | #4
On 2020/05/22 4:50, Oliver Neukum wrote:
> interesting. Do you have a test case for these patches working?

No.

> 
>> 	wait_event(desc->wait,
>> 			/*
>> 			 * needs both flags. We cannot do with one
>> 			 * because resetting it would cause a race
>> 			 * with write() yet we need to signal
>> 			 * a disconnect
>> 			 */
>> 			!test_bit(WDM_IN_USE, &desc->flags) ||
>> 			test_bit(WDM_DISCONNECTING, &desc->flags));
>>
>> but wdm_write() is not calling wake_up(&desc->wait) after
>> clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed.
> 
> Yes, because desc->wlock is held. There can be nobody sleeping here.

Then, this patch is not needed. (But adding some comment is welcomed.)

> 
>>> And is this a bugfix?  For what?  Does it need to go to stable kernels?
>>
>> Potential bugfix. syzbot is reporting two bugs that hung at wdm_flush(),
>> but I think that this patch won't fix these bugs. Therefore, I don't think
>> this patch needs to go to stable kernels.
> 
> Do you have links for them?

It turned out that the second bug report is a duplicate of the first bug report.
The dashboard link for the first bug report is
https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 .
Oliver Neukum May 22, 2020, 8:04 a.m. UTC | #5
Am Freitag, den 22.05.2020, 07:48 +0900 schrieb Tetsuo Handa:


Hi,

I looked at your patch again and I am impressed and I need to
apologize. I looked at only the intended use, but overlooked
the unintended use.

I think we need to work on the description of the patch though.

> On 2020/05/22 4:50, Oliver Neukum wrote:
> > interesting. Do you have a test case for these patches working?
> 
> No.

Yes, going through the logs it looks that you need to trigger
an error case for this race to strike.

> > > 	wait_event(desc->wait,
> > > 			/*
> > > 			 * needs both flags. We cannot do with one
> > > 			 * because resetting it would cause a race
> > > 			 * with write() yet we need to signal
> > > 			 * a disconnect
> > > 			 */
> > > 			!test_bit(WDM_IN_USE, &desc->flags) ||
> > > 			test_bit(WDM_DISCONNECTING, &desc->flags));
> > > 
> > > but wdm_write() is not calling wake_up(&desc->wait) after
> > > clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed.
> > 
> > Yes, because desc->wlock is held. There can be nobody sleeping here.
> 
> Then, this patch is not needed. (But adding some comment is welcomed.)

OK and here I screwed up. wlock is held in wdm_write(), but not in
wdm_flush()

So may I suggest the following log:

--

WDM_IN_USE is used in wdm_write() to protect against concurrent writes
and in wdm_flush() to wait for all messages to flush, so that errors
are not lost. The former use is guarded by a mutex, the later, as
it does no IO, is not.
In the error case of wdm_write(), however, there is a race, which
can make wdm_flush() wait for IO that was supposed to be started
but was never started due to an error condition.
Hence if an error is detected in wdm_write() after WDM_IN_USE was
toggeled, all potential waiters must be woken. As multiple tasks
can be in wdm_flush, wake_up_all() must be used in all cases
WDM_IN_USE is reset.

--

May I ask you to redo the patch with comments added stating
that the wake up is done for the sake of wdm_flush(), change
the description and add the link to syzkaller?

> > > > And is this a bugfix?  For what?  Does it need to go to stable kernels?

Yes, it is. The bug is ancient. It goes back to afba937e540c9,
which introduced the driver.


Again, thank you for this impressive piece of debugging.

	Regards
		Oliver
Tetsuo Handa May 22, 2020, 8:26 a.m. UTC | #6
On 2020/05/22 17:04, Oliver Neukum wrote:
> May I ask you to redo the patch with comments added stating
> that the wake up is done for the sake of wdm_flush(), change
> the description and add the link to syzkaller?

You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at
https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 ,
but this patch did not solve the problem syzbot has found. Thus, I don't add
a link to syzkaller...
Oliver Neukum May 25, 2020, 12:06 p.m. UTC | #7
Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa:
> On 2020/05/22 17:04, Oliver Neukum wrote:
> > May I ask you to redo the patch with comments added stating
> > that the wake up is done for the sake of wdm_flush(), change
> > the description and add the link to syzkaller?
> 
> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at
> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 ,
> but this patch did not solve the problem syzbot has found. Thus, I don't add
> a link to syzkaller...

Hi,

this is odd. I looked at the driver and it looks to me like
pre/post_reset() is the only other place that touches WDM_IN_USE.
And it does so correctly. Any idea what could be wrong?

	Regards
		Oliver
Tetsuo Handa May 25, 2020, 1:32 p.m. UTC | #8
On 2020/05/25 21:06, Oliver Neukum wrote:
> Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa:
>> On 2020/05/22 17:04, Oliver Neukum wrote:
>>> May I ask you to redo the patch with comments added stating
>>> that the wake up is done for the sake of wdm_flush(), change
>>> the description and add the link to syzkaller?
>>
>> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at
>> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 ,
>> but this patch did not solve the problem syzbot has found. Thus, I don't add
>> a link to syzkaller...
> 
> Hi,
> 
> this is odd. I looked at the driver and it looks to me like
> pre/post_reset() is the only other place that touches WDM_IN_USE.
> And it does so correctly. Any idea what could be wrong?

I haven't understood the root cause of this problem. You were in the thread at
https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ .
My patch was made from that thread (which simply makes sure to call wake_up_all()
when clearing WDM_IN_USE bit) apart from the root cause of this problem.
Tetsuo Handa May 27, 2020, 4:47 a.m. UTC | #9
On 2020/05/25 22:32, Tetsuo Handa wrote:
> On 2020/05/25 21:06, Oliver Neukum wrote:
>> Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa:
>>> On 2020/05/22 17:04, Oliver Neukum wrote:
>>>> May I ask you to redo the patch with comments added stating
>>>> that the wake up is done for the sake of wdm_flush(), change
>>>> the description and add the link to syzkaller?
>>>
>>> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at
>>> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 ,
>>> but this patch did not solve the problem syzbot has found. Thus, I don't add
>>> a link to syzkaller...
>>
>> Hi,
>>
>> this is odd. I looked at the driver and it looks to me like
>> pre/post_reset() is the only other place that touches WDM_IN_USE.
>> And it does so correctly. Any idea what could be wrong?
> 
> I haven't understood the root cause of this problem. You were in the thread at
> https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ .
> My patch was made from that thread (which simply makes sure to call wake_up_all()
> when clearing WDM_IN_USE bit) apart from the root cause of this problem.
> 

I might have found what is wrong.

My understanding is that a process using /dev/raw-gadget is responsible for
reacting to every USB request. I don't know whether /dev/raw-gadget already
provides callback for aborting the in-flight USB requests (in order to resume
wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or
implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget
already provides such callback in the following paragraphs.

Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and
/dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into
deadlock condition due to the need to close both files when the reproducer is
killed. My guess is that since that process is stuck at wdm_flush() (due to
explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL),
that process cannot react to USB requests which are needed for resuming wdm_flush().
Unexpectedly blocking a process which is responsible for reacting to USB requests
will look as if it is a broken hardware.

A debug printk() patch (shown below) says that the desc->command urb issued by
wdm_write() never gets wdm_out_callback(). I guess that it is an evidence of
such dependency exists, and I think that a process had better avoid opening both
/dev/raw-gadget and /dev/cdc-wdm0 at the same time.

----------------------------------------
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 5b0bd8581fe6..e352b54dbbf2 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -320,12 +320,19 @@ EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
 unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
 {
+	struct task_struct *task = state->task;
+
 	if (unwind_done(state))
 		return NULL;
 
 	if (state->regs)
 		return &state->regs->ip;
 
+	if (task != current && state->sp == task->thread.sp) {
+		struct inactive_task_frame *frame = (void *)task->thread.sp;
+		return &frame->ret_addr;
+	}
+
 	if (state->sp)
 		return (unsigned long *)state->sp - 1;
 
diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index e3db6fbeadef..5d28a1a2521d 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -150,8 +150,9 @@ static void wdm_out_callback(struct urb *urb)
 	spin_unlock_irqrestore(&desc->iuspin, flags);
 	kfree(desc->outbuf);
 	desc->outbuf = NULL;
+	printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
 	clear_bit(WDM_IN_USE, &desc->flags);
-	wake_up(&desc->wait);
+	wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -417,17 +418,21 @@ static ssize_t wdm_write
 	req->wValue = 0;
 	req->wIndex = desc->inum; /* already converted */
 	req->wLength = cpu_to_le16(count);
+	printk("%s : set WDM_IN_USE %px %px\n", __func__, desc, desc->command);
 	set_bit(WDM_IN_USE, &desc->flags);
 	desc->outbuf = buf;
 
 	rv = usb_submit_urb(desc->command, GFP_KERNEL);
 	if (rv < 0) {
 		desc->outbuf = NULL;
+		printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
 		clear_bit(WDM_IN_USE, &desc->flags);
 		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
 		rv = usb_translate_errors(rv);
+		wake_up_all(&desc->wait);
 		goto out_free_mem_pm;
 	} else {
+		printk("%s : waiting for wdm_out_callback %px\n", __func__, desc);
 		dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n",
 			le16_to_cpu(req->wIndex));
 	}
@@ -587,6 +592,7 @@ static int wdm_flush(struct file *file, fl_owner_t id)
 {
 	struct wdm_device *desc = file->private_data;
 
+	printk("%s : start wait_event %px\n", __func__, desc);
 	wait_event(desc->wait,
 			/*
 			 * needs both flags. We cannot do with one
@@ -596,6 +602,7 @@ static int wdm_flush(struct file *file, fl_owner_t id)
 			 */
 			!test_bit(WDM_IN_USE, &desc->flags) ||
 			test_bit(WDM_DISCONNECTING, &desc->flags));
+	printk("%s : end wait_event %px\n", __func__, desc);
 
 	/* cannot dereference desc->intf if WDM_DISCONNECTING */
 	if (test_bit(WDM_DISCONNECTING, &desc->flags))
@@ -642,6 +649,7 @@ static int wdm_open(struct inode *inode, struct file *file)
 
 	mutex_lock(&wdm_mutex);
 	desc = wdm_find_device_by_minor(minor);
+	printk("%s : start %px\n", __func__, desc);
 	if (!desc)
 		goto out;
 
@@ -676,6 +684,7 @@ static int wdm_open(struct inode *inode, struct file *file)
 		desc->manage_power(intf, 1);
 	usb_autopm_put_interface(desc->intf);
 out:
+	printk("%s : done %px\n", __func__, desc);
 	mutex_unlock(&wdm_mutex);
 	return rv;
 }
@@ -685,6 +694,7 @@ static int wdm_release(struct inode *inode, struct file *file)
 	struct wdm_device *desc = file->private_data;
 
 	mutex_lock(&wdm_mutex);
+	printk("%s : start %px\n", __func__, desc);
 
 	/* using write lock to protect desc->count */
 	mutex_lock(&desc->wlock);
@@ -694,6 +704,7 @@ static int wdm_release(struct inode *inode, struct file *file)
 	if (!desc->count) {
 		if (!test_bit(WDM_DISCONNECTING, &desc->flags)) {
 			dev_dbg(&desc->intf->dev, "wdm_release: cleanup\n");
+			printk("%s : kill_urbs() %px\n", __func__, desc);
 			kill_urbs(desc);
 			spin_lock_irq(&desc->iuspin);
 			desc->resp_count = 0;
@@ -705,6 +716,7 @@ static int wdm_release(struct inode *inode, struct file *file)
 			cleanup(desc);
 		}
 	}
+	printk("%s : done %px\n", __func__, desc);
 	mutex_unlock(&wdm_mutex);
 	return 0;
 }
@@ -832,6 +844,7 @@ static int wdm_create(struct usb_interface *intf, struct usb_endpoint_descriptor
 	desc->command = usb_alloc_urb(0, GFP_KERNEL);
 	if (!desc->command)
 		goto err;
+	desc->command->trace_me = true;
 
 	desc->ubuf = kmalloc(desc->wMaxCommand, GFP_KERNEL);
 	if (!desc->ubuf)
@@ -979,15 +992,18 @@ static void wdm_disconnect(struct usb_interface *intf)
 	usb_deregister_dev(intf, &wdm_class);
 	desc = wdm_find_device(intf);
 	mutex_lock(&wdm_mutex);
+	printk("%s : start %px\n", __func__, desc);
 
 	/* the spinlock makes sure no new urbs are generated in the callbacks */
 	spin_lock_irqsave(&desc->iuspin, flags);
+	printk("%s : set WDM_DISCONNECTING %px\n", __func__, desc);
 	set_bit(WDM_DISCONNECTING, &desc->flags);
 	set_bit(WDM_READ, &desc->flags);
 	spin_unlock_irqrestore(&desc->iuspin, flags);
 	wake_up_all(&desc->wait);
 	mutex_lock(&desc->rlock);
 	mutex_lock(&desc->wlock);
+	printk("%s : kill_urbs() %px\n", __func__, desc);
 	kill_urbs(desc);
 	cancel_work_sync(&desc->rxwork);
 	cancel_work_sync(&desc->service_outs_intr);
@@ -1003,6 +1019,7 @@ static void wdm_disconnect(struct usb_interface *intf)
 		cleanup(desc);
 	else
 		dev_dbg(&intf->dev, "%d open files - postponing cleanup\n", desc->count);
+	printk("%s : done %px\n", __func__, desc);
 	mutex_unlock(&wdm_mutex);
 }
 
@@ -1031,6 +1048,7 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message)
 		set_bit(WDM_SUSPENDING, &desc->flags);
 		spin_unlock_irq(&desc->iuspin);
 		/* callback submits work - order is essential */
+		printk("%s : kill_urbs() %px\n", __func__, desc);
 		kill_urbs(desc);
 		cancel_work_sync(&desc->rxwork);
 		cancel_work_sync(&desc->service_outs_intr);
@@ -1082,15 +1100,19 @@ static int wdm_pre_reset(struct usb_interface *intf)
 	 * must be done before recovery lest a spontaneous
 	 * message from the device is lost
 	 */
+	printk("%s : start\n", __func__);
 	spin_lock_irq(&desc->iuspin);
+	printk("%s : set WDM_RESETTING %px\n", __func__, desc);
 	set_bit(WDM_RESETTING, &desc->flags);	/* inform read/write */
 	set_bit(WDM_READ, &desc->flags);	/* unblock read */
+	printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
 	clear_bit(WDM_IN_USE, &desc->flags);	/* unblock write */
 	desc->rerr = -EINTR;
 	spin_unlock_irq(&desc->iuspin);
 	wake_up_all(&desc->wait);
 	mutex_lock(&desc->rlock);
 	mutex_lock(&desc->wlock);
+	printk("%s : kill_urbs() %px\n", __func__, desc);
 	kill_urbs(desc);
 	cancel_work_sync(&desc->rxwork);
 	cancel_work_sync(&desc->service_outs_intr);
@@ -1103,10 +1125,12 @@ static int wdm_post_reset(struct usb_interface *intf)
 	int rv;
 
 	clear_bit(WDM_OVERFLOW, &desc->flags);
+	printk("%s : clear WDM_RESETTING %px\n", __func__, desc);
 	clear_bit(WDM_RESETTING, &desc->flags);
 	rv = recover_from_urb_loss(desc);
 	mutex_unlock(&desc->wlock);
 	mutex_unlock(&desc->rlock);
+	printk("%s : done\n", __func__);
 	return rv;
 }
 
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index aa45840d8273..ab3d3b65a6cf 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1521,6 +1521,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
 	int			status;
 	struct usb_hcd		*hcd = bus_to_hcd(urb->dev->bus);
 
+	if (urb->trace_me)
+		printk("%s : start %px\n", __func__, urb);
 	/* increment urb's reference count as part of giving it to the HCD
 	 * (which will control it).  HCD guarantees that it either returns
 	 * an error or calls giveback(), but not both.
@@ -1550,6 +1552,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
 	}
 
 	if (unlikely(status)) {
+		if (urb->trace_me)
+			printk("%s : error %px %d\n", __func__, urb, status);
 		usbmon_urb_submit_error(&hcd->self, urb, status);
 		urb->hcpriv = NULL;
 		INIT_LIST_HEAD(&urb->urb_list);
@@ -1559,6 +1563,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
 			wake_up(&usb_kill_urb_queue);
 		usb_put_urb(urb);
 	}
+	if (urb->trace_me)
+		printk("%s : end %px %d\n", __func__, urb, status);
 	return status;
 }
 
@@ -1582,6 +1588,8 @@ static int unlink1(struct usb_hcd *hcd, struct urb *urb, int status)
 		 * Such failures should be harmless. */
 		value = hcd->driver->urb_dequeue(hcd, urb, status);
 	}
+	if (urb->trace_me)
+		printk("%s : %px %d\n", __func__, urb, value);
 	return value;
 }
 
@@ -1645,6 +1653,8 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
 
 	/* pass ownership to the completion handler */
 	urb->status = status;
+	if (urb->trace_me)
+		printk("%s : complete %px %d\n", __func__, urb, status);
 	urb->complete(urb);
 
 	usb_anchor_resume_wakeups(anchor);
diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
index e01e366d89cd..9adcc978804e 100644
--- a/drivers/usb/gadget/legacy/raw_gadget.c
+++ b/drivers/usb/gadget/legacy/raw_gadget.c
@@ -33,6 +33,10 @@ MODULE_DESCRIPTION(DRIVER_DESC);
 MODULE_AUTHOR("Andrey Konovalov");
 MODULE_LICENSE("GPL");
 
+#undef dev_dbg
+#define dev_dbg(dev, fmt, ...)						\
+	dev_printk(KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__)
+
 /*----------------------------------------------------------------------*/
 
 #define RAW_EVENT_QUEUE_SIZE	16
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 9f3c721c70dc..0843e07f6cc2 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1589,6 +1589,7 @@ struct urb {
 	int error_count;		/* (return) number of ISO errors */
 	void *context;			/* (in) context for completion */
 	usb_complete_t complete;	/* (in) completion routine */
+	bool trace_me;
 	struct usb_iso_packet_descriptor iso_frame_desc[];
 					/* (in) ISO ONLY */
 };
----------------------------------------

----------------------------------------
[   25.056066][   T12] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef
[   25.056724][ T1175] udc-core: couldn't find an available UDC or it's busy
[   25.056730][ T1175] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.057711][ T1176] udc-core: couldn't find an available UDC or it's busy
[   25.057719][ T1176] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.058090][ T1171] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.058104][ T1178] udc-core: couldn't find an available UDC or it's busy
[   25.058233][ T1177] udc-core: couldn't find an available UDC or it's busy
[   25.058238][ T1177] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.058685][ T1179] udc-core: couldn't find an available UDC or it's busy
[   25.058692][ T1179] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.060237][   T12] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2
[   25.060250][ T1180] udc-core: couldn't find an available UDC or it's busy
[   25.060255][ T1180] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.060737][ T1181] udc-core: couldn't find an available UDC or it's busy
[   25.060741][ T1181] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.060836][ T1182] udc-core: couldn't find an available UDC or it's busy
[   25.060840][ T1182] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.061235][ T1183] udc-core: couldn't find an available UDC or it's busy
[   25.061241][ T1183] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.062374][ T1178] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.064298][   T12] usb 5-1: SerialNumber: syz
[   25.065151][ T1185] udc-core: couldn't find an available UDC or it's busy
[   25.065161][ T1184] udc-core: couldn't find an available UDC or it's busy
[   25.065168][ T1184] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.065191][ T1186] udc-core: couldn't find an available UDC or it's busy
[   25.065195][ T1186] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.065451][ T1187] udc-core: couldn't find an available UDC or it's busy
[   25.065457][ T1187] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.068979][ T1189] udc-core: couldn't find an available UDC or it's busy
[   25.070166][ T1185] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.072145][ T1191] udc-core: couldn't find an available UDC or it's busy
[   25.072151][ T1191] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.072166][ T1190] udc-core: couldn't find an available UDC or it's busy
[   25.072174][ T1190] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.072254][ T1189] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.073991][ T1193] udc-core: couldn't find an available UDC or it's busy
[   25.073997][ T1193] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.074002][ T1192] udc-core: couldn't find an available UDC or it's busy
[   25.074006][ T1192] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.087896][ T1196] udc-core: couldn't find an available UDC or it's busy
[   25.087933][ T1195] udc-core: couldn't find an available UDC or it's busy
[   25.087941][ T1195] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.088229][ T1197] udc-core: couldn't find an available UDC or it's busy
[   25.088235][ T1197] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.088239][ T1194] udc-core: couldn't find an available UDC or it's busy
[   25.088244][ T1194] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.090788][ T1198] udc-core: couldn't find an available UDC or it's busy
[   25.092032][ T1196] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.093018][ T1199] udc-core: couldn't find an available UDC or it's busy
[   25.093023][ T1200] udc-core: couldn't find an available UDC or it's busy
[   25.093027][ T1200] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.093886][ T1198] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.094586][ T1201] udc-core: couldn't find an available UDC or it's busy
[   25.095028][ T1202] udc-core: couldn't find an available UDC or it's busy
[   25.095032][ T1202] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.095403][ T1204] udc-core: couldn't find an available UDC or it's busy
[   25.095409][ T1204] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.095411][ T1203] udc-core: couldn't find an available UDC or it's busy
[   25.095416][ T1203] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.095896][ T1199] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.099620][ T1205] udc-core: couldn't find an available UDC or it's busy
[   25.100765][ T1201] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.102512][ T1205] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.104434][ T1207] udc-core: couldn't find an available UDC or it's busy
[   25.104439][ T1207] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.104483][ T1206] udc-core: couldn't find an available UDC or it's busy
[   25.104489][ T1206] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.104516][ T1208] udc-core: couldn't find an available UDC or it's busy
[   25.104521][ T1208] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.122543][   T12] hub 5-1:118.0: bad descriptor, ignoring hub
[   25.122597][ T1210] udc-core: couldn't find an available UDC or it's busy
[   25.122604][ T1210] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.122976][ T1211] udc-core: couldn't find an available UDC or it's busy
[   25.122983][ T1211] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.123591][ T1212] udc-core: couldn't find an available UDC or it's busy
[   25.123596][ T1212] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.123654][ T1209] udc-core: couldn't find an available UDC or it's busy
[   25.123659][ T1209] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.124649][ T1213] udc-core: couldn't find an available UDC or it's busy
[   25.124809][ T1214] udc-core: couldn't find an available UDC or it's busy
[   25.124814][ T1214] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.125168][ T1215] udc-core: couldn't find an available UDC or it's busy
[   25.125174][ T1215] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.126547][   T12] hub: probe of 5-1:118.0 failed with error -5
[   25.128767][ T1213] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.129656][ T1216] udc-core: couldn't find an available UDC or it's busy
[   25.129662][ T1216] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.129678][ T1217] udc-core: couldn't find an available UDC or it's busy
[   25.129684][ T1217] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.129968][ T1218] udc-core: couldn't find an available UDC or it's busy
[   25.129973][ T1218] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.132093][   T12] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device
[   25.132549][ T1219] udc-core: couldn't find an available UDC or it's busy
[   25.132554][ T1219] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.133379][ T1222] udc-core: couldn't find an available UDC or it's busy
[   25.133385][ T1222] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.133552][ T1223] udc-core: couldn't find an available UDC or it's busy
[   25.133708][ T1221] udc-core: couldn't find an available UDC or it's busy
[   25.133713][ T1221] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.135592][ T1224] udc-core: couldn't find an available UDC or it's busy
[   25.135598][ T1224] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.136158][   T12] wdm_suspend : kill_urbs() ffff88822dffd000
[   25.136578][ T1225] udc-core: couldn't find an available UDC or it's busy
[   25.136583][ T1225] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.136638][ T1226] udc-core: couldn't find an available UDC or it's busy
[   25.136642][ T1226] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.136928][ T1227] udc-core: couldn't find an available UDC or it's busy
[   25.136933][ T1227] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.138275][ T1228] udc-core: couldn't find an available UDC or it's busy
[   25.138281][ T1228] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.138293][ T1223] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.138821][ T1229] udc-core: couldn't find an available UDC or it's busy
[   25.138948][ T1230] udc-core: couldn't find an available UDC or it's busy
[   25.138952][ T1230] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.139091][ T1231] udc-core: couldn't find an available UDC or it's busy
[   25.139096][ T1231] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.141700][ T1232] udc-core: couldn't find an available UDC or it's busy
[   25.142014][ T1233] udc-core: couldn't find an available UDC or it's busy
[   25.142020][ T1233] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.142097][ T1233] wdm_open : start ffff88822dffd000
[   25.142251][ T1234] udc-core: couldn't find an available UDC or it's busy
[   25.142258][ T1234] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.142285][ T1235] udc-core: couldn't find an available UDC or it's busy
[   25.142289][ T1235] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.142372][ T1229] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.144150][ T1232] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[   25.319563][ T1233] wdm_open : done ffff88822dffd000
[   25.732316][ T1235] wdm_open : start ffff88822dffd000
[   25.732389][ T1233] wdm_write : set WDM_IN_USE ffff88822dffd000 ffff888234392f00
[   25.736342][ T1233] usb_hcd_submit_urb : start ffff888234392f00
[   25.738112][ T1233] usb_hcd_submit_urb : end ffff888234392f00 0
[   25.739989][ T1233] wdm_write : waiting for wdm_out_callback ffff88822dffd000
[   25.742837][ T1233] wdm_flush : start wait_event ffff88822dffd000
[   25.742948][ T1235] wdm_open : done ffff88822dffd000
[   25.747707][ T1234] wdm_open : start ffff88822dffd000
[   30.141494][ T1234] wdm_open : done ffff88822dffd000
[   30.141711][ T1235] wdm_flush : start wait_event ffff88822dffd000
[   30.143693][ T1232] wdm_open : start ffff88822dffd000
[   30.143916][ T1234] wdm_flush : start wait_event ffff88822dffd000
[   30.151409][ T1232] wdm_open : done ffff88822dffd000
[   30.153610][  T841] wdm_open : start ffff88822dffd000
[   30.153788][ T1232] wdm_flush : start wait_event ffff88822dffd000
[   30.155816][  T841] wdm_open : done ffff88822dffd000
[   30.159953][ T1229] wdm_open : start ffff88822dffd000
[   30.160146][  T841] wdm_flush : start wait_event ffff88822dffd000
[   30.162005][ T1229] wdm_open : done ffff88822dffd000
[   30.165691][ T1229] wdm_flush : start wait_event ffff88822dffd000
[   63.298753][    C3] random: crng init done
[   63.299926][    C3] random: 7 urandom warning(s) missed due to ratelimiting
[  168.509958][   T52] INFO: task a.out:1233 blocked for more than 141 seconds.
[  168.516579][   T52]       Not tainted 5.7.0-rc5+ #13
[  168.520995][   T52] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  168.528400][   T52] a.out           D13024  1233    838 0x00000004
[  168.532156][   T52] Call Trace:
[  168.533970][   T52]  __schedule+0x363/0x840
[  168.535991][   T52]  ? wdm_suspend+0x160/0x160
[  168.538199][   T52]  schedule+0x5a/0xd0
[  168.539782][   T52]  wdm_flush+0x124/0x190
[  168.541542][   T52]  ? remove_wait_queue+0x60/0x60
[  168.543503][   T52]  filp_close+0x3d/0xa0
[  168.544874][   T52]  __close_fd+0x8f/0x110
[  168.546267][   T52]  __x64_sys_close+0x2a/0x60
[  168.547843][   T52]  do_syscall_64+0x58/0x270
[  168.549214][   T52]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  168.550889][   T52] RIP: 0033:0x7f54551f4f30
[  168.552150][   T52] Code: Bad RIP value.
[  168.553323][   T52] RSP: 002b:00007ffe39c17878 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  168.555439][   T52] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f54551f4f30
[  168.558429][   T52] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
[  168.561016][   T52] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffe39c178ae
[  168.563155][   T52] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[  168.565136][   T52] R13: 0000000000005fb3 R14: 0000000000000000 R15: 0000000000000000
[  168.567108][   T52] 
[  168.567108][   T52] Showing all locks held in the system:
[  168.569123][   T52] 1 lock held by khungtaskd/52:
[  168.570400][   T52]  #0: ffffffff83a6baa0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x1b0
[  168.573387][   T52] 1 lock held by in:imklog/465:
[  168.574597][   T52]  #0: ffff88822d28fcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x55/0x60
[  168.576825][   T52] 
[  168.577408][   T52] =============================================
[  168.577408][   T52] 
[  168.579559][   T52] NMI backtrace for cpu 1
[  168.580819][   T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13
[  168.582916][   T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[  168.585746][   T52] Call Trace:
[  168.586565][   T52]  dump_stack+0x97/0xde
[  168.588235][   T52]  ? irq_force_complete_move+0x61/0x80
[  168.590020][   T52]  nmi_cpu_backtrace+0xe2/0xf0
[  168.591382][   T52]  ? lapic_can_unplug_cpu+0x90/0x90
[  168.592679][   T52]  nmi_trigger_cpumask_backtrace+0x124/0x170
[  168.594338][   T52]  watchdog+0x4e4/0x950
[  168.595543][   T52]  ? hungtask_pm_notify+0x70/0x70
[  168.596790][   T52]  kthread+0x128/0x170
[  168.597801][   T52]  ? kthread_create_worker_on_cpu+0x90/0x90
[  168.599257][   T52]  ret_from_fork+0x24/0x30
[  168.600448][   T52] Sending NMI from CPU 1 to CPUs 0,2-7:
[  168.602261][    C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x25/0x180
[  168.602268][    C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x25/0x180
[  168.602272][    C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x25/0x180
[  168.602277][    C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x25/0x180
[  168.602294][    C7] NMI backtrace for cpu 7 skipped: idling at default_idle+0x25/0x180
[  168.602299][    C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x25/0x180
[  168.602308][    C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x25/0x180
[  168.603243][   T52] Kernel panic - not syncing: hung_task: blocked tasks
[  168.623795][   T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13
[  168.625777][   T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[  168.629089][   T52] Call Trace:
[  168.630281][   T52]  dump_stack+0x97/0xde
[  168.631495][   T52]  panic+0x135/0x3d5
[  168.632577][   T52]  ? watchdog+0x4e4/0x950
[  168.633730][   T52]  watchdog+0x4f0/0x950
[  168.634829][   T52]  ? hungtask_pm_notify+0x70/0x70
[  168.636160][   T52]  kthread+0x128/0x170
[  168.637238][   T52]  ? kthread_create_worker_on_cpu+0x90/0x90
[  168.638792][   T52]  ret_from_fork+0x24/0x30
[  168.640142][   T52] Kernel Offset: disabled
[  168.663256][   T52] Rebooting in 86400 seconds..
----------------------------------------
Andrey Konovalov May 28, 2020, 3:18 p.m. UTC | #10
On Wed, May 27, 2020 at 6:47 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/05/25 22:32, Tetsuo Handa wrote:
> > On 2020/05/25 21:06, Oliver Neukum wrote:
> >> Am Freitag, den 22.05.2020, 17:26 +0900 schrieb Tetsuo Handa:
> >>> On 2020/05/22 17:04, Oliver Neukum wrote:
> >>>> May I ask you to redo the patch with comments added stating
> >>>> that the wake up is done for the sake of wdm_flush(), change
> >>>> the description and add the link to syzkaller?
> >>>
> >>> You can take over this patch. syzbot tried this patch on 2020/02/11 01:23 at
> >>> https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186 ,
> >>> but this patch did not solve the problem syzbot has found. Thus, I don't add
> >>> a link to syzkaller...
> >>
> >> Hi,
> >>
> >> this is odd. I looked at the driver and it looks to me like
> >> pre/post_reset() is the only other place that touches WDM_IN_USE.
> >> And it does so correctly. Any idea what could be wrong?
> >
> > I haven't understood the root cause of this problem. You were in the thread at
> > https://groups.google.com/forum/#!msg/syzkaller-bugs/q3QMjt_t83c/GqP58GaTAQAJ .
> > My patch was made from that thread (which simply makes sure to call wake_up_all()
> > when clearing WDM_IN_USE bit) apart from the root cause of this problem.
> >
>
> I might have found what is wrong.
>
> My understanding is that a process using /dev/raw-gadget is responsible for
> reacting to every USB request. I don't know whether /dev/raw-gadget already
> provides callback for aborting the in-flight USB requests (in order to resume
> wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or
> implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget
> already provides such callback in the following paragraphs.

raw-gadget should kill all unfishished USB requests when the file is closed.

>
> Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and
> /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into
> deadlock condition due to the need to close both files when the reproducer is
> killed. My guess is that since that process is stuck at wdm_flush() (due to
> explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL),
> that process cannot react to USB requests which are needed for resuming wdm_flush().
> Unexpectedly blocking a process which is responsible for reacting to USB requests
> will look as if it is a broken hardware.

Hm, so wdm_flush() is unable to finish unless an expected USB request
is received from the device? This is a bug in the wdm driver then.
Should we use wait_event_interruptible() instead of wait_event() in
wdm_flush()?

>
> A debug printk() patch (shown below) says that the desc->command urb issued by
> wdm_write() never gets wdm_out_callback(). I guess that it is an evidence of
> such dependency exists, and I think that a process had better avoid opening both
> /dev/raw-gadget and /dev/cdc-wdm0 at the same time.
>
> ----------------------------------------
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 5b0bd8581fe6..e352b54dbbf2 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -320,12 +320,19 @@ EXPORT_SYMBOL_GPL(unwind_get_return_address);
>
>  unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
>  {
> +       struct task_struct *task = state->task;
> +
>         if (unwind_done(state))
>                 return NULL;
>
>         if (state->regs)
>                 return &state->regs->ip;
>
> +       if (task != current && state->sp == task->thread.sp) {
> +               struct inactive_task_frame *frame = (void *)task->thread.sp;
> +               return &frame->ret_addr;
> +       }
> +
>         if (state->sp)
>                 return (unsigned long *)state->sp - 1;
>
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index e3db6fbeadef..5d28a1a2521d 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -150,8 +150,9 @@ static void wdm_out_callback(struct urb *urb)
>         spin_unlock_irqrestore(&desc->iuspin, flags);
>         kfree(desc->outbuf);
>         desc->outbuf = NULL;
> +       printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
>         clear_bit(WDM_IN_USE, &desc->flags);
> -       wake_up(&desc->wait);
> +       wake_up_all(&desc->wait);
>  }
>
>  static void wdm_in_callback(struct urb *urb)
> @@ -417,17 +418,21 @@ static ssize_t wdm_write
>         req->wValue = 0;
>         req->wIndex = desc->inum; /* already converted */
>         req->wLength = cpu_to_le16(count);
> +       printk("%s : set WDM_IN_USE %px %px\n", __func__, desc, desc->command);
>         set_bit(WDM_IN_USE, &desc->flags);
>         desc->outbuf = buf;
>
>         rv = usb_submit_urb(desc->command, GFP_KERNEL);
>         if (rv < 0) {
>                 desc->outbuf = NULL;
> +               printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
>                 clear_bit(WDM_IN_USE, &desc->flags);
>                 dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
>                 rv = usb_translate_errors(rv);
> +               wake_up_all(&desc->wait);
>                 goto out_free_mem_pm;
>         } else {
> +               printk("%s : waiting for wdm_out_callback %px\n", __func__, desc);
>                 dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n",
>                         le16_to_cpu(req->wIndex));
>         }
> @@ -587,6 +592,7 @@ static int wdm_flush(struct file *file, fl_owner_t id)
>  {
>         struct wdm_device *desc = file->private_data;
>
> +       printk("%s : start wait_event %px\n", __func__, desc);
>         wait_event(desc->wait,
>                         /*
>                          * needs both flags. We cannot do with one
> @@ -596,6 +602,7 @@ static int wdm_flush(struct file *file, fl_owner_t id)
>                          */
>                         !test_bit(WDM_IN_USE, &desc->flags) ||
>                         test_bit(WDM_DISCONNECTING, &desc->flags));
> +       printk("%s : end wait_event %px\n", __func__, desc);
>
>         /* cannot dereference desc->intf if WDM_DISCONNECTING */
>         if (test_bit(WDM_DISCONNECTING, &desc->flags))
> @@ -642,6 +649,7 @@ static int wdm_open(struct inode *inode, struct file *file)
>
>         mutex_lock(&wdm_mutex);
>         desc = wdm_find_device_by_minor(minor);
> +       printk("%s : start %px\n", __func__, desc);
>         if (!desc)
>                 goto out;
>
> @@ -676,6 +684,7 @@ static int wdm_open(struct inode *inode, struct file *file)
>                 desc->manage_power(intf, 1);
>         usb_autopm_put_interface(desc->intf);
>  out:
> +       printk("%s : done %px\n", __func__, desc);
>         mutex_unlock(&wdm_mutex);
>         return rv;
>  }
> @@ -685,6 +694,7 @@ static int wdm_release(struct inode *inode, struct file *file)
>         struct wdm_device *desc = file->private_data;
>
>         mutex_lock(&wdm_mutex);
> +       printk("%s : start %px\n", __func__, desc);
>
>         /* using write lock to protect desc->count */
>         mutex_lock(&desc->wlock);
> @@ -694,6 +704,7 @@ static int wdm_release(struct inode *inode, struct file *file)
>         if (!desc->count) {
>                 if (!test_bit(WDM_DISCONNECTING, &desc->flags)) {
>                         dev_dbg(&desc->intf->dev, "wdm_release: cleanup\n");
> +                       printk("%s : kill_urbs() %px\n", __func__, desc);
>                         kill_urbs(desc);
>                         spin_lock_irq(&desc->iuspin);
>                         desc->resp_count = 0;
> @@ -705,6 +716,7 @@ static int wdm_release(struct inode *inode, struct file *file)
>                         cleanup(desc);
>                 }
>         }
> +       printk("%s : done %px\n", __func__, desc);
>         mutex_unlock(&wdm_mutex);
>         return 0;
>  }
> @@ -832,6 +844,7 @@ static int wdm_create(struct usb_interface *intf, struct usb_endpoint_descriptor
>         desc->command = usb_alloc_urb(0, GFP_KERNEL);
>         if (!desc->command)
>                 goto err;
> +       desc->command->trace_me = true;
>
>         desc->ubuf = kmalloc(desc->wMaxCommand, GFP_KERNEL);
>         if (!desc->ubuf)
> @@ -979,15 +992,18 @@ static void wdm_disconnect(struct usb_interface *intf)
>         usb_deregister_dev(intf, &wdm_class);
>         desc = wdm_find_device(intf);
>         mutex_lock(&wdm_mutex);
> +       printk("%s : start %px\n", __func__, desc);
>
>         /* the spinlock makes sure no new urbs are generated in the callbacks */
>         spin_lock_irqsave(&desc->iuspin, flags);
> +       printk("%s : set WDM_DISCONNECTING %px\n", __func__, desc);
>         set_bit(WDM_DISCONNECTING, &desc->flags);
>         set_bit(WDM_READ, &desc->flags);
>         spin_unlock_irqrestore(&desc->iuspin, flags);
>         wake_up_all(&desc->wait);
>         mutex_lock(&desc->rlock);
>         mutex_lock(&desc->wlock);
> +       printk("%s : kill_urbs() %px\n", __func__, desc);
>         kill_urbs(desc);
>         cancel_work_sync(&desc->rxwork);
>         cancel_work_sync(&desc->service_outs_intr);
> @@ -1003,6 +1019,7 @@ static void wdm_disconnect(struct usb_interface *intf)
>                 cleanup(desc);
>         else
>                 dev_dbg(&intf->dev, "%d open files - postponing cleanup\n", desc->count);
> +       printk("%s : done %px\n", __func__, desc);
>         mutex_unlock(&wdm_mutex);
>  }
>
> @@ -1031,6 +1048,7 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message)
>                 set_bit(WDM_SUSPENDING, &desc->flags);
>                 spin_unlock_irq(&desc->iuspin);
>                 /* callback submits work - order is essential */
> +               printk("%s : kill_urbs() %px\n", __func__, desc);
>                 kill_urbs(desc);
>                 cancel_work_sync(&desc->rxwork);
>                 cancel_work_sync(&desc->service_outs_intr);
> @@ -1082,15 +1100,19 @@ static int wdm_pre_reset(struct usb_interface *intf)
>          * must be done before recovery lest a spontaneous
>          * message from the device is lost
>          */
> +       printk("%s : start\n", __func__);
>         spin_lock_irq(&desc->iuspin);
> +       printk("%s : set WDM_RESETTING %px\n", __func__, desc);
>         set_bit(WDM_RESETTING, &desc->flags);   /* inform read/write */
>         set_bit(WDM_READ, &desc->flags);        /* unblock read */
> +       printk("%s : clear WDM_IN_USE %px\n", __func__, desc);
>         clear_bit(WDM_IN_USE, &desc->flags);    /* unblock write */
>         desc->rerr = -EINTR;
>         spin_unlock_irq(&desc->iuspin);
>         wake_up_all(&desc->wait);
>         mutex_lock(&desc->rlock);
>         mutex_lock(&desc->wlock);
> +       printk("%s : kill_urbs() %px\n", __func__, desc);
>         kill_urbs(desc);
>         cancel_work_sync(&desc->rxwork);
>         cancel_work_sync(&desc->service_outs_intr);
> @@ -1103,10 +1125,12 @@ static int wdm_post_reset(struct usb_interface *intf)
>         int rv;
>
>         clear_bit(WDM_OVERFLOW, &desc->flags);
> +       printk("%s : clear WDM_RESETTING %px\n", __func__, desc);
>         clear_bit(WDM_RESETTING, &desc->flags);
>         rv = recover_from_urb_loss(desc);
>         mutex_unlock(&desc->wlock);
>         mutex_unlock(&desc->rlock);
> +       printk("%s : done\n", __func__);
>         return rv;
>  }
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index aa45840d8273..ab3d3b65a6cf 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1521,6 +1521,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
>         int                     status;
>         struct usb_hcd          *hcd = bus_to_hcd(urb->dev->bus);
>
> +       if (urb->trace_me)
> +               printk("%s : start %px\n", __func__, urb);
>         /* increment urb's reference count as part of giving it to the HCD
>          * (which will control it).  HCD guarantees that it either returns
>          * an error or calls giveback(), but not both.
> @@ -1550,6 +1552,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
>         }
>
>         if (unlikely(status)) {
> +               if (urb->trace_me)
> +                       printk("%s : error %px %d\n", __func__, urb, status);
>                 usbmon_urb_submit_error(&hcd->self, urb, status);
>                 urb->hcpriv = NULL;
>                 INIT_LIST_HEAD(&urb->urb_list);
> @@ -1559,6 +1563,8 @@ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags)
>                         wake_up(&usb_kill_urb_queue);
>                 usb_put_urb(urb);
>         }
> +       if (urb->trace_me)
> +               printk("%s : end %px %d\n", __func__, urb, status);
>         return status;
>  }
>
> @@ -1582,6 +1588,8 @@ static int unlink1(struct usb_hcd *hcd, struct urb *urb, int status)
>                  * Such failures should be harmless. */
>                 value = hcd->driver->urb_dequeue(hcd, urb, status);
>         }
> +       if (urb->trace_me)
> +               printk("%s : %px %d\n", __func__, urb, value);
>         return value;
>  }
>
> @@ -1645,6 +1653,8 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
>
>         /* pass ownership to the completion handler */
>         urb->status = status;
> +       if (urb->trace_me)
> +               printk("%s : complete %px %d\n", __func__, urb, status);
>         urb->complete(urb);
>
>         usb_anchor_resume_wakeups(anchor);
> diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
> index e01e366d89cd..9adcc978804e 100644
> --- a/drivers/usb/gadget/legacy/raw_gadget.c
> +++ b/drivers/usb/gadget/legacy/raw_gadget.c
> @@ -33,6 +33,10 @@ MODULE_DESCRIPTION(DRIVER_DESC);
>  MODULE_AUTHOR("Andrey Konovalov");
>  MODULE_LICENSE("GPL");
>
> +#undef dev_dbg
> +#define dev_dbg(dev, fmt, ...)                                         \
> +       dev_printk(KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__)
> +
>  /*----------------------------------------------------------------------*/
>
>  #define RAW_EVENT_QUEUE_SIZE   16
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index 9f3c721c70dc..0843e07f6cc2 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -1589,6 +1589,7 @@ struct urb {
>         int error_count;                /* (return) number of ISO errors */
>         void *context;                  /* (in) context for completion */
>         usb_complete_t complete;        /* (in) completion routine */
> +       bool trace_me;
>         struct usb_iso_packet_descriptor iso_frame_desc[];
>                                         /* (in) ISO ONLY */
>  };
> ----------------------------------------
>
> ----------------------------------------
> [   25.056066][   T12] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef
> [   25.056724][ T1175] udc-core: couldn't find an available UDC or it's busy
> [   25.056730][ T1175] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.057711][ T1176] udc-core: couldn't find an available UDC or it's busy
> [   25.057719][ T1176] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.058090][ T1171] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.058104][ T1178] udc-core: couldn't find an available UDC or it's busy
> [   25.058233][ T1177] udc-core: couldn't find an available UDC or it's busy
> [   25.058238][ T1177] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.058685][ T1179] udc-core: couldn't find an available UDC or it's busy
> [   25.058692][ T1179] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.060237][   T12] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2
> [   25.060250][ T1180] udc-core: couldn't find an available UDC or it's busy
> [   25.060255][ T1180] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.060737][ T1181] udc-core: couldn't find an available UDC or it's busy
> [   25.060741][ T1181] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.060836][ T1182] udc-core: couldn't find an available UDC or it's busy
> [   25.060840][ T1182] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.061235][ T1183] udc-core: couldn't find an available UDC or it's busy
> [   25.061241][ T1183] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.062374][ T1178] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.064298][   T12] usb 5-1: SerialNumber: syz
> [   25.065151][ T1185] udc-core: couldn't find an available UDC or it's busy
> [   25.065161][ T1184] udc-core: couldn't find an available UDC or it's busy
> [   25.065168][ T1184] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.065191][ T1186] udc-core: couldn't find an available UDC or it's busy
> [   25.065195][ T1186] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.065451][ T1187] udc-core: couldn't find an available UDC or it's busy
> [   25.065457][ T1187] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.068979][ T1189] udc-core: couldn't find an available UDC or it's busy
> [   25.070166][ T1185] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.072145][ T1191] udc-core: couldn't find an available UDC or it's busy
> [   25.072151][ T1191] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.072166][ T1190] udc-core: couldn't find an available UDC or it's busy
> [   25.072174][ T1190] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.072254][ T1189] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.073991][ T1193] udc-core: couldn't find an available UDC or it's busy
> [   25.073997][ T1193] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.074002][ T1192] udc-core: couldn't find an available UDC or it's busy
> [   25.074006][ T1192] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.087896][ T1196] udc-core: couldn't find an available UDC or it's busy
> [   25.087933][ T1195] udc-core: couldn't find an available UDC or it's busy
> [   25.087941][ T1195] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.088229][ T1197] udc-core: couldn't find an available UDC or it's busy
> [   25.088235][ T1197] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.088239][ T1194] udc-core: couldn't find an available UDC or it's busy
> [   25.088244][ T1194] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.090788][ T1198] udc-core: couldn't find an available UDC or it's busy
> [   25.092032][ T1196] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.093018][ T1199] udc-core: couldn't find an available UDC or it's busy
> [   25.093023][ T1200] udc-core: couldn't find an available UDC or it's busy
> [   25.093027][ T1200] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.093886][ T1198] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.094586][ T1201] udc-core: couldn't find an available UDC or it's busy
> [   25.095028][ T1202] udc-core: couldn't find an available UDC or it's busy
> [   25.095032][ T1202] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.095403][ T1204] udc-core: couldn't find an available UDC or it's busy
> [   25.095409][ T1204] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.095411][ T1203] udc-core: couldn't find an available UDC or it's busy
> [   25.095416][ T1203] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.095896][ T1199] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.099620][ T1205] udc-core: couldn't find an available UDC or it's busy
> [   25.100765][ T1201] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.102512][ T1205] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.104434][ T1207] udc-core: couldn't find an available UDC or it's busy
> [   25.104439][ T1207] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.104483][ T1206] udc-core: couldn't find an available UDC or it's busy
> [   25.104489][ T1206] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.104516][ T1208] udc-core: couldn't find an available UDC or it's busy
> [   25.104521][ T1208] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.122543][   T12] hub 5-1:118.0: bad descriptor, ignoring hub
> [   25.122597][ T1210] udc-core: couldn't find an available UDC or it's busy
> [   25.122604][ T1210] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.122976][ T1211] udc-core: couldn't find an available UDC or it's busy
> [   25.122983][ T1211] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.123591][ T1212] udc-core: couldn't find an available UDC or it's busy
> [   25.123596][ T1212] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.123654][ T1209] udc-core: couldn't find an available UDC or it's busy
> [   25.123659][ T1209] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.124649][ T1213] udc-core: couldn't find an available UDC or it's busy
> [   25.124809][ T1214] udc-core: couldn't find an available UDC or it's busy
> [   25.124814][ T1214] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.125168][ T1215] udc-core: couldn't find an available UDC or it's busy
> [   25.125174][ T1215] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.126547][   T12] hub: probe of 5-1:118.0 failed with error -5
> [   25.128767][ T1213] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.129656][ T1216] udc-core: couldn't find an available UDC or it's busy
> [   25.129662][ T1216] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.129678][ T1217] udc-core: couldn't find an available UDC or it's busy
> [   25.129684][ T1217] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.129968][ T1218] udc-core: couldn't find an available UDC or it's busy
> [   25.129973][ T1218] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.132093][   T12] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device
> [   25.132549][ T1219] udc-core: couldn't find an available UDC or it's busy
> [   25.132554][ T1219] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.133379][ T1222] udc-core: couldn't find an available UDC or it's busy
> [   25.133385][ T1222] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.133552][ T1223] udc-core: couldn't find an available UDC or it's busy
> [   25.133708][ T1221] udc-core: couldn't find an available UDC or it's busy
> [   25.133713][ T1221] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.135592][ T1224] udc-core: couldn't find an available UDC or it's busy
> [   25.135598][ T1224] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.136158][   T12] wdm_suspend : kill_urbs() ffff88822dffd000
> [   25.136578][ T1225] udc-core: couldn't find an available UDC or it's busy
> [   25.136583][ T1225] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.136638][ T1226] udc-core: couldn't find an available UDC or it's busy
> [   25.136642][ T1226] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.136928][ T1227] udc-core: couldn't find an available UDC or it's busy
> [   25.136933][ T1227] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.138275][ T1228] udc-core: couldn't find an available UDC or it's busy
> [   25.138281][ T1228] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.138293][ T1223] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.138821][ T1229] udc-core: couldn't find an available UDC or it's busy
> [   25.138948][ T1230] udc-core: couldn't find an available UDC or it's busy
> [   25.138952][ T1230] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.139091][ T1231] udc-core: couldn't find an available UDC or it's busy
> [   25.139096][ T1231] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.141700][ T1232] udc-core: couldn't find an available UDC or it's busy
> [   25.142014][ T1233] udc-core: couldn't find an available UDC or it's busy
> [   25.142020][ T1233] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.142097][ T1233] wdm_open : start ffff88822dffd000
> [   25.142251][ T1234] udc-core: couldn't find an available UDC or it's busy
> [   25.142258][ T1234] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.142285][ T1235] udc-core: couldn't find an available UDC or it's busy
> [   25.142289][ T1235] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.142372][ T1229] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.144150][ T1232] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [   25.319563][ T1233] wdm_open : done ffff88822dffd000
> [   25.732316][ T1235] wdm_open : start ffff88822dffd000
> [   25.732389][ T1233] wdm_write : set WDM_IN_USE ffff88822dffd000 ffff888234392f00
> [   25.736342][ T1233] usb_hcd_submit_urb : start ffff888234392f00
> [   25.738112][ T1233] usb_hcd_submit_urb : end ffff888234392f00 0
> [   25.739989][ T1233] wdm_write : waiting for wdm_out_callback ffff88822dffd000
> [   25.742837][ T1233] wdm_flush : start wait_event ffff88822dffd000
> [   25.742948][ T1235] wdm_open : done ffff88822dffd000
> [   25.747707][ T1234] wdm_open : start ffff88822dffd000
> [   30.141494][ T1234] wdm_open : done ffff88822dffd000
> [   30.141711][ T1235] wdm_flush : start wait_event ffff88822dffd000
> [   30.143693][ T1232] wdm_open : start ffff88822dffd000
> [   30.143916][ T1234] wdm_flush : start wait_event ffff88822dffd000
> [   30.151409][ T1232] wdm_open : done ffff88822dffd000
> [   30.153610][  T841] wdm_open : start ffff88822dffd000
> [   30.153788][ T1232] wdm_flush : start wait_event ffff88822dffd000
> [   30.155816][  T841] wdm_open : done ffff88822dffd000
> [   30.159953][ T1229] wdm_open : start ffff88822dffd000
> [   30.160146][  T841] wdm_flush : start wait_event ffff88822dffd000
> [   30.162005][ T1229] wdm_open : done ffff88822dffd000
> [   30.165691][ T1229] wdm_flush : start wait_event ffff88822dffd000
> [   63.298753][    C3] random: crng init done
> [   63.299926][    C3] random: 7 urandom warning(s) missed due to ratelimiting
> [  168.509958][   T52] INFO: task a.out:1233 blocked for more than 141 seconds.
> [  168.516579][   T52]       Not tainted 5.7.0-rc5+ #13
> [  168.520995][   T52] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  168.528400][   T52] a.out           D13024  1233    838 0x00000004
> [  168.532156][   T52] Call Trace:
> [  168.533970][   T52]  __schedule+0x363/0x840
> [  168.535991][   T52]  ? wdm_suspend+0x160/0x160
> [  168.538199][   T52]  schedule+0x5a/0xd0
> [  168.539782][   T52]  wdm_flush+0x124/0x190
> [  168.541542][   T52]  ? remove_wait_queue+0x60/0x60
> [  168.543503][   T52]  filp_close+0x3d/0xa0
> [  168.544874][   T52]  __close_fd+0x8f/0x110
> [  168.546267][   T52]  __x64_sys_close+0x2a/0x60
> [  168.547843][   T52]  do_syscall_64+0x58/0x270
> [  168.549214][   T52]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [  168.550889][   T52] RIP: 0033:0x7f54551f4f30
> [  168.552150][   T52] Code: Bad RIP value.
> [  168.553323][   T52] RSP: 002b:00007ffe39c17878 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [  168.555439][   T52] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f54551f4f30
> [  168.558429][   T52] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> [  168.561016][   T52] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffe39c178ae
> [  168.563155][   T52] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> [  168.565136][   T52] R13: 0000000000005fb3 R14: 0000000000000000 R15: 0000000000000000
> [  168.567108][   T52]
> [  168.567108][   T52] Showing all locks held in the system:
> [  168.569123][   T52] 1 lock held by khungtaskd/52:
> [  168.570400][   T52]  #0: ffffffff83a6baa0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x1b0
> [  168.573387][   T52] 1 lock held by in:imklog/465:
> [  168.574597][   T52]  #0: ffff88822d28fcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x55/0x60
> [  168.576825][   T52]
> [  168.577408][   T52] =============================================
> [  168.577408][   T52]
> [  168.579559][   T52] NMI backtrace for cpu 1
> [  168.580819][   T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13
> [  168.582916][   T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
> [  168.585746][   T52] Call Trace:
> [  168.586565][   T52]  dump_stack+0x97/0xde
> [  168.588235][   T52]  ? irq_force_complete_move+0x61/0x80
> [  168.590020][   T52]  nmi_cpu_backtrace+0xe2/0xf0
> [  168.591382][   T52]  ? lapic_can_unplug_cpu+0x90/0x90
> [  168.592679][   T52]  nmi_trigger_cpumask_backtrace+0x124/0x170
> [  168.594338][   T52]  watchdog+0x4e4/0x950
> [  168.595543][   T52]  ? hungtask_pm_notify+0x70/0x70
> [  168.596790][   T52]  kthread+0x128/0x170
> [  168.597801][   T52]  ? kthread_create_worker_on_cpu+0x90/0x90
> [  168.599257][   T52]  ret_from_fork+0x24/0x30
> [  168.600448][   T52] Sending NMI from CPU 1 to CPUs 0,2-7:
> [  168.602261][    C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x25/0x180
> [  168.602268][    C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x25/0x180
> [  168.602272][    C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x25/0x180
> [  168.602277][    C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x25/0x180
> [  168.602294][    C7] NMI backtrace for cpu 7 skipped: idling at default_idle+0x25/0x180
> [  168.602299][    C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x25/0x180
> [  168.602308][    C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x25/0x180
> [  168.603243][   T52] Kernel panic - not syncing: hung_task: blocked tasks
> [  168.623795][   T52] CPU: 1 PID: 52 Comm: khungtaskd Not tainted 5.7.0-rc5+ #13
> [  168.625777][   T52] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
> [  168.629089][   T52] Call Trace:
> [  168.630281][   T52]  dump_stack+0x97/0xde
> [  168.631495][   T52]  panic+0x135/0x3d5
> [  168.632577][   T52]  ? watchdog+0x4e4/0x950
> [  168.633730][   T52]  watchdog+0x4f0/0x950
> [  168.634829][   T52]  ? hungtask_pm_notify+0x70/0x70
> [  168.636160][   T52]  kthread+0x128/0x170
> [  168.637238][   T52]  ? kthread_create_worker_on_cpu+0x90/0x90
> [  168.638792][   T52]  ret_from_fork+0x24/0x30
> [  168.640142][   T52] Kernel Offset: disabled
> [  168.663256][   T52] Rebooting in 86400 seconds..
> ----------------------------------------
Tetsuo Handa May 28, 2020, 4:03 p.m. UTC | #11
On 2020/05/29 0:18, Andrey Konovalov wrote:
>> I might have found what is wrong.
>>
>> My understanding is that a process using /dev/raw-gadget is responsible for
>> reacting to every USB request. I don't know whether /dev/raw-gadget already
>> provides callback for aborting the in-flight USB requests (in order to resume
>> wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or
>> implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget
>> already provides such callback in the following paragraphs.
> 
> raw-gadget should kill all unfishished USB requests when the file is closed.

I see. But

> 
>>
>> Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and
>> /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into
>> deadlock condition due to the need to close both files when the reproducer is
>> killed. My guess is that since that process is stuck at wdm_flush() (due to
>> explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL),
>> that process cannot react to USB requests which are needed for resuming wdm_flush().
>> Unexpectedly blocking a process which is responsible for reacting to USB requests
>> will look as if it is a broken hardware.
> 
> Hm, so wdm_flush() is unable to finish unless an expected USB request
> is received from the device? This is a bug in the wdm driver then.

this specific bug report is caused by being unable to close /dev/cdc-wdm0
due to /dev/raw-gadget API usage bug in the userspace process. In other words,
this bug report should be closed with "#syz invalid" like a bug report at
https://syzkaller.appspot.com/bug?id=287aa8708bc940d0ca1645223c53dd4c2d203be6
which unexpectedly did ioctl(FIFREEZE) without corresponding ioctl(FITHAW).

> Should we use wait_event_interruptible() instead of wait_event() in
> wdm_flush()?

That only shadows this kind of bug reports, by not using TASK_UNINTERRUPTIBLE.

The problem that the userspace process which is responsible for closing
/dev/raw-gadget gets stuck at wdm_flush() unless interrupted by a signal
when closing /dev/cdc-wdm0 is remaining. I think that a process should not
open /dev/raw-gadget and /dev/cdc-wdm0 at the same time.
Andrey Konovalov May 28, 2020, 7:03 p.m. UTC | #12
On Thu, May 28, 2020 at 6:03 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/05/29 0:18, Andrey Konovalov wrote:
> >> I might have found what is wrong.
> >>
> >> My understanding is that a process using /dev/raw-gadget is responsible for
> >> reacting to every USB request. I don't know whether /dev/raw-gadget already
> >> provides callback for aborting the in-flight USB requests (in order to resume
> >> wdm_flush()) when /dev/raw-gadget is closed (due to explicit close() syscall or
> >> implicit exit_files() from do_exit() upon SIGKILL). I assume /dev/raw-gadget
> >> already provides such callback in the following paragraphs.
> >
> > raw-gadget should kill all unfishished USB requests when the file is closed.
>
> I see. But
>
> >
> >>
> >> Since the reproducer is opening both /dev/raw-gadget (which is char-10-62) and
> >> /dev/cdc-wdm0 (which is char-180-0), it seems that the kernel is falling into
> >> deadlock condition due to the need to close both files when the reproducer is
> >> killed. My guess is that since that process is stuck at wdm_flush() (due to
> >> explicit close() syscall or implicit exit_files() from do_exit() upon SIGKILL),
> >> that process cannot react to USB requests which are needed for resuming wdm_flush().
> >> Unexpectedly blocking a process which is responsible for reacting to USB requests
> >> will look as if it is a broken hardware.
> >
> > Hm, so wdm_flush() is unable to finish unless an expected USB request
> > is received from the device? This is a bug in the wdm driver then.
>
> this specific bug report is caused by being unable to close /dev/cdc-wdm0
> due to /dev/raw-gadget API usage bug in the userspace process. In other words,
> this bug report should be closed with "#syz invalid" like a bug report at
> https://syzkaller.appspot.com/bug?id=287aa8708bc940d0ca1645223c53dd4c2d203be6
> which unexpectedly did ioctl(FIFREEZE) without corresponding ioctl(FITHAW).
>
> > Should we use wait_event_interruptible() instead of wait_event() in
> > wdm_flush()?
>
> That only shadows this kind of bug reports, by not using TASK_UNINTERRUPTIBLE.
>
> The problem that the userspace process which is responsible for closing
> /dev/raw-gadget gets stuck at wdm_flush() unless interrupted by a signal
> when closing /dev/cdc-wdm0 is remaining. I think that a process should not
> open /dev/raw-gadget and /dev/cdc-wdm0 at the same time.

Ah, so the problem is that when a process exits, it tries to close wdm
fd first, which ends up calling wdm_flush(), which can't finish
because the USB requests are not terminated before raw-gadget fd is
closed, which is supposed to happen after wdm fd is closed. Is this
correct? I wonder what will happen if a real device stays connected
and ignores wdm requests.

I don't understand though, how using wait_event_interruptible() will
shadow anything here.

Alan, Greg, is this acceptable behavior for a USB driver?
Alan Stern May 28, 2020, 7:40 p.m. UTC | #13
On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:

> Ah, so the problem is that when a process exits, it tries to close wdm
> fd first, which ends up calling wdm_flush(), which can't finish
> because the USB requests are not terminated before raw-gadget fd is
> closed, which is supposed to happen after wdm fd is closed. Is this
> correct? I wonder what will happen if a real device stays connected
> and ignores wdm requests.
> 
> I don't understand though, how using wait_event_interruptible() will
> shadow anything here.
> 
> Alan, Greg, is this acceptable behavior for a USB driver?

I don't understand what the problem is.  Can you explain in more general 
terms -- nothing specific to wdm or anything like that -- what you are 
concerned about?  Is this something that could happen to any gadget 
driver?  Or any USB class device driver?  Or does it only affect 
usespace components of raw-gadget drivers?

Alan Stern
Andrey Konovalov May 28, 2020, 7:51 p.m. UTC | #14
On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
>
> > Ah, so the problem is that when a process exits, it tries to close wdm
> > fd first, which ends up calling wdm_flush(), which can't finish
> > because the USB requests are not terminated before raw-gadget fd is
> > closed, which is supposed to happen after wdm fd is closed. Is this
> > correct? I wonder what will happen if a real device stays connected
> > and ignores wdm requests.
> >
> > I don't understand though, how using wait_event_interruptible() will
> > shadow anything here.
> >
> > Alan, Greg, is this acceptable behavior for a USB driver?
>
> I don't understand what the problem is.  Can you explain in more general
> terms -- nothing specific to wdm or anything like that -- what you are
> concerned about?  Is this something that could happen to any gadget
> driver?  Or any USB class device driver?  Or does it only affect
> usespace components of raw-gadget drivers?

So, AFAIU, we have a driver whose flush() callback blocks on
wait_event(), which can only terminate when either 1) the driver
receives a particular USB response from the device or 2) the device
disconnects. For 1) the emulated device doesn't provide required
responses. For 2) the problem is that the emulated via raw-gadget
device disconnects when the process is killed (and raw-gadget fd is
closed). But that process is the same process that is currently stuck
on wait_event() in the flush callback(), and therefore unkillable.

This can generally happen with any driver that goes into
uninterruptible sleep within one of its code paths reachable from
userspace that can only be unblocked by a particular behavior from the
USB device. But I haven't seen any such drivers so far, wdm is the
first.
Alan Stern May 28, 2020, 8:58 p.m. UTC | #15
On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
> >
> > > Ah, so the problem is that when a process exits, it tries to close wdm
> > > fd first, which ends up calling wdm_flush(), which can't finish
> > > because the USB requests are not terminated before raw-gadget fd is
> > > closed, which is supposed to happen after wdm fd is closed. Is this
> > > correct? I wonder what will happen if a real device stays connected
> > > and ignores wdm requests.
> > >
> > > I don't understand though, how using wait_event_interruptible() will
> > > shadow anything here.
> > >
> > > Alan, Greg, is this acceptable behavior for a USB driver?
> >
> > I don't understand what the problem is.  Can you explain in more general
> > terms -- nothing specific to wdm or anything like that -- what you are
> > concerned about?  Is this something that could happen to any gadget
> > driver?  Or any USB class device driver?  Or does it only affect
> > usespace components of raw-gadget drivers?
> 
> So, AFAIU, we have a driver whose flush() callback blocks on
> wait_event(), which can only terminate when either 1) the driver
> receives a particular USB response from the device or 2) the device
> disconnects.

This sounds like a bug in the driver.  What would it do if someone had a 
genuine (not emulated) but buggy USB device which didn't send the 
desired response?  The only way to unblock the driver would be to unplug 
the device!  That isn't acceptable behavior.

> For 1) the emulated device doesn't provide required
> responses. For 2) the problem is that the emulated via raw-gadget
> device disconnects when the process is killed (and raw-gadget fd is
> closed). But that process is the same process that is currently stuck
> on wait_event() in the flush callback(), and therefore unkillable.

What would happen if you unload dummy-hcd at this point?  Or even just 
do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is 
the bus number of the dummy-hcd bus?

> This can generally happen with any driver that goes into
> uninterruptible sleep within one of its code paths reachable from
> userspace that can only be unblocked by a particular behavior from the
> USB device. But I haven't seen any such drivers so far, wdm is the
> first.

Drivers should never go into uninterruptible sleep states unless they 
can guarantee that the duration will be bounded somehow (for example, by 
a reasonable timeout).  Or that cutting the sleep state short would 
cause the system to crash -- but that's not an issue here.

Alan Stern
Andrey Konovalov May 29, 2020, 8:41 p.m. UTC | #16
On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
> > On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
> > >
> > > > Ah, so the problem is that when a process exits, it tries to close wdm
> > > > fd first, which ends up calling wdm_flush(), which can't finish
> > > > because the USB requests are not terminated before raw-gadget fd is
> > > > closed, which is supposed to happen after wdm fd is closed. Is this
> > > > correct? I wonder what will happen if a real device stays connected
> > > > and ignores wdm requests.
> > > >
> > > > I don't understand though, how using wait_event_interruptible() will
> > > > shadow anything here.
> > > >
> > > > Alan, Greg, is this acceptable behavior for a USB driver?
> > >
> > > I don't understand what the problem is.  Can you explain in more general
> > > terms -- nothing specific to wdm or anything like that -- what you are
> > > concerned about?  Is this something that could happen to any gadget
> > > driver?  Or any USB class device driver?  Or does it only affect
> > > usespace components of raw-gadget drivers?
> >
> > So, AFAIU, we have a driver whose flush() callback blocks on
> > wait_event(), which can only terminate when either 1) the driver
> > receives a particular USB response from the device or 2) the device
> > disconnects.
>
> This sounds like a bug in the driver.  What would it do if someone had a
> genuine (not emulated) but buggy USB device which didn't send the
> desired response?  The only way to unblock the driver would be to unplug
> the device!  That isn't acceptable behavior.

OK, that's what I thought.

>
> > For 1) the emulated device doesn't provide required
> > responses. For 2) the problem is that the emulated via raw-gadget
> > device disconnects when the process is killed (and raw-gadget fd is
> > closed). But that process is the same process that is currently stuck
> > on wait_event() in the flush callback(), and therefore unkillable.
>
> What would happen if you unload dummy-hcd at this point?  Or even just
> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is
> the bus number of the dummy-hcd bus?

The device disconnects and flush() unblocks.

> > This can generally happen with any driver that goes into
> > uninterruptible sleep within one of its code paths reachable from
> > userspace that can only be unblocked by a particular behavior from the
> > USB device. But I haven't seen any such drivers so far, wdm is the
> > first.
>
> Drivers should never go into uninterruptible sleep states unless they
> can guarantee that the duration will be bounded somehow (for example, by
> a reasonable timeout).  Or that cutting the sleep state short would
> cause the system to crash -- but that's not an issue here.

OK, thank you, Alan!

Tetsuo, could you clarify why you think that using
wait_event_interruptible() is a bad fix here?
Tetsuo Handa May 30, 2020, 12:42 a.m. UTC | #17
On 2020/05/30 5:41, Andrey Konovalov wrote:
> On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>>
>> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
>>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>>>>
>>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
>>>>
>>>>> Ah, so the problem is that when a process exits, it tries to close wdm
>>>>> fd first, which ends up calling wdm_flush(), which can't finish
>>>>> because the USB requests are not terminated before raw-gadget fd is
>>>>> closed, which is supposed to happen after wdm fd is closed. Is this
>>>>> correct? I wonder what will happen if a real device stays connected
>>>>> and ignores wdm requests.
>>>>>
>>>>> I don't understand though, how using wait_event_interruptible() will
>>>>> shadow anything here.
>>>>>
>>>>> Alan, Greg, is this acceptable behavior for a USB driver?
>>>>
>>>> I don't understand what the problem is.  Can you explain in more general
>>>> terms -- nothing specific to wdm or anything like that -- what you are
>>>> concerned about?  Is this something that could happen to any gadget
>>>> driver?  Or any USB class device driver?  Or does it only affect
>>>> usespace components of raw-gadget drivers?
>>>
>>> So, AFAIU, we have a driver whose flush() callback blocks on
>>> wait_event(), which can only terminate when either 1) the driver
>>> receives a particular USB response from the device or 2) the device
>>> disconnects.
>>
>> This sounds like a bug in the driver.  What would it do if someone had a
>> genuine (not emulated) but buggy USB device which didn't send the
>> desired response?  The only way to unblock the driver would be to unplug
>> the device!  That isn't acceptable behavior.
> 
> OK, that's what I thought.

I believe that this is not a bug in the driver but a problem of hardware
failure. Unless this is high-availability code which is designed for safely
failing over to other node, we don't need to care about hardware failure.

> 
>>
>>> For 1) the emulated device doesn't provide required
>>> responses. For 2) the problem is that the emulated via raw-gadget
>>> device disconnects when the process is killed (and raw-gadget fd is
>>> closed). But that process is the same process that is currently stuck
>>> on wait_event() in the flush callback(), and therefore unkillable.
>>
>> What would happen if you unload dummy-hcd at this point?  Or even just
>> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is
>> the bus number of the dummy-hcd bus?
> 
> The device disconnects and flush() unblocks.
> 
>>> This can generally happen with any driver that goes into
>>> uninterruptible sleep within one of its code paths reachable from
>>> userspace that can only be unblocked by a particular behavior from the
>>> USB device. But I haven't seen any such drivers so far, wdm is the
>>> first.
>>
>> Drivers should never go into uninterruptible sleep states unless they
>> can guarantee that the duration will be bounded somehow (for example, by
>> a reasonable timeout).  Or that cutting the sleep state short would
>> cause the system to crash -- but that's not an issue here.
> 
> OK, thank you, Alan!
> 
> Tetsuo, could you clarify why you think that using
> wait_event_interruptible() is a bad fix here?
> 

If wait_event() in wdm_flush() were using timeout or interruptible, can the
wdm driver handle that case safely? Since cleanup(desc) from wdm_release()
might release "desc", wouldn't "not-waiting-for-completion" has a risk of
"use-after-free write"? Please add comment block why it is safe if it is
actually safe.
Alan Stern May 30, 2020, 1:10 a.m. UTC | #18
On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote:
> On 2020/05/30 5:41, Andrey Konovalov wrote:
> > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:

> >> This sounds like a bug in the driver.  What would it do if someone had a
> >> genuine (not emulated) but buggy USB device which didn't send the
> >> desired response?  The only way to unblock the driver would be to unplug
> >> the device!  That isn't acceptable behavior.
> > 
> > OK, that's what I thought.
> 
> I believe that this is not a bug in the driver but a problem of hardware
> failure. Unless this is high-availability code which is designed for safely
> failing over to other node, we don't need to care about hardware failure.

Oh my!  I can't even imagine what Linus would say if he saw that...  :-(

Have you heard of Bad USB?

The kernel most definitely does need to protect itself against 
misbehaving hardware.  Let's just leave it at that.  If you don't 
believe me, ask Greg KH.

I admit, causing a driver to hang isn't the worst thing a buggy device 
can do.  But the kernel is supposed to be able to cope with such things 
gracefully.

Alan Stern
Tetsuo Handa May 30, 2020, 4:58 a.m. UTC | #19
On 2020/05/30 10:10, Alan Stern wrote:
> On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote:
>> On 2020/05/30 5:41, Andrey Konovalov wrote:
>>> On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> 
>>>> This sounds like a bug in the driver.  What would it do if someone had a
>>>> genuine (not emulated) but buggy USB device which didn't send the
>>>> desired response?  The only way to unblock the driver would be to unplug
>>>> the device!  That isn't acceptable behavior.
>>>
>>> OK, that's what I thought.
>>
>> I believe that this is not a bug in the driver but a problem of hardware
>> failure. Unless this is high-availability code which is designed for safely
>> failing over to other node, we don't need to care about hardware failure.
> 
> Oh my!  I can't even imagine what Linus would say if he saw that...  :-(
> 
> Have you heard of Bad USB?

Of course, I've heard of that.

Please show me as a patch first.

> 
> The kernel most definitely does need to protect itself against 
> misbehaving hardware.  Let's just leave it at that.  If you don't 
> believe me, ask Greg KH.

I've made many locations killable (in order to reduce damage caused by OOM
condition). But I can't make locations killable where handling SIGKILL case is
too difficult to implement.

"struct file_operations"->flush() is called from filp_close() when there is
something which has to be done before "struct file_operations"->release() is
called.

As far as I read this thread, what you are trying to do sounds like allow
"not waiting for completion of wdm_out_callback()" with only
's/wait_event/wait_event_intrruptible/' in wdm_flush(). Then, please do remove
wdm_flush() call itself.

I'm not familiar with USB. But at least we would need to do something similar
to commit d0bd587a80960d7b ("usermodehelper: implement UMH_KILLABLE") in
addition to 's/wait_event/wait_event_intrruptible/' in wdm_flush().

> 
> I admit, causing a driver to hang isn't the worst thing a buggy device 
> can do.  But the kernel is supposed to be able to cope with such things 
> gracefully.

My understanding is that the "misbehaving hardware" in this bug report is not
"USB device itself" but "CPU used for receiving request from that USB device
and sending response to that USB device".

I don't know how wdm_flush() can recover when the CPU which is supposed to
unblock wait_event() is blocked inside that wait_event() itself. Unless you can
safely omit wdm_flush() by doing something similar to commit d0bd587a80960d7b,
this looks to me like a circular dependency which is impossible to solve.

Therefore, again, please show me as a patch first.
Greg Kroah-Hartman May 30, 2020, 6:08 a.m. UTC | #20
On Sat, May 30, 2020 at 09:42:46AM +0900, Tetsuo Handa wrote:
> On 2020/05/30 5:41, Andrey Konovalov wrote:
> > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >>
> >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
> >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >>>>
> >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
> >>>>
> >>>>> Ah, so the problem is that when a process exits, it tries to close wdm
> >>>>> fd first, which ends up calling wdm_flush(), which can't finish
> >>>>> because the USB requests are not terminated before raw-gadget fd is
> >>>>> closed, which is supposed to happen after wdm fd is closed. Is this
> >>>>> correct? I wonder what will happen if a real device stays connected
> >>>>> and ignores wdm requests.
> >>>>>
> >>>>> I don't understand though, how using wait_event_interruptible() will
> >>>>> shadow anything here.
> >>>>>
> >>>>> Alan, Greg, is this acceptable behavior for a USB driver?
> >>>>
> >>>> I don't understand what the problem is.  Can you explain in more general
> >>>> terms -- nothing specific to wdm or anything like that -- what you are
> >>>> concerned about?  Is this something that could happen to any gadget
> >>>> driver?  Or any USB class device driver?  Or does it only affect
> >>>> usespace components of raw-gadget drivers?
> >>>
> >>> So, AFAIU, we have a driver whose flush() callback blocks on
> >>> wait_event(), which can only terminate when either 1) the driver
> >>> receives a particular USB response from the device or 2) the device
> >>> disconnects.
> >>
> >> This sounds like a bug in the driver.  What would it do if someone had a
> >> genuine (not emulated) but buggy USB device which didn't send the
> >> desired response?  The only way to unblock the driver would be to unplug
> >> the device!  That isn't acceptable behavior.
> > 
> > OK, that's what I thought.
> 
> I believe that this is not a bug in the driver but a problem of hardware
> failure. Unless this is high-availability code which is designed for safely
> failing over to other node, we don't need to care about hardware failure.

As Alan said, that's just not true.  It's the job of an operating system
kernel to handle all of the crazy ways hardware is broken, and make it
work properly for people.  We deal with hardware failure all the time.

So don't do uninterruptable waits or loop for forever waiting for some
hardware value to change that might not change.  That's a sure way to
lock up the system and make users mad at you.

thanks,

greg k-h
Oliver Neukum May 30, 2020, 3:25 p.m. UTC | #21
Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern:
> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:

> > > I don't understand what the problem is.  Can you explain in more general
> > > terms -- nothing specific to wdm or anything like that -- what you are
> > > concerned about?  Is this something that could happen to any gadget
> > > driver?  Or any USB class device driver?  Or does it only affect
> > > usespace components of raw-gadget drivers?
> > 
> > So, AFAIU, we have a driver whose flush() callback blocks on
> > wait_event(), which can only terminate when either 1) the driver
> > receives a particular USB response from the device or 2) the device
> > disconnects.
> 
> This sounds like a bug in the driver.  What would it do if someone had a 

Arguably yes. I will introduce a timeout. Unfortunately flush()
requires a non-interruptible sleep, as you cannot sanely return EAGAIN.

	Regards
		Oliver
Alan Stern May 30, 2020, 3:47 p.m. UTC | #22
On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote:
> Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern:
> > On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
> 
> > > > I don't understand what the problem is.  Can you explain in more general
> > > > terms -- nothing specific to wdm or anything like that -- what you are
> > > > concerned about?  Is this something that could happen to any gadget
> > > > driver?  Or any USB class device driver?  Or does it only affect
> > > > usespace components of raw-gadget drivers?
> > > 
> > > So, AFAIU, we have a driver whose flush() callback blocks on
> > > wait_event(), which can only terminate when either 1) the driver
> > > receives a particular USB response from the device or 2) the device
> > > disconnects.
> > 
> > This sounds like a bug in the driver.  What would it do if someone had a 
> 
> Arguably yes. I will introduce a timeout. Unfortunately flush()
> requires a non-interruptible sleep, as you cannot sanely return EAGAIN.

But maybe you can kill some URBs and drop some data.

Alan Stern
Andrey Konovalov June 1, 2020, 12:26 p.m. UTC | #23
On Sat, May 30, 2020 at 2:43 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/05/30 5:41, Andrey Konovalov wrote:
> > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >>
> >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
> >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >>>>
> >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote:
> >>>>
> >>>>> Ah, so the problem is that when a process exits, it tries to close wdm
> >>>>> fd first, which ends up calling wdm_flush(), which can't finish
> >>>>> because the USB requests are not terminated before raw-gadget fd is
> >>>>> closed, which is supposed to happen after wdm fd is closed. Is this
> >>>>> correct? I wonder what will happen if a real device stays connected
> >>>>> and ignores wdm requests.
> >>>>>
> >>>>> I don't understand though, how using wait_event_interruptible() will
> >>>>> shadow anything here.
> >>>>>
> >>>>> Alan, Greg, is this acceptable behavior for a USB driver?
> >>>>
> >>>> I don't understand what the problem is.  Can you explain in more general
> >>>> terms -- nothing specific to wdm or anything like that -- what you are
> >>>> concerned about?  Is this something that could happen to any gadget
> >>>> driver?  Or any USB class device driver?  Or does it only affect
> >>>> usespace components of raw-gadget drivers?
> >>>
> >>> So, AFAIU, we have a driver whose flush() callback blocks on
> >>> wait_event(), which can only terminate when either 1) the driver
> >>> receives a particular USB response from the device or 2) the device
> >>> disconnects.
> >>
> >> This sounds like a bug in the driver.  What would it do if someone had a
> >> genuine (not emulated) but buggy USB device which didn't send the
> >> desired response?  The only way to unblock the driver would be to unplug
> >> the device!  That isn't acceptable behavior.
> >
> > OK, that's what I thought.
>
> I believe that this is not a bug in the driver but a problem of hardware
> failure. Unless this is high-availability code which is designed for safely
> failing over to other node, we don't need to care about hardware failure.
>
> >
> >>
> >>> For 1) the emulated device doesn't provide required
> >>> responses. For 2) the problem is that the emulated via raw-gadget
> >>> device disconnects when the process is killed (and raw-gadget fd is
> >>> closed). But that process is the same process that is currently stuck
> >>> on wait_event() in the flush callback(), and therefore unkillable.
> >>
> >> What would happen if you unload dummy-hcd at this point?  Or even just
> >> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is
> >> the bus number of the dummy-hcd bus?
> >
> > The device disconnects and flush() unblocks.
> >
> >>> This can generally happen with any driver that goes into
> >>> uninterruptible sleep within one of its code paths reachable from
> >>> userspace that can only be unblocked by a particular behavior from the
> >>> USB device. But I haven't seen any such drivers so far, wdm is the
> >>> first.
> >>
> >> Drivers should never go into uninterruptible sleep states unless they
> >> can guarantee that the duration will be bounded somehow (for example, by
> >> a reasonable timeout).  Or that cutting the sleep state short would
> >> cause the system to crash -- but that's not an issue here.
> >
> > OK, thank you, Alan!
> >
> > Tetsuo, could you clarify why you think that using
> > wait_event_interruptible() is a bad fix here?
> >
>
> If wait_event() in wdm_flush() were using timeout or interruptible, can the
> wdm driver handle that case safely? Since cleanup(desc) from wdm_release()
> might release "desc", wouldn't "not-waiting-for-completion" has a risk of
> "use-after-free write"? Please add comment block why it is safe if it is
> actually safe.

Oh, it might be that just replacing wait_event() with
wait_event_interruptible() can lead to other issues, and a more
involved fix is needed. The suggestion was rather to avoid blocking
flush() indefinitely.
Tetsuo Handa June 8, 2020, 2:24 a.m. UTC | #24
On 2020/05/31 0:47, Alan Stern wrote:
> On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote:
>> Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern:
>>> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote:
>>
>>>>> I don't understand what the problem is.  Can you explain in more general
>>>>> terms -- nothing specific to wdm or anything like that -- what you are
>>>>> concerned about?  Is this something that could happen to any gadget
>>>>> driver?  Or any USB class device driver?  Or does it only affect
>>>>> usespace components of raw-gadget drivers?
>>>>
>>>> So, AFAIU, we have a driver whose flush() callback blocks on
>>>> wait_event(), which can only terminate when either 1) the driver
>>>> receives a particular USB response from the device or 2) the device
>>>> disconnects.
>>>
>>> This sounds like a bug in the driver.  What would it do if someone had a 
>>
>> Arguably yes. I will introduce a timeout. Unfortunately flush()
>> requires a non-interruptible sleep, as you cannot sanely return EAGAIN.
> 
> But maybe you can kill some URBs and drop some data.

You mean call usb_kill_urb() via kill_urbs() ?

As far as I tested, it seems that usb_kill_urb() sometimes fails to call
wdm_out_callback() despite the comment for usb_kill_urb() says

 * This routine cancels an in-progress request.  It is guaranteed that
 * upon return all completion handlers will have finished and the URB
 * will be totally idle and available for reuse.  These features make
 * this an ideal way to stop I/O in a disconnect() callback or close()
 * function.  If the request has not already finished or been unlinked
 * the completion handler will see urb->status == -ENOENT.

. Is something still wrong? Or just replacing

		BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
		       !test_bit(WDM_DISCONNECTING, &desc->flags));

with

		wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
			   test_bit(WDM_DISCONNECTING, &desc->flags));

in the patch shown below is sufficient?

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index e3db6fbeadef..3e92e79ce0a0 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb)
 	kfree(desc->outbuf);
 	desc->outbuf = NULL;
 	clear_bit(WDM_IN_USE, &desc->flags);
-	wake_up(&desc->wait);
+	wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -424,6 +424,7 @@ static ssize_t wdm_write
 	if (rv < 0) {
 		desc->outbuf = NULL;
 		clear_bit(WDM_IN_USE, &desc->flags);
+		wake_up_all(&desc->wait);
 		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
 		rv = usb_translate_errors(rv);
 		goto out_free_mem_pm;
@@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id)
 {
 	struct wdm_device *desc = file->private_data;
 
-	wait_event(desc->wait,
-			/*
-			 * needs both flags. We cannot do with one
-			 * because resetting it would cause a race
-			 * with write() yet we need to signal
-			 * a disconnect
-			 */
-			!test_bit(WDM_IN_USE, &desc->flags) ||
-			test_bit(WDM_DISCONNECTING, &desc->flags));
+	/*
+	 * needs both flags. We cannot do with one because resetting it would
+	 * cause a race with write() yet we need to signal a disconnect
+	 */
+	if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
+				test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) {
+		kill_urbs(desc);
+		BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
+		       !test_bit(WDM_DISCONNECTING, &desc->flags));
+	}
 
 	/* cannot dereference desc->intf if WDM_DISCONNECTING */
 	if (test_bit(WDM_DISCONNECTING, &desc->flags))


[  208.339533] [ T5240] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339614] [ T5224] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339753] [ T5184] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339811] [ T5223] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339820] [ T5242] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339902] [ T5222] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.339988] [ T5201] cdc_wdm 5-1:118.0: Error in flush path: -2
[  208.343226] [ T5270] udc-core: couldn't find an available UDC or it's busy
[  208.343237] [ T5270] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.343693] [ T5271] udc-core: couldn't find an available UDC or it's busy
[  208.343700] [ T5271] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.345655] [ T5273] udc-core: couldn't find an available UDC or it's busy
[  208.345863] [ T5275] udc-core: couldn't find an available UDC or it's busy
[  208.345870] [ T5275] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.345933] [ T5272] udc-core: couldn't find an available UDC or it's busy
[  208.345943] [ T5272] misc raw-gadget: fail, usb_gadget_probe_driver returned -16
[  208.345982] [ T5274] udc-core: couldn't find an available UDC or it's busy
[  208.346007] [ T5274] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.347209] [ T5276] udc-core: couldn't find an available UDC or it's busy
[  208.348035] [ T5273] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.356463] [ T5182] ------------[ cut here ]------------
[  208.356752] [ T5276] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.358439] [ T5182] kernel BUG at drivers/usb/class/cdc-wdm.c:599!
[  208.358481] [ T5182] invalid opcode: 0000 [#1] SMP KASAN
[  208.377041] [ T5271] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.378521] [ T5182] CPU: 1 PID: 5182 Comm: a.out Not tainted 5.7.0-rc5+ #22
[  208.406262] [ T5274] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.406692] [ T5182] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[  208.406703] [ T5182] RIP: 0010:wdm_flush+0x363/0x460
[  208.409106] [ T5277] udc-core: couldn't find an available UDC or it's busy
[  208.409114] [ T5277] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.411250] [ T5278] udc-core: couldn't find an available UDC or it's busy
[  208.411464] [ T5182] Code: 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 90 00 00 00 48 8b 83 90 00 00 00 a8 04 0f 85 85 fd ff ff e8 0d 28 bf fd <0f> 0b bd ed ff ff ff e9 00 fe ff ff 4c 89 e7 e8 79 73 e5 fd e9 d0
[  208.411467] [ T5182] RSP: 0018:ffff8881e8ed7df0 EFLAGS: 00010293
[  208.411470] [ T5182] RAX: 0000000000000000 RBX: ffff8881f44d5800 RCX: ffffffff837d9974
[  208.411472] [ T5182] RDX: ffff8881e734b180 RSI: ffffffff837d99a3 RDI: ffff8881f44d5890
[  208.411473] [ T5182] RBP: ffff8881f44d5890 R08: ffffed103e89ab13 R09: ffff8881f44d5897
[  208.411479] [ T5182] R10: ffffed103e89ab12 R11: ffffed103e89ab13 R12: 0000000000000000
[  208.414031] [ T5278] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.415966] [ T5182] R13: ffff8881e8ed7e18 R14: ffff8881f44d59e0 R15: ffffed103e89ab12
[  208.415973] [ T5182] FS:  00007f95f02a3440(0000) GS:ffff8881f6c40000(0000) knlGS:0000000000000000
[  208.456966] [ T5278] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.456971] [ T5277] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.456974] [ T5275] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.456988] [ T5276] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.456991] [ T5272] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.457405] [ T5273] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.457527] [ T5182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  208.458378] [ T5279] udc-core: couldn't find an available UDC or it's busy
[  208.458384] [ T5279] misc raw-gadget: fail, usb_gadget_probe_driver returned -16
[  208.458606] [ T5280] udc-core: couldn't find an available UDC or it's busy
[  208.458611] [ T5280] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.460681] [ T5281] udc-core: couldn't find an available UDC or it's busy
[  208.460925] [ T5282] udc-core: couldn't find an available UDC or it's busy
[  208.460936] [ T5282] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.461031] [ T5283] udc-core: couldn't find an available UDC or it's busy
[  208.461037] [ T5283] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.461268] [ T5284] udc-core: couldn't find an available UDC or it's busy
[  208.461273] [ T5284] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.462427] [ T5182] CR2: 00007fb0ae7797a0 CR3: 00000001eb03d002 CR4: 00000000003606e0
[  208.462435] [ T5182] Call Trace:
[  208.462444] [ T5182]  ? wdm_poll+0x230/0x230
[  208.463320] [ T5285] udc-core: couldn't find an available UDC or it's busy
[  208.463325] [ T5285] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.464664] [ T5281] misc raw-gadget: fail, usb_gadget_probe_driver returned -16
[  208.466731] [ T5182]  ? remove_wait_queue+0x190/0x190
[  208.466737] [ T5182]  ? wdm_poll+0x230/0x230
[  208.466743] [ T5182]  filp_close+0xad/0x160
[  208.466749] [ T5182]  __close_fd+0x112/0x1d0
[  208.466755] [ T5182]  __x64_sys_close+0x67/0xc0
[  208.466762] [ T5182]  ? lockdep_hardirqs_on+0x3f1/0x5b0
[  208.466767] [ T5182]  do_syscall_64+0x9b/0x510
[  208.466774] [ T5182]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  208.466780] [ T5182] RIP: 0033:0x7f95efdc3f30
[  208.466786] [ T5182] Code: 00 64 c7 00 0d 00 00 00 b8 ff ff ff ff eb 90 b8 ff ff ff ff eb 89 0f 1f 40 00 83 3d d9 27 2c 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 95 01 00 48 89 04 24
[  208.466791] [ T5182] RSP: 002b:00007ffc958cd6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  208.466799] [ T5182] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f95efdc3f30
[  208.476513] [ T5280] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.478577] [ T5182] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
[  208.478581] [ T5182] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffc958cd70e
[  208.478586] [ T5182] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[  208.482884] [ T5286] udc-core: couldn't find an available UDC or it's busy
[  208.483534] [ T5182] R13: 000000000002daa7 R14: 0000000000000000 R15: 0000000000000000
[  208.483536] [ T5182] Modules linked in:
[  208.483635] [ T5182] ---[ end trace 6a73cddfff8988f4 ]---
[  208.485364] [ T5286] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.487479] [ T5182] RIP: 0010:wdm_flush+0x363/0x460
[  208.506707] [ T5282] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.508658] [ T5182] Code: 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 90 00 00 00 48 8b 83 90 00 00 00 a8 04 0f 85 85 fd ff ff e8 0d 28 bf fd <0f> 0b bd ed ff ff ff e9 00 fe ff ff 4c 89 e7 e8 79 73 e5 fd e9 d0
[  208.528782] [ T5287] udc-core: couldn't find an available UDC or it's busy
[  208.529695] [ T5182] RSP: 0018:ffff8881e8ed7df0 EFLAGS: 00010293
[  208.529699] [ T5182] RAX: 0000000000000000 RBX: ffff8881f44d5800 RCX: ffffffff837d9974
[  208.529701] [ T5182] RDX: ffff8881e734b180 RSI: ffffffff837d99a3 RDI: ffff8881f44d5890
[  208.529703] [ T5182] RBP: ffff8881f44d5890 R08: ffffed103e89ab13 R09: ffff8881f44d5897
[  208.529705] [ T5182] R10: ffffed103e89ab12 R11: ffffed103e89ab13 R12: 0000000000000000
[  208.529706] [ T5182] R13: ffff8881e8ed7e18 R14: ffff8881f44d59e0 R15: ffffed103e89ab12
[  208.529709] [ T5182] FS:  00007f95f02a3440(0000) GS:ffff8881f6c40000(0000) knlGS:0000000000000000
[  208.529718] [ T5182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  208.530960] [ T5287] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.537715] [ T5182] CR2: 00007fb0ae7797a0 CR3: 00000001eb03d002 CR4: 00000000003606e0
[  208.577096] [ T5284] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.577105] [ T5281] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.577108] [ T5287] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.577111] [ T5283] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.577525] [ T5286] cdc_wdm 5-1:118.0: Error in flush path: -32
[  208.578527] [ T5288] udc-core: couldn't find an available UDC or it's busy
[  208.578533] [ T5288] misc raw-gadget: fail, usb_gadget_probe_driver returned -16
[  208.580556] [ T5291] udc-core: couldn't find an available UDC or it's busy
[  208.580644] [ T5289] udc-core: couldn't find an available UDC or it's busy
[  208.580646] [ T5290] udc-core: couldn't find an available UDC or it's busy
[  208.580652] [ T5289] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.580655] [ T5290] misc raw-gadget: fail, usb_gadget_probe_driver returned -16
[  208.580844] [ T5292] udc-core: couldn't find an available UDC or it's busy
[  208.580851] [ T5292] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.581053] [ T5293] udc-core: couldn't find an available UDC or it's busy
[  208.581058] [ T5293] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.599340] [ T5294] udc-core: couldn't find an available UDC or it's busy
[  208.599866] [ T5291] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.601707] [ T5294] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  208.616268] [ T5289] cdc_wdm 5-1:118.0: Error in flush path: -32
Tetsuo Handa June 18, 2020, 12:48 a.m. UTC | #25
On 2020/06/08 11:24, Tetsuo Handa wrote:
> As far as I tested, it seems that usb_kill_urb() sometimes fails to call
> wdm_out_callback() despite the comment for usb_kill_urb() says
> 
>  * This routine cancels an in-progress request.  It is guaranteed that
>  * upon return all completion handlers will have finished and the URB
>  * will be totally idle and available for reuse.  These features make
>  * this an ideal way to stop I/O in a disconnect() callback or close()
>  * function.  If the request has not already finished or been unlinked
>  * the completion handler will see urb->status == -ENOENT.
> 
> . Is something still wrong? Or just replacing
> 
> 		BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
> 		       !test_bit(WDM_DISCONNECTING, &desc->flags));
> 
> with
> 
> 		wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
> 			   test_bit(WDM_DISCONNECTING, &desc->flags));
> 
> in the patch shown below is sufficient?

How is the progress?

Another debug printk() patch shown below says that while wdm_flush() waits for
clearing of WDM_IN_USE using wait_event(), concurrently executed wdm_write() also
waits for clearing of WDM_IN_USE using wait_event_interruptible(), and wdm_write() can
immediately set WDM_IN_USE again as soon as returning from wait_event_interruptible()
even if somebody was already waiting at wdm_flush() to clear WDM_IN_USE.

That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb()
request which is started after wait_event() found that WDM_IN_USE was cleared. Then,
what does this wait_event() in wdm_flush() want to flush? There is no need to wait for
completion of usb_submit_urb() ?

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index e3db6fbeadef..d2c4d8b9cc18 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -82,6 +82,7 @@ struct wdm_device {
 	spinlock_t		iuspin;
 
 	unsigned long		flags;
+	atomic_t                flushers;
 	u16			bufsize;
 	u16			wMaxCommand;
 	u16			wMaxPacketSize;
@@ -151,7 +152,7 @@ static void wdm_out_callback(struct urb *urb)
 	kfree(desc->outbuf);
 	desc->outbuf = NULL;
 	clear_bit(WDM_IN_USE, &desc->flags);
-	wake_up(&desc->wait);
+	wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -417,6 +418,13 @@ static ssize_t wdm_write
 	req->wValue = 0;
 	req->wIndex = desc->inum; /* already converted */
 	req->wLength = cpu_to_le16(count);
+	{
+		const int flushers = atomic_read(&desc->flushers);
+		if (flushers) {
+			printk(KERN_ERR "There is %u flushers.\n", flushers);
+			BUG_ON(1);
+		}
+	}
 	set_bit(WDM_IN_USE, &desc->flags);
 	desc->outbuf = buf;
 
@@ -424,6 +432,7 @@ static ssize_t wdm_write
 	if (rv < 0) {
 		desc->outbuf = NULL;
 		clear_bit(WDM_IN_USE, &desc->flags);
+		wake_up_all(&desc->wait);
 		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
 		rv = usb_translate_errors(rv);
 		goto out_free_mem_pm;
@@ -587,15 +596,20 @@ static int wdm_flush(struct file *file, fl_owner_t id)
 {
 	struct wdm_device *desc = file->private_data;
 
-	wait_event(desc->wait,
-			/*
-			 * needs both flags. We cannot do with one
-			 * because resetting it would cause a race
-			 * with write() yet we need to signal
-			 * a disconnect
-			 */
-			!test_bit(WDM_IN_USE, &desc->flags) ||
-			test_bit(WDM_DISCONNECTING, &desc->flags));
+	/*
+	 * needs both flags. We cannot do with one because resetting it would
+	 * cause a race with write() yet we need to signal a disconnect
+	 */
+	atomic_inc(&desc->flushers);
+	if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
+				test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) {
+		atomic_dec(&desc->flushers);
+		kill_urbs(desc);
+		BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
+		       !test_bit(WDM_DISCONNECTING, &desc->flags));
+	} else {
+		atomic_dec(&desc->flushers);
+	}
 
 	/* cannot dereference desc->intf if WDM_DISCONNECTING */
 	if (test_bit(WDM_DISCONNECTING, &desc->flags))



[  103.934085][  T118] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef
[  103.934093][  T118] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2
[  103.934100][  T118] usb 5-1: SerialNumber: syz
[  103.934145][ T4108] udc-core: couldn't find an available UDC or it's busy
[  103.934151][ T4108] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.934810][ T4109] udc-core: couldn't find an available UDC or it's busy
[  103.934821][ T4109] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.935385][ T4110] udc-core: couldn't find an available UDC or it's busy
[  103.935392][ T4110] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.935411][ T4106] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.938481][ T4112] udc-core: couldn't find an available UDC or it's busy
[  103.939183][ T4113] udc-core: couldn't find an available UDC or it's busy
[  103.939192][ T4113] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.939415][ T4114] udc-core: couldn't find an available UDC or it's busy
[  103.939421][ T4114] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.943069][ T4115] udc-core: couldn't find an available UDC or it's busy
[  103.943257][ T4116] udc-core: couldn't find an available UDC or it's busy
[  103.943266][ T4116] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.944118][ T4117] udc-core: couldn't find an available UDC or it's busy
[  103.944129][ T4117] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.944160][ T4118] udc-core: couldn't find an available UDC or it's busy
[  103.944169][ T4118] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.944738][ T4112] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.946715][ T4115] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.947159][ T4119] udc-core: couldn't find an available UDC or it's busy
[  103.947165][ T4119] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.947315][ T4120] udc-core: couldn't find an available UDC or it's busy
[  103.947324][ T4120] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.948093][ T4121] udc-core: couldn't find an available UDC or it's busy
[  103.948103][ T4121] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.949190][ T4122] udc-core: couldn't find an available UDC or it's busy
[  103.950070][ T4123] udc-core: couldn't find an available UDC or it's busy
[  103.950080][ T4123] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.950580][ T4124] udc-core: couldn't find an available UDC or it's busy
[  103.950587][ T4124] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.953081][ T4125] udc-core: couldn't find an available UDC or it's busy
[  103.953430][ T4126] udc-core: couldn't find an available UDC or it's busy
[  103.953437][ T4126] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.953963][ T4122] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.954780][ T4128] udc-core: couldn't find an available UDC or it's busy
[  103.954786][ T4128] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.955864][ T4129] udc-core: couldn't find an available UDC or it's busy
[  103.955872][ T4129] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.955962][ T4127] udc-core: couldn't find an available UDC or it's busy
[  103.955968][ T4127] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.956010][ T4125] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.958600][ T4130] udc-core: couldn't find an available UDC or it's busy
[  103.958962][ T4131] udc-core: couldn't find an available UDC or it's busy
[  103.958972][ T4131] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.959737][ T4132] udc-core: couldn't find an available UDC or it's busy
[  103.959749][ T4132] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.963405][ T4133] udc-core: couldn't find an available UDC or it's busy
[  103.963413][ T4133] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.964329][ T4134] udc-core: couldn't find an available UDC or it's busy
[  103.965201][ T4135] udc-core: couldn't find an available UDC or it's busy
[  103.965211][ T4135] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.965511][ T4136] udc-core: couldn't find an available UDC or it's busy
[  103.965520][ T4136] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.966011][ T4130] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.966939][ T4137] udc-core: couldn't find an available UDC or it's busy
[  103.966946][ T4137] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.967467][ T4138] udc-core: couldn't find an available UDC or it's busy
[  103.967473][ T4138] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.967909][ T4134] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.969292][ T4139] udc-core: couldn't find an available UDC or it's busy
[  103.969301][ T4139] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.969708][ T4140] udc-core: couldn't find an available UDC or it's busy
[  103.969717][ T4140] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.970343][ T4141] udc-core: couldn't find an available UDC or it's busy
[  103.973658][  T118] hub 5-1:118.0: bad descriptor, ignoring hub
[  103.974100][ T4142] udc-core: couldn't find an available UDC or it's busy
[  103.974107][ T4142] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.974518][ T4141] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.974657][ T4143] udc-core: couldn't find an available UDC or it's busy
[  103.974663][ T4143] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.975767][ T4144] udc-core: couldn't find an available UDC or it's busy
[  103.975774][ T4144] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.976777][  T118] hub: probe of 5-1:118.0 failed with error -5
[  103.977257][ T4145] udc-core: couldn't find an available UDC or it's busy
[  103.977263][ T4145] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.978020][ T4146] udc-core: couldn't find an available UDC or it's busy
[  103.978029][ T4146] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.978223][ T4148] udc-core: couldn't find an available UDC or it's busy
[  103.978229][ T4148] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.978388][ T4147] udc-core: couldn't find an available UDC or it's busy
[  103.978396][ T4147] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.980149][ T4149] udc-core: couldn't find an available UDC or it's busy
[  103.982395][ T4151] udc-core: couldn't find an available UDC or it's busy
[  103.982468][ T4152] udc-core: couldn't find an available UDC or it's busy
[  103.982478][ T4152] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.982674][ T4153] udc-core: couldn't find an available UDC or it's busy
[  103.982682][ T4153] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.982780][ T4150] udc-core: couldn't find an available UDC or it's busy
[  103.982788][ T4150] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.983634][ T4149] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.983843][  T118] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device
[  103.984589][ T4155] udc-core: couldn't find an available UDC or it's busy
[  103.984595][ T4155] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.985615][ T4156] udc-core: couldn't find an available UDC or it's busy
[  103.985624][ T4156] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.985747][ T4151] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.985906][ T4157] udc-core: couldn't find an available UDC or it's busy
[  103.985912][ T4157] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.987838][ T4158] udc-core: couldn't find an available UDC or it's busy
[  103.987845][ T4158] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.987955][ T4159] udc-core: couldn't find an available UDC or it's busy
[  103.987962][ T4159] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.988233][ T4160] udc-core: couldn't find an available UDC or it's busy
[  103.988239][ T4160] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.989778][ T4161] udc-core: couldn't find an available UDC or it's busy
[  103.990308][ T4162] udc-core: couldn't find an available UDC or it's busy
[  103.990319][ T4162] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.991055][ T4163] udc-core: couldn't find an available UDC or it's busy
[  103.993669][ T4161] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.994007][ T4164] udc-core: couldn't find an available UDC or it's busy
[  103.994142][ T4165] udc-core: couldn't find an available UDC or it's busy
[  103.994151][ T4165] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.995212][ T4166] udc-core: couldn't find an available UDC or it's busy
[  103.995222][ T4166] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  103.996400][ T4163] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  106.505076][ T4164] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  124.793950][ T3360] cdc_wdm 5-1:118.0: Error in flush path: -2
[  124.793977][ T4165] cdc_wdm 5-1:118.0: Error in flush path: -2
[  124.793987][ T4166] cdc_wdm 5-1:118.0: Error in flush path: -2
[  124.793997][ T4164] cdc_wdm 5-1:118.0: Error in flush path: -2
[  124.794461][ T4163] cdc_wdm 5-1:118.0: Error in flush path: -2
[  124.798193][ T4168] udc-core: couldn't find an available UDC or it's busy
[  124.798218][ T4168] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  124.798292][ T4169] udc-core: couldn't find an available UDC or it's busy
[  124.798315][ T4169] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  124.798531][ T4170] udc-core: couldn't find an available UDC or it's busy
[  124.798537][ T4170] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
[  124.812947][ T4170] There is 2 flushers.
[  124.812973][ T4161] cdc_wdm 5-1:118.0: Error in flush path: -32
[  124.812982][ T4168] cdc_wdm 5-1:118.0: Error in flush path: -32
[  124.812987][ T4169] cdc_wdm 5-1:118.0: Error in flush path: -32
[  124.823862][ T4170] ------------[ cut here ]------------
[  124.825440][ T4170] kernel BUG at drivers/usb/class/cdc-wdm.c:425!
[  124.827049][ T4170] invalid opcode: 0000 [#1] SMP KASAN
[  124.828439][ T4170] CPU: 3 PID: 4170 Comm: a.out Not tainted 5.7.0-rc5+ #27
[  124.830192][ T4170] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[  124.833192][ T4170] RIP: 0010:wdm_write+0xce7/0xf00
[  124.834683][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24
[  124.839523][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286
[  124.841496][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc
[  124.843673][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec
[  124.846399][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f
[  124.848563][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac
[  124.850559][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000
[  124.852875][ T4170] FS:  00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000
[  124.855270][ T4170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  124.857106][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0
[  124.859118][ T4170] Call Trace:
[  124.860123][ T4170]  ? wdm_probe+0x2b0/0x2b0
[  124.861567][ T4170]  ? remove_wait_queue+0x190/0x190
[  124.862896][ T4170]  ? wdm_probe+0x2b0/0x2b0
[  124.864214][ T4170]  __vfs_write+0x76/0x100
[  124.865314][ T4170]  vfs_write+0x185/0x510
[  124.866418][ T4170]  ksys_write+0xff/0x200
[  124.867653][ T4170]  ? __ia32_sys_read+0xb0/0xb0
[  124.868851][ T4170]  ? trace_hardirqs_off_caller+0x22/0x1c0
[  124.870283][ T4170]  do_syscall_64+0x9b/0x510
[  124.871575][ T4170]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  124.873352][ T4170] RIP: 0033:0x7fbf55cc2469
[  124.874481][ T4170] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[  124.879954][ T4170] RSP: 002b:00007ffdc294dd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  124.882090][ T4170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf55cc2469
[  124.884573][ T4170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
[  124.887178][ T4170] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffdc294dd8e
[  124.889941][ T4170] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[  124.892208][ T4170] R13: 00000000000193a4 R14: 0000000000000000 R15: 0000000000000000
[  124.894103][ T4170] Modules linked in:
[  124.895465][ T4170] ---[ end trace 2aea2ce7d52c22e0 ]---
[  124.897307][ T4170] RIP: 0010:wdm_write+0xce7/0xf00
[  124.898975][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24
[  124.905012][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286
[  124.906793][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc
[  124.909151][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec
[  124.911276][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f
[  124.913784][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac
[  124.916882][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000
[  124.919078][ T4170] FS:  00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000
[  124.921484][ T4170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  124.923337][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0
[  124.925606][ T4170] cdc_wdm 5-1:118.0: Error in flush path: -32
Andrey Konovalov June 19, 2020, 1:56 p.m. UTC | #26
On Thu, Jun 18, 2020 at 2:49 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/06/08 11:24, Tetsuo Handa wrote:
> > As far as I tested, it seems that usb_kill_urb() sometimes fails to call
> > wdm_out_callback() despite the comment for usb_kill_urb() says
> >
> >  * This routine cancels an in-progress request.  It is guaranteed that
> >  * upon return all completion handlers will have finished and the URB
> >  * will be totally idle and available for reuse.  These features make
> >  * this an ideal way to stop I/O in a disconnect() callback or close()
> >  * function.  If the request has not already finished or been unlinked
> >  * the completion handler will see urb->status == -ENOENT.
> >
> > . Is something still wrong? Or just replacing
> >
> >               BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
> >                      !test_bit(WDM_DISCONNECTING, &desc->flags));
> >
> > with
> >
> >               wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
> >                          test_bit(WDM_DISCONNECTING, &desc->flags));
> >
> > in the patch shown below is sufficient?
>
> How is the progress?
>
> Another debug printk() patch shown below says that while wdm_flush() waits for
> clearing of WDM_IN_USE using wait_event(), concurrently executed wdm_write() also
> waits for clearing of WDM_IN_USE using wait_event_interruptible(), and wdm_write() can
> immediately set WDM_IN_USE again as soon as returning from wait_event_interruptible()
> even if somebody was already waiting at wdm_flush() to clear WDM_IN_USE.
>
> That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb()
> request which is started after wait_event() found that WDM_IN_USE was cleared. Then,
> what does this wait_event() in wdm_flush() want to flush? There is no need to wait for
> completion of usb_submit_urb() ?

Oliver, any chance you could help us with fixing the hang in this
driver? You seem to be its original author. This hang is one of the
top crashers on syzbot, with over 32000 crashed kernels.

>
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index e3db6fbeadef..d2c4d8b9cc18 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -82,6 +82,7 @@ struct wdm_device {
>         spinlock_t              iuspin;
>
>         unsigned long           flags;
> +       atomic_t                flushers;
>         u16                     bufsize;
>         u16                     wMaxCommand;
>         u16                     wMaxPacketSize;
> @@ -151,7 +152,7 @@ static void wdm_out_callback(struct urb *urb)
>         kfree(desc->outbuf);
>         desc->outbuf = NULL;
>         clear_bit(WDM_IN_USE, &desc->flags);
> -       wake_up(&desc->wait);
> +       wake_up_all(&desc->wait);
>  }
>
>  static void wdm_in_callback(struct urb *urb)
> @@ -417,6 +418,13 @@ static ssize_t wdm_write
>         req->wValue = 0;
>         req->wIndex = desc->inum; /* already converted */
>         req->wLength = cpu_to_le16(count);
> +       {
> +               const int flushers = atomic_read(&desc->flushers);
> +               if (flushers) {
> +                       printk(KERN_ERR "There is %u flushers.\n", flushers);
> +                       BUG_ON(1);
> +               }
> +       }
>         set_bit(WDM_IN_USE, &desc->flags);
>         desc->outbuf = buf;
>
> @@ -424,6 +432,7 @@ static ssize_t wdm_write
>         if (rv < 0) {
>                 desc->outbuf = NULL;
>                 clear_bit(WDM_IN_USE, &desc->flags);
> +               wake_up_all(&desc->wait);
>                 dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
>                 rv = usb_translate_errors(rv);
>                 goto out_free_mem_pm;
> @@ -587,15 +596,20 @@ static int wdm_flush(struct file *file, fl_owner_t id)
>  {
>         struct wdm_device *desc = file->private_data;
>
> -       wait_event(desc->wait,
> -                       /*
> -                        * needs both flags. We cannot do with one
> -                        * because resetting it would cause a race
> -                        * with write() yet we need to signal
> -                        * a disconnect
> -                        */
> -                       !test_bit(WDM_IN_USE, &desc->flags) ||
> -                       test_bit(WDM_DISCONNECTING, &desc->flags));
> +       /*
> +        * needs both flags. We cannot do with one because resetting it would
> +        * cause a race with write() yet we need to signal a disconnect
> +        */
> +       atomic_inc(&desc->flushers);
> +       if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
> +                               test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) {
> +               atomic_dec(&desc->flushers);
> +               kill_urbs(desc);
> +               BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
> +                      !test_bit(WDM_DISCONNECTING, &desc->flags));
> +       } else {
> +               atomic_dec(&desc->flushers);
> +       }
>
>         /* cannot dereference desc->intf if WDM_DISCONNECTING */
>         if (test_bit(WDM_DISCONNECTING, &desc->flags))
>
>
>
> [  103.934085][  T118] usb 5-1: New USB device found, idVendor=2400, idProduct=4200, bcdDevice=42.ef
> [  103.934093][  T118] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=2
> [  103.934100][  T118] usb 5-1: SerialNumber: syz
> [  103.934145][ T4108] udc-core: couldn't find an available UDC or it's busy
> [  103.934151][ T4108] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.934810][ T4109] udc-core: couldn't find an available UDC or it's busy
> [  103.934821][ T4109] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.935385][ T4110] udc-core: couldn't find an available UDC or it's busy
> [  103.935392][ T4110] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.935411][ T4106] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.938481][ T4112] udc-core: couldn't find an available UDC or it's busy
> [  103.939183][ T4113] udc-core: couldn't find an available UDC or it's busy
> [  103.939192][ T4113] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.939415][ T4114] udc-core: couldn't find an available UDC or it's busy
> [  103.939421][ T4114] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.943069][ T4115] udc-core: couldn't find an available UDC or it's busy
> [  103.943257][ T4116] udc-core: couldn't find an available UDC or it's busy
> [  103.943266][ T4116] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.944118][ T4117] udc-core: couldn't find an available UDC or it's busy
> [  103.944129][ T4117] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.944160][ T4118] udc-core: couldn't find an available UDC or it's busy
> [  103.944169][ T4118] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.944738][ T4112] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.946715][ T4115] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.947159][ T4119] udc-core: couldn't find an available UDC or it's busy
> [  103.947165][ T4119] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.947315][ T4120] udc-core: couldn't find an available UDC or it's busy
> [  103.947324][ T4120] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.948093][ T4121] udc-core: couldn't find an available UDC or it's busy
> [  103.948103][ T4121] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.949190][ T4122] udc-core: couldn't find an available UDC or it's busy
> [  103.950070][ T4123] udc-core: couldn't find an available UDC or it's busy
> [  103.950080][ T4123] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.950580][ T4124] udc-core: couldn't find an available UDC or it's busy
> [  103.950587][ T4124] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.953081][ T4125] udc-core: couldn't find an available UDC or it's busy
> [  103.953430][ T4126] udc-core: couldn't find an available UDC or it's busy
> [  103.953437][ T4126] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.953963][ T4122] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.954780][ T4128] udc-core: couldn't find an available UDC or it's busy
> [  103.954786][ T4128] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.955864][ T4129] udc-core: couldn't find an available UDC or it's busy
> [  103.955872][ T4129] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.955962][ T4127] udc-core: couldn't find an available UDC or it's busy
> [  103.955968][ T4127] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.956010][ T4125] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.958600][ T4130] udc-core: couldn't find an available UDC or it's busy
> [  103.958962][ T4131] udc-core: couldn't find an available UDC or it's busy
> [  103.958972][ T4131] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.959737][ T4132] udc-core: couldn't find an available UDC or it's busy
> [  103.959749][ T4132] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.963405][ T4133] udc-core: couldn't find an available UDC or it's busy
> [  103.963413][ T4133] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.964329][ T4134] udc-core: couldn't find an available UDC or it's busy
> [  103.965201][ T4135] udc-core: couldn't find an available UDC or it's busy
> [  103.965211][ T4135] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.965511][ T4136] udc-core: couldn't find an available UDC or it's busy
> [  103.965520][ T4136] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.966011][ T4130] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.966939][ T4137] udc-core: couldn't find an available UDC or it's busy
> [  103.966946][ T4137] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.967467][ T4138] udc-core: couldn't find an available UDC or it's busy
> [  103.967473][ T4138] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.967909][ T4134] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.969292][ T4139] udc-core: couldn't find an available UDC or it's busy
> [  103.969301][ T4139] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.969708][ T4140] udc-core: couldn't find an available UDC or it's busy
> [  103.969717][ T4140] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.970343][ T4141] udc-core: couldn't find an available UDC or it's busy
> [  103.973658][  T118] hub 5-1:118.0: bad descriptor, ignoring hub
> [  103.974100][ T4142] udc-core: couldn't find an available UDC or it's busy
> [  103.974107][ T4142] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.974518][ T4141] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.974657][ T4143] udc-core: couldn't find an available UDC or it's busy
> [  103.974663][ T4143] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.975767][ T4144] udc-core: couldn't find an available UDC or it's busy
> [  103.975774][ T4144] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.976777][  T118] hub: probe of 5-1:118.0 failed with error -5
> [  103.977257][ T4145] udc-core: couldn't find an available UDC or it's busy
> [  103.977263][ T4145] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.978020][ T4146] udc-core: couldn't find an available UDC or it's busy
> [  103.978029][ T4146] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.978223][ T4148] udc-core: couldn't find an available UDC or it's busy
> [  103.978229][ T4148] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.978388][ T4147] udc-core: couldn't find an available UDC or it's busy
> [  103.978396][ T4147] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.980149][ T4149] udc-core: couldn't find an available UDC or it's busy
> [  103.982395][ T4151] udc-core: couldn't find an available UDC or it's busy
> [  103.982468][ T4152] udc-core: couldn't find an available UDC or it's busy
> [  103.982478][ T4152] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.982674][ T4153] udc-core: couldn't find an available UDC or it's busy
> [  103.982682][ T4153] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.982780][ T4150] udc-core: couldn't find an available UDC or it's busy
> [  103.982788][ T4150] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.983634][ T4149] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.983843][  T118] cdc_wdm 5-1:118.0: cdc-wdm0: USB WDM device
> [  103.984589][ T4155] udc-core: couldn't find an available UDC or it's busy
> [  103.984595][ T4155] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.985615][ T4156] udc-core: couldn't find an available UDC or it's busy
> [  103.985624][ T4156] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.985747][ T4151] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.985906][ T4157] udc-core: couldn't find an available UDC or it's busy
> [  103.985912][ T4157] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.987838][ T4158] udc-core: couldn't find an available UDC or it's busy
> [  103.987845][ T4158] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.987955][ T4159] udc-core: couldn't find an available UDC or it's busy
> [  103.987962][ T4159] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.988233][ T4160] udc-core: couldn't find an available UDC or it's busy
> [  103.988239][ T4160] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.989778][ T4161] udc-core: couldn't find an available UDC or it's busy
> [  103.990308][ T4162] udc-core: couldn't find an available UDC or it's busy
> [  103.990319][ T4162] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.991055][ T4163] udc-core: couldn't find an available UDC or it's busy
> [  103.993669][ T4161] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.994007][ T4164] udc-core: couldn't find an available UDC or it's busy
> [  103.994142][ T4165] udc-core: couldn't find an available UDC or it's busy
> [  103.994151][ T4165] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.995212][ T4166] udc-core: couldn't find an available UDC or it's busy
> [  103.995222][ T4166] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  103.996400][ T4163] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  106.505076][ T4164] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  124.793950][ T3360] cdc_wdm 5-1:118.0: Error in flush path: -2
> [  124.793977][ T4165] cdc_wdm 5-1:118.0: Error in flush path: -2
> [  124.793987][ T4166] cdc_wdm 5-1:118.0: Error in flush path: -2
> [  124.793997][ T4164] cdc_wdm 5-1:118.0: Error in flush path: -2
> [  124.794461][ T4163] cdc_wdm 5-1:118.0: Error in flush path: -2
> [  124.798193][ T4168] udc-core: couldn't find an available UDC or it's busy
> [  124.798218][ T4168] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  124.798292][ T4169] udc-core: couldn't find an available UDC or it's busy
> [  124.798315][ T4169] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  124.798531][ T4170] udc-core: couldn't find an available UDC or it's busy
> [  124.798537][ T4170] misc raw-gadget: fail, usb_gadget_probe_driver returned -19
> [  124.812947][ T4170] There is 2 flushers.
> [  124.812973][ T4161] cdc_wdm 5-1:118.0: Error in flush path: -32
> [  124.812982][ T4168] cdc_wdm 5-1:118.0: Error in flush path: -32
> [  124.812987][ T4169] cdc_wdm 5-1:118.0: Error in flush path: -32
> [  124.823862][ T4170] ------------[ cut here ]------------
> [  124.825440][ T4170] kernel BUG at drivers/usb/class/cdc-wdm.c:425!
> [  124.827049][ T4170] invalid opcode: 0000 [#1] SMP KASAN
> [  124.828439][ T4170] CPU: 3 PID: 4170 Comm: a.out Not tainted 5.7.0-rc5+ #27
> [  124.830192][ T4170] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> [  124.833192][ T4170] RIP: 0010:wdm_write+0xce7/0xf00
> [  124.834683][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24
> [  124.839523][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286
> [  124.841496][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc
> [  124.843673][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec
> [  124.846399][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f
> [  124.848563][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac
> [  124.850559][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000
> [  124.852875][ T4170] FS:  00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000
> [  124.855270][ T4170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  124.857106][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0
> [  124.859118][ T4170] Call Trace:
> [  124.860123][ T4170]  ? wdm_probe+0x2b0/0x2b0
> [  124.861567][ T4170]  ? remove_wait_queue+0x190/0x190
> [  124.862896][ T4170]  ? wdm_probe+0x2b0/0x2b0
> [  124.864214][ T4170]  __vfs_write+0x76/0x100
> [  124.865314][ T4170]  vfs_write+0x185/0x510
> [  124.866418][ T4170]  ksys_write+0xff/0x200
> [  124.867653][ T4170]  ? __ia32_sys_read+0xb0/0xb0
> [  124.868851][ T4170]  ? trace_hardirqs_off_caller+0x22/0x1c0
> [  124.870283][ T4170]  do_syscall_64+0x9b/0x510
> [  124.871575][ T4170]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [  124.873352][ T4170] RIP: 0033:0x7fbf55cc2469
> [  124.874481][ T4170] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
> [  124.879954][ T4170] RSP: 002b:00007ffdc294dd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  124.882090][ T4170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf55cc2469
> [  124.884573][ T4170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> [  124.887178][ T4170] RBP: 0000000000000000 R08: 000000000000000f R09: 00007ffdc294dd8e
> [  124.889941][ T4170] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> [  124.892208][ T4170] R13: 00000000000193a4 R14: 0000000000000000 R15: 0000000000000000
> [  124.894103][ T4170] Modules linked in:
> [  124.895465][ T4170] ---[ end trace 2aea2ce7d52c22e0 ]---
> [  124.897307][ T4170] RIP: 0010:wdm_write+0xce7/0xf00
> [  124.898975][ T4170] Code: 8b 7c 24 20 e8 ca 16 e5 fd 48 8b 54 24 18 e9 49 fa ff ff 89 34 24 e8 a8 ca be fd 8b 34 24 48 c7 c7 40 0f 01 86 e8 b2 2e aa fd <0f> 0b 48 89 ef e8 1f 16 e5 fd e9 ba f5 ff ff 48 89 cf 48 89 54 24
> [  124.905012][ T4170] RSP: 0018:ffff8881e5ff7d30 EFLAGS: 00010286
> [  124.906793][ T4170] RAX: 0000000000000014 RBX: ffff8881ee647800 RCX: ffffffff81d9c9dc
> [  124.909151][ T4170] RDX: 0000000000000000 RSI: ffffffff81283169 RDI: ffff8881f6ce8dec
> [  124.911276][ T4170] RBP: ffff8881ee647890 R08: ffffed103ed9e292 R09: ffff8881f6cf148f
> [  124.913784][ T4170] R10: ffffed103dcc8f13 R11: ffffed103ed9e292 R12: 1ffff1103cbfefac
> [  124.916882][ T4170] R13: 0000000000000010 R14: ffff8881e7fb5fac R15: 0000000000000000
> [  124.919078][ T4170] FS:  00007fbf56199440(0000) GS:ffff8881f6cc0000(0000) knlGS:0000000000000000
> [  124.921484][ T4170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  124.923337][ T4170] CR2: 00007fbf55bef225 CR3: 00000001e61cd002 CR4: 00000000003606e0
> [  124.925606][ T4170] cdc_wdm 5-1:118.0: Error in flush path: -32
>
Tetsuo Handa June 23, 2020, 11:20 a.m. UTC | #27
On 2020/06/19 22:56, Andrey Konovalov wrote:
> Oliver, any chance you could help us with fixing the hang in this
> driver? You seem to be its original author. This hang is one of the
> top crashers on syzbot, with over 32000 crashed kernels.
> 

Yes, I think that wdm_flush() has another bug and wdm_write() has yet another bug.
I need the authors' comments.

wdm_flush() says

	/* cannot dereference desc->intf if WDM_DISCONNECTING */
	if (test_bit(WDM_DISCONNECTING, &desc->flags))
		return -ENODEV;
	if (desc->werr < 0)
		dev_err(&desc->intf->dev, "Error in flush path: %d\n",
			desc->werr);

but it seems to me that nothing guarantees that test_bit(WDM_DISCONNECTING) == false
indicates dereferencing desc->intf->dev is safe, for wdm_flush() tests WDM_DISCONNECTING
without any lock whereas wdm_disconnect() sets WDM_DISCONNECTING under wdm_mutex and
desc->iuspin held. It might be safe to dereference from wdm_release() which holds wdm_mutex.

Also, if wait_event() in wdm_flush() might fail to wake up (due to close() dependency
problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might
also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we
wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ?

I feel that wdm_flush() is so bogus (which could/should be removed).
Oliver Neukum June 24, 2020, 11:57 a.m. UTC | #28
Am Samstag, den 30.05.2020, 13:58 +0900 schrieb Tetsuo Handa:

Hi,

sorry for taking this long. There has been a family emergency.

> > The kernel most definitely does need to protect itself against 
> > misbehaving hardware.  Let's just leave it at that.  If you don't 
> > believe me, ask Greg KH.
> 
> I've made many locations killable (in order to reduce damage caused by OOM
> condition). But I can't make locations killable where handling SIGKILL case is
> too difficult to implement.

We can make flush interruptible. But that will not do the job. We would
get a file that cannot be closed.

> "struct file_operations"->flush() is called from filp_close() when there is
> something which has to be done before "struct file_operations"->release() is
> called.

Yes, in particular error reporting. Without flush() there is no way to
know whether the last write() has actually worked.

> As far as I read this thread, what you are trying to do sounds like allow
> "not waiting for completion of wdm_out_callback()" with only
> 's/wait_event/wait_event_intrruptible/' in wdm_flush(). Then, please do remove
> wdm_flush() call itself.

That would break error reporting. That flush() waits for IO to complete
is basically a side effect. You can know whether IO has worked after it
is finished.

> Therefore, again, please show me as a patch first.

Sure, attached. The difficulty here is that I see three possible
interacting errors, two of which are races.

	Regards
		Oliver
Tetsuo Handa June 24, 2020, 12:48 p.m. UTC | #29
On 2020/06/24 20:57, Oliver Neukum wrote:
> Am Samstag, den 30.05.2020, 13:58 +0900 schrieb Tetsuo Handa:
> 
> Hi,
> 
> sorry for taking this long. There has been a family emergency.

No problem. Thank you for responding.

>> Therefore, again, please show me as a patch first.
> 
> Sure, attached. The difficulty here is that I see three possible
> interacting errors, two of which are races.

This thread grew long, and you have many posts to read up to
https://lkml.kernel.org/r/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp .
Oliver Neukum June 25, 2020, 9:56 a.m. UTC | #30
Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa:

Hi,

sorry for being late in reply. I have had an emergency to take
care of.

> On 2020/05/31 0:47, Alan Stern wrote:
> > On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote:
> > > Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern:

> > > > This sounds like a bug in the driver.  What would it do if someone had a 
> > > 
> > > Arguably yes. I will introduce a timeout. Unfortunately flush()
> > > requires a non-interruptible sleep, as you cannot sanely return EAGAIN.
> > 
> > But maybe you can kill some URBs and drop some data.
> 
> You mean call usb_kill_urb() via kill_urbs() ?

I have to correct myself. We can return -EINTR.
But that is no solution ultimately. We could not close the fd,
though we would not hang.

> As far as I tested, it seems that usb_kill_urb() sometimes fails to call
> wdm_out_callback() despite the comment for usb_kill_urb() says
> 
>  * This routine cancels an in-progress request.  It is guaranteed that
>  * upon return all completion handlers will have finished and the URB
>  * will be totally idle and available for reuse.  These features make
>  * this an ideal way to stop I/O in a disconnect() callback or close()
>  * function.  If the request has not already finished or been unlinked
>  * the completion handler will see urb->status == -ENOENT.

It looks like it does exactly as the description says. Cancelling
an URB is by necessity a race condition. It can always finish
before you can kill it.

> . Is something still wrong? Or just replacing
> 
> 		BUG_ON(test_bit(WDM_IN_USE, &desc->flags) &&
> 		       !test_bit(WDM_DISCONNECTING, &desc->flags));
> 
> with
> 
> 		wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
> 			   test_bit(WDM_DISCONNECTING, &desc->flags));
> 
> in the patch shown below is sufficient?
> 
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index e3db6fbeadef..3e92e79ce0a0 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb)
>  	kfree(desc->outbuf);
>  	desc->outbuf = NULL;
>  	clear_bit(WDM_IN_USE, &desc->flags);
> -	wake_up(&desc->wait);
> +	wake_up_all(&desc->wait);
>  }
>  
>  static void wdm_in_callback(struct urb *urb)
> @@ -424,6 +424,7 @@ static ssize_t wdm_write
>  	if (rv < 0) {
>  		desc->outbuf = NULL;
>  		clear_bit(WDM_IN_USE, &desc->flags);
> +		wake_up_all(&desc->wait);
>  		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
>  		rv = usb_translate_errors(rv);
>  		goto out_free_mem_pm;
> @@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id)
>  {
>  	struct wdm_device *desc = file->private_data;
>  
> -	wait_event(desc->wait,
> -			/*
> -			 * needs both flags. We cannot do with one
> -			 * because resetting it would cause a race
> -			 * with write() yet we need to signal
> -			 * a disconnect
> -			 */
> -			!test_bit(WDM_IN_USE, &desc->flags) ||
> -			test_bit(WDM_DISCONNECTING, &desc->flags));
> +	/*
> +	 * needs both flags. We cannot do with one because resetting it would
> +	 * cause a race with write() yet we need to signal a disconnect
> +	 */
> +	if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) ||
> +				test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) {
> +		kill_urbs(desc);

No. We cannot just kill all URBs just because one fd's owner wants to
flush.

In fact we have multiple code paths that can reach the same hang.
Could you test the attached patches?

	Regards
		Oliver
Tetsuo Handa June 25, 2020, 11:15 a.m. UTC | #31
On 2020/06/25 18:56, Oliver Neukum wrote:
> I have to correct myself. We can return -EINTR.
> But that is no solution ultimately. We could not close the fd,
> though we would not hang.

Returning -EINTR upon close() is legal, but I think we should avoid it
because it is not safe for userspace process to retry close() upon -EINTR.

> In fact we have multiple code paths that can reach the same hang.
> Could you test the attached patches?

Have you already read 
https://lore.kernel.org/linux-usb/254939d4-f3a1-8c7e-94e5-9862c02774fa@i-love.sakura.ne.jp/ and
https://lore.kernel.org/linux-usb/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp/ ?
Oliver Neukum July 1, 2020, 7:08 a.m. UTC | #32
Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa:

I am having mail troubles. This is a test, as mail seems not to
be delivered in this thread specifically.

	Sorry
		Oliver
Tetsuo Handa July 2, 2020, 5:44 a.m. UTC | #33
On 2020/06/23 20:20, Tetsuo Handa wrote:
> Also, if wait_event() in wdm_flush() might fail to wake up (due to close() dependency
> problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might
> also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we
> wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ?
> 
> I feel that wdm_flush() is so bogus (which could/should be removed).
> 

I'd like to simplify like below.

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index e3db6fbeadef..9631d1054799 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb)
 	kfree(desc->outbuf);
 	desc->outbuf = NULL;
 	clear_bit(WDM_IN_USE, &desc->flags);
-	wake_up(&desc->wait);
+	wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -426,6 +426,7 @@ static ssize_t wdm_write
 		clear_bit(WDM_IN_USE, &desc->flags);
 		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
 		rv = usb_translate_errors(rv);
+		wake_up_all(&desc->wait);
 		goto out_free_mem_pm;
 	} else {
 		dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n",
@@ -434,6 +435,24 @@ static ssize_t wdm_write
 
 	usb_autopm_put_interface(desc->intf);
 	mutex_unlock(&desc->wlock);
+	if (rv >= 0 &&
+	    /*
+	     * needs both flags. We cannot do with one
+	     * because resetting it would cause a race
+	     * with write() yet we need to signal
+	     * a disconnect
+	     */
+	    wait_event_killable_timeout(desc->wait,
+					!test_bit(WDM_IN_USE, &desc->flags) ||
+					test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) {
+		if (mutex_lock_killable(&desc->wlock) == 0) {
+			if (!test_bit(WDM_DISCONNECTING, &desc->flags))
+				dev_err(&desc->intf->dev,
+					"Tx URB not responding index=%d\n",
+					le16_to_cpu(req->wIndex));
+			mutex_unlock(&desc->wlock);
+		}
+	}
 	return count;
 
 out_free_mem_pm:
@@ -583,30 +602,6 @@ static ssize_t wdm_read
 	return rv;
 }
 
-static int wdm_flush(struct file *file, fl_owner_t id)
-{
-	struct wdm_device *desc = file->private_data;
-
-	wait_event(desc->wait,
-			/*
-			 * needs both flags. We cannot do with one
-			 * because resetting it would cause a race
-			 * with write() yet we need to signal
-			 * a disconnect
-			 */
-			!test_bit(WDM_IN_USE, &desc->flags) ||
-			test_bit(WDM_DISCONNECTING, &desc->flags));
-
-	/* cannot dereference desc->intf if WDM_DISCONNECTING */
-	if (test_bit(WDM_DISCONNECTING, &desc->flags))
-		return -ENODEV;
-	if (desc->werr < 0)
-		dev_err(&desc->intf->dev, "Error in flush path: %d\n",
-			desc->werr);
-
-	return usb_translate_errors(desc->werr);
-}
-
 static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait)
 {
 	struct wdm_device *desc = file->private_data;
@@ -730,7 +725,6 @@ static const struct file_operations wdm_fops = {
 	.read =		wdm_read,
 	.write =	wdm_write,
 	.open =		wdm_open,
-	.flush =	wdm_flush,
 	.release =	wdm_release,
 	.poll =		wdm_poll,
 	.unlocked_ioctl = wdm_ioctl,
Oliver Neukum July 2, 2020, 7:24 a.m. UTC | #34
Am Donnerstag, den 02.07.2020, 14:44 +0900 schrieb Tetsuo Handa:

>  
>  	usb_autopm_put_interface(desc->intf)
>  	mutex_unlock(&desc->wlock);
> +	if (rv >= 0 &&
> +	    /*
> +	     * needs both flags. We cannot do with one
> +	     * because resetting it would cause a race
> +	     * with write() yet we need to signal
> +	     * a disconnect
> +	     */
> +	    wait_event_killable_timeout(desc->wait,
> +					!test_bit(WDM_IN_USE, &desc->flags) ||
> +					test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) {
> +		if (mutex_lock_killable(&desc->wlock) == 0) {
> +			if (!test_bit(WDM_DISCONNECTING, &desc->flags))
> +				dev_err(&desc->intf->dev,
> +					"Tx URB not responding index=%d\n",
> +					le16_to_cpu(req->wIndex));
> +			mutex_unlock(&desc->wlock);
> +		}
> +	}

Hi,

I am afraid this would

1. serialize the driver, harming performance
2. introduce a race with every timer a task is running

	Regards
		Oliver
Tetsuo Handa July 15, 2020, 6:15 a.m. UTC | #35
On 2020/07/02 16:24, Oliver Neukum wrote:
> Am Donnerstag, den 02.07.2020, 14:44 +0900 schrieb Tetsuo Handa:
> 
>>  
>>  	usb_autopm_put_interface(desc->intf)
>>  	mutex_unlock(&desc->wlock);
>> +	if (rv >= 0 &&
>> +	    /*
>> +	     * needs both flags. We cannot do with one
>> +	     * because resetting it would cause a race
>> +	     * with write() yet we need to signal
>> +	     * a disconnect
>> +	     */
>> +	    wait_event_killable_timeout(desc->wait,
>> +					!test_bit(WDM_IN_USE, &desc->flags) ||
>> +					test_bit(WDM_DISCONNECTING, &desc->flags), 30 * HZ) == 0) {
>> +		if (mutex_lock_killable(&desc->wlock) == 0) {
>> +			if (!test_bit(WDM_DISCONNECTING, &desc->flags))
>> +				dev_err(&desc->intf->dev,
>> +					"Tx URB not responding index=%d\n",
>> +					le16_to_cpu(req->wIndex));
>> +			mutex_unlock(&desc->wlock);
>> +		}
>> +	}
> 
> Hi,
> 
> I am afraid this would
> 
> 1. serialize the driver, harming performance

Is wdm_write() called so frequently (e.g. 1000 times per one second) ?

> 2. introduce a race with every timer a task is running

What is estimated response time from usb_submit_urb() to wdm_out_callback() ?
Can it be many seconds?

I didn't try your patches at https://lkml.kernel.org/r/1593078968.28236.15.camel@suse.com
because it seems to me that your patch does not answer my 3 concerns:

(1) wdm_flush() says
    
            /* cannot dereference desc->intf if WDM_DISCONNECTING */
            if (test_bit(WDM_DISCONNECTING, &desc->flags))
                    return -ENODEV;
            if (desc->werr < 0)
                    dev_err(&desc->intf->dev, "Error in flush path: %d\n",
                            desc->werr);

    but it seems to me that nothing guarantees that test_bit(WDM_DISCONNECTING) == false
    indicates dereferencing desc->intf->dev is safe, for wdm_flush() tests WDM_DISCONNECTING
    without any lock whereas wdm_disconnect() sets WDM_DISCONNECTING under wdm_mutex and
    desc->iuspin held. It might be safe to dereference from wdm_release() which holds wdm_mutex.

(2) If wait_event() in wdm_flush() might fail to wake up (due to close() dependency
    problem this crash report is focusing on), wait_event_interruptible() in wdm_write() might
    also fail to wake up (unless interrupted) due to the same dependency. Then, why can't we
    wait for completion of wdm_out_callback() (with reasonable timeout) inside wdm_write() ?

(3) While wdm_flush() waits for clearing of WDM_IN_USE using wait_event(), concurrently
    executed wdm_write() also waits for clearing of WDM_IN_USE using wait_event_interruptible(),
    and wdm_write() can immediately set WDM_IN_USE again as soon as returning from
    wait_event_interruptible() even if somebody was already waiting at wdm_flush() to clear
    WDM_IN_USE.

    That is, wait_event() in wdm_flush() does not know whether there is usb_submit_urb()
    request which is started after wait_event() found that WDM_IN_USE was cleared. Then,
    why does this wait_event() in wdm_flush() want to flush which current thread might not
    have issued?

Current thread synchronously waits for completion of wdm_out_callback() issued by current
thread's usb_submit_urb() request might make sense. But how much value is there for current
thread waits for completion of wdm_out_callback() issued by other thread's usb_submit_urb()
request? Multiple threads can use the same "desc" pointer, and trying to flush upon close()
by each thread using that "desc" pointer...

If synchronous waiting harms performance so much, do we want to know the error at all?
wdm_write() already returned success (the number of bytes passed to write()). And there is
no guarantee that the error code which current thread will receive from wmd_flush()
corresponds with a request current thread has issued?

I'm skeptical about the value of trying to synchronously return an error code for
wmd_write() request to the caller. I'm really inclined to remove wdm_flush() completely.
Tetsuo Handa Aug. 10, 2020, 10:47 a.m. UTC | #36
On 2020/07/15 15:15, Tetsuo Handa wrote:
> I'm skeptical about the value of trying to synchronously return an error code for
> wmd_write() request to the caller. I'm really inclined to remove wdm_flush() completely.
> 

No response. It seems to me that this module is orphaned.

Can we go with simply removing wdm_flush() completely?
diff mbox series

Patch

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index e3db6fbeadef..bf2a6deb7b08 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -151,7 +151,7 @@  static void wdm_out_callback(struct urb *urb)
 	kfree(desc->outbuf);
 	desc->outbuf = NULL;
 	clear_bit(WDM_IN_USE, &desc->flags);
-	wake_up(&desc->wait);
+	wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -426,6 +426,7 @@  static ssize_t wdm_write
 		clear_bit(WDM_IN_USE, &desc->flags);
 		dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
 		rv = usb_translate_errors(rv);
+		wake_up_all(&desc->wait);
 		goto out_free_mem_pm;
 	} else {
 		dev_dbg(&desc->intf->dev, "Tx URB has been submitted index=%d\n",