diff mbox series

[RFC] fixes for hangs and error reporting in CDC_WDM

Message ID 20200922112126.16919-1-oneukum@suse.com (mailing list archive)
State New, archived
Headers show
Series [RFC] fixes for hangs and error reporting in CDC_WDM | expand

Commit Message

Oliver Neukum Sept. 22, 2020, 11:21 a.m. UTC
Stress testing has shown that CDC-WDM has some issues with hangs
and error reporting

1. wakeups are not correctly handled in multhreaded environments
2. unresponsive hardware is not handled
3. errors are not correctly reported. This needs flush() to be
implemented.

This version makes wdm_flush() use interruptible sleep.

For easier review all squashed together:

Comments

Greg KH Sept. 25, 2020, 3:11 p.m. UTC | #1
On Tue, Sep 22, 2020 at 01:21:19PM +0200, Oliver Neukum wrote:
> Stress testing has shown that CDC-WDM has some issues with hangs
> and error reporting
> 
> 1. wakeups are not correctly handled in multhreaded environments
> 2. unresponsive hardware is not handled
> 3. errors are not correctly reported. This needs flush() to be
> implemented.
> 
> This version makes wdm_flush() use interruptible sleep.
> 
> For easier review all squashed together:
> 

I have like 3 or 4 different "RFC" series here from you for this driver,
which one is the "newest"?

And can you send a series that isn't RFC so that I can know you feel it
is good enough to be merged?

thanks,

greg k-h
Tetsuo Handa Sept. 25, 2020, 3:20 p.m. UTC | #2
On 2020/09/26 0:11, Greg KH wrote:
> On Tue, Sep 22, 2020 at 01:21:19PM +0200, Oliver Neukum wrote:
>> Stress testing has shown that CDC-WDM has some issues with hangs
>> and error reporting
>>
>> 1. wakeups are not correctly handled in multhreaded environments
>> 2. unresponsive hardware is not handled
>> 3. errors are not correctly reported. This needs flush() to be
>> implemented.
>>
>> This version makes wdm_flush() use interruptible sleep.
>>
>> For easier review all squashed together:
>>
> 
> I have like 3 or 4 different "RFC" series here from you for this driver,
> which one is the "newest"?

https://lkml.kernel.org/r/20200923092136.14824-1-oneukum@suse.com

is the newest series from Oliver. But

https://lkml.kernel.org/r/b27841ab-a88c-13e2-a66f-6df7af1f46b4@i-love.sakura.ne.jp

is the squashed version with updated comments and deduplicated code.

> 
> And can you send a series that isn't RFC so that I can know you feel it
> is good enough to be merged?

Do you want this fix as a series of patches (the former link)?
Since I think that the changeset should be atomically applied, I posted the latter link.
Greg KH Sept. 25, 2020, 3:28 p.m. UTC | #3
On Sat, Sep 26, 2020 at 12:20:57AM +0900, Tetsuo Handa wrote:
> On 2020/09/26 0:11, Greg KH wrote:
> > On Tue, Sep 22, 2020 at 01:21:19PM +0200, Oliver Neukum wrote:
> >> Stress testing has shown that CDC-WDM has some issues with hangs
> >> and error reporting
> >>
> >> 1. wakeups are not correctly handled in multhreaded environments
> >> 2. unresponsive hardware is not handled
> >> 3. errors are not correctly reported. This needs flush() to be
> >> implemented.
> >>
> >> This version makes wdm_flush() use interruptible sleep.
> >>
> >> For easier review all squashed together:
> >>
> > 
> > I have like 3 or 4 different "RFC" series here from you for this driver,
> > which one is the "newest"?
> 
> https://lkml.kernel.org/r/20200923092136.14824-1-oneukum@suse.com
> 
> is the newest series from Oliver. But
> 
> https://lkml.kernel.org/r/b27841ab-a88c-13e2-a66f-6df7af1f46b4@i-love.sakura.ne.jp
> 
> is the squashed version with updated comments and deduplicated code.
> 
> > 
> > And can you send a series that isn't RFC so that I can know you feel it
> > is good enough to be merged?
> 
> Do you want this fix as a series of patches (the former link)?
> Since I think that the changeset should be atomically applied, I posted the latter link.

A single patch would be good to send to me again, burried at the end of
a long thread is hard to dig out.

Also with proper authorship is needed, did Oliver write this, or did
you?

There is the co-developed-by: tag, which looks like it might be relevant
here, can you do that?

thanks,

greg k-h
Tetsuo Handa Sept. 25, 2020, 3:34 p.m. UTC | #4
On 2020/09/26 0:28, Greg KH wrote:
>> Do you want this fix as a series of patches (the former link)?
>> Since I think that the changeset should be atomically applied, I posted the latter link.
> 
> A single patch would be good to send to me again, burried at the end of
> a long thread is hard to dig out.
> 
> Also with proper authorship is needed, did Oliver write this, or did
> you?

Oliver wrote the majority part. I just squashed the series and updated comments
and deduplicated the code. Thus, I think main authorship should be given to Oliver.

> 
> There is the co-developed-by: tag, which looks like it might be relevant
> here, can you do that?

If you prefer the co-developed-by: tag, you can add:

Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Oliver, any comments?
Greg KH Sept. 26, 2020, 5:40 a.m. UTC | #5
On Sat, Sep 26, 2020 at 12:34:24AM +0900, Tetsuo Handa wrote:
> On 2020/09/26 0:28, Greg KH wrote:
> >> Do you want this fix as a series of patches (the former link)?
> >> Since I think that the changeset should be atomically applied, I posted the latter link.
> > 
> > A single patch would be good to send to me again, burried at the end of
> > a long thread is hard to dig out.
> > 
> > Also with proper authorship is needed, did Oliver write this, or did
> > you?
> 
> Oliver wrote the majority part. I just squashed the series and updated comments
> and deduplicated the code. Thus, I think main authorship should be given to Oliver.
> 
> > 
> > There is the co-developed-by: tag, which looks like it might be relevant
> > here, can you do that?
> 
> If you prefer the co-developed-by: tag, you can add:
> 
> Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

That seems reasonable, then put Oliver's address on the first line with
a "From:" line to make this all work correctly when I apply it.

thanks,

greg k-h
Oliver Neukum Sept. 29, 2020, 8:46 a.m. UTC | #6
Am Samstag, den 26.09.2020, 07:40 +0200 schrieb Greg KH:
> On Sat, Sep 26, 2020 at 12:34:24AM +0900, Tetsuo Handa wrote:
> > On 2020/09/26 0:28, Greg KH wrote:
> > > > Do you want this fix as a series of patches (the former link)?
> > > > Since I think that the changeset should be atomically applied, I posted the latter link.
> > > 
> > > A single patch would be good to send to me again, burried at the end of
> > > a long thread is hard to dig out.
> > > 
> > > Also with proper authorship is needed, did Oliver write this, or did
> > > you?
> > 
> > Oliver wrote the majority part. I just squashed the series and updated comments
> > and deduplicated the code. Thus, I think main authorship should be given to Oliver.
> > 
> > > There is the co-developed-by: tag, which looks like it might be relevant
> > > here, can you do that?
> > 
> > If you prefer the co-developed-by: tag, you can add:
> > 
> > Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> That seems reasonable, then put Oliver's address on the first line with
> a "From:" line to make this all work correctly when I apply it.

Hi,

I did sent out the series with a PATCH label last week.
Should I resend?

	Regards
		Oliver
Tetsuo Handa Sept. 29, 2020, 10:17 a.m. UTC | #7
On 2020/09/29 17:46, Oliver Neukum wrote:
>> That seems reasonable, then put Oliver's address on the first line with
>> a "From:" line to make this all work correctly when I apply it.
> 
> Hi,
> 
> I did sent out the series with a PATCH label last week.
> Should I resend?
> 

No need to resend. I already sent a squashed version as
https://lkml.kernel.org/r/20200928141755.3476-1-penguin-kernel@I-love.SAKURA.ne.jp .
diff mbox series

Patch

--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -58,6 +58,9 @@  MODULE_DEVICE_TABLE (usb, wdm_ids);
 
 #define WDM_MAX                        16
 
+/* flush() is uninterruptible, but we cannot wait forever */
+#define WDM_FLUSH_TIMEOUT      (30 * HZ)
+
 /* CDC-WMC r1.1 requires wMaxCommand to be "at least 256 decimal (0x100)" */
 #define WDM_DEFAULT_BUFSIZE    256
 
@@ -151,7 +154,7 @@  static void wdm_out_callback(struct urb *urb)
        kfree(desc->outbuf);
        desc->outbuf = NULL;
        clear_bit(WDM_IN_USE, &desc->flags);
-       wake_up(&desc->wait);
+       wake_up_all(&desc->wait);
 }
 
 static void wdm_in_callback(struct urb *urb)
@@ -393,6 +396,9 @@  static ssize_t wdm_write
        if (test_bit(WDM_RESETTING, &desc->flags))
                r = -EIO;
 
+       if (test_bit(WDM_DISCONNECTING, &desc->flags))
+               r = -ENODEV;
+
        if (r < 0) {
                rv = r;
                goto out_free_mem_pm;
@@ -424,6 +430,7 @@  static ssize_t wdm_write
        if (rv < 0) {
                desc->outbuf = NULL;
                clear_bit(WDM_IN_USE, &desc->flags);
+               wake_up_all(&desc->wait); /* for flush() */
                dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv);
                rv = usb_translate_errors(rv);
                goto out_free_mem_pm;
@@ -583,11 +590,39 @@  static ssize_t wdm_read
        return rv;
 }
 
+/*
+ * The difference to flush is that we wait forever. If you don't like
+ * that behavior, you need to send a signal.
+ */
+
+static int wdm_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+       struct wdm_device *desc = file->private_data;
+       int rv;
+
+       rv = wait_event_interruptible(desc->wait,
+                       !test_bit(WDM_IN_USE, &desc->flags) ||
+                       test_bit(WDM_DISCONNECTING, &desc->flags));
+
+       if (test_bit(WDM_DISCONNECTING, &desc->flags))
+               return -ENODEV;
+       if (rv < 0)
+               return -EINTR;
+
+       spin_lock_irq(&desc->iuspin);
+       rv = desc->werr;
+       desc->werr = 0;
+       spin_unlock_irq(&desc->iuspin);
+
+       return usb_translate_errors(rv);
+}
+
 static int wdm_flush(struct file *file, fl_owner_t id)
 {
        struct wdm_device *desc = file->private_data;
+       int rv;
 
-       wait_event(desc->wait,
+       rv = wait_event_interruptible_timeout(desc->wait,
                        /*
                         * needs both flags. We cannot do with one
                         * because resetting it would cause a race
@@ -595,16 +630,27 @@  static int wdm_flush(struct file *file, fl_owner_t id)
                         * a disconnect
                         */
                        !test_bit(WDM_IN_USE, &desc->flags) ||
-                       test_bit(WDM_DISCONNECTING, &desc->flags));
+                       test_bit(WDM_DISCONNECTING, &desc->flags),
+                       WDM_FLUSH_TIMEOUT);
 
-       /* cannot dereference desc->intf if WDM_DISCONNECTING */
+       /*
+        * to report the correct error.
+        * This is best effort
+        * We are inevitably racing with the hardware.
+        */
        if (test_bit(WDM_DISCONNECTING, &desc->flags))
                return -ENODEV;
-       if (desc->werr < 0)
-               dev_err(&desc->intf->dev, "Error in flush path: %d\n",
-                       desc->werr);
+       if (!rv)
+               return -EIO;
+       if (rv < 0)
+               return -EINTR;
+
+       spin_lock_irq(&desc->iuspin);
+       rv = desc->werr;
+       desc->werr = 0;
+       spin_unlock_irq(&desc->iuspin);
 
-       return usb_translate_errors(desc->werr);
+       return usb_translate_errors(rv);
 }
 
 static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait)
@@ -729,6 +775,7 @@  static const struct file_operations wdm_fops = {
        .owner =        THIS_MODULE,
        .read =         wdm_read,
        .write =        wdm_write,
+       .fsync =        wdm_fsync,
        .open =         wdm_open,
        .flush =        wdm_flush,
        .release =      wdm_release,