diff mbox series

USB: usbtmc: Fix RCU stall warning

Message ID 20210629033236.7107-1-qiang.zhang@windriver.com (mailing list archive)
State Superseded
Headers show
Series USB: usbtmc: Fix RCU stall warning | expand

Commit Message

Zhang, Qiang June 29, 2021, 3:32 a.m. UTC
From: Zqiang <qiang.zhang@windriver.com>

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
        softirq=25390/25392 fqs=3
        (t=12164 jiffies g=31645 q=43226)
rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
     RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
        OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task

In the case of system use dummy_hcd as usb controller, when the
usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
status is unknown, the urb will be resubmit, the urb may be insert
to dum_hcd->urbp_list again, this will cause the dummy_timer() not
to exit for a long time, beacause the dummy_timer() be called in
softirq and local_bh is disable, this not only causes the RCU reading
critical area to consume too much time but also makes the tasks in
the current CPU runq not run in time, and that triggered RCU stall.

return directly when find the urb status is not zero to fix it.

Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
---
 drivers/usb/class/usbtmc.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

Comments

Greg KH July 7, 2021, 5:58 a.m. UTC | #1
On Wed, Jul 07, 2021 at 01:29:12AM +0000, Zhang, Qiang wrote:
> 
> 
> ________________________________________
> From: Zhang, Qiang <qiang.zhang@windriver.com>
> Sent: Tuesday, 29 June 2021 11:32
> To: gregkh@linuxfoundation.org; stern@rowland.harvard.edu; dvyukov@google.com
> Cc: paulmck@kernel.org; dpenkler@gmail.com; guido.kiener@rohde-schwarz.com; linux-usb@vger.kernel.org
> 
> Hello Greg KH
> 
> Have you reviewed this change? 

Nope!

You should have got a copy of my "it's the merge window, relax!"
response from my bot, for this patch.  If not, here it is below:

-----------------

Hi,

This is the friendly semi-automated patch-bot of Greg Kroah-Hartman.
You have sent him a patch that has triggered this response.

Right now, the development tree you have sent a patch for is "closed"
due to the timing of the merge window.  Don't worry, the patch(es) you
have sent are not lost, and will be looked at after the merge window is
over (after the -rc1 kernel is released by Linus).

So thank you for your patience and your patches will be reviewed at this
later time, you do not have to do anything further, this is just a short
note to let you know the patch status and so you don't worry they didn't
make it through.

thanks,

greg k-h's patch email bot
Greg KH July 21, 2021, 7:08 a.m. UTC | #2
On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> From: Zqiang <qiang.zhang@windriver.com>

I need a "full" name here, and in the signed-off-by line please.

> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
>         softirq=25390/25392 fqs=3
>         (t=12164 jiffies g=31645 q=43226)
> rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
>      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
>         OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task
> 
> In the case of system use dummy_hcd as usb controller, when the
> usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> status is unknown, the urb will be resubmit, the urb may be insert
> to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> to exit for a long time, beacause the dummy_timer() be called in
> softirq and local_bh is disable, this not only causes the RCU reading
> critical area to consume too much time but also makes the tasks in
> the current CPU runq not run in time, and that triggered RCU stall.
> 
> return directly when find the urb status is not zero to fix it.
> 
> Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> Signed-off-by: Zqiang <qiang.zhang@windriver.com>

What commit does this fix?  Does it need to go to stable kernels?

What about the usbtmc maintainers, what do they think about this?

thanks,

greg k-h
dave penkler July 21, 2021, 7:41 a.m. UTC | #3
On Wed, 21 Jul 2021 at 09:08, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> > From: Zqiang <qiang.zhang@windriver.com>
>
> I need a "full" name here, and in the signed-off-by line please.
>
> >
> > rcu: INFO: rcu_preempt self-detected stall on CPU
> > rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
> >         softirq=25390/25392 fqs=3
> >         (t=12164 jiffies g=31645 q=43226)
> > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
> >      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
> >         OOM is now expected behavior.
> > rcu: RCU grace-period kthread stack dump:
> > task:rcu_preempt     state:R  running task
> >
> > In the case of system use dummy_hcd as usb controller, when the
> > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> > status is unknown, the urb will be resubmit, the urb may be insert
> > to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> > to exit for a long time, beacause the dummy_timer() be called in
> > softirq and local_bh is disable, this not only causes the RCU reading
> > critical area to consume too much time but also makes the tasks in
> > the current CPU runq not run in time, and that triggered RCU stall.
> >
> > return directly when find the urb status is not zero to fix it.
> >
> > Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
>
> What commit does this fix?  Does it need to go to stable kernels?
>
> What about the usbtmc maintainers, what do they think about this?

This patch makes the babbling endpoint retry/recovery code in the real
world usb host controller drivers redundant and would prevent usbtmc
applications from benefiting from it.

>
> thanks,
>
> greg k-h
Greg KH July 21, 2021, 7:52 a.m. UTC | #4
On Wed, Jul 21, 2021 at 09:41:15AM +0200, dave penkler wrote:
> On Wed, 21 Jul 2021 at 09:08, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> > > From: Zqiang <qiang.zhang@windriver.com>
> >
> > I need a "full" name here, and in the signed-off-by line please.
> >
> > >
> > > rcu: INFO: rcu_preempt self-detected stall on CPU
> > > rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
> > >         softirq=25390/25392 fqs=3
> > >         (t=12164 jiffies g=31645 q=43226)
> > > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
> > >      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > > rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
> > >         OOM is now expected behavior.
> > > rcu: RCU grace-period kthread stack dump:
> > > task:rcu_preempt     state:R  running task
> > >
> > > In the case of system use dummy_hcd as usb controller, when the
> > > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> > > status is unknown, the urb will be resubmit, the urb may be insert
> > > to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> > > to exit for a long time, beacause the dummy_timer() be called in
> > > softirq and local_bh is disable, this not only causes the RCU reading
> > > critical area to consume too much time but also makes the tasks in
> > > the current CPU runq not run in time, and that triggered RCU stall.
> > >
> > > return directly when find the urb status is not zero to fix it.
> > >
> > > Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> >
> > What commit does this fix?  Does it need to go to stable kernels?
> >
> > What about the usbtmc maintainers, what do they think about this?
> 
> This patch makes the babbling endpoint retry/recovery code in the real
> world usb host controller drivers redundant and would prevent usbtmc
> applications from benefiting from it.

I do not understand, is this change ok or not?

Why do usbtmc applications need to know if babbling happens or not?

confused,

greg k-h
Greg KH July 21, 2021, 7:52 a.m. UTC | #5
On Wed, Jul 21, 2021 at 07:30:39AM +0000, Zhang, Qiang wrote:
> 
> 
> ________________________________________
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Wednesday, 21 July 2021 15:08
> To: Zhang, Qiang
> Cc: stern@rowland.harvard.edu; dvyukov@google.com; paulmck@kernel.org; dpenkler@gmail.com; guido.kiener@rohde-schwarz.com; linux-usb@vger.kernel.org
> Subject: Re: [PATCH] USB: usbtmc: Fix RCU stall warning
> 
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> > From: Zqiang <qiang.zhang@windriver.com>
> 
> >I need a "full" name here, and in the signed-off-by line please.
> 
> >
> > rcu: INFO: rcu_preempt self-detected stall on CPU
> > rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
> >         softirq=25390/25392 fqs=3
> >         (t=12164 jiffies g=31645 q=43226)
> > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
> >      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
> >         OOM is now expected behavior.
> > rcu: RCU grace-period kthread stack dump:
> > task:rcu_preempt     state:R  running task
> >
> > In the case of system use dummy_hcd as usb controller, when the
> > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> > status is unknown, the urb will be resubmit, the urb may be insert
> > to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> > to exit for a long time, beacause the dummy_timer() be called in
> > softirq and local_bh is disable, this not only causes the RCU reading
> > critical area to consume too much time but also makes the tasks in
> > the current CPU runq not run in time, and that triggered RCU stall.
> >
> > return directly when find the urb status is not zero to fix it.
> >
> > Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> 
> >What commit does this fix?  Does it need to go to stable kernels?
> 
>  I will add fix tags resend,   need to go to stable kernel
> 
> >
> >What about the usbtmc maintainers, what do they think about this?
> 
>  Alan Stern reviewed this change before.

I do not see that on this commit :(
dave penkler July 21, 2021, 9:44 a.m. UTC | #6
On Wed, 21 Jul 2021 at 09:52, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jul 21, 2021 at 09:41:15AM +0200, dave penkler wrote:
> > On Wed, 21 Jul 2021 at 09:08, Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> > > > From: Zqiang <qiang.zhang@windriver.com>
> > >
> > > I need a "full" name here, and in the signed-off-by line please.
> > >
> > > >
> > > > rcu: INFO: rcu_preempt self-detected stall on CPU
> > > > rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
> > > >         softirq=25390/25392 fqs=3
> > > >         (t=12164 jiffies g=31645 q=43226)
> > > > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
> > > >      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > > > rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
> > > >         OOM is now expected behavior.
> > > > rcu: RCU grace-period kthread stack dump:
> > > > task:rcu_preempt     state:R  running task
> > > >
> > > > In the case of system use dummy_hcd as usb controller, when the
> > > > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> > > > status is unknown, the urb will be resubmit, the urb may be insert
> > > > to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> > > > to exit for a long time, beacause the dummy_timer() be called in
> > > > softirq and local_bh is disable, this not only causes the RCU reading
> > > > critical area to consume too much time but also makes the tasks in
> > > > the current CPU runq not run in time, and that triggered RCU stall.
> > > >
> > > > return directly when find the urb status is not zero to fix it.
> > > >
> > > > Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > >
> > > What commit does this fix?  Does it need to go to stable kernels?
> > >
> > > What about the usbtmc maintainers, what do they think about this?
> >
> > This patch makes the babbling endpoint retry/recovery code in the real
> > world usb host controller drivers redundant and would prevent usbtmc
> > applications from benefiting from it.
>
> I do not understand, is this change ok or not?
>
> Why do usbtmc applications need to know if babbling happens or not?
>
> confused,
Sorry, the issue this patch is trying to fix occurs because the
current usbtmc driver resubmits the URB when it gets an EPROTO return.
The dummy usb host controller driver used in the syzbot tests keeps
returning the resubmitted URB with EPROTO causing a loop that starves
RCU. With an actual HCI driver it either recovers or returns an EPIPE.
In either case no loop occurs. So for my part as a user and maintainer
this patch is not ok.
>
> greg k-h
Greg KH July 21, 2021, 9:47 a.m. UTC | #7
On Wed, Jul 21, 2021 at 11:44:23AM +0200, dave penkler wrote:
> On Wed, 21 Jul 2021 at 09:52, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jul 21, 2021 at 09:41:15AM +0200, dave penkler wrote:
> > > On Wed, 21 Jul 2021 at 09:08, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@windriver.com wrote:
> > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > >
> > > > I need a "full" name here, and in the signed-off-by line please.
> > > >
> > > > >
> > > > > rcu: INFO: rcu_preempt self-detected stall on CPU
> > > > > rcu:    1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000
> > > > >         softirq=25390/25392 fqs=3
> > > > >         (t=12164 jiffies g=31645 q=43226)
> > > > > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0
> > > > >      RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > > > > rcu:    Unless rcu_preempt kthread gets sufficient CPU time,
> > > > >         OOM is now expected behavior.
> > > > > rcu: RCU grace-period kthread stack dump:
> > > > > task:rcu_preempt     state:R  running task
> > > > >
> > > > > In the case of system use dummy_hcd as usb controller, when the
> > > > > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb
> > > > > status is unknown, the urb will be resubmit, the urb may be insert
> > > > > to dum_hcd->urbp_list again, this will cause the dummy_timer() not
> > > > > to exit for a long time, beacause the dummy_timer() be called in
> > > > > softirq and local_bh is disable, this not only causes the RCU reading
> > > > > critical area to consume too much time but also makes the tasks in
> > > > > the current CPU runq not run in time, and that triggered RCU stall.
> > > > >
> > > > > return directly when find the urb status is not zero to fix it.
> > > > >
> > > > > Reported-by: syzbot+e2eae5639e7203360018@syzkaller.appspotmail.com
> > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > >
> > > > What commit does this fix?  Does it need to go to stable kernels?
> > > >
> > > > What about the usbtmc maintainers, what do they think about this?
> > >
> > > This patch makes the babbling endpoint retry/recovery code in the real
> > > world usb host controller drivers redundant and would prevent usbtmc
> > > applications from benefiting from it.
> >
> > I do not understand, is this change ok or not?
> >
> > Why do usbtmc applications need to know if babbling happens or not?
> >
> > confused,
> Sorry, the issue this patch is trying to fix occurs because the
> current usbtmc driver resubmits the URB when it gets an EPROTO return.
> The dummy usb host controller driver used in the syzbot tests keeps
> returning the resubmitted URB with EPROTO causing a loop that starves
> RCU. With an actual HCI driver it either recovers or returns an EPIPE.
> In either case no loop occurs. So for my part as a user and maintainer
> this patch is not ok.

Thanks for the review.

Zqiang, can you fix this patch up based on this please?

thanks,

greg k-h
Alan Stern July 21, 2021, 2:22 p.m. UTC | #8
On Wed, Jul 21, 2021 at 11:44:23AM +0200, dave penkler wrote:
> Sorry, the issue this patch is trying to fix occurs because the
> current usbtmc driver resubmits the URB when it gets an EPROTO return.
> The dummy usb host controller driver used in the syzbot tests keeps
> returning the resubmitted URB with EPROTO causing a loop that starves
> RCU. With an actual HCI driver it either recovers or returns an EPIPE.

Are you sure about that?  Have you ever observed a usbtmc device recovering 
and continuing to operate after an EPROTO error?

An EPIPE error also seems rather unlikely -- particularly if the device is 
not plugged into a high-speed hub.

Alan Stern

> In either case no loop occurs. So for my part as a user and maintainer
> this patch is not ok.
diff mbox series

Patch

diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c
index 74d5a9c5238a..c4e1a88fff78 100644
--- a/drivers/usb/class/usbtmc.c
+++ b/drivers/usb/class/usbtmc.c
@@ -2324,17 +2324,9 @@  static void usbtmc_interrupt(struct urb *urb)
 		dev_err(dev, "overflow with length %d, actual length is %d\n",
 			data->iin_wMaxPacketSize, urb->actual_length);
 		fallthrough;
-	case -ECONNRESET:
-	case -ENOENT:
-	case -ESHUTDOWN:
-	case -EILSEQ:
-	case -ETIME:
-	case -EPIPE:
-		/* urb terminated, clean up */
-		dev_dbg(dev, "urb terminated, status: %d\n", status);
-		return;
 	default:
-		dev_err(dev, "unknown status received: %d\n", status);
+		dev_err(dev, "error status received: %d\n", status);
+		return;
 	}
 exit:
 	rv = usb_submit_urb(urb, GFP_ATOMIC);