diff mbox series

usb: dwc2: extend treatment for incomplete transfer

Message ID 20191105032922.GA3041@tungsten (mailing list archive)
State New, archived
Headers show
Series usb: dwc2: extend treatment for incomplete transfer | expand

Commit Message

Boris ARZUR Nov. 5, 2019, 3:29 a.m. UTC
Channel halt can happen with BULK endpoints when the
cpu is under high load. Treating it as an error leads
to a null-pointer dereference in dwc2_free_dma_aligned_buffer().

Signed-off-by: Boris Arzur <boris@konbu.org>
---
 drivers/usb/dwc2/hcd_intr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

                                 * A periodic transfer halted with no other
--
2.23.0

Comments

Boris ARZUR Nov. 5, 2019, 3:39 a.m. UTC | #1
Hi,

First post in this list, please be lenient.

Replying to self to give some context: I'm on a Asus c201 (rk3288)
and I see some crashes with cdc_ether.

Here is how to repro:
- create heavy usb network load: I tether my phone and
  netcat some file from it;
- create heavy CPU load (pushd linux; make -j 6)
- observe kernel messages:
dwc2 ff580000.usb: dwc2_hc_chhltd_intr_dma: Channel 7 - ChHltd set, but reason is unknown
dwc2 ff580000.usb: hcint 0x00000002, intsts 0x04200021
dwc2 ff580000.usb: ep_type 0x00000002 bulk /* ba: ADDED LOG */

The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c
2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb)
2475 {
/* ... */
2482 	/* Restore urb->transfer_buffer from the end of the allocated area */
2483 	memcpy(&stored_xfer_buffer,
2484 	       PTR_ALIGN(urb->transfer_buffer + urb->transfer_buffer_length,
2485 			 dma_get_cache_alignment()),
2486 	       sizeof(urb->transfer_buffer));
/* ... */
2494 		memcpy(stored_xfer_buffer, urb->transfer_buffer, length);
/* ... */
2500 }

The fix I propose has been working fine on my machine, but I confess
I am less than familiar with this area...

My guess is that the kernel misses some deadlines due to contention and we
see channel halts. I tried treating these as we do the other (with other end
point types) and it solved the crashes. I verified on next-20191030 that the
data is correctly transfered over the network (no corruption).

Thank you & regards,
Boris.

>Channel halt can happen with BULK endpoints when the
>cpu is under high load. Treating it as an error leads
>to a null-pointer dereference in dwc2_free_dma_aligned_buffer().
>
>Signed-off-by: Boris Arzur <boris@konbu.org>
>---
> drivers/usb/dwc2/hcd_intr.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
>index a052d39b4375..697fed530aeb 100644
>--- a/drivers/usb/dwc2/hcd_intr.c
>+++ b/drivers/usb/dwc2/hcd_intr.c
>@@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg
>*hsotg,
>                         */
>                        dwc2_hc_ack_intr(hsotg, chan, chnum, qtd);
>                } else {
>-                       if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
>+                       if (chan->ep_type == USB_ENDPOINT_XFER_BULK ||
>+                           chan->ep_type == USB_ENDPOINT_XFER_INT ||
>                            chan->ep_type == USB_ENDPOINT_XFER_ISOC) {
>                                /*
>                                 * A periodic transfer halted with no other
>--
>2.23.0
Guenter Roeck Jan. 31, 2020, 10:09 p.m. UTC | #2
Hi Boris,

On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote:
> Channel halt can happen with BULK endpoints when the
> cpu is under high load. Treating it as an error leads
> to a null-pointer dereference in dwc2_free_dma_aligned_buffer().
> 

good find, and good analysis. We stated to see this problem as well in the
latest ChromeOS kernel.

I am still trying understand what exactly happens. To do that, I'll need to
be able to reproduce the problem. Maybe you can help me. How do you tether
your phone through USB ?

Thanks,
Guenter

> Signed-off-by: Boris Arzur <boris@konbu.org>
> ---
>  drivers/usb/dwc2/hcd_intr.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
>                                  * A periodic transfer halted with no other
> --
> 2.23.0
> 
> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
> index a052d39b4375..697fed530aeb 100644
> --- a/drivers/usb/dwc2/hcd_intr.c
> +++ b/drivers/usb/dwc2/hcd_intr.c
> @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg
> *hsotg,
>                          */
>                         dwc2_hc_ack_intr(hsotg, chan, chnum, qtd);
>                 } else {
> -                       if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
> +                       if (chan->ep_type == USB_ENDPOINT_XFER_BULK ||
> +                           chan->ep_type == USB_ENDPOINT_XFER_INT ||
>                             chan->ep_type == USB_ENDPOINT_XFER_ISOC) {
>                                 /*
Boris ARZUR Feb. 2, 2020, 5:15 a.m. UTC | #3
Hello Guenter,


>good find, and good analysis. We stated to see this problem as well in the
>latest ChromeOS kernel.
I'm glad you find my report helpful.


>be able to reproduce the problem. Maybe you can help me. How do you tether
>your phone through USB ?
You mention thethering, so I think you have read my follow-up:
https://www.spinics.net/lists/linux-usb/msg187497.html


My setup is as follows:
- 'kenzo' phone (https://wiki.lineageos.org/devices/kenzo) on AICP 12.1
  (android 7.1.2 linux 3.10.105);
- 'veyron speedy' chromebook (https://wiki.gentoo.org/wiki/Asus_Chromebook_C201)
  on Arch Linux ARM, vanilla linux 5.2.14;


Here are my repro steps, sorry if tedious, I'm not sure of the level of
details you want, so I will go verbose squared :) :

0. plug in phone to chromebook, with a USB2 micro b cable;

1. activate usb tethering in phone settings:
   settings> more> tethering & portable hotspot> USB tethering 
   click and confirm "tethered";

2. chromebook sees phone as:
[ 2128.080551] rndis_host 2-1:1.0 usb0: register 'rndis_host' at usb-ff580000.usb-1, RNDIS device, 4a:5e:0c:89:ec:09

3. chromebook$ sudo dhcpcd --noarp usb0
usb0: adding default route via 192.168.42.129

4. on phone, start termux (https://f-droid.org/en/packages/com.termux/) 

5. phone$ dd if=/dev/urandom of=blob count=50 bs=1M

6. phone$ sha512sum blob
b9e...14d blob

7. phone$ pkg install netcat

8. phone$ while true; do <blob netcat -l -p 9999; done

9. chromebook$ sudo pacman -Syu extra/gnu-netcat community/pigz

10. chromebook$ dd if=/dev/urandom of=job count=10 bs=1M

11. chromebook terminal 0$ while true; do <job pigz -11 -i -p 1024 >/dev/null; done

12. chromebook terminal 1$ cat /proc/loadavg
28.18 8.76 3.74 54/521 8826
 
13. chromebook terminal 1$ while true; do netcat 192.168.42.129 9999 | sha512sum; done
b9e...14d -

13. chromebook will panic soon (I see repros in tens of seconds);

I managed to track the issue to:
> The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c
>2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb)
>2494 		memcpy(stored_xfer_buffer, urb->transfer_buffer, length);


I discussed the below patch with hminas@synopsys.com, who expressed doubts about its
correctness.

I tested it a while back and it seemed solid (no crash & correct hashes), but while
writing this mail I see that sometimes the output of sha512sum on the
chromebook is wrong... also, I'm thinking that the fix below may be a memory
leak.


In conclusion, do not commit, the fix needs more work :)

I hope to restart experimenting in a short while, when I get a bit more free
time.


I am waiting for any question you may have, thank you for your time.
Boris.

Guenter Roeck wrote:
>Hi Boris,
>
>On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote:
>> Channel halt can happen with BULK endpoints when the
>> cpu is under high load. Treating it as an error leads
>> to a null-pointer dereference in dwc2_free_dma_aligned_buffer().
>> 
>
>good find, and good analysis. We stated to see this problem as well in the
>latest ChromeOS kernel.
>
>I am still trying understand what exactly happens. To do that, I'll need to
>be able to reproduce the problem. Maybe you can help me. How do you tether
>your phone through USB ?
>
>Thanks,
>Guenter
>
>> Signed-off-by: Boris Arzur <boris@konbu.org>
>> ---
>>  drivers/usb/dwc2/hcd_intr.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>>                                  * A periodic transfer halted with no other
>> --
>> 2.23.0
>> 
>> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
>> index a052d39b4375..697fed530aeb 100644
>> --- a/drivers/usb/dwc2/hcd_intr.c
>> +++ b/drivers/usb/dwc2/hcd_intr.c
>> @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg
>> *hsotg,
>>                          */
>>                         dwc2_hc_ack_intr(hsotg, chan, chnum, qtd);
>>                 } else {
>> -                       if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
>> +                       if (chan->ep_type == USB_ENDPOINT_XFER_BULK ||
>> +                           chan->ep_type == USB_ENDPOINT_XFER_INT ||
>>                             chan->ep_type == USB_ENDPOINT_XFER_ISOC) {
>>                                 /*
Guenter Roeck Feb. 2, 2020, 6:52 p.m. UTC | #4
Hi Boris,

On 2/1/20 9:15 PM, Boris ARZUR wrote:
> Hello Guenter,
> 
> 
>> good find, and good analysis. We stated to see this problem as well in the
>> latest ChromeOS kernel.
> I'm glad you find my report helpful.
> 
> 
>> be able to reproduce the problem. Maybe you can help me. How do you tether
>> your phone through USB ?
> You mention thethering, so I think you have read my follow-up:
> https://www.spinics.net/lists/linux-usb/msg187497.html
> 
> 
> My setup is as follows:
> - 'kenzo' phone (https://wiki.lineageos.org/devices/kenzo) on AICP 12.1
>    (android 7.1.2 linux 3.10.105);
> - 'veyron speedy' chromebook (https://wiki.gentoo.org/wiki/Asus_Chromebook_C201)
>    on Arch Linux ARM, vanilla linux 5.2.14;
> 
> 
> Here are my repro steps, sorry if tedious, I'm not sure of the level of
> details you want, so I will go verbose squared :) :
> 
> 0. plug in phone to chromebook, with a USB2 micro b cable;
> 
> 1. activate usb tethering in phone settings:
>     settings> more> tethering & portable hotspot> USB tethering
>     click and confirm "tethered";
> 
> 2. chromebook sees phone as:
> [ 2128.080551] rndis_host 2-1:1.0 usb0: register 'rndis_host' at usb-ff580000.usb-1, RNDIS device, 4a:5e:0c:89:ec:09
> 
> 3. chromebook$ sudo dhcpcd --noarp usb0
> usb0: adding default route via 192.168.42.129
> 
> 4. on phone, start termux (https://f-droid.org/en/packages/com.termux/)
> 
> 5. phone$ dd if=/dev/urandom of=blob count=50 bs=1M
> 
> 6. phone$ sha512sum blob
> b9e...14d blob
> 
> 7. phone$ pkg install netcat
> 
> 8. phone$ while true; do <blob netcat -l -p 9999; done
> 
> 9. chromebook$ sudo pacman -Syu extra/gnu-netcat community/pigz
> 
> 10. chromebook$ dd if=/dev/urandom of=job count=10 bs=1M
> 
> 11. chromebook terminal 0$ while true; do <job pigz -11 -i -p 1024 >/dev/null; done
> 
> 12. chromebook terminal 1$ cat /proc/loadavg
> 28.18 8.76 3.74 54/521 8826
>   
> 13. chromebook terminal 1$ while true; do netcat 192.168.42.129 9999 | sha512sum; done
> b9e...14d -
> 
> 13. chromebook will panic soon (I see repros in tens of seconds);
> 
Thanks a lot for the information. I'll see if I can reproduce the problem using
this (or a similar) approach. Tethering an Android phone isn't really difficult,
but the traffic pattern seems to play a role as well.

> I managed to track the issue to:
>> The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c
>> 2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb)
>> 2494 		memcpy(stored_xfer_buffer, urb->transfer_buffer, length);
> 
> 
> I discussed the below patch with hminas@synopsys.com, who expressed doubts about its
> correctness.
> 
> I tested it a while back and it seemed solid (no crash & correct hashes), but while
> writing this mail I see that sometimes the output of sha512sum on the
> chromebook is wrong... also, I'm thinking that the fix below may be a memory
> leak.
> 
> 
> In conclusion, do not commit, the fix needs more work :)
> 
Yes, I suspect that your patch is not a real fix but rather a bandage; that is why I
want to reproduce the problem and spend some time trying to figure out what is
going on. In a nutshell, even if the current code doesn't handle the situation well,
it should not result in the observed problem (which looks like a memory overwrite).

> I hope to restart experimenting in a short while, when I get a bit more free
> time.
> 
> 
> I am waiting for any question you may have, thank you for your time.
> Boris.
> 
Thanks!

Guenter

> Guenter Roeck wrote:
>> Hi Boris,
>>
>> On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote:
>>> Channel halt can happen with BULK endpoints when the
>>> cpu is under high load. Treating it as an error leads
>>> to a null-pointer dereference in dwc2_free_dma_aligned_buffer().
>>>
>>
>> good find, and good analysis. We stated to see this problem as well in the
>> latest ChromeOS kernel.
>>
>> I am still trying understand what exactly happens. To do that, I'll need to
>> be able to reproduce the problem. Maybe you can help me. How do you tether
>> your phone through USB ?
>>
>> Thanks,
>> Guenter
>>
>>> Signed-off-by: Boris Arzur <boris@konbu.org>
>>> ---
>>>   drivers/usb/dwc2/hcd_intr.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>>                                   * A periodic transfer halted with no other
>>> --
>>> 2.23.0
>>>
>>> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
>>> index a052d39b4375..697fed530aeb 100644
>>> --- a/drivers/usb/dwc2/hcd_intr.c
>>> +++ b/drivers/usb/dwc2/hcd_intr.c
>>> @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg
>>> *hsotg,
>>>                           */
>>>                          dwc2_hc_ack_intr(hsotg, chan, chnum, qtd);
>>>                  } else {
>>> -                       if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
>>> +                       if (chan->ep_type == USB_ENDPOINT_XFER_BULK ||
>>> +                           chan->ep_type == USB_ENDPOINT_XFER_INT ||
>>>                              chan->ep_type == USB_ENDPOINT_XFER_ISOC) {
>>>                                  /*
diff mbox series

Patch

diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index a052d39b4375..697fed530aeb 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -1944,7 +1944,8 @@  static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg
*hsotg,
                         */
                        dwc2_hc_ack_intr(hsotg, chan, chnum, qtd);
                } else {
-                       if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
+                       if (chan->ep_type == USB_ENDPOINT_XFER_BULK ||
+                           chan->ep_type == USB_ENDPOINT_XFER_INT ||
                            chan->ep_type == USB_ENDPOINT_XFER_ISOC) {
                                /*