Message ID | 20191105032922.GA3041@tungsten (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | usb: dwc2: extend treatment for incomplete transfer | expand |
Hi, First post in this list, please be lenient. Replying to self to give some context: I'm on a Asus c201 (rk3288) and I see some crashes with cdc_ether. Here is how to repro: - create heavy usb network load: I tether my phone and netcat some file from it; - create heavy CPU load (pushd linux; make -j 6) - observe kernel messages: dwc2 ff580000.usb: dwc2_hc_chhltd_intr_dma: Channel 7 - ChHltd set, but reason is unknown dwc2 ff580000.usb: hcint 0x00000002, intsts 0x04200021 dwc2 ff580000.usb: ep_type 0x00000002 bulk /* ba: ADDED LOG */ The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c 2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb) 2475 { /* ... */ 2482 /* Restore urb->transfer_buffer from the end of the allocated area */ 2483 memcpy(&stored_xfer_buffer, 2484 PTR_ALIGN(urb->transfer_buffer + urb->transfer_buffer_length, 2485 dma_get_cache_alignment()), 2486 sizeof(urb->transfer_buffer)); /* ... */ 2494 memcpy(stored_xfer_buffer, urb->transfer_buffer, length); /* ... */ 2500 } The fix I propose has been working fine on my machine, but I confess I am less than familiar with this area... My guess is that the kernel misses some deadlines due to contention and we see channel halts. I tried treating these as we do the other (with other end point types) and it solved the crashes. I verified on next-20191030 that the data is correctly transfered over the network (no corruption). Thank you & regards, Boris. >Channel halt can happen with BULK endpoints when the >cpu is under high load. Treating it as an error leads >to a null-pointer dereference in dwc2_free_dma_aligned_buffer(). > >Signed-off-by: Boris Arzur <boris@konbu.org> >--- > drivers/usb/dwc2/hcd_intr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c >index a052d39b4375..697fed530aeb 100644 >--- a/drivers/usb/dwc2/hcd_intr.c >+++ b/drivers/usb/dwc2/hcd_intr.c >@@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg >*hsotg, > */ > dwc2_hc_ack_intr(hsotg, chan, chnum, qtd); > } else { >- if (chan->ep_type == USB_ENDPOINT_XFER_INT || >+ if (chan->ep_type == USB_ENDPOINT_XFER_BULK || >+ chan->ep_type == USB_ENDPOINT_XFER_INT || > chan->ep_type == USB_ENDPOINT_XFER_ISOC) { > /* > * A periodic transfer halted with no other >-- >2.23.0
Hi Boris, On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote: > Channel halt can happen with BULK endpoints when the > cpu is under high load. Treating it as an error leads > to a null-pointer dereference in dwc2_free_dma_aligned_buffer(). > good find, and good analysis. We stated to see this problem as well in the latest ChromeOS kernel. I am still trying understand what exactly happens. To do that, I'll need to be able to reproduce the problem. Maybe you can help me. How do you tether your phone through USB ? Thanks, Guenter > Signed-off-by: Boris Arzur <boris@konbu.org> > --- > drivers/usb/dwc2/hcd_intr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > * A periodic transfer halted with no other > -- > 2.23.0 > > diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c > index a052d39b4375..697fed530aeb 100644 > --- a/drivers/usb/dwc2/hcd_intr.c > +++ b/drivers/usb/dwc2/hcd_intr.c > @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg > *hsotg, > */ > dwc2_hc_ack_intr(hsotg, chan, chnum, qtd); > } else { > - if (chan->ep_type == USB_ENDPOINT_XFER_INT || > + if (chan->ep_type == USB_ENDPOINT_XFER_BULK || > + chan->ep_type == USB_ENDPOINT_XFER_INT || > chan->ep_type == USB_ENDPOINT_XFER_ISOC) { > /*
Hello Guenter, >good find, and good analysis. We stated to see this problem as well in the >latest ChromeOS kernel. I'm glad you find my report helpful. >be able to reproduce the problem. Maybe you can help me. How do you tether >your phone through USB ? You mention thethering, so I think you have read my follow-up: https://www.spinics.net/lists/linux-usb/msg187497.html My setup is as follows: - 'kenzo' phone (https://wiki.lineageos.org/devices/kenzo) on AICP 12.1 (android 7.1.2 linux 3.10.105); - 'veyron speedy' chromebook (https://wiki.gentoo.org/wiki/Asus_Chromebook_C201) on Arch Linux ARM, vanilla linux 5.2.14; Here are my repro steps, sorry if tedious, I'm not sure of the level of details you want, so I will go verbose squared :) : 0. plug in phone to chromebook, with a USB2 micro b cable; 1. activate usb tethering in phone settings: settings> more> tethering & portable hotspot> USB tethering click and confirm "tethered"; 2. chromebook sees phone as: [ 2128.080551] rndis_host 2-1:1.0 usb0: register 'rndis_host' at usb-ff580000.usb-1, RNDIS device, 4a:5e:0c:89:ec:09 3. chromebook$ sudo dhcpcd --noarp usb0 usb0: adding default route via 192.168.42.129 4. on phone, start termux (https://f-droid.org/en/packages/com.termux/) 5. phone$ dd if=/dev/urandom of=blob count=50 bs=1M 6. phone$ sha512sum blob b9e...14d blob 7. phone$ pkg install netcat 8. phone$ while true; do <blob netcat -l -p 9999; done 9. chromebook$ sudo pacman -Syu extra/gnu-netcat community/pigz 10. chromebook$ dd if=/dev/urandom of=job count=10 bs=1M 11. chromebook terminal 0$ while true; do <job pigz -11 -i -p 1024 >/dev/null; done 12. chromebook terminal 1$ cat /proc/loadavg 28.18 8.76 3.74 54/521 8826 13. chromebook terminal 1$ while true; do netcat 192.168.42.129 9999 | sha512sum; done b9e...14d - 13. chromebook will panic soon (I see repros in tens of seconds); I managed to track the issue to: > The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c >2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb) >2494 memcpy(stored_xfer_buffer, urb->transfer_buffer, length); I discussed the below patch with hminas@synopsys.com, who expressed doubts about its correctness. I tested it a while back and it seemed solid (no crash & correct hashes), but while writing this mail I see that sometimes the output of sha512sum on the chromebook is wrong... also, I'm thinking that the fix below may be a memory leak. In conclusion, do not commit, the fix needs more work :) I hope to restart experimenting in a short while, when I get a bit more free time. I am waiting for any question you may have, thank you for your time. Boris. Guenter Roeck wrote: >Hi Boris, > >On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote: >> Channel halt can happen with BULK endpoints when the >> cpu is under high load. Treating it as an error leads >> to a null-pointer dereference in dwc2_free_dma_aligned_buffer(). >> > >good find, and good analysis. We stated to see this problem as well in the >latest ChromeOS kernel. > >I am still trying understand what exactly happens. To do that, I'll need to >be able to reproduce the problem. Maybe you can help me. How do you tether >your phone through USB ? > >Thanks, >Guenter > >> Signed-off-by: Boris Arzur <boris@konbu.org> >> --- >> drivers/usb/dwc2/hcd_intr.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> * A periodic transfer halted with no other >> -- >> 2.23.0 >> >> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c >> index a052d39b4375..697fed530aeb 100644 >> --- a/drivers/usb/dwc2/hcd_intr.c >> +++ b/drivers/usb/dwc2/hcd_intr.c >> @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg >> *hsotg, >> */ >> dwc2_hc_ack_intr(hsotg, chan, chnum, qtd); >> } else { >> - if (chan->ep_type == USB_ENDPOINT_XFER_INT || >> + if (chan->ep_type == USB_ENDPOINT_XFER_BULK || >> + chan->ep_type == USB_ENDPOINT_XFER_INT || >> chan->ep_type == USB_ENDPOINT_XFER_ISOC) { >> /*
Hi Boris, On 2/1/20 9:15 PM, Boris ARZUR wrote: > Hello Guenter, > > >> good find, and good analysis. We stated to see this problem as well in the >> latest ChromeOS kernel. > I'm glad you find my report helpful. > > >> be able to reproduce the problem. Maybe you can help me. How do you tether >> your phone through USB ? > You mention thethering, so I think you have read my follow-up: > https://www.spinics.net/lists/linux-usb/msg187497.html > > > My setup is as follows: > - 'kenzo' phone (https://wiki.lineageos.org/devices/kenzo) on AICP 12.1 > (android 7.1.2 linux 3.10.105); > - 'veyron speedy' chromebook (https://wiki.gentoo.org/wiki/Asus_Chromebook_C201) > on Arch Linux ARM, vanilla linux 5.2.14; > > > Here are my repro steps, sorry if tedious, I'm not sure of the level of > details you want, so I will go verbose squared :) : > > 0. plug in phone to chromebook, with a USB2 micro b cable; > > 1. activate usb tethering in phone settings: > settings> more> tethering & portable hotspot> USB tethering > click and confirm "tethered"; > > 2. chromebook sees phone as: > [ 2128.080551] rndis_host 2-1:1.0 usb0: register 'rndis_host' at usb-ff580000.usb-1, RNDIS device, 4a:5e:0c:89:ec:09 > > 3. chromebook$ sudo dhcpcd --noarp usb0 > usb0: adding default route via 192.168.42.129 > > 4. on phone, start termux (https://f-droid.org/en/packages/com.termux/) > > 5. phone$ dd if=/dev/urandom of=blob count=50 bs=1M > > 6. phone$ sha512sum blob > b9e...14d blob > > 7. phone$ pkg install netcat > > 8. phone$ while true; do <blob netcat -l -p 9999; done > > 9. chromebook$ sudo pacman -Syu extra/gnu-netcat community/pigz > > 10. chromebook$ dd if=/dev/urandom of=job count=10 bs=1M > > 11. chromebook terminal 0$ while true; do <job pigz -11 -i -p 1024 >/dev/null; done > > 12. chromebook terminal 1$ cat /proc/loadavg > 28.18 8.76 3.74 54/521 8826 > > 13. chromebook terminal 1$ while true; do netcat 192.168.42.129 9999 | sha512sum; done > b9e...14d - > > 13. chromebook will panic soon (I see repros in tens of seconds); > Thanks a lot for the information. I'll see if I can reproduce the problem using this (or a similar) approach. Tethering an Android phone isn't really difficult, but the traffic pattern seems to play a role as well. > I managed to track the issue to: >> The kernel will write to 0 at line 2494 below in file drivers/usb/dwc2/hcd.c >> 2474 static void dwc2_free_dma_aligned_buffer(struct urb *urb) >> 2494 memcpy(stored_xfer_buffer, urb->transfer_buffer, length); > > > I discussed the below patch with hminas@synopsys.com, who expressed doubts about its > correctness. > > I tested it a while back and it seemed solid (no crash & correct hashes), but while > writing this mail I see that sometimes the output of sha512sum on the > chromebook is wrong... also, I'm thinking that the fix below may be a memory > leak. > > > In conclusion, do not commit, the fix needs more work :) > Yes, I suspect that your patch is not a real fix but rather a bandage; that is why I want to reproduce the problem and spend some time trying to figure out what is going on. In a nutshell, even if the current code doesn't handle the situation well, it should not result in the observed problem (which looks like a memory overwrite). > I hope to restart experimenting in a short while, when I get a bit more free > time. > > > I am waiting for any question you may have, thank you for your time. > Boris. > Thanks! Guenter > Guenter Roeck wrote: >> Hi Boris, >> >> On Tue, Nov 05, 2019 at 12:29:22PM +0900, Boris ARZUR wrote: >>> Channel halt can happen with BULK endpoints when the >>> cpu is under high load. Treating it as an error leads >>> to a null-pointer dereference in dwc2_free_dma_aligned_buffer(). >>> >> >> good find, and good analysis. We stated to see this problem as well in the >> latest ChromeOS kernel. >> >> I am still trying understand what exactly happens. To do that, I'll need to >> be able to reproduce the problem. Maybe you can help me. How do you tether >> your phone through USB ? >> >> Thanks, >> Guenter >> >>> Signed-off-by: Boris Arzur <boris@konbu.org> >>> --- >>> drivers/usb/dwc2/hcd_intr.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> * A periodic transfer halted with no other >>> -- >>> 2.23.0 >>> >>> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c >>> index a052d39b4375..697fed530aeb 100644 >>> --- a/drivers/usb/dwc2/hcd_intr.c >>> +++ b/drivers/usb/dwc2/hcd_intr.c >>> @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg >>> *hsotg, >>> */ >>> dwc2_hc_ack_intr(hsotg, chan, chnum, qtd); >>> } else { >>> - if (chan->ep_type == USB_ENDPOINT_XFER_INT || >>> + if (chan->ep_type == USB_ENDPOINT_XFER_BULK || >>> + chan->ep_type == USB_ENDPOINT_XFER_INT || >>> chan->ep_type == USB_ENDPOINT_XFER_ISOC) { >>> /*
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c index a052d39b4375..697fed530aeb 100644 --- a/drivers/usb/dwc2/hcd_intr.c +++ b/drivers/usb/dwc2/hcd_intr.c @@ -1944,7 +1944,8 @@ static void dwc2_hc_chhltd_intr_dma(struct dwc2_hsotg *hsotg, */ dwc2_hc_ack_intr(hsotg, chan, chnum, qtd); } else { - if (chan->ep_type == USB_ENDPOINT_XFER_INT || + if (chan->ep_type == USB_ENDPOINT_XFER_BULK || + chan->ep_type == USB_ENDPOINT_XFER_INT || chan->ep_type == USB_ENDPOINT_XFER_ISOC) { /*
Channel halt can happen with BULK endpoints when the cpu is under high load. Treating it as an error leads to a null-pointer dereference in dwc2_free_dma_aligned_buffer(). Signed-off-by: Boris Arzur <boris@konbu.org> --- drivers/usb/dwc2/hcd_intr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) * A periodic transfer halted with no other -- 2.23.0