diff mbox

usb HC busted?

Message ID 20180717114104.irgdb5rmg2qxclgp@debian (mailing list archive)
State New, archived
Headers show

Commit Message

Sudip Mukherjee July 17, 2018, 11:41 a.m. UTC
Hi Mathias,

On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote:
> Hi Mathias,
> 
> On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote:
> > On 27.06.2018 14:59, Sudip Mukherjee wrote:
> > > > > Can you share a bit more details on the platform you are using, and what types of test you are running.
> > > > 
<snip>
> Then to track what is going on, I added the slub debugging and :(
> I have attached part of dmesg for you to check.
> Will appreciate your help in finding out the problem.

I did some more debugging. Tested with a KASAN enabled kernel and that
shows the problem. The report is attached.

To my understanding:

btusb_work() is calling usb_set_interface() with alternate = 0. which
again calls usb_hcd_alloc_bandwidth() and that frees the rings by
xhci_free_endpoint_ring(). But then usb_set_interface() continues and
calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
xhci_urb_dequeue() which at the end gives the command to stop endpoint.

In all the cycles I have tested I see that only in the fail case
handle_cmd_completion() gets called, but in the cycles where the error
is not there handle_cmd_completion() is not called with that command.

I am not sure what is happening, and you are the best person to understand
what is happening. :)

But for now (untill you are back from holiday and suggest a proper solution),
I made a hacky patch (attached) which is working and I donot get any
corruption after that. Both KASAN and slub debug are also happy.

So, now waiting for you to analyze what is going on and suggest a proper
fix.

Thanks in advance.

--
Regards
Sudip
[  236.814156] ==================================================================
[  236.814187] BUG: KASAN: use-after-free in xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
[  236.814193] Read of size 8 at addr ffff8800789329c8 by task weston/138

[  236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G     U  W  O    4.14.47-20180606+ #7
[  236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
[  236.814209] Call Trace:
[  236.814214]  <IRQ>
[  236.814226]  dump_stack+0x46/0x59
[  236.814238]  print_address_description+0x6b/0x23b
[  236.814255]  ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
[  236.814262]  kasan_report+0x220/0x246
[  236.814278]  xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
[  236.814294]  trb_in_td+0x3b/0x1cd [xhci_hcd]
[  236.814311]  handle_cmd_completion+0x1181/0x2c9b [xhci_hcd]
[  236.814329]  ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd]
[  236.814337]  ? drm_handle_vblank+0x4ec/0x590
[  236.814352]  xhci_irq+0x529/0x3294 [xhci_hcd]
[  236.814362]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814378]  ? finish_td.isra.40+0x223/0x223 [xhci_hcd]
[  236.814384]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814390]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814405]  ? xhci_irq+0x3294/0x3294 [xhci_hcd]
[  236.814412]  __handle_irq_event_percpu+0x149/0x3db
[  236.814421]  handle_irq_event_percpu+0x65/0x109
[  236.814428]  ? __handle_irq_event_percpu+0x3db/0x3db
[  236.814436]  ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce
[  236.814442]  handle_irq_event+0xa8/0x10a
[  236.814449]  handle_edge_irq+0x4b2/0x538
[  236.814458]  handle_irq+0x3e/0x45
[  236.814465]  do_IRQ+0x5c/0x126
[  236.814474]  common_interrupt+0x7a/0x7a
[  236.814478]  </IRQ>
[  236.814483] RIP: 0023:0xf79d3d82
[  236.814486] RSP: 002b:00000000ffc588e8 EFLAGS: 00200282 ORIG_RAX: ffffffffffffffdc
[  236.814493] RAX: 0000000000000000 RBX: 00000000f7bebd5c RCX: 0000000000000000
[  236.814496] RDX: 0000000008d4197c RSI: 0000000000000000 RDI: 00000000f746c020
[  236.814499] RBP: 00000000ffc588e8 R08: 0000000000000000 R09: 0000000000000000
[  236.814503] R10: 0000000000000000 R11: 0000000000200206 R12: 0000000000000000
[  236.814506] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

[  236.814513] Allocated by task 2082:
[  236.814521]  kasan_kmalloc.part.1+0x51/0xc7
[  236.814526]  kmem_cache_alloc_trace+0x178/0x187
[  236.814540]  xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd]
[  236.814553]  xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd]
[  236.814566]  xhci_ring_alloc.constprop.16+0x197/0x4ba [xhci_hcd]
[  236.814579]  xhci_endpoint_init+0x77a/0x9ba [xhci_hcd]
[  236.814592]  xhci_add_endpoint+0x3bc/0x43b [xhci_hcd]
[  236.814615]  usb_hcd_alloc_bandwidth+0x7ef/0x857 [usbcore]
[  236.814637]  usb_set_interface+0x294/0x681 [usbcore]
[  236.814645]  btusb_work+0x2e6/0x981 [btusb]
[  236.814651]  process_one_work+0x579/0x9e9
[  236.814656]  worker_thread+0x68f/0x804
[  236.814662]  kthread+0x31c/0x32b
[  236.814668]  ret_from_fork+0x35/0x40

[  236.814672] Freed by task 1533:
[  236.814678]  kasan_slab_free+0xb3/0x15e
[  236.814683]  kfree+0x103/0x1a9
[  236.814696]  xhci_ring_free+0x205/0x286 [xhci_hcd]
[  236.814709]  xhci_free_endpoint_ring+0x4d/0x83 [xhci_hcd]
[  236.814722]  xhci_check_bandwidth+0x57b/0x65a [xhci_hcd]
[  236.814743]  usb_hcd_alloc_bandwidth+0x665/0x857 [usbcore]
[  236.814765]  usb_set_interface+0x294/0x681 [usbcore]
[  236.814772]  btusb_work+0x664/0x981 [btusb]
[  236.814777]  process_one_work+0x579/0x9e9
[  236.814782]  worker_thread+0x68f/0x804
[  236.814788]  kthread+0x31c/0x32b
[  236.814793]  ret_from_fork+0x35/0x40

[  236.814799] The buggy address belongs to the object at ffff8800789329c8
 which belongs to the cache kmalloc-64 of size 64
[  236.814804] The buggy address is located 0 bytes inside of
 64-byte region [ffff8800789329c8, ffff880078932a08)
[  236.814806] The buggy address belongs to the page:
[  236.814812] page:ffffea0001e24c80 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[  236.825813] flags: 0x4000000000008100(slab|head)
[  236.830981] raw: 4000000000008100 0000000000000000 0000000000000000 0000000100130013
[  236.830988] raw: ffffea0000cfbaa0 ffffea00010ddf20 ffff88013b80f640 0000000000000000
[  236.830990] page dumped because: kasan: bad access detected

[  236.830993] Memory state around the buggy address:
[  236.830999]  ffff880078932880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  236.831004]  ffff880078932900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  236.831008] >ffff880078932980: fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
[  236.831011]                                               ^
[  236.831015]  ffff880078932a00: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  236.831019]  ffff880078932a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  236.831021] ==================================================================
[  236.831024] Disabling lock debugging due to kernel taint

Comments

Alan Stern July 17, 2018, 2:28 p.m. UTC | #1
On Tue, 17 Jul 2018, Sudip Mukherjee wrote:

> I did some more debugging. Tested with a KASAN enabled kernel and that
> shows the problem. The report is attached.
> 
> To my understanding:
> 
> btusb_work() is calling usb_set_interface() with alternate = 0. which
> again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> xhci_free_endpoint_ring().

That doesn't sound like the right thing to do.  The rings shouldn't be 
freed until xhci_endpoint_disable() is called.  

On the other hand, there doesn't appear to be any 
xhci_endpoint_disable() routine, although a comment refers to it.  
Maybe this is the real problem?

Alan Stern

> But then usb_set_interface() continues and
> calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
> xhci_urb_dequeue() which at the end gives the command to stop endpoint.
> 
> In all the cycles I have tested I see that only in the fail case
> handle_cmd_completion() gets called, but in the cycles where the error
> is not there handle_cmd_completion() is not called with that command.
> 
> I am not sure what is happening, and you are the best person to understand
> what is happening. :)
> 
> But for now (untill you are back from holiday and suggest a proper solution),
> I made a hacky patch (attached) which is working and I donot get any
> corruption after that. Both KASAN and slub debug are also happy.
> 
> So, now waiting for you to analyze what is going on and suggest a proper
> fix.
> 
> Thanks in advance.
> 
> --
> Regards
> Sudip
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudip Mukherjee July 17, 2018, 2:40 p.m. UTC | #2
Hi Alan,

On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> 
> > I did some more debugging. Tested with a KASAN enabled kernel and that
> > shows the problem. The report is attached.
> > 
> > To my understanding:
> > 
> > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > xhci_free_endpoint_ring().
> 
> That doesn't sound like the right thing to do.  The rings shouldn't be 
> freed until xhci_endpoint_disable() is called.  
> 
> On the other hand, there doesn't appear to be any 
> xhci_endpoint_disable() routine, although a comment refers to it.  
> Maybe this is the real problem?

one of your old mail might help :)

https://www.spinics.net/lists/linux-usb/msg98123.html

--
Regards
Sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudip Mukherjee July 17, 2018, 2:49 p.m. UTC | #3
On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> Hi Alan,
> 
> On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > 
> > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > shows the problem. The report is attached.
> > > 
> > > To my understanding:
> > > 
> > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > xhci_free_endpoint_ring().
> > 
> > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > freed until xhci_endpoint_disable() is called.  
> > 
> > On the other hand, there doesn't appear to be any 
> > xhci_endpoint_disable() routine, although a comment refers to it.  
> > Maybe this is the real problem?
> 
> one of your old mail might help :)
> 
> https://www.spinics.net/lists/linux-usb/msg98123.html

Wrote too soon.

Is it the one you are looking for -
usb_disable_endpoint() is in drivers/usb/core/message.c

--
Regards
Sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 17, 2018, 3:08 p.m. UTC | #4
On Tue, 17 Jul 2018, Sudip Mukherjee wrote:

> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > Hi Alan,
> > 
> > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > 
> > > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > > shows the problem. The report is attached.
> > > > 
> > > > To my understanding:
> > > > 
> > > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > > xhci_free_endpoint_ring().
> > > 
> > > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > > freed until xhci_endpoint_disable() is called.  
> > > 
> > > On the other hand, there doesn't appear to be any 
> > > xhci_endpoint_disable() routine, although a comment refers to it.  
> > > Maybe this is the real problem?
> > 
> > one of your old mail might help :)
> > 
> > https://www.spinics.net/lists/linux-usb/msg98123.html

That message seems to say the same thing as what I just wrote, more or 
less.

> Wrote too soon.
> 
> Is it the one you are looking for -
> usb_disable_endpoint() is in drivers/usb/core/message.c

No, I'm talking about xhci_endpoint_disable(), which would be called by 
usb_hcd_disable_endpoint() if it existed.  Of course, 
usb_hcd_disable_endpoint() is called by usb_disable_endpoint().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudip Mukherjee July 17, 2018, 3:10 p.m. UTC | #5
Hi Alan, Greg,

On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:
> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > Hi Alan,
> > 
> > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > 
> > > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > > shows the problem. The report is attached.
> > > > 
> > > > To my understanding:
> > > > 
> > > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > > xhci_free_endpoint_ring().
> > > 
> > > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > > freed until xhci_endpoint_disable() is called.  
> > > 
> > > On the other hand, there doesn't appear to be any 
> > > xhci_endpoint_disable() routine, although a comment refers to it.  
> > > Maybe this is the real problem?
> > 
> > one of your old mail might help :)
> > 
> > https://www.spinics.net/lists/linux-usb/msg98123.html
> 
> Wrote too soon.
> 
> Is it the one you are looking for -
> usb_disable_endpoint() is in drivers/usb/core/message.c

I think now I understand what the problem is.
usb_set_interface() calls usb_disable_interface() which again calls
usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer
to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint().
After flushing the endpoints usb_disable_endpoint() calls
usb_hcd_disable_endpoint() which tries to do:
	if (hcd->driver->endpoint_disable)
		hcd->driver->endpoint_disable(hcd, ep);
but there is no endpoint_disable() callback in xhci, so the endpoint is
never marked as disabled. So, next time usb_hcd_flush_endpoint() is
called I get this corruption. 
And this is exactly where I used to see the problem happening.

And, my hacky patch worked as I prevented it from calling
usb_disable_interface() in this particular case.

Greg - answering your question here. My hacky patch was based on the
fact that usb_hcd_alloc_bandwidth() is calling hcd->driver->drop_endpoint()
and hcd->driver->add_endpoint() if (cur_alt && new_alt). So, I prevented
usb_disable_interface() to be called for that same condition. And that
worked as the call to usb_hcd_flush_endpoint() was not executed.
I know it is not correct and I might be having memory leaks for this, but
I have the system working till we get the actual fix.

--
Regards
Sudip

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathias Nyman July 19, 2018, 10:59 a.m. UTC | #6
On 17.07.2018 18:10, Sudip Mukherjee wrote:
> Hi Alan, Greg,
> 
> On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:
>> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
>>> Hi Alan,
>>>
>>> On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
>>>> On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
>>>>
>>>>> I did some more debugging. Tested with a KASAN enabled kernel and that
>>>>> shows the problem. The report is attached.
>>>>>
>>>>> To my understanding:
>>>>>
>>>>> btusb_work() is calling usb_set_interface() with alternate = 0. which
>>>>> again calls usb_hcd_alloc_bandwidth() and that frees the rings by
>>>>> xhci_free_endpoint_ring().
>>>>
>>>> That doesn't sound like the right thing to do.  The rings shouldn't be
>>>> freed until xhci_endpoint_disable() is called.
>>>>
>>>> On the other hand, there doesn't appear to be any
>>>> xhci_endpoint_disable() routine, although a comment refers to it.
>>>> Maybe this is the real problem?
>>>
>>> one of your old mail might help :)
>>>
>>> https://www.spinics.net/lists/linux-usb/msg98123.html
>>
>> Wrote too soon.
>>
>> Is it the one you are looking for -
>> usb_disable_endpoint() is in drivers/usb/core/message.c
> 
> I think now I understand what the problem is.
> usb_set_interface() calls usb_disable_interface() which again calls
> usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer
> to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint().
> After flushing the endpoints usb_disable_endpoint() calls
> usb_hcd_disable_endpoint() which tries to do:
> 	if (hcd->driver->endpoint_disable)
> 		hcd->driver->endpoint_disable(hcd, ep);
> but there is no endpoint_disable() callback in xhci, so the endpoint is
> never marked as disabled. So, next time usb_hcd_flush_endpoint() is
> called I get this corruption.
> And this is exactly where I used to see the problem happening.
> 
> And, my hacky patch worked as I prevented it from calling
> usb_disable_interface() in this particular case.
> 

Back for a few days, looking at this

xhci driver will set up all the endpoints for the new altsetting already in
usb_hcd_alloc_bandwidth().

New endpoints will be ready and rings running after this. I don't know the exact
history behind this, but I assume it is because xhci does all of the steps to
drop/add, disable/enable endpoints and check bandwidth in a single configure
endpoint command, that will return errors if there is not enough bandwidth.
This command is issued in hcd->driver->check_bandwidth()
This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
hcd->driver->endpoint_enable

It also means that xhci driver assumes rings are empty when
hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
If there are URBs left on a endpoint ring that was dropped+added
(freed+reallocated) then those URBs will contain pointers to freed ring,
causing issues when usb_hcd_flush_endpoint() cancels those URBs.

usb_set_interface()
   usb_hcd_alloc_bandwidth()
     hcd->driver->drop_endpoint()
     hcd->driver->add_endpoint() // allocates new rings
     hcd->driver->check_bandwidth() // issues configure endpoint command, free rings.
   usb_disable_interface(iface, true)
     usb_disable_endpoint()
       usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
       usb_hcd_disable_endpoint()
         hcd->driver->endpoint_disable()  // xhci does nothing
   usb_enable_interface(iface, true)
     usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side.

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.

But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci
has the altssetting up and running when usb core hasn't event started flushing endpoints.

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudip Mukherjee July 19, 2018, 11:34 a.m. UTC | #7
Hi Mathias,

On Thu, Jul 19, 2018 at 01:59:01PM +0300, Mathias Nyman wrote:
> On 17.07.2018 18:10, Sudip Mukherjee wrote:
> > Hi Alan, Greg,
> > 
> > On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:
> > > On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > > > Hi Alan,
> > > > 
> > > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > > > 
> > > > > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > > > > shows the problem. The report is attached.
> > > > > > 
<snip>
> > 
> > And, my hacky patch worked as I prevented it from calling
> > usb_disable_interface() in this particular case.
> > 
> 
> Back for a few days, looking at this

I hope you had a good holiday. :)

> 
> xhci driver will set up all the endpoints for the new altsetting already in
> usb_hcd_alloc_bandwidth().
> 
<snip>
> 
> As first aid I could try to implement checks that make sure the flushed URBs
> trb pointers really are on the current endpoint ring, and also add some warning
> if we are we are dropping endpoints with URBs still queued.

Yes, please. I think your first-aid will be a much better option than
the hacky patch I am using atm.

> 
> But we need to fix this properly as well.
> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci
> has the altssetting up and running when usb core hasn't event started flushing endpoints.

I am able to reproduce this on almost all cycles, so I can always test
the fix for you after you are fully back from your holiday.

--
Regards
Sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 19, 2018, 2:57 p.m. UTC | #8
On Thu, 19 Jul 2018, Mathias Nyman wrote:

> xhci driver will set up all the endpoints for the new altsetting already in
> usb_hcd_alloc_bandwidth().
> 
> New endpoints will be ready and rings running after this. I don't know the exact
> history behind this, but I assume it is because xhci does all of the steps to
> drop/add, disable/enable endpoints and check bandwidth in a single configure
> endpoint command, that will return errors if there is not enough bandwidth.

That's right; Sarah and I spent some time going over this while she was 
working on it.  But it looks like the approach isn't adequate.

> This command is issued in hcd->driver->check_bandwidth()
> This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
> hcd->driver->endpoint_enable
> 
> It also means that xhci driver assumes rings are empty when
> hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
> If there are URBs left on a endpoint ring that was dropped+added
> (freed+reallocated) then those URBs will contain pointers to freed ring,
> causing issues when usb_hcd_flush_endpoint() cancels those URBs.
> 
> usb_set_interface()
>    usb_hcd_alloc_bandwidth()
>      hcd->driver->drop_endpoint()
>      hcd->driver->add_endpoint() // allocates new rings
>      hcd->driver->check_bandwidth() // issues configure endpoint command, free rings.
>    usb_disable_interface(iface, true)
>      usb_disable_endpoint()
>        usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
>        usb_hcd_disable_endpoint()
>          hcd->driver->endpoint_disable()  // xhci does nothing
>    usb_enable_interface(iface, true)
>      usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side.
> 
> As first aid I could try to implement checks that make sure the flushed URBs
> trb pointers really are on the current endpoint ring, and also add some warning
> if we are we are dropping endpoints with URBs still queued.
> 
> But we need to fix this properly as well.
> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci
> has the altssetting up and running when usb core hasn't event started flushing endpoints.

Absolutely.  The core tries to be compatible with host controller
drivers that either allocate bandwidth as it is requested or else
allocate bandwidth all at once when an altsetting is installed.  

xhci-hcd falls into the second category.  However, this approach
requires the bandwidth verification for the new altsetting to be
performed before the old altsetting has been disabled, and the xHCI
hardware can't do this.

We may need to change the core so that the old endpoints are disabled 
before the bandwidth check is done, instead of after.  Of course, this 
leads to an awkward situation if the check fails -- we'd probably have 
to go back and re-install the old altsetting.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathias Nyman July 20, 2018, 11:46 a.m. UTC | #9
On 19.07.2018 17:57, Alan Stern wrote:
> On Thu, 19 Jul 2018, Mathias Nyman wrote:
> 
>> xhci driver will set up all the endpoints for the new altsetting already in
>> usb_hcd_alloc_bandwidth().
>>
>> New endpoints will be ready and rings running after this. I don't know the exact
>> history behind this, but I assume it is because xhci does all of the steps to
>> drop/add, disable/enable endpoints and check bandwidth in a single configure
>> endpoint command, that will return errors if there is not enough bandwidth.
> 
> That's right; Sarah and I spent some time going over this while she was
> working on it.  But it looks like the approach isn't adequate.
> 
>> This command is issued in hcd->driver->check_bandwidth()
>> This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
>> hcd->driver->endpoint_enable
>>
>> It also means that xhci driver assumes rings are empty when
>> hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
>> If there are URBs left on a endpoint ring that was dropped+added
>> (freed+reallocated) then those URBs will contain pointers to freed ring,
>> causing issues when usb_hcd_flush_endpoint() cancels those URBs.
>>
>> usb_set_interface()
>>     usb_hcd_alloc_bandwidth()
>>       hcd->driver->drop_endpoint()
>>       hcd->driver->add_endpoint() // allocates new rings
>>       hcd->driver->check_bandwidth() // issues configure endpoint command, free rings.
>>     usb_disable_interface(iface, true)
>>       usb_disable_endpoint()
>>         usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
>>         usb_hcd_disable_endpoint()
>>           hcd->driver->endpoint_disable()  // xhci does nothing
>>     usb_enable_interface(iface, true)
>>       usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side.
>>
>> As first aid I could try to implement checks that make sure the flushed URBs
>> trb pointers really are on the current endpoint ring, and also add some warning
>> if we are we are dropping endpoints with URBs still queued.
>>
>> But we need to fix this properly as well.
>> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci
>> has the altssetting up and running when usb core hasn't event started flushing endpoints.
> 
> Absolutely.  The core tries to be compatible with host controller
> drivers that either allocate bandwidth as it is requested or else
> allocate bandwidth all at once when an altsetting is installed.
> 
> xhci-hcd falls into the second category.  However, this approach
> requires the bandwidth verification for the new altsetting to be
> performed before the old altsetting has been disabled, and the xHCI
> hardware can't do this.
> 
> We may need to change the core so that the old endpoints are disabled
> before the bandwidth check is done, instead of after.  Of course, this
> leads to an awkward situation if the check fails -- we'd probably have
> to go back and re-install the old altsetting.

That would help xhci a lot.

If we want to avoid the awkward altsetting re-install after bandwidth failure
then adding a extra endpoint flush before checking the bandwidth would already help a lot.

The endpoint disabling can then be remain after bandwidth checking.
Does that work for other host controllers?

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 20, 2018, 2:09 p.m. UTC | #10
On Fri, 20 Jul 2018, Mathias Nyman wrote:

> >> But we need to fix this properly as well.
> >> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci
> >> has the altssetting up and running when usb core hasn't event started flushing endpoints.
> > 
> > Absolutely.  The core tries to be compatible with host controller
> > drivers that either allocate bandwidth as it is requested or else
> > allocate bandwidth all at once when an altsetting is installed.
> > 
> > xhci-hcd falls into the second category.  However, this approach
> > requires the bandwidth verification for the new altsetting to be
> > performed before the old altsetting has been disabled, and the xHCI
> > hardware can't do this.
> > 
> > We may need to change the core so that the old endpoints are disabled
> > before the bandwidth check is done, instead of after.  Of course, this
> > leads to an awkward situation if the check fails -- we'd probably have
> > to go back and re-install the old altsetting.
> 
> That would help xhci a lot.
> 
> If we want to avoid the awkward altsetting re-install after bandwidth failure
> then adding a extra endpoint flush before checking the bandwidth would already help a lot.
> 
> The endpoint disabling can then be remain after bandwidth checking.
> Does that work for other host controllers?

As far as I know, the other host controller drivers don't really care 
how this is done.  xHCI is the only technology where the hardware has 
to verify the bandwidth requirements.  (Maybe some other SuperSpeed 
controller design also cares, but if so then this change is unlikely to 
hurt.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From cbbe6dc59ac90a4f2c358de56e58e254320171e0 Mon Sep 17 00:00:00 2001
From: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Date: Tue, 10 Jul 2018 09:50:00 +0100
Subject: [PATCH] hacky solution to mem-corruption

Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
---
 drivers/usb/core/message.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 7cd4ec33dbf4..7fdf7a27611d 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1398,7 +1398,8 @@  int usb_set_interface(struct usb_device *dev, int interface, int alternate)
 		remove_intf_ep_devs(iface);
 		usb_remove_sysfs_intf_files(iface);
 	}
-	usb_disable_interface(dev, iface, true);
+	if (!(iface->cur_altsetting && alt))
+		usb_disable_interface(dev, iface, true);
 
 	iface->cur_altsetting = alt;
 
-- 
2.11.0