diff mbox series

[1/1] usb: xhci: Fix NULL pointer dereference on certain command aborts

Message ID 20241203205249.513f1153@foxbook (mailing list archive)
State New
Headers show
Series usb: xhci: Fix NULL pointer dereference on certain command aborts | expand

Commit Message

Michal Pecio Dec. 3, 2024, 7:52 p.m. UTC
xhci_handle_stopped_cmd_ring() attempts to restart the command ring if
there are pending TRBs on it, which it detects by comparing the enqueue
and dequeue pointers for equality. It assumes that pending TRBs imply a
pending command, and blindly dereferences cur_cmd. This can be wrong.

If a command is queued to the final usable TRB of a ring segment, the
enqueue pointer is advanced to the subsequent link TRB and no further.
If the command is later aborted, when the abort completion is handled
the dequeue pointer is advanced to the first TRB of the next segment.

If no further commands are queued, the pointers stay this way and then
xhci_handle_stopped_cmd_ring() is called by xhci_abort_cmd_ring() with
NULL cur_cmd, which triggers cur_cmd dereference as described above.

Fix this by omitting timer setup if cur_cmd is NULL. Leave the rest
unchanged, including ringing the doorbell each time the ring pointers
aren't equal. An unnecessary doorbell ring should be harmless.

This is probably Bug 219532, but no confirmation has been received.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532
Fixes: c311e391a7ef ("xhci: rework command timeout and cancellation,")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michal Pecio Dec. 4, 2024, 5:53 p.m. UTC | #1
I confirmed that the bug is real and behaves exactly as expected, using
a USB microcontroller programmed to NAK the status stage of SET_ADDRESS
requests forever and to reconnect if the host gives up enumerating it.

Command timeout was reduced to 500ms to sooner reach the segment's end
and some relevant debug info was added, hopefully self-explanatory:

[  +0,378926] usb 10-1: new full-speed USB device number 109 using xhci_hcd
[  +0,501006] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671bff0 deq ffff88814671b000
[  +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[  +0,000005] xhci_hcd 0000:03:00.0: !!! avoiding dereferencing a NULL pointer !!!
[  +0,712001] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671b010 deq ffff88814671b010
[  +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[  +0,207981] usb 10-1: device not accepting address 109, error -62

The driver and host controller continue working normally after one hour
of testing and several avoided crashes.

The only thing I haven't tried is actually crashing the kernel, but
considering what's inside xhci_mod_cmd_timer() I think it's obvious
that this is exactly what would happen next without this patch.

Regards,
Michal
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4cf5363875c7..da26e317ab0c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -422,7 +422,8 @@  static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 	if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
 	    !(xhci->xhc_state & XHCI_STATE_DYING)) {
 		xhci->current_cmd = cur_cmd;
-		xhci_mod_cmd_timer(xhci);
+		if (cur_cmd)
+			xhci_mod_cmd_timer(xhci);
 		xhci_ring_cmd_db(xhci);
 	}
 }