Message ID | 1567102335-5231-1-git-send-email-William.Kuzeja@stratus.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RESEND] xhci: Prevent deadlock when xhci adapter breaks during init | expand |
On Thu, Aug 29, 2019 at 02:12:15PM -0400, Bill Kuzeja wrote: > The system can hit a deadlock if xhci adapter breaks while initializing. > The deadlock is between two threads: thread 1 is tearing down the > adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the > hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying > to add a usb device), but is stuck in xhci_endpoint_reset waiting for a > stop or config command to complete. A reboot is required to resolve. > > It turns out when calling xhci_queue_stop_endpoint and > xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is > not checked for errors. If the timing is right and the adapter dies just > before either of these commands get issued, we hang indefinitely waiting > for a completion on a command that didn't get issued. > > This wasn't a problem before the following fix because we didn't send > commands in xhci_endpoint_reset: > > commit f5249461b504 ("xhci: Clear the host side toggle manually when endpoint is soft reset") > > With the patch I am submitting, a duration test which breaks adapters > during initialization (and which deadlocks with the standard kernel) runs > without issue. > > Fixes: f5249461b504 ("xhci: Clear the host side toggle manually when endpoint is soft reset") > Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> > --- > drivers/usb/host/xhci.c | 22 +++++++++++++++++++--- $ ./scripts/get_maintainer.pl --file drivers/usb/host/xhci.c Mathias Nyman <mathias.nyman@intel.com> (supporter:USB XHCI DRIVER) Greg Kroah-Hartman <gregkh@linuxfoundation.org> (supporter:USB SUBSYSTEM) linux-usb@vger.kernel.org (open list:USB XHCI DRIVER) linux-kernel@vger.kernel.org (open list) I think you forgot to send this to the xhci driver maintainer for review :(
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 248cd7a..835708d 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -3132,7 +3132,16 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd, xhci_free_command(xhci, cfg_cmd); goto cleanup; } - xhci_queue_stop_endpoint(xhci, stop_cmd, udev->slot_id, ep_index, 0); + + if (xhci_queue_stop_endpoint(xhci, stop_cmd, udev->slot_id, + ep_index, 0) < 0) { + spin_unlock_irqrestore(&xhci->lock, flags); + xhci_free_command(xhci, cfg_cmd); + xhci_warn(xhci, "%s: stop_cmd xhci_queue_stop_endpoint " + "returns error, exiting\n", __func__); + goto cleanup; + } + xhci_ring_cmd_db(xhci); spin_unlock_irqrestore(&xhci->lock, flags); @@ -3146,8 +3155,15 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd, ctrl_ctx, ep_flag, ep_flag); xhci_endpoint_copy(xhci, cfg_cmd->in_ctx, vdev->out_ctx, ep_index); - xhci_queue_configure_endpoint(xhci, cfg_cmd, cfg_cmd->in_ctx->dma, - udev->slot_id, false); + if (xhci_queue_configure_endpoint(xhci, cfg_cmd, cfg_cmd->in_ctx->dma, + udev->slot_id, false) < 0) { + spin_unlock_irqrestore(&xhci->lock, flags); + xhci_free_command(xhci, cfg_cmd); + xhci_warn(xhci, "%s: cfg_cmd xhci_queue_configure_endpoint " + "returns error, exiting\n", __func__); + goto cleanup; + } + xhci_ring_cmd_db(xhci); spin_unlock_irqrestore(&xhci->lock, flags);
The system can hit a deadlock if xhci adapter breaks while initializing. The deadlock is between two threads: thread 1 is tearing down the adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying to add a usb device), but is stuck in xhci_endpoint_reset waiting for a stop or config command to complete. A reboot is required to resolve. It turns out when calling xhci_queue_stop_endpoint and xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is not checked for errors. If the timing is right and the adapter dies just before either of these commands get issued, we hang indefinitely waiting for a completion on a command that didn't get issued. This wasn't a problem before the following fix because we didn't send commands in xhci_endpoint_reset: commit f5249461b504 ("xhci: Clear the host side toggle manually when endpoint is soft reset") With the patch I am submitting, a duration test which breaks adapters during initialization (and which deadlocks with the standard kernel) runs without issue. Fixes: f5249461b504 ("xhci: Clear the host side toggle manually when endpoint is soft reset") Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> --- drivers/usb/host/xhci.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)