From patchwork Wed Dec 11 22:51:39 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 3328861 Return-Path: X-Original-To: patchwork-linux-samsung-soc@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id CE1679F1F0 for ; Wed, 11 Dec 2013 22:51:45 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D75B920804 for ; Wed, 11 Dec 2013 22:51:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D089A20803 for ; Wed, 11 Dec 2013 22:51:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751510Ab3LKWvn (ORCPT ); Wed, 11 Dec 2013 17:51:43 -0500 Received: from mail-qe0-f52.google.com ([209.85.128.52]:44043 "EHLO mail-qe0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751000Ab3LKWvl (ORCPT ); Wed, 11 Dec 2013 17:51:41 -0500 Received: by mail-qe0-f52.google.com with SMTP id ne12so5878221qeb.39 for ; Wed, 11 Dec 2013 14:51:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=4Jsq9u1/MnNJC1DAAPqLMD/c7hGJsdz8aYrGyVRQjLQ=; b=Pxed5ShbGJUHnnPyMrqJmoSCAKWGCrWE80+uKmJ2/q0MVsRg6I+ILp1wUKQ66nj4A+ azO3Ts/z9R/vYQ9xAPVWd+tVY0tU+5gAVsVdInISYTzL1TfKxpFFCIItt7PaoeVeeoGB 8KFbvOpL65vE4nCQxfAKDtTpfqkAjI1hc3cUOjAy7sG1PepAkdUUw7qzsqJ2Tm96xaET 9lZiEkKB3P76G48fUyrTWUMxt80Z7n60R4P5UaflqJHt/AIKeyh1PTUKyp0x3u2q/Rbl uypRcp9sN/gEluVuI3F7y9BvkU+owvvqhx5lyT7fEz4mju59FyDbRE2dryCeOymLDO3Z +UuQ== MIME-Version: 1.0 X-Received: by 10.224.61.198 with SMTP id u6mr33938qah.41.1386802299498; Wed, 11 Dec 2013 14:51:39 -0800 (PST) Received: by 10.140.88.112 with HTTP; Wed, 11 Dec 2013 14:51:39 -0800 (PST) In-Reply-To: <20131211193647.GA25483@xanatos> References: <20131211193647.GA25483@xanatos> Date: Wed, 11 Dec 2013 14:51:39 -0800 Message-ID: Subject: Re: [PATCH] USB: core: Add warm reset while reset-resuming SuperSpeed HUBs From: Dan Williams To: Sarah Sharp Cc: Julius Werner , Alan Stern , Vikas Sajjan , Vikas Sajjan , linux-samsung-soc , Kukjin Kim , LKML , "linux-usb@vger.kernel.org" , Vincent Palatin , Lan Tianyu , Ksenia Ragiadakou , Greg Kroah-Hartman , Vivek Gautam , Douglas Anderson , Felipe Balbi , sunil joshi Sender: linux-samsung-soc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-samsung-soc@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Dec 11, 2013 at 11:36 AM, Sarah Sharp wrote: > On Wed, Dec 11, 2013 at 11:00:13AM -0800, Julius Werner wrote: >> > I don't know what you mean by "fails". The system goes to sleep and >> > then later on wakes up, doesn't it? >> > >> > Do you mean that the Jetflash device gets disconnected when the system >> > wakes up? That's _supposed_ to happen under those circumstances. >> > When hub_activate() sees HUB_RESET_RESUME, all child devices get >> > disconnected except those where udev->persist_enabled is set. >> >> This patch was written in response to the same bug as my "usb: hub: >> Use correct reset for wedged USB3 devices that are NOTATTACHED" >> submission. My patch only helps when the port gets stuck in Compliance >> Mode, but Vikas reports that he can sometimes see it stuck in Polling >> or Recovery states as well. >> >> The underlying issue is a deadlock in the USB 3.0 link training state >> machine when the host controller is unilaterally reset on resume >> (without driving a reset on the bus). > > The xHCI spec requires that when the xHCI host is reset, a USB reset is > driven down the USB 3.0 ports. If hot reset fails, the port may migrate > to warm reset. See table 32 in the xHCI spec, in the definition of > HCRST. It sounds like this host doesn't drive a USB reset down USB 3.0 > ports at all on host controller reset? ...although, the spec says that it does not wait for the port resets to complete. As far as I can see re-issuing a warm reset and waiting is the only way to guarantee the core times the recovery. Presumably the portstatus debounce in hub_activate() mitigates this, but that 100ms is less than a full reset timeout. I have something similar in the port power rework patches [1], but I think something like the following (untested) is more generic, it arranges for reset_resume to start with a warm reset if necessary. (also attached) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index a7c04e24ca48..30ce237569dd 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2783,8 +2783,14 @@ static int check_port_resume_type(struct usb_device *udev, struct usb_hub *hub, int port1, int status, unsigned portchange, unsigned portstatus) { + /* Did the port go SS.Inactive? Even if ->persist_enabled is cleared the + * device won't come back until a warm reset completes + */ + if (hub_port_warm_reset_required(hub, portstatus)) { + udev->reset_resume = 1; + udev->reset_resume_warm = 1; /* Is the device still present? */ - if (status || port_is_suspended(hub, portstatus) || + } else if (status || port_is_suspended(hub, portstatus) || !port_is_power_on(hub, portstatus) || !(portstatus & USB_PORT_STAT_CONNECTION)) { if (status >= 0) @@ -4022,7 +4028,8 @@ hub_port_init (struct usb_hub *hub, struct usb_device *udev, int port1, /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ - retval = hub_port_reset(hub, port1, udev, delay, false); + retval = hub_port_reset(hub, port1, udev, delay, + udev->reset_resume_warm); if (retval < 0) /* error or disconnect */ goto fail; /* success, speed is known */ @@ -4730,7 +4737,8 @@ static void hub_events(void) /* deal with port status changes */ for (i = 1; i <= hdev->maxchild; i++) { - if (test_bit(i, hub->busy_bits)) + if (test_bit(i, hub->busy_bits) || + test_bit(i, hub->delayed_change_bits)) continue; connect_change = test_bit(i, hub->change_bits); wakeup_change = test_and_clear_bit(i, hub->wakeup_bits); diff --git a/include/linux/usb.h b/include/linux/usb.h index 7454865ad148..ff1b6fe4a0ff 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -572,6 +572,7 @@ struct usb_device { unsigned do_remote_wakeup:1; unsigned reset_resume:1; + unsigned reset_resume_warm:1; unsigned port_is_suspended:1; #endif struct wusb_dev *wusb_dev; > >> The host port starts out back in >> Rx.Detect without remembering anything about its previous state, but >> the device is still in U3. The host detects Rx terminations, moves to >> Polling and starts sending LFPS link training packets, but the device >> doesn't expect those and interprets them as link problems (moving to >> Recovery). What happens next seems to be device specific, but >> apparently the device can end up in SS.Inactive while the host port >> gets stuck in Polling or Recovery (or some kind of livelock between >> those). In testing the port power patches I see this particular device give up on its superspeed connection if it sees too many link failures and fallsback to connecting via its high speed connection. >> This patch tries to warm reset all USB 3.0 ports on reset-resume >> (after xhci_reset() was called) that had devices connected to them >> before suspend. This seems to be the only way to ensure the devices' >> state machines get back to a well-defined state that the host can work >> with. I don't think this is a specific hardware bug, it's just an >> unfortunate design flaw that the USB 3.0 spec doesn't account for a >> root hub port being reset independently of its connected device. I >> think Sarah is correct that it could be limited to root hubs, though. > I think we also need something like the following to hold off khubd while further resume actions may be taking place. Again we're mitigated by the debounce delay in most cases, but if a warm reset recovery is going to take longer than that khubd needs to be held off for the given port. diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 30ce237569dd..a99e3eb28aa5 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1173,22 +1173,25 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) port_resumed)) set_bit(port1, hub->change_bits); - } else if (udev->persist_enabled) { + } else { struct usb_port *port_dev = hub->ports[port1 - 1]; + if (udev->persist_enabled) { #ifdef CONFIG_PM - udev->reset_resume = 1; + udev->reset_resume = 1; #endif - /* Don't set the change_bits when the device - * was powered off. - */ - if (port_dev->power_is_on) - set_bit(port1, hub->change_bits); + /* Don't set the change_bits when the device + * was powered off. + */ + if (port_dev->power_is_on) + set_bit(port1, hub->change_bits); + } - } else { - /* The power session is gone; tell khubd */ - usb_set_device_state(udev, USB_STATE_NOTATTACHED); - set_bit(port1, hub->change_bits); + /* The power session may be gone; wait for port + * recovery before letting khubd take action on + * this port + */ + set_bit(port1, hub->delayed_change_bits); } } @@ -3285,6 +3288,11 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) usb_unlocked_enable_lpm(udev); } + if (test_and_clear_bit(port1, hub->delayed_change_bits)) { + set_bit(port1, hub->change_bits); + kick_khubd(hub); + } + return status; } diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h index 4e4790dea343..74e87c0380f8 100644 --- a/drivers/usb/core/hub.h +++ b/drivers/usb/core/hub.h @@ -45,8 +45,9 @@ struct usb_hub { unsigned long event_bits[1]; /* status change bitmask */ unsigned long change_bits[1]; /* ports with logical connect status change */ - unsigned long busy_bits[1]; /* ports being reset or - resumed */ + unsigned long busy_bits[1]; /* ports being reset */ + unsigned long delayed_change_bits[1]; /* ports awaiting + recovery */ unsigned long removed_bits[1]; /* ports with a "removed" device present */ unsigned long wakeup_bits[1]; /* ports that have signaled