From patchwork Mon Apr 16 11:55:48 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mathias Nyman <mathias.nyman@linux.intel.com>
X-Patchwork-Id: 10342801
Return-Path: <linux-usb-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	3C06560542 for <patchwork-linux-usb@patchwork.kernel.org>;
	Mon, 16 Apr 2018 11:53:44 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24F8D28698
	for <patchwork-linux-usb@patchwork.kernel.org>;
	Mon, 16 Apr 2018 11:53:44 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 158D52869A; Mon, 16 Apr 2018 11:53:44 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI,T_TVD_MIME_EPI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3228828698
	for <patchwork-linux-usb@patchwork.kernel.org>;
	Mon, 16 Apr 2018 11:53:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754002AbeDPLxl (ORCPT
	<rfc822;patchwork-linux-usb@patchwork.kernel.org>);
	Mon, 16 Apr 2018 07:53:41 -0400
Received: from mga03.intel.com ([134.134.136.65]:29249 "EHLO mga03.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752849AbeDPLxl (ORCPT <rfc822;linux-usb@vger.kernel.org>);
	Mon, 16 Apr 2018 07:53:41 -0400
X-Amp-Result: UNSCANNABLE
X-Amp-File-Uploaded: False
Received: from orsmga008.jf.intel.com ([10.7.209.65])
	by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	16 Apr 2018 04:53:40 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.48,459,1517904000";
	d="scan'208,223";a="34006676"
Received: from mattu-haswell.fi.intel.com (HELO [10.237.72.164])
	([10.237.72.164])
	by orsmga008.jf.intel.com with ESMTP; 16 Apr 2018 04:53:39 -0700
Subject: Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host
	controller not responding, assume dead" on Dell 5855
To: russianneuromancer@ya.ru, linux-usb@vger.kernel.org
References: <dcb4bbbf2f292677ef756d4e8002d08c47500d11.camel@ya.ru>
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Message-ID: <16a67206-6dce-01f1-1074-ee5d3b7e2602@linux.intel.com>
Date: Mon, 16 Apr 2018 14:55:48 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <dcb4bbbf2f292677ef756d4e8002d08c47500d11.camel@ya.ru>
Content-Language: en-US
Sender: linux-usb-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-usb.vger.kernel.org>
X-Mailing-List: linux-usb@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On 10.04.2018 12:15, russianneuromancer@ya.ru wrote:
> Hello!
> 
> On Dell Venue 8 Pro 5855 tablet installing tlp or running "powertop --
> auto-tune" cause "xHCI host controller not responding, assume dead"
> error, when error happen two integrated USB devices (Bluetooth adapter
> and LTE modem) disappear until reboot. First time this issue was
> observer in Linux 4.13 and still present in Linux 4.16. Blacklisting
> both "Linux Foundation 3.0 root hub" from autosuspend in tlp
> configuration is workaround for this issue, however on other devices
> tlp works fine without blacklisting usb hub autosuspend, and on this
> tablet there was no such issue before (at least in Linux ~4.8-4.12
> range) so I assume there is regression somewhere.
> 
> Is there any related commits between 4.12 and 4.13 that I could try to
> revert?
> 

In 4.12 there was a added sensitivity to react to hotplug removed
xhc controllers, i.e. if we read 0xffffffff from a xhci register
we assume host is removed and start cleaning up.

commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
     xhci: Rework how we handle unresponsive or hoptlug removed hosts

You can try to revert that, but as a final solution we should
find the real rootcause

> How issue looks like in logs:
> 
> [  227.258385] xhci_hcd 0000:00:14.0: xHC is not running.
> [  329.671544] xhci_hcd 0000:00:14.0: xHC is not running.
> [  416.695796] xhci_hcd 0000:00:14.0: xHC is not running.

The "xHC is not running" is the xhci driver handing a port event
interrupt for a resuming port, but whole host controller is not running.
We stop the host controller in xhci_suspend(), and start it in xhci_resume()

Attaching a patch that improves preventing xhci host suspend during
USB2 resume signaling.
Could help, worth a shot.

> [  416.695862] xhci_hcd 0000:00:14.0: xHCI host controller not
> responding, assume dead

This means xhci_hc_died() was called, many possible places.
Adding the code below could give a hint:


> [  416.695900] xhci_hcd 0000:00:14.0: HC died; cleaning up
> [  416.696052] usb 1-3: USB disconnect, device number 2
> [  416.815610] cdc_mbim 1-3:1.12 wwp0s20u3i12: unregister 'cdc_mbim'
> usb-0000:00:14.0-3, CDC MBIM
> [  416.847934] usb 1-4: USB disconnect, device number 3
> 
> After that Bluetooth adapter and LTE modem disappear from lsusb output,
> while xHCI controller itself remain visible.

we stop the host activity in xhci_hc_died(), no usb devices under this host will work.

> Complete dmesg: https://paste.fedoraproject.org/paste/7aMpVGLfZ82zppdGs
> 56Oqg
> lsusb -v: https://paste.fedoraproject.org/paste/c7y8GisC13YdzcYE9B-JIw
> dsdt.dsl: https://paste.fedoraproject.org/paste/8g6mp2dafypUkFT4sa43iA

xhci traces and dynamic debug could help:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

-Mathias

From 090b13a6df3f489a9781223dd959e03c2f81347b Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Thu, 1 Mar 2018 18:48:32 +0200
Subject: [PATCH] xhci: prevent USB 2 roothub autosuspend during port resume
 signaling

xhci USB 2 roothub tries to autosuspended itself again immediately after
being resumed by a remote wake. This can be avoided by calling the
usb_hcd_start_port_resume() and usb_hcd_end_port_resume() implemented
especially for this purpose.

Use them, and prevent roothub autosuspend during resume signaling.

Suggested-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-hub.c  | 3 +++
 drivers/usb/host/xhci-ring.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 72ebbc9..671a336 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -905,6 +905,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 
 				set_bit(wIndex, &bus_state->resuming_ports);
 				bus_state->resume_done[wIndex] = timeout;
+				usb_hcd_start_port_resume(&hcd->self, wIndex);
 				mod_timer(&hcd->rh_timer, timeout);
 			}
 		/* Has resume been signalled for USB_RESUME_TIME yet? */
@@ -930,6 +931,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 					msecs_to_jiffies(
 						XHCI_MAX_REXIT_TIMEOUT));
 			spin_lock_irqsave(&xhci->lock, flags);
+			usb_hcd_end_port_resume(&hcd->self, wIndex);
 
 			if (time_left) {
 				slot_id = xhci_find_slot_id_by_port(hcd,
@@ -970,6 +972,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 	    (raw_port_status & PORT_PLS_MASK) != XDEV_RESUME) {
 		bus_state->resume_done[wIndex] = 0;
 		clear_bit(wIndex, &bus_state->resuming_ports);
+		usb_hcd_end_port_resume(&hcd->self, wIndex);
 	}
 
 
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index daa94c3..a1cffe9 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1666,6 +1666,8 @@ static void handle_port_status(struct xhci_hcd *xhci,
 			bus_state->resume_done[faked_port_index] = jiffies +
 				msecs_to_jiffies(USB_RESUME_TIMEOUT);
 			set_bit(faked_port_index, &bus_state->resuming_ports);
+			usb_hcd_start_port_resume(&hcd->self, faked_port_index);
+
 			/* Do the rest in GetPortStatus after resume time delay.
 			 * Avoid polling roothub status before that so that a
 			 * usb device auto-resume latency around ~40ms.
-- 
2.7.4