From patchwork Thu Oct 6 09:34:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Serge Semin X-Patchwork-Id: 9364423 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 195AD607D3 for ; Thu, 6 Oct 2016 09:35:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0959028E8A for ; Thu, 6 Oct 2016 09:35:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F216628E8D; Thu, 6 Oct 2016 09:35:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71E5728E8A for ; Thu, 6 Oct 2016 09:35:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965187AbcJFJeo (ORCPT ); Thu, 6 Oct 2016 05:34:44 -0400 Received: from mail-lf0-f68.google.com ([209.85.215.68]:36174 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965171AbcJFJel (ORCPT ); Thu, 6 Oct 2016 05:34:41 -0400 Received: by mail-lf0-f68.google.com with SMTP id b75so722044lfg.3; Thu, 06 Oct 2016 02:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Kj2BNpBjC5vtW/nEQ3w22Fsgw5ihMRg+FbzeKercybM=; b=HaUqAk4KKgJQW+YDdlDrUrU5V6befqKLGrxqwhXsPKWt59V8yzMLU6mV0lWtyLocwx zY+kve2td0NkUx9+0u4mHUhgv/4dmk2wqWJ6FX2ZjRvLpDaunIGpA8FYyP/o+9zB/x3y E+k3bHmxRbGnmrAj2D0bLuoI7S0/sFrD9OmmxfzXfjUyAHaP4lGgxEuZ16aQWX1P1AEE jdNYrTLkdSB7zYmJ43OuVPW/oUisiNSgiil281fHQWBPwD7qCYVxGqbR/QyFG3x0Qqw+ mmPj1ACBuLY8Yp3scGhv65jyfY2a408nrmg1A8MtzjdCPNwQi8lFZ+MODdqbUtpV0IMT dJpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Kj2BNpBjC5vtW/nEQ3w22Fsgw5ihMRg+FbzeKercybM=; b=Oqvf1kXEVCl75aT5k8whVwJIiDxECjcQfysUnlsP5ZsFCGW/F8X7RMCoMUhR+fjJ1W MFgJqGGLorO5+lolaQ/43POfVus4DDX9Jz2SOndA+P3U8rgRkNU+lO5lI2TfCjRCABp1 +arejmsTcbpckPvFkXtOmW0PD1CVpjNqi6Z1vYXRf+edSmDY1TU7rvuC4MvaiSeHbZyz /hsCt/2qZkxAzuYKEX7s9PH+YmJ1pEHAU0czgT4742guGKYIVhVKiY/tZXwKLHE9Uj7C DD8NmOyII9gd7u282MBLJ/sgs2II60dXye+D8xCm8AUFv97gGwmcR3rzzslaMpfH0pBB 8yOA== X-Gm-Message-State: AA6/9RnKC4qVpWqVWtN1FZTbIbKBcwmG30FSuRhTVtEFg+u/fq0Itl27Zx4khvFIJvZdRA== X-Received: by 10.25.74.143 with SMTP id x137mr5288761lfa.10.1475746478406; Thu, 06 Oct 2016 02:34:38 -0700 (PDT) Received: from linux.local ([5.164.208.48]) by smtp.gmail.com with ESMTPSA id g63sm213877lji.45.2016.10.06.02.34.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 06 Oct 2016 02:34:37 -0700 (PDT) From: Serge Semin To: bhelgaas@google.com Cc: shawn.lin@rock-chips.com, luto@kernel.org, Sergey.Semin@t-platforms.ru, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Serge Semin Subject: [RFC] PCI: Fix kernel panic of root-port-less PCIe enum due to ASPM Date: Thu, 6 Oct 2016 12:34:15 +0300 Message-Id: <1475746455-20665-1-git-send-email-fancer.lancer@gmail.com> X-Mailer: git-send-email 2.6.6 MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hello linux folks, Sometime ago I discovered a kernel panic popping up when PCI subsystem was trying to enumerate PCI express bus with ASPM service enabled. Here it is: [ 5.089667] CPU 0 Unable to handle kernel paging request at virtual address 00000060, epc == 80317004, ra == 80316ac8 [ 5.120952] Oops[#1]: ... [ 5.528438] Call Trace: [ 5.535640] [<80317004>] pcie_aspm_init_link_state+0x6c0/0x814 [ 5.552843] [<80300c44>] pci_scan_slot+0x140/0x148 [ 5.566957] [<80301dcc>] pci_scan_child_bus+0x50/0x1b0 [ 5.582096] [<80301944>] pci_scan_bridge+0x25c/0x694 [ 5.596724] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0 [ 5.611862] [<80301944>] pci_scan_bridge+0x25c/0x694 [ 5.626488] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0 [ 5.641628] [<8030215c>] pci_scan_root_bus+0x64/0x124 [ 5.656528] [<804ca298>] pcibios_scanbus+0xa8/0x188 I more than sure you are familiar with the issue, since I've found the mailing discussion: "PCI: avoid NULL deref in alloc_pcie_link_state" https://patchwork.kernel.org/patch/2751651/ https://bugzilla.kernel.org/show_bug.cgi?id=60111 You closed the bugzilla ticket with the next statement: "I'm closing this as invalid because the simulated machine where the problem occurs has an invalid PCIe topology (an Upstream Port with no Downstream Port or Root Port above it). As far as I know, there is no valid topology, e.g., a real hardware machine in the field, that would cause this failure." I'm strongly disagree with it, since I've got at least two hardware with PCIe-bus hierarchy as described in the mailing list. One of them is based on Cavium Octeon III CN7020. Here is a ASCII-diagram of PCIe-bus: -+-[0000:01]---00.0-[02-06]--+-02.0-[03-05]--+-00.0-[04-05]----00.0-[05]-- | | \-00.1 Device [111d:808f] | \-04.0-[06]----00.0 Device [126f:0750] \-[0000:00]- where 01:00.0 is an Upstream port of IDT PCIe-swtich. / # /usr/local/sbin/lspci -v -s 01:00.0 01:00.0 Class 0604: Device 111d:8061 Flags: bus master, fast devsel, latency 0 Memory at (32-bit, non-prefetchable) [size=2] Memory at (32-bit, non-prefetchable) [size=2] Bus: primary=01, secondary=02, subordinate=06, sec-latency=0 Memory behind bridge: 08000000-0dffffff Expansion ROM at [disabled] [size=2] Capabilities: [40] Express Upstream Port, MSI 00 Capabilities: [c0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [200] Virtual Channel Kernel driver in use: pcieport As you can see PCI-bus hierarchy doesn't have root port and the very first upstream port is directly connected to Host-PCIe bridge of MCU, which of course is not listed by the lspci utility. Despite of Radim Kr?má?, who suggested a fix, which would de-facto just turned ASPM off, I found a quick solution, which disabled ASPM only in the first link (Host-PCIe=>Upstream port) of PCIe-bus for such hierarchy. ASPM for other PCIe-bus topologies shall work the way it was. I hope the fix will be helpful. Thanks, ============================= Serge V. Semin Leading Programmer Embedded SW development group T-platforms ============================= Signed-off-by: Serge Semin --- drivers/pci/pcie/aspm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 0ec649d..a9295f29 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -522,7 +522,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev) INIT_LIST_HEAD(&link->children); INIT_LIST_HEAD(&link->link); link->pdev = pdev; - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) { + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && + (!pci_is_root_bus(pdev->bus->parent))) { struct pcie_link_state *parent; parent = pdev->bus->parent->self->link_state; if (!parent) {