From patchwork Thu Feb 14 17:00:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 10813293 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BEE513A4 for ; Thu, 14 Feb 2019 17:00:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 571432EBB9 for ; Thu, 14 Feb 2019 17:00:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B0D22EBD1; Thu, 14 Feb 2019 17:00:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3905F2EBB9 for ; Thu, 14 Feb 2019 17:00:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726358AbfBNRAd (ORCPT ); Thu, 14 Feb 2019 12:00:33 -0500 Received: from ale.deltatee.com ([207.54.116.67]:36462 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727997AbfBNRAd (ORCPT ); Thu, 14 Feb 2019 12:00:33 -0500 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1guKNC-0004Pd-Uf; Thu, 14 Feb 2019 10:00:32 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1guKNC-0007G7-Fl; Thu, 14 Feb 2019 10:00:30 -0700 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Bjorn Helgaas Cc: Kit Chow , Logan Gunthorpe , Yinghai Lu Date: Thu, 14 Feb 2019 10:00:27 -0700 Message-Id: <20190214170028.27862-1-logang@deltatee.com> X-Mailer: git-send-email 2.19.0 MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, bhelgaas@google.com, kchow@gigaio.com, logang@deltatee.com, yinghai@kernel.org X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When using the pci=realloc command line argument, with hpmemsize not equal to zero, some hierarchies of 32-bit resources can fail to be assigned in some situations. When this happens, the user will see some PCI BAR resources being ignored and some PCI Bridge windows being left unset. In lspci this may look like: Memory behind bridge: fff00000-000fffff or Region 0: Memory at (32-bit, non-prefetchable) [size=256K] Ignored BARs mean the underlying device will not be usable. The possible situations where this can happen will be quite varied and depend highly on the exact hierarchy and how the realloc code ends up trying to assign the regions. It's known to at least require a large 64-bit BAR (>1GB) below a PCI bridge. The cause of this bug is in __pci_bus_size_bridges() which tries to calculate the total resource space required for each of the bridge windows (typically IO, 64-bit, and 32-bit / non-prefetchable). The code, as written, tries to allocate all the 64-bit prefetchable resources followed by all the remaining resources. It uses two calls to pbus_size_mem() for this. If the first call to pbus_size_mem() fails it tries to fit all resources into the 32-bit bridge window and it expects the size of the 32-bit bridge window to be multiple GBs which will never be assignable under the 4GB limit imposed on it. There are only two reasons for pbus_size_mem() to fail: if there is no 64-bit/prefetchable bridge window, or if that window is already assigned (in other words, its resource already has a parent set). We know the former case can't be true because, in __pci_bus_size_bridges(), it's existence is checked before making the call. So if the pbus_size_mem() call in question fails, the window must already be assigned, and in this case, we still do not want 64-bit resources trying to be sized into the 32-bit catch-all resource. So to fix the bug, we must always set mask, type2 and type3 in cases where a 64-bit resource exists even if pbus_size_mem() fails. Reported-by: Kit Chow Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Logan Gunthorpe Cc: Bjorn Helgaas Cc: Yinghai Lu --- drivers/pci/setup-bus.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index ed960436df5e..56b7077f37ff 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1265,21 +1265,20 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head) prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH; if (b_res[2].flags & IORESOURCE_MEM_64) { prefmask |= IORESOURCE_MEM_64; - ret = pbus_size_mem(bus, prefmask, prefmask, + pbus_size_mem(bus, prefmask, prefmask, prefmask, prefmask, realloc_head ? 0 : additional_mem_size, additional_mem_size, realloc_head); /* - * If successful, all non-prefetchable resources - * and any 32-bit prefetchable resources will go in - * the non-prefetchable window. + * Given the existence of a 64-bit resource for this + * bus, all non-prefetchable resources and any 32-bit + * prefetchable resources will go in the + * non-prefetchable window. */ - if (ret == 0) { - mask = prefmask; - type2 = prefmask & ~IORESOURCE_MEM_64; - type3 = prefmask & ~IORESOURCE_PREFETCH; - } + mask = prefmask; + type2 = prefmask & ~IORESOURCE_MEM_64; + type3 = prefmask & ~IORESOURCE_PREFETCH; } /* From patchwork Thu Feb 14 17:00:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 10813295 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AEC413A4 for ; Thu, 14 Feb 2019 17:00:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86CCE2EBB9 for ; Thu, 14 Feb 2019 17:00:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7AE432EBD1; Thu, 14 Feb 2019 17:00:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B4212EBB9 for ; Thu, 14 Feb 2019 17:00:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394308AbfBNRAe (ORCPT ); Thu, 14 Feb 2019 12:00:34 -0500 Received: from ale.deltatee.com ([207.54.116.67]:36466 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394094AbfBNRAd (ORCPT ); Thu, 14 Feb 2019 12:00:33 -0500 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1guKNC-0004Pe-Uf; Thu, 14 Feb 2019 10:00:33 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1guKNC-0007G9-Iy; Thu, 14 Feb 2019 10:00:30 -0700 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Bjorn Helgaas Cc: Kit Chow , Logan Gunthorpe , Yinghai Lu Date: Thu, 14 Feb 2019 10:00:28 -0700 Message-Id: <20190214170028.27862-2-logang@deltatee.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20190214170028.27862-1-logang@deltatee.com> References: <20190214170028.27862-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, bhelgaas@google.com, kchow@gigaio.com, logang@deltatee.com, yinghai@kernel.org X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP One odd quirk of PLX switches is that their upstream bridge port has 256K of space allocated behind its BAR0 (most other bridge implementations do not report any BAR space). The lspci for such device looks like: 04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca) (prog-if 00 [Normal decode]) Physical Slot: 1 Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0 Memory at 90a00000 (32-bit, non-prefetchable) [size=256K] Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: 90000000-909fffff Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff Kernel driver in use: pcieport It's not clear what the purpose of the memory at 0x90a00000 is, and currently the kernel never actually uses it for anything. In most cases, it's safely ignored and does not cause a problem. However, when the kernel assigns the resource addresses (with the pci=realloc command line parameter, for example) it can inadvertently disable the struct resource corresponding to the bar. When this happens, lspci will report this memory as ignored: Region 0: Memory at (32-bit, non-prefetchable) [size=256K] This is because the kernel reports a zero start address and zero flags in the corresponding sysfs resource file and in /proc/bus/pci/devices. Investigation with 'lspci -x', however shows the bios-assigned address will still be programmed in the device's BAR registers. In many cases, this still isn't a problem. Nothing uses the memory, so nothing is affected. However, a big problem shows up when an IOMMU is in use: the IOMMU will not reserve this space in the IOVA because the kernel no longer thinks the range is valid. (See dmar_init_reserved_ranges() for the Intel implementation of this.) Without the proper reserved range, we have a situation where a DMA mapping may occasionally allocate an IOVA which the PCI bus will actually route to a BAR in the PLX switch. This will result in some random DMA writes not actually writing to the RAM they are supposed to, or random DMA reads returning all FFs from the PLX BAR when it's supposed to have read from RAM. The problem is caused in pci_assign_unassigned_root_bus_resources(). When any resource from a bridge device fails to get assigned, the code sets the resource's flags to zero. This makes sense for bridge resources, as they will be re-enabled later, but for regular BARs, it disables them permanently. To fix the problem, we only set the flags to zero for bridge resources and treat any other resources like non-bridge devices. Reported-by: Kit Chow Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)") Signed-off-by: Logan Gunthorpe Cc: Bjorn Helgaas Cc: Yinghai Lu --- drivers/pci/setup-bus.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 56b7077f37ff..3695edd9c256 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1821,11 +1821,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus) /* restore size and flags */ list_for_each_entry(fail_res, &fail_head, list) { struct resource *res = fail_res->res; + int idx; res->start = fail_res->start; res->end = fail_res->end; res->flags = fail_res->flags; - if (fail_res->dev->subordinate) + + idx = res - &fail_res->dev->resource[0]; + if (fail_res->dev->subordinate && + idx >= PCI_BRIDGE_RESOURCES && + idx <= PCI_BRIDGE_RESOURCE_END) res->flags = 0; } free_list(&fail_head);