From patchwork Tue Nov 22 13:13:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriele Paoloni X-Patchwork-Id: 9441047 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 067FE6075D for ; Tue, 22 Nov 2016 13:14:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA359284B6 for ; Tue, 22 Nov 2016 13:14:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF33428529; Tue, 22 Nov 2016 13:14:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F8BE28527 for ; Tue, 22 Nov 2016 13:14:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754507AbcKVNOK convert rfc822-to-8bit (ORCPT ); Tue, 22 Nov 2016 08:14:10 -0500 Received: from lhrrgout.huawei.com ([194.213.3.17]:4294 "EHLO lhrrgout.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753651AbcKVNOJ (ORCPT ); Tue, 22 Nov 2016 08:14:09 -0500 Received: from 172.18.7.190 (EHLO lhreml708-cah.china.huawei.com) ([172.18.7.190]) by lhrrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CVT30649; Tue, 22 Nov 2016 13:13:53 +0000 (GMT) Received: from LHREML507-MBX.china.huawei.com ([10.201.4.190]) by lhreml708-cah.china.huawei.com ([10.201.5.202]) with mapi id 14.03.0235.001; Tue, 22 Nov 2016 13:13:50 +0000 From: Gabriele Paoloni To: Bjorn Helgaas CC: Bjorn Helgaas , "linux-pci@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linaro-acpi@lists.linaro.org" Subject: RE: [PATCH] PCI: Add information about describing PCI in ACPI Thread-Topic: [PATCH] PCI: Add information about describing PCI in ACPI Thread-Index: AQHSQP1qztMZQ+naFEazAud1dK6ECqDe+nYQgAANDwCABB29MIAAhoaAgAAFfuCAADM5gIABCe8A Date: Tue, 22 Nov 2016 13:13:50 +0000 Message-ID: References: <20161117175938.17465.45820.stgit@bhelgaas-glaptop.roam.corp.google.com> <20161118175404.GI25762@bhelgaas-glaptop.roam.corp.google.com> <20161121164700.GL25762@bhelgaas-glaptop.roam.corp.google.com> <20161121201001.GA4832@bhelgaas-glaptop.roam.corp.google.com> In-Reply-To: <20161121201001.GA4832@bhelgaas-glaptop.roam.corp.google.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.203.181.151] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.58344492.0245, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 989eba8c9bbc4b3cec054c8355bd811f Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Bjorn > -----Original Message----- > From: Bjorn Helgaas [mailto:helgaas@kernel.org] > Sent: 21 November 2016 20:10 > To: Gabriele Paoloni > Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- > acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- > kernel@lists.infradead.org; linaro-acpi@lists.linaro.org > Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI > > On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote: > > Hi Bjorn > > > > > -----Original Message----- > > > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- > > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > > > Sent: 21 November 2016 16:47 > > > To: Gabriele Paoloni > > > Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- > > > acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- > > > kernel@lists.infradead.org; linaro-acpi@lists.linaro.org > > > Subject: Re: [PATCH] PCI: Add information about describing PCI in > ACPI > > > > > > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote: > > > > Hi Bjorn > > > > > > > > > -----Original Message----- > > > > > From: Bjorn Helgaas [mailto:helgaas@kernel.org] > > > > > Sent: 18 November 2016 17:54 > > > > > To: Gabriele Paoloni > > > > > Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- > > > > > acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- > > > > > kernel@lists.infradead.org; linaro-acpi@lists.linaro.org > > > > > Subject: Re: [PATCH] PCI: Add information about describing PCI > in > > > ACPI > > > > > > > > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni > wrote: > > > > > > > -----Original Message----- > > > > > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux- > kernel- > > > > > > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > > > > > > > Sent: 17 November 2016 18:00 > > > > > > > > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not* > > > mechanisms > > > > > for > > > > > > > +reserving address space! The static tables are for things > the > > > OS > > > > > > > +needs to know early in boot, before it can parse the ACPI > > > > > namespace. > > > > > > > +If a new table is defined, an old OS needs to operate > > > correctly > > > > > even > > > > > > > +though it ignores the table. _CRS allows that because it > is > > > > > generic > > > > > > > +and understood by the old OS; a static table does not. > > > > > > > > > > > > Right so if my understanding is correct you are saying that > > > resources > > > > > > described in the MCFG table should also be declared in > PNP0C02 > > > > > devices > > > > > > so that the PNP driver can reserve these resources. > > > > > > > > > > Yes. > > > > > > > > > > > On the other side the PCI Root bridge driver should not > reserve > > > such > > > > > > resources. > > > > > > > > > > > > Well if my understanding is correct I think we have a problem > > > here: > > > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74 > > > > > > > > > > > > As you can see pci_ecam_create() will conflict with the pnp > > > driver > > > > > > as it will try to reserve the resources from the MCFG > table... > > > > > > > > > > > > Maybe we need to rework pci_ecam_create() ? > > > > > > > > > > I think it's OK as it is. > > > > > > > > > > The pnp/system.c driver does try to reserve PNP0C02 resources, > and > > > it > > > > > marks them as "not busy". That way they appear in /proc/iomem > and > > > > > won't be allocated for anything else, but they can still be > > > requested > > > > > by drivers, e.g., pci/ecam.c, which will mark them "busy". > > > > > > > > > > This is analogous to what the PCI core does in > > > pci_claim_resource(). > > > > > This is really a function of the ACPI/PNP *core*, which should > > > reserve > > > > > all _CRS resources for all devices (not just PNP0C02 devices). > But > > > > > it's done by pnp/system.c, and only for PNP0C02, because > there's a > > > > > bunch of historical baggage there. > > > > > > > > > > You'll also notice that in this case, things are out of order: > > > > > logically the pnp/system.c reservation should happen first, but > in > > > > > fact the pci/ecam.c request happens *before* the pnp/system.c > one. > > > > > That means the pnp/system.c one might fail and complain "[mem > ...] > > > > > could not be reserved". > > > > > > > > Correct me if I am wrong... > > > > > > > > So currently we are relying on the fact that pci_ecam_create() is > > > called > > > > before the pnp driver. > > > > If the pnp driver came first we would end up in pci_ecam_create() > > > failing > > > > here: > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76 > > > > > > > > I am not sure but it seems to me like a bit weak condition to > rely > > > on... > > > > what about removing the error condition in pci_ecam_create() and > > > logging > > > > just a dev_info()? > > > > > > Huh. I'm confused. I *thought* it would be safe to reverse the > > > order, which would effectively be this: > > > > > > system_pnp_probe > > > reserve_resources_of_dev > > > reserve_range > > > request_mem_region([mem 0xb0000000-0xb1ffffff]) > > > ... > > > pci_ecam_create > > > request_resource_conflict([mem 0xb0000000-0xb1ffffff]) > > > > > > > > > but I experimented with the patch below on qemu, and it failed as > you > > > predicted: > > > > > > ** res test ** > > > requested [mem 0xa0000000-0xafffffff] > > > can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with > ECAM > > > PNP [mem 0xa0000000-0xafffffff] > > > > > > I expected the request_resource_conflict() to succeed since it's > > > completely contained in the "ECAM PNP" region. But I guess I don't > > > understand kernel/resource.c well enough. > > > > I think it fails because effectively the PNP driver is populating the > > iomem_resource resource tree and therefore pci_ecam_create() finds > that > > it cannot add the cfg resource to the same hierarchy as it is already > > there... > > Right. I'm just surprised because the PNP reservation is marked > "not busy", and a driver (e.g., ECAM) should still be able to request > the resource. Yes unfortunately pci_ecam_create() is not flexible on the conflict as pci_request_regions(): http://lxr.free-electrons.com/source/kernel/resource.c#L1155 if the conflict resource is not busy pci_request_regions() will create a child resource under the conflict sibling and mark it as busy... or at least this is my understanding... > > > > I'm not sure we need to fix anything yet, since we currently do the > > > ecam.c request before the system.c one, and any change there would > be > > > a long ways off. If/when that *does* change, I think the correct > fix > > > would be to change ecam.c so its request succeeds (by changing the > way > > > it does the request, fixing kernel/resource.c, or whatever) rather > > > than to reduce the log level and ignore the failure. > > > > Well in my mind I didn't want just to make the error disappear... > > If all the resources should be reserved by the PNP driver then > ideally > > we could take away request_resource_conflict() from > pci_ecam_create(), > > but this would make buggy some systems with an already shipped BIOS > > that relied on pci_ecam_create() reservation rather than PNP > reservation. > > I don't want remove the request from ecam.c. Ideally, there should be > TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or > whatever it is, and a second from ecam.c. The first is the generic > one saying "this region is consumed by a piece of hardware, so don't > put anything else here." The second is the driver-specific one saying > "PCI ECAM owns this region, nobody else can use it." > > This is the same way we handle PCI BAR resources. Here are two > examples from my laptop. The first (00:08.0) only has one line: > it has a BAR that consumes address space, but I don't have a driver > for it loaded. The second (00:16.0) does have a driver loaded, so it > has a second line showing that the driver owns the space: > > f124a000-f124afff : 0000:00:08.0 # from PCI core > > f124d000-f124dfff : 0000:00:16.0 # from PCI core > f124d000-f124dfff : mei_me # from mei_me driver > > > Just removing the error condition and converting dev_err() into > > dev_info() seems to me like accommodating already shipped BIOS images > > and flagging a reservation that is already done by somebody else > > without compromising the functionality of the PCI Root bridge driver > > (so far the only reason why I can see the error condition there is > > to catch a buggy MCFG with overlapping addresses; so if this is the > > case maybe we need to have a different diagnostic check to make sure > > that the MCFG table is alright) > > Ideally I think we should end up with this: > > a0000000-afffffff : pnp 00:01 > a0000000-afffffff : PCI ECAM I think that for PCIe device drivers it works ok because it is guaranteed that their own pci_request_regions() is called always after pci_claim_resource() of the bridge that is on top of them... I.e. pci_claim_resource() reserves the resources as not busy and pci_request_regions() will create a child busy resource > > Realistically right now we'll probably end up with only the "PCI ECAM" > line in /proc/iomem and a warning from system.c about not being able > to reserve the space. > > If we ever change things to do the generic PNP reservation first, then > we should fix things so ecam.c can claim the space without an error. Maybe the patch below could be a sort of solution...effectively pci_ecam should succeed in reserving a busy resource under the conflict resource in case of PNP driver allocating a non BUSY resource first... --- drivers/pci/ecam.c | 16 +++++----------- drivers/pci/host/pci-thunder-ecam.c | 2 +- include/linux/pci-ecam.h | 2 +- 3 files changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c index 43ed08d..999b6ef 100644 --- a/drivers/pci/ecam.c +++ b/drivers/pci/ecam.c @@ -66,16 +66,10 @@ struct pci_config_window *pci_ecam_create(struct device *dev, } bsz = 1 << ops->bus_shift; - cfg->res.start = cfgres->start; - cfg->res.end = cfgres->end; - cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; - cfg->res.name = "PCI ECAM"; - - conflict = request_resource_conflict(&iomem_resource, &cfg->res); - if (conflict) { + cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM"); + if (!cfg->res) { err = -EBUSY; - dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n", - &cfg->res, conflict->name, conflict); + dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res); goto err_exit; } @@ -126,8 +120,8 @@ void pci_ecam_free(struct pci_config_window *cfg) if (cfg->win) iounmap(cfg->win); } - if (cfg->res.parent) - release_resource(&cfg->res); + if (cfg->res->parent) + release_region(cfg->res->start, resource_size(cfg->res)); kfree(cfg); } diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c index d50a3dc..2e48d9d 100644 --- a/drivers/pci/host/pci-thunder-ecam.c +++ b/drivers/pci/host/pci-thunder-ecam.c @@ -117,7 +117,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn, * the config space access window. Since we are working with * the high-order 32 bits, shift everything down by 32 bits. */ - node_bits = (cfg->res.start >> 32) & (1 << 12); + node_bits = (cfg->res->start >> 32) & (1 << 12); v |= node_bits; set_val(v, where, size, val); diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h index 7adad20..f30a4ea 100644 --- a/include/linux/pci-ecam.h +++ b/include/linux/pci-ecam.h @@ -36,7 +36,7 @@ struct pci_ecam_ops { * use ECAM. */ struct pci_config_window { - struct resource res; + struct resource *res; struct resource busr; void *priv; struct pci_ecam_ops *ops;