diff mbox

PCI: Add information about describing PCI in ACPI

Message ID EE11001F9E5DDD47B7634E2F8A612F2E1F92C09D@lhreml507-mbx (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Gabriele Paoloni Nov. 22, 2016, 1:13 p.m. UTC
Hi Bjorn

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: 21 November 2016 20:10
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linaro-acpi@lists.linaro.org
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
> 
> On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
> > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas
> > > Sent: 21 November 2016 16:47
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-
> > > acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > > kernel@lists.infradead.org; linaro-acpi@lists.linaro.org
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > > > Hi Bjorn
> > > >
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > > > > Sent: 18 November 2016 17:54
> > > > > To: Gabriele Paoloni
> > > > > Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-
> > > > > acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > > > > kernel@lists.infradead.org; linaro-acpi@lists.linaro.org
> > > > > Subject: Re: [PATCH] PCI: Add information about describing PCI
> in
> > > ACPI
> > > > >
> > > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni
> wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-
> kernel-
> > > > > > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas
> > > > > > > Sent: 17 November 2016 18:00
> > > > >
> > > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> > > mechanisms
> > > > > for
> > > > > > > +reserving address space!  The static tables are for things
> the
> > > OS
> > > > > > > +needs to know early in boot, before it can parse the ACPI
> > > > > namespace.
> > > > > > > +If a new table is defined, an old OS needs to operate
> > > correctly
> > > > > even
> > > > > > > +though it ignores the table.  _CRS allows that because it
> is
> > > > > generic
> > > > > > > +and understood by the old OS; a static table does not.
> > > > > >
> > > > > > Right so if my understanding is correct you are saying that
> > > resources
> > > > > > described in the MCFG table should also be declared in
> PNP0C02
> > > > > devices
> > > > > > so that the PNP driver can reserve these resources.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > On the other side the PCI Root bridge driver should not
> reserve
> > > such
> > > > > > resources.
> > > > > >
> > > > > > Well if my understanding is correct I think we have a problem
> > > here:
> > > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > > > >
> > > > > > As you can see pci_ecam_create() will conflict with the pnp
> > > driver
> > > > > > as it will try to reserve the resources from the MCFG
> table...
> > > > > >
> > > > > > Maybe we need to rework pci_ecam_create() ?
> > > > >
> > > > > I think it's OK as it is.
> > > > >
> > > > > The pnp/system.c driver does try to reserve PNP0C02 resources,
> and
> > > it
> > > > > marks them as "not busy".  That way they appear in /proc/iomem
> and
> > > > > won't be allocated for anything else, but they can still be
> > > requested
> > > > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > > > >
> > > > > This is analogous to what the PCI core does in
> > > pci_claim_resource().
> > > > > This is really a function of the ACPI/PNP *core*, which should
> > > reserve
> > > > > all _CRS resources for all devices (not just PNP0C02 devices).
> But
> > > > > it's done by pnp/system.c, and only for PNP0C02, because
> there's a
> > > > > bunch of historical baggage there.
> > > > >
> > > > > You'll also notice that in this case, things are out of order:
> > > > > logically the pnp/system.c reservation should happen first, but
> in
> > > > > fact the pci/ecam.c request happens *before* the pnp/system.c
> one.
> > > > > That means the pnp/system.c one might fail and complain "[mem
> ...]
> > > > > could not be reserved".
> > > >
> > > > Correct me if I am wrong...
> > > >
> > > > So currently we are relying on the fact that pci_ecam_create() is
> > > called
> > > > before the pnp driver.
> > > > If the pnp driver came first we would end up in pci_ecam_create()
> > > failing
> > > > here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> > > >
> > > > I am not sure but it seems to me like a bit weak condition to
> rely
> > > on...
> > > > what about removing the error condition in pci_ecam_create() and
> > > logging
> > > > just a dev_info()?
> > >
> > > Huh.  I'm confused.  I *thought* it would be safe to reverse the
> > > order, which would effectively be this:
> > >
> > >   system_pnp_probe
> > >     reserve_resources_of_dev
> > >       reserve_range
> > >         request_mem_region([mem 0xb0000000-0xb1ffffff])
> > >   ...
> > >   pci_ecam_create
> > >     request_resource_conflict([mem 0xb0000000-0xb1ffffff])
> > >
> > >
> > > but I experimented with the patch below on qemu, and it failed as
> you
> > > predicted:
> > >
> > >   ** res test **
> > >   requested [mem 0xa0000000-0xafffffff]
> > >   can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with
> ECAM
> > > PNP [mem 0xa0000000-0xafffffff]
> > >
> > > I expected the request_resource_conflict() to succeed since it's
> > > completely contained in the "ECAM PNP" region.  But I guess I don't
> > > understand kernel/resource.c well enough.
> >
> > I think it fails because effectively the PNP driver is populating the
> > iomem_resource resource tree and therefore pci_ecam_create() finds
> that
> > it cannot add the cfg resource to the same hierarchy as it is already
> > there...
> 
> Right.  I'm just surprised because the PNP reservation is marked
> "not busy", and a driver (e.g., ECAM) should still be able to request
> the resource.

Yes unfortunately pci_ecam_create() is not flexible on the conflict as 
pci_request_regions():
http://lxr.free-electrons.com/source/kernel/resource.c#L1155
if the conflict resource is not busy pci_request_regions() will create
a child resource under the conflict sibling and mark it as busy...

or at least this is my understanding...

> 
> > > I'm not sure we need to fix anything yet, since we currently do the
> > > ecam.c request before the system.c one, and any change there would
> be
> > > a long ways off.  If/when that *does* change, I think the correct
> fix
> > > would be to change ecam.c so its request succeeds (by changing the
> way
> > > it does the request, fixing kernel/resource.c, or whatever) rather
> > > than to reduce the log level and ignore the failure.
> >
> > Well in my mind I didn't want just to make the error disappear...
> > If all the resources should be reserved by the PNP driver then
> ideally
> > we could take away request_resource_conflict() from
> pci_ecam_create(),
> > but this would make buggy some systems with an already shipped BIOS
> > that relied on pci_ecam_create() reservation rather than PNP
> reservation.
> 
> I don't want remove the request from ecam.c.  Ideally, there should be
> TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or
> whatever it is, and a second from ecam.c.  The first is the generic
> one saying "this region is consumed by a piece of hardware, so don't
> put anything else here."  The second is the driver-specific one saying
> "PCI ECAM owns this region, nobody else can use it."
> 
> This is the same way we handle PCI BAR resources.  Here are two
> examples from my laptop.  The first (00:08.0) only has one line:
> it has a BAR that consumes address space, but I don't have a driver
> for it loaded.  The second (00:16.0) does have a driver loaded, so it
> has a second line showing that the driver owns the space:
> 
>   f124a000-f124afff : 0000:00:08.0         # from PCI core
> 
>   f124d000-f124dfff : 0000:00:16.0         # from PCI core
>     f124d000-f124dfff : mei_me             # from mei_me driver
> 
> > Just removing the error condition and converting dev_err() into
> > dev_info() seems to me like accommodating already shipped BIOS images
> > and flagging a reservation that is already done by somebody else
> > without compromising the functionality of the PCI Root bridge driver
> > (so far the only reason why I can see the error condition there is
> > to catch a buggy MCFG with overlapping addresses; so if this is the
> > case maybe we need to have a different diagnostic check to make sure
> > that the MCFG table is alright)
> 
> Ideally I think we should end up with this:
> 
>   a0000000-afffffff : pnp 00:01
>     a0000000-afffffff : PCI ECAM

I think that for PCIe device drivers it works ok because it is guaranteed
that their own pci_request_regions() is called always after
pci_claim_resource() of the bridge that is on top of them...
I.e. pci_claim_resource() reserves the resources as not busy and
pci_request_regions() will create a child busy resource 

> 
> Realistically right now we'll probably end up with only the "PCI ECAM"
> line in /proc/iomem and a warning from system.c about not being able
> to reserve the space.
> 
> If we ever change things to do the generic PNP reservation first, then
> we should fix things so ecam.c can claim the space without an error.

Maybe the patch below could be a sort of solution...effectively pci_ecam
should succeed in reserving a busy resource under the conflict resource
in case of PNP driver allocating a non BUSY resource first...

---
drivers/pci/ecam.c                  | 16 +++++-----------
 drivers/pci/host/pci-thunder-ecam.c |  2 +-
 include/linux/pci-ecam.h            |  2 +-
 3 files changed, 7 insertions(+), 13 deletions(-)
diff mbox

Patch

diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
index 43ed08d..999b6ef 100644
--- a/drivers/pci/ecam.c
+++ b/drivers/pci/ecam.c
@@ -66,16 +66,10 @@  struct pci_config_window *pci_ecam_create(struct device *dev,
 	}
 	bsz = 1 << ops->bus_shift;
 
-	cfg->res.start = cfgres->start;
-	cfg->res.end = cfgres->end;
-	cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-	cfg->res.name = "PCI ECAM";
-
-	conflict = request_resource_conflict(&iomem_resource, &cfg->res);
-	if (conflict) {
+	cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM");
+	if (!cfg->res) {
 		err = -EBUSY;
-		dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n",
-			&cfg->res, conflict->name, conflict);
+		dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res);
 		goto err_exit;
 	}
 
@@ -126,8 +120,8 @@  void pci_ecam_free(struct pci_config_window *cfg)
 		if (cfg->win)
 			iounmap(cfg->win);
 	}
-	if (cfg->res.parent)
-		release_resource(&cfg->res);
+	if (cfg->res->parent)
+		release_region(cfg->res->start, resource_size(cfg->res));
 	kfree(cfg);
 }
 
diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c
index d50a3dc..2e48d9d 100644
--- a/drivers/pci/host/pci-thunder-ecam.c
+++ b/drivers/pci/host/pci-thunder-ecam.c
@@ -117,7 +117,7 @@  static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn,
 	 * the config space access window.  Since we are working with
 	 * the high-order 32 bits, shift everything down by 32 bits.
 	 */
-	node_bits = (cfg->res.start >> 32) & (1 << 12);
+	node_bits = (cfg->res->start >> 32) & (1 << 12);
 
 	v |= node_bits;
 	set_val(v, where, size, val);
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index 7adad20..f30a4ea 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -36,7 +36,7 @@  struct pci_ecam_ops {
  * use ECAM.
  */
 struct pci_config_window {
-	struct resource			res;
+	struct resource			*res;
 	struct resource			busr;
 	void				*priv;
 	struct pci_ecam_ops		*ops;