diff mbox series

[RESEND] PCI: disable runtime PM for PLX switches

Message ID 20190415135903.wiyw34faiezdnbbs@yadro.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series [RESEND] PCI: disable runtime PM for PLX switches | expand

Commit Message

Alexander Fomichev April 15, 2019, 1:59 p.m. UTC
PLX switches have an issue that their internal registers become inaccessible
when runtime PM is enabled. Therefore PLX service tools can't communicate
with the hardware. A kernel option "pcie_port_pm=off" can be used as a
workaround. But it affects all the devices.
So this solution is to add PLX switch devices to the quirk list for
disabling runtime PM only for them.
---
 drivers/pci/quirks.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Bjorn Helgaas April 15, 2019, 2:15 p.m. UTC | #1
This says it's a resend, but I don't see a previous posting; maybe it was
HTML and rejected by the mailing list?

On Mon, Apr 15, 2019 at 04:59:03PM +0300, Alexander Fomichev wrote:
> PLX switches have an issue that their internal registers become inaccessible
> when runtime PM is enabled. Therefore PLX service tools can't communicate
> with the hardware. A kernel option "pcie_port_pm=off" can be used as a
> workaround. But it affects all the devices.
> So this solution is to add PLX switch devices to the quirk list for
> disabling runtime PM only for them.

I assume the problem is actually that the config space registers are
inaccessible when the device is in D3hot?

I think config space access is supposed to work when a device is in D3hot
(see PCIe r4.0, sec 5.3.1.4).

If it doesn't work, wouldn't that mean that we couldn't even bring the
device *out* of D3hot, since that requires a config write?

If this is really the problem, it'd be nice to identify this specifically
instead of piggy-backing on the "is_hotplug_bridge" thing, which might be
coincidentally related, but also carries other meanings.

> ---
>  drivers/pci/quirks.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index a59ad09..8ea99aa 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2923,6 +2923,17 @@ static void quirk_hotplug_bridge(struct pci_dev *dev)
>  	dev->is_hotplug_bridge = 1;
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HINT, 0x0020, quirk_hotplug_bridge);
> +/*
> + * Disable runtime PM for PLX Draco (1,2), Capella (1,2) series PCIe switches
> + * to prevent service tools from failing to access hardware registers.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8712, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8733, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8734, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8780, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8796, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9781, quirk_hotplug_bridge);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9797, quirk_hotplug_bridge);
>  
>  /*
>   * This is a quirk for the Ricoh MMC controller found as a part of some
> -- 
> 2.7.4
Bjorn Helgaas April 23, 2019, 9:53 p.m. UTC | #2
[+cc Rafael, linux-pm]

On Mon, Apr 15, 2019 at 09:15:54AM -0500, Bjorn Helgaas wrote:
> This says it's a resend, but I don't see a previous posting; maybe it was
> HTML and rejected by the mailing list?
> 
> On Mon, Apr 15, 2019 at 04:59:03PM +0300, Alexander Fomichev wrote:
> > PLX switches have an issue that their internal registers become inaccessible
> > when runtime PM is enabled. Therefore PLX service tools can't communicate
> > with the hardware. A kernel option "pcie_port_pm=off" can be used as a
> > workaround. But it affects all the devices.
> > So this solution is to add PLX switch devices to the quirk list for
> > disabling runtime PM only for them.
> 
> I assume the problem is actually that the config space registers are
> inaccessible when the device is in D3hot?

Reading this again, I realize you said "internal registers".  I don't
know whether that actually means config space registers (which
*should* work even when the device is in D3hot (see the PCIe reference
below and PCI Power Management Spec r1.2, sec 5.4.1)), or MMIO
registers (which are not expected to work while in D3hot).

If the service tools read MMIO registers, presumably that goes through
some driver that should be able to manage runtime PM.  Or, if there's
no driver, I think your service tool could prevent runtime power
management by changing /sys/devices/.../power/control to "on" (see
Documentation/power/runtime_pm.txt).

Please repost this with more details.

> I think config space access is supposed to work when a device is in D3hot
> (see PCIe r4.0, sec 5.3.1.4).
> 
> If it doesn't work, wouldn't that mean that we couldn't even bring the
> device *out* of D3hot, since that requires a config write?
> 
> If this is really the problem, it'd be nice to identify this specifically
> instead of piggy-backing on the "is_hotplug_bridge" thing, which might be
> coincidentally related, but also carries other meanings.
> 
> > ---
> >  drivers/pci/quirks.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index a59ad09..8ea99aa 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -2923,6 +2923,17 @@ static void quirk_hotplug_bridge(struct pci_dev *dev)
> >  	dev->is_hotplug_bridge = 1;
> >  }
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HINT, 0x0020, quirk_hotplug_bridge);
> > +/*
> > + * Disable runtime PM for PLX Draco (1,2), Capella (1,2) series PCIe switches
> > + * to prevent service tools from failing to access hardware registers.
> > + */
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8712, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8733, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8734, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8780, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8796, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9781, quirk_hotplug_bridge);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9797, quirk_hotplug_bridge);
> >  
> >  /*
> >   * This is a quirk for the Ricoh MMC controller found as a part of some
> > -- 
> > 2.7.4
Alexander Fomichev April 24, 2019, 10:01 a.m. UTC | #3
On Tue, Apr 23, 2019 at 04:53:40PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 15, 2019 at 09:15:54AM -0500, Bjorn Helgaas wrote:
> > This says it's a resend, but I don't see a previous posting; maybe it was
> > HTML and rejected by the mailing list?
> > 
The first post was niether rejected nor accepted in ML. So I added
"resend" tag in case it appears in the archive finally.

> > On Mon, Apr 15, 2019 at 04:59:03PM +0300, Alexander Fomichev wrote:
> > > PLX switches have an issue that their internal registers become inaccessible
> > > when runtime PM is enabled. Therefore PLX service tools can't communicate
> > > with the hardware. A kernel option "pcie_port_pm=off" can be used as a
> > > workaround. But it affects all the devices.
> > > So this solution is to add PLX switch devices to the quirk list for
> > > disabling runtime PM only for them.
> > 
> > I assume the problem is actually that the config space registers are
> > inaccessible when the device is in D3hot?
> 
> Reading this again, I realize you said "internal registers".  I don't
> know whether that actually means config space registers (which
> *should* work even when the device is in D3hot (see the PCIe reference
> below and PCI Power Management Spec r1.2, sec 5.4.1)), or MMIO
> registers (which are not expected to work while in D3hot).
> 
> If the service tools read MMIO registers, presumably that goes through
> some driver that should be able to manage runtime PM.  Or, if there's
> no driver, I think your service tool could prevent runtime power
> management by changing /sys/devices/.../power/control to "on" (see
> Documentation/power/runtime_pm.txt).
> 
You're right. Config space registers are accessible. The driver can't
read/write MMIO registers (Device-Specific Registers as they're called
by Broadcom).

> Please repost this with more details.
> 
> > I think config space access is supposed to work when a device is in D3hot
> > (see PCIe r4.0, sec 5.3.1.4).
> > 
> > If it doesn't work, wouldn't that mean that we couldn't even bring the
> > device *out* of D3hot, since that requires a config write?
> > 
> > If this is really the problem, it'd be nice to identify this specifically
> > instead of piggy-backing on the "is_hotplug_bridge" thing, which might be
> > coincidentally related, but also carries other meanings.
> > 
The proposed patch was a sort of workaround. If this approach is not
acceptable then it needs more research how to change PLX driver that it
can play with PM to access MMIO registers.

Regards,
  Alexander
Bjorn Helgaas April 24, 2019, 2:11 p.m. UTC | #4
[+cc Mika for runtime PM of bridges, Logan for switchtec question]

On Wed, Apr 24, 2019 at 01:01:02PM +0300, Alexander Fomichev wrote:
> On Tue, Apr 23, 2019 at 04:53:40PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 15, 2019 at 09:15:54AM -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 15, 2019 at 04:59:03PM +0300, Alexander Fomichev wrote:
> > > > PLX switches have an issue that their internal registers
> > > > become inaccessible when runtime PM is enabled. Therefore PLX
> > > > service tools can't communicate with the hardware. A kernel
> > > > option "pcie_port_pm=off" can be used as a workaround. But it
> > > > affects all the devices.
> > > >
> > > > So this solution is to add PLX switch devices to the quirk
> > > > list for disabling runtime PM only for them.
> > >
> > > I assume the problem is actually that the config space registers
> > > are inaccessible when the device is in D3hot?
> >
> > Reading this again, I realize you said "internal registers".  I
> > don't know whether that actually means config space registers
> > (which *should* work even when the device is in D3hot (see the
> > PCIe reference below and PCI Power Management Spec r1.2, sec
> > 5.4.1)), or MMIO registers (which are not expected to work while
> > in D3hot).
> > 
> > If the service tools read MMIO registers, presumably that goes
> > through some driver that should be able to manage runtime PM.  Or,
> > if there's no driver, I think your service tool could prevent
> > runtime power management by changing
> > /sys/devices/.../power/control to "on" (see
> > Documentation/power/runtime_pm.txt).
>
> You're right. Config space registers are accessible. The driver
> can't read/write MMIO registers (Device-Specific Registers as
> they're called by Broadcom).

Ah, perfect.  That's exactly what's supposed to happen from a PCI
hardware point of view.  Unfortunately I don't know much about how
Linux power management works, but Rafael and Mika do.

How do your service tools access these MMIO registers?

  - via a PLX driver that provides read/write/ioctl on a char dev?
  - read/write on /sys/devices/pci*/.../resource<x> ?
  - mmap on /sys/devices/pci*/.../resource<x> ?
  - read/write on /dev/mem?
  - mmap on /dev/mem?
  - something else?

I think there are several ways we might be able to fix this:

  - Write a driver along the lines of drivers/pci/switch/switchtec.c.
    I don't see any runtime PM stuff in that driver, so maybe it's
    magically taken care of by the PM/PCI core?  There might also be
    an issue if both portdrv and your driver want to claim the same
    device.  I don't know how switchtec deals with that.

  - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
    turn off runtime PM?  If we allow mmap of a BAR and then put the
    device in D3hot, that seems like a bug that could affect lots of
    things.  But maybe that's already done magically elsewhere?

Bjorn
Mika Westerberg April 24, 2019, 2:58 p.m. UTC | #5
On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
>   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
>     turn off runtime PM?  If we allow mmap of a BAR and then put the
>     device in D3hot, that seems like a bug that could affect lots of
>     things.  But maybe that's already done magically elsewhere?

IIRC there is no PM magic happening for MMIO userspace accesses.

What you suggest above sounds like a good way to fix it. We already do
similar for config space access from userspace (if the device is in
D3cold) so definitely makes sense to do the same for MMIO. However, I
don't think we need to disable runtime PM - it should be enough to
increase the reference count (pm_runtime_get_sync() and friends) during
the time the MMIO resource is mmapped.
Logan Gunthorpe April 24, 2019, 4:01 p.m. UTC | #6
On 2019-04-24 8:11 a.m., Bjorn Helgaas wrote:
> [+cc Mika for runtime PM of bridges, Logan for switchtec question]
> 
> On Wed, Apr 24, 2019 at 01:01:02PM +0300, Alexander Fomichev wrote:
>> On Tue, Apr 23, 2019 at 04:53:40PM -0500, Bjorn Helgaas wrote:
>>> On Mon, Apr 15, 2019 at 09:15:54AM -0500, Bjorn Helgaas wrote:
>>>> On Mon, Apr 15, 2019 at 04:59:03PM +0300, Alexander Fomichev wrote:
>>>>> PLX switches have an issue that their internal registers
>>>>> become inaccessible when runtime PM is enabled. Therefore PLX
>>>>> service tools can't communicate with the hardware. A kernel
>>>>> option "pcie_port_pm=off" can be used as a workaround. But it
>>>>> affects all the devices.
>>>>>
>>>>> So this solution is to add PLX switch devices to the quirk
>>>>> list for disabling runtime PM only for them.
>>>>
>>>> I assume the problem is actually that the config space registers
>>>> are inaccessible when the device is in D3hot?
>>>
>>> Reading this again, I realize you said "internal registers".  I
>>> don't know whether that actually means config space registers
>>> (which *should* work even when the device is in D3hot (see the
>>> PCIe reference below and PCI Power Management Spec r1.2, sec
>>> 5.4.1)), or MMIO registers (which are not expected to work while
>>> in D3hot).
>>>
>>> If the service tools read MMIO registers, presumably that goes
>>> through some driver that should be able to manage runtime PM.  Or,
>>> if there's no driver, I think your service tool could prevent
>>> runtime power management by changing
>>> /sys/devices/.../power/control to "on" (see
>>> Documentation/power/runtime_pm.txt).
>>
>> You're right. Config space registers are accessible. The driver
>> can't read/write MMIO registers (Device-Specific Registers as
>> they're called by Broadcom).
> 
> Ah, perfect.  That's exactly what's supposed to happen from a PCI
> hardware point of view.  Unfortunately I don't know much about how
> Linux power management works, but Rafael and Mika do.
> 
> How do your service tools access these MMIO registers?
> 
>    - via a PLX driver that provides read/write/ioctl on a char dev?
>    - read/write on /sys/devices/pci*/.../resource<x> ?
>    - mmap on /sys/devices/pci*/.../resource<x> ?
>    - read/write on /dev/mem?
>    - mmap on /dev/mem?
>    - something else?
> 
> I think there are several ways we might be able to fix this:
> 
>    - Write a driver along the lines of drivers/pci/switch/switchtec.c.
>      I don't see any runtime PM stuff in that driver, so maybe it's
>      magically taken care of by the PM/PCI core?  There might also be
>      an issue if both portdrv and your driver want to claim the same
>      device.  I don't know how switchtec deals with that.

Right, the switchtec device puts all its management MMIO registers 
behind a separate PCIe function which lets us bind a different driver 
and not conflict with the portdrv. I've noticed the PLX parts have an 
extra unused BAR in their upstream port function which means it will be 
an annoying hack to write a driver to use it seeing the portdrv needs to 
be registered to it.

Logan
Bjorn Helgaas April 24, 2019, 5:21 p.m. UTC | #7
On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> >   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> >     turn off runtime PM?  If we allow mmap of a BAR and then put the
> >     device in D3hot, that seems like a bug that could affect lots of
> >     things.  But maybe that's already done magically elsewhere?
> 
> IIRC there is no PM magic happening for MMIO userspace accesses.
> 
> What you suggest above sounds like a good way to fix it. We already do
> similar for config space access from userspace (if the device is in
> D3cold) so definitely makes sense to do the same for MMIO. However, I
> don't think we need to disable runtime PM - it should be enough to
> increase the reference count (pm_runtime_get_sync() and friends) during
> the time the MMIO resource is mmapped.

OK, so if I understand correctly this would be basically adding
pm_runtime_get_sync(dev) in pci_mmap_resource().  I don't know what
the unmap path is, but there would have to be a matching
pm_runtime_put() somewhere.

And a similar change in the read/write path for /sys/.../resource<N>;
I think this must be related to the sysfs_create_bin_file() call in
pci_create_attr(), but I don't see the path where the actual
read/write to the device is done.

And probably something similar should be done in pci_resource_io(),
pci_map_rom(), and pci_unmap_rom().

Bjorn
Rafael J. Wysocki April 24, 2019, 9:09 p.m. UTC | #8
On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > >   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > >     turn off runtime PM?  If we allow mmap of a BAR and then put the
> > >     device in D3hot, that seems like a bug that could affect lots of
> > >     things.  But maybe that's already done magically elsewhere?
> >
> > IIRC there is no PM magic happening for MMIO userspace accesses.
> >
> > What you suggest above sounds like a good way to fix it. We already do
> > similar for config space access from userspace (if the device is in
> > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > don't think we need to disable runtime PM - it should be enough to
> > increase the reference count (pm_runtime_get_sync() and friends) during
> > the time the MMIO resource is mmapped.
>
> OK, so if I understand correctly this would be basically adding
> pm_runtime_get_sync(dev) in pci_mmap_resource().  I don't know what
> the unmap path is, but there would have to be a matching
> pm_runtime_put() somewhere.

Right.

> And a similar change in the read/write path for /sys/.../resource<N>;
> I think this must be related to the sysfs_create_bin_file() call in
> pci_create_attr(), but I don't see the path where the actual
> read/write to the device is done.
>
> And probably something similar should be done in pci_resource_io(),
> pci_map_rom(), and pci_unmap_rom().

In general, every path in which there is a memory or IO address space
access requires pm_runtime_get_sync()/pm_runtime_put() around it as
these accesses are only guaranteed to work in D0.

Cheers,
Rafael
Alexander Fomichev June 27, 2019, 11:06 a.m. UTC | #9
Hi.

On Wed, Apr 24, 2019 at 11:09:52PM +0200, Rafael J. Wysocki wrote:
> On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > > >   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > > >     turn off runtime PM?  If we allow mmap of a BAR and then put the
> > > >     device in D3hot, that seems like a bug that could affect lots of
> > > >     things.  But maybe that's already done magically elsewhere?
> > >
> > > IIRC there is no PM magic happening for MMIO userspace accesses.
> > >
> > > What you suggest above sounds like a good way to fix it. We already do
> > > similar for config space access from userspace (if the device is in
> > > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > > don't think we need to disable runtime PM - it should be enough to
> > > increase the reference count (pm_runtime_get_sync() and friends) during
> > > the time the MMIO resource is mmapped.
> >
> > OK, so if I understand correctly this would be basically adding
> > pm_runtime_get_sync(dev) in pci_mmap_resource().  I don't know what
> > the unmap path is, but there would have to be a matching
> > pm_runtime_put() somewhere.
> 
> Right.
> 
> > And a similar change in the read/write path for /sys/.../resource<N>;
> > I think this must be related to the sysfs_create_bin_file() call in
> > pci_create_attr(), but I don't see the path where the actual
> > read/write to the device is done.
> >
> > And probably something similar should be done in pci_resource_io(),
> > pci_map_rom(), and pci_unmap_rom().
> 
> In general, every path in which there is a memory or IO address space
> access requires pm_runtime_get_sync()/pm_runtime_put() around it as
> these accesses are only guaranteed to work in D0.
> 

Tested a solution based on proposals by Logan, Bjorn, Mika, Rafael (thanks all
of you, guys), I managed to fix the problem inside the PLX driver code. So no
additional quirks or other modifications in Linux kernel needed. I think
my patch can be easily rejected.

Thanks for help.
Bjorn Helgaas July 17, 2019, 9:42 p.m. UTC | #10
On Thu, Jun 27, 2019 at 02:06:24PM +0300, Alexander Fomichev wrote:
> On Wed, Apr 24, 2019 at 11:09:52PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > > > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > > > >   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > > > >     turn off runtime PM?  If we allow mmap of a BAR and then put the
> > > > >     device in D3hot, that seems like a bug that could affect lots of
> > > > >     things.  But maybe that's already done magically elsewhere?
> > > >
> > > > IIRC there is no PM magic happening for MMIO userspace accesses.
> > > >
> > > > What you suggest above sounds like a good way to fix it. We already do
> > > > similar for config space access from userspace (if the device is in
> > > > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > > > don't think we need to disable runtime PM - it should be enough to
> > > > increase the reference count (pm_runtime_get_sync() and friends) during
> > > > the time the MMIO resource is mmapped.
> > >
> > > OK, so if I understand correctly this would be basically adding
> > > pm_runtime_get_sync(dev) in pci_mmap_resource().  I don't know what
> > > the unmap path is, but there would have to be a matching
> > > pm_runtime_put() somewhere.
> > 
> > Right.
> > 
> > > And a similar change in the read/write path for /sys/.../resource<N>;
> > > I think this must be related to the sysfs_create_bin_file() call in
> > > pci_create_attr(), but I don't see the path where the actual
> > > read/write to the device is done.
> > >
> > > And probably something similar should be done in pci_resource_io(),
> > > pci_map_rom(), and pci_unmap_rom().
> > 
> > In general, every path in which there is a memory or IO address space
> > access requires pm_runtime_get_sync()/pm_runtime_put() around it as
> > these accesses are only guaranteed to work in D0.
> 
> Tested a solution based on proposals by Logan, Bjorn, Mika, Rafael (thanks all
> of you, guys), I managed to fix the problem inside the PLX driver code. So no
> additional quirks or other modifications in Linux kernel needed. I think
> my patch can be easily rejected.

Can you fill us in a little bit on the solution?  Are you referring to
an out-of-tree PLX kernel driver?  I assume this is not a userspace
PLX tool because I don't think we have a solution to make sysfs mmap
safe yet.

Did you have to call pm_runtime_get() or similar from your driver?
Did your driver already call some PM interface before that?  (If you
could point us at the source, that would be ideal.)

Rafael, does a PCI driver have to indicate somehow that it's prepared
for runtime PM?  I assume the runtime PM core is designed in such a
way that it doesn't force driver changes (e.g., maybe driver changes
would enable more power savings, but at least things would *work*
unchanged).

Bjorn
Rafael J. Wysocki July 18, 2019, 8:35 a.m. UTC | #11
On Wednesday, July 17, 2019 11:42:05 PM CEST Bjorn Helgaas wrote:
> On Thu, Jun 27, 2019 at 02:06:24PM +0300, Alexander Fomichev wrote:
> > On Wed, Apr 24, 2019 at 11:09:52PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > > > > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > > > > >   - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > > > > >     turn off runtime PM?  If we allow mmap of a BAR and then put the
> > > > > >     device in D3hot, that seems like a bug that could affect lots of
> > > > > >     things.  But maybe that's already done magically elsewhere?
> > > > >
> > > > > IIRC there is no PM magic happening for MMIO userspace accesses.
> > > > >
> > > > > What you suggest above sounds like a good way to fix it. We already do
> > > > > similar for config space access from userspace (if the device is in
> > > > > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > > > > don't think we need to disable runtime PM - it should be enough to
> > > > > increase the reference count (pm_runtime_get_sync() and friends) during
> > > > > the time the MMIO resource is mmapped.
> > > >
> > > > OK, so if I understand correctly this would be basically adding
> > > > pm_runtime_get_sync(dev) in pci_mmap_resource().  I don't know what
> > > > the unmap path is, but there would have to be a matching
> > > > pm_runtime_put() somewhere.
> > > 
> > > Right.
> > > 
> > > > And a similar change in the read/write path for /sys/.../resource<N>;
> > > > I think this must be related to the sysfs_create_bin_file() call in
> > > > pci_create_attr(), but I don't see the path where the actual
> > > > read/write to the device is done.
> > > >
> > > > And probably something similar should be done in pci_resource_io(),
> > > > pci_map_rom(), and pci_unmap_rom().
> > > 
> > > In general, every path in which there is a memory or IO address space
> > > access requires pm_runtime_get_sync()/pm_runtime_put() around it as
> > > these accesses are only guaranteed to work in D0.
> > 
> > Tested a solution based on proposals by Logan, Bjorn, Mika, Rafael (thanks all
> > of you, guys), I managed to fix the problem inside the PLX driver code. So no
> > additional quirks or other modifications in Linux kernel needed. I think
> > my patch can be easily rejected.
> 
> Can you fill us in a little bit on the solution?  Are you referring to
> an out-of-tree PLX kernel driver?  I assume this is not a userspace
> PLX tool because I don't think we have a solution to make sysfs mmap
> safe yet.
> 
> Did you have to call pm_runtime_get() or similar from your driver?
> Did your driver already call some PM interface before that?  (If you
> could point us at the source, that would be ideal.)
> 
> Rafael, does a PCI driver have to indicate somehow that it's prepared
> for runtime PM?

Yes, it does.

Please see the comment in local_pci_probe().

> I assume the runtime PM core is designed in such a
> way that it doesn't force driver changes (e.g., maybe driver changes
> would enable more power savings, but at least things would *work*
> unchanged).

Right.
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a59ad09..8ea99aa 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2923,6 +2923,17 @@  static void quirk_hotplug_bridge(struct pci_dev *dev)
 	dev->is_hotplug_bridge = 1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HINT, 0x0020, quirk_hotplug_bridge);
+/*
+ * Disable runtime PM for PLX Draco (1,2), Capella (1,2) series PCIe switches
+ * to prevent service tools from failing to access hardware registers.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8712, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8733, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8734, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8780, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x8796, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9781, quirk_hotplug_bridge);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x9797, quirk_hotplug_bridge);
 
 /*
  * This is a quirk for the Ricoh MMC controller found as a part of some