diff mbox

[RFC] PCI PM: Be extra careful when changing power states of devices

Message ID 200903210003.55161.rjw@sisk.pl (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Rafael Wysocki March 20, 2009, 11:03 p.m. UTC
From: Rafael J. Wysocki <rjw@sisk.pl>

The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
that setting the power state of a PCI device by
pci_raw_set_power_state() may sometimes fail.  For this reason,
pci_raw_set_power_state() should not assume that the power state of
the device has actually changed after writing into its PMCSR.
Instead, it should read the value from there and use it to update
dev->current_state.  It also is useful to print a warning if the
device's power state hasn't changed as expected.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rafael Wysocki March 22, 2009, 9:08 p.m. UTC | #1
On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> that setting the power state of a PCI device by
> pci_raw_set_power_state() may sometimes fail.  For this reason,
> pci_raw_set_power_state() should not assume that the power state of
> the device has actually changed after writing into its PMCSR.
> Instead, it should read the value from there and use it to update
> dev->current_state.  It also is useful to print a warning if the
> device's power state hasn't changed as expected.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
to do something a bit different.

Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
driver not to open code PCI PM operations.

Patch 2/2 makes the driver use __pci_set_power_state().

Comments welcome.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt March 23, 2009, 12:09 a.m. UTC | #2
On Sun, 2009-03-22 at 22:08 +0100, Rafael J. Wysocki wrote:
> On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > that setting the power state of a PCI device by
> > pci_raw_set_power_state() may sometimes fail.  For this reason,
> > pci_raw_set_power_state() should not assume that the power state of
> > the device has actually changed after writing into its PMCSR.
> > Instead, it should read the value from there and use it to update
> > dev->current_state.  It also is useful to print a warning if the
> > device's power state hasn't changed as expected.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> to do something a bit different.
> 
> Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> driver not to open code PCI PM operations.
> 
> Patch 2/2 makes the driver use __pci_set_power_state().
> 
> Comments welcome.

No objection.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki March 23, 2009, 9:30 p.m. UTC | #3
On Sunday 22 March 2009, Rafael J. Wysocki wrote:
> On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > that setting the power state of a PCI device by
> > pci_raw_set_power_state() may sometimes fail.  For this reason,
> > pci_raw_set_power_state() should not assume that the power state of
> > the device has actually changed after writing into its PMCSR.
> > Instead, it should read the value from there and use it to update
> > dev->current_state.  It also is useful to print a warning if the
> > device's power state hasn't changed as expected.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> to do something a bit different.
> 
> Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> driver not to open code PCI PM operations.
> 
> Patch 2/2 makes the driver use __pci_set_power_state().
> 
> Comments welcome.

Well, Jesse doesn't like these patches very much, so here's an alternative.

1/2 changes platform_pci_set_power_state() into an exported function (and uses
it to simplify pci_set_power_state() a bit) so that the radeonfb driver can use
it.

2/2 modifies the radeonfb driver itself.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Barnes March 23, 2009, 10:23 p.m. UTC | #4
On Mon, 23 Mar 2009 22:30:09 +0100
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Sunday 22 March 2009, Rafael J. Wysocki wrote:
> > On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846
> > > shows that setting the power state of a PCI device by
> > > pci_raw_set_power_state() may sometimes fail.  For this reason,
> > > pci_raw_set_power_state() should not assume that the power state
> > > of the device has actually changed after writing into its PMCSR.
> > > Instead, it should read the value from there and use it to update
> > > dev->current_state.  It also is useful to print a warning if the
> > > device's power state hasn't changed as expected.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > OK, since the Ben's radeonfb fix for bug #12846 has been merged,
> > I'd like to do something a bit different.
> > 
> > Patch 1/2 introduces __pci_set_power_state() that will allow the
> > radeonfb driver not to open code PCI PM operations.
> > 
> > Patch 2/2 makes the driver use __pci_set_power_state().
> > 
> > Comments welcome.
> 
> Well, Jesse doesn't like these patches very much, so here's an
> alternative.
> 
> 1/2 changes platform_pci_set_power_state() into an exported function
> (and uses it to simplify pci_set_power_state() a bit) so that the
> radeonfb driver can use it.
> 
> 2/2 modifies the radeonfb driver itself.

The thing I didn't like was that it made the radeon driver use an
internal interface; I'd really prefer a proper return value from
pci_set_power_state, which in turn means auditing all its current
callers. But that doesn't seem worth it unless we see other drivers
needing something similar...

And if we did go with something like your first patch, I'd still rather
see the timeout done in the driver, rather than having the attempts &
delay included in the function...
Rafael Wysocki March 23, 2009, 11:01 p.m. UTC | #5
On Monday 23 March 2009, Benjamin Herrenschmidt wrote:
> On Sun, 2009-03-22 at 22:08 +0100, Rafael J. Wysocki wrote:
> > On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > > that setting the power state of a PCI device by
> > > pci_raw_set_power_state() may sometimes fail.  For this reason,
> > > pci_raw_set_power_state() should not assume that the power state of
> > > the device has actually changed after writing into its PMCSR.
> > > Instead, it should read the value from there and use it to update
> > > dev->current_state.  It also is useful to print a warning if the
> > > device's power state hasn't changed as expected.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> > to do something a bit different.
> > 
> > Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> > driver not to open code PCI PM operations.
> > 
> > Patch 2/2 makes the driver use __pci_set_power_state().
> > 
> > Comments welcome.
> 
> No objection.

Great, thanks.

I also discussed the patchset with Jesse on IRC and we agreed it was better
than the alternative approaches.  (So, please disregard the patches sent
earlier today).

Would you mind if I put 1/2 and 2/2 into the suspend tree?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt March 24, 2009, 12:57 a.m. UTC | #6
On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> The thing I didn't like was that it made the radeon driver use an
> internal interface; I'd really prefer a proper return value from
> pci_set_power_state, which in turn means auditing all its current
> callers. But that doesn't seem worth it unless we see other drivers
> needing something similar...
> 
> And if we did go with something like your first patch, I'd still rather
> see the timeout done in the driver, rather than having the attempts &
> delay included in the function...

So what ? The driver would call pci_set_power_state() until it stops
failing ?

I'm not too fan of that, because it will change the access pattern
to the chip:

 - write PM to 2
 - short delay
 - read PM, see 0, return error
 - driver does big delay
 - write PM to 2
 - short delay
 - read PM ....

vs. the current sequence which is

 - write PM to 2
 - long delay
 - read PM, be happy

Which -seems- to be pretty much what happens in practice, though on that chip,
I don't know for sure about others.

I'm worried of the possible side effects of the first sequence that you propose
since it would do 2 things potentially confusing to the HW:

 - read PM after a short delay... it -should- be harmless but you know
HW as well as I do ...

 - write PM to 2 a second time after the long delay. Again, it -should- be
harmless since the chip at this stage should already be in D2 state but god
knows how the HW will react.

I'm especially worried about the later in fact. Maybe we can minimize it by
having pci_set_power_state() dbl check the content of the PM reg before writing
to it...

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Barnes March 24, 2009, 1:14 a.m. UTC | #7
On Tue, 24 Mar 2009 11:57:19 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> > The thing I didn't like was that it made the radeon driver use an
> > internal interface; I'd really prefer a proper return value from
> > pci_set_power_state, which in turn means auditing all its current
> > callers. But that doesn't seem worth it unless we see other drivers
> > needing something similar...
> > 
> > And if we did go with something like your first patch, I'd still
> > rather see the timeout done in the driver, rather than having the
> > attempts & delay included in the function...
> 
> So what ? The driver would call pci_set_power_state() until it stops
> failing ?

Yeah, that's what I had in mind.

> I'm not too fan of that, because it will change the access pattern
> to the chip:
> 
>  - write PM to 2
>  - short delay
>  - read PM, see 0, return error
>  - driver does big delay
>  - write PM to 2
>  - short delay
>  - read PM ....
> 
> vs. the current sequence which is
> 
>  - write PM to 2
>  - long delay
>  - read PM, be happy
> 
> Which -seems- to be pretty much what happens in practice, though on
> that chip, I don't know for sure about others.
> 
> I'm worried of the possible side effects of the first sequence that
> you propose since it would do 2 things potentially confusing to the
> HW:
> 
>  - read PM after a short delay... it -should- be harmless but you know
> HW as well as I do ...
> 
>  - write PM to 2 a second time after the long delay. Again, it
> -should- be harmless since the chip at this stage should already be
> in D2 state but god knows how the HW will react.
> 
> I'm especially worried about the later in fact. Maybe we can minimize
> it by having pci_set_power_state() dbl check the content of the PM
> reg before writing to it...

Honestly I'm not too happy about any of the approaches, but yeah I see
your point.  The main thing is to prevent any config space access for
a specified time after the first D-state transition, which I think we
do correctly in the core.  Beyond that, we just have to make sure the
core state is updated correctly; Rafael's first patch does that
correctly I think.

Actually now that I think of it, maybe Alex can get us details on the
errata here.  If we just need a longer wait between a D-state transition
and the next config space access for this chip (or set of chips), we
could add that as a proper quirk to the core...

Alex, any thoughts about the bug & patch in
http://bugzilla.kernel.org/show_bug.cgi?id=12846 ?  Looks like old
radeon chips need a workaround when transitioning from D0 -> D2...

Thanks,
Alex Deucher March 24, 2009, 10:04 p.m. UTC | #8
On 3/23/09, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> On Tue, 24 Mar 2009 11:57:19 +1100
>  Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
>  > On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
>  > > The thing I didn't like was that it made the radeon driver use an
>  > > internal interface; I'd really prefer a proper return value from
>  > > pci_set_power_state, which in turn means auditing all its current
>  > > callers. But that doesn't seem worth it unless we see other drivers
>  > > needing something similar...
>  > >
>  > > And if we did go with something like your first patch, I'd still
>  > > rather see the timeout done in the driver, rather than having the
>  > > attempts & delay included in the function...
>  >
>  > So what ? The driver would call pci_set_power_state() until it stops
>  > failing ?
>
>  Yeah, that's what I had in mind.
>
>  > I'm not too fan of that, because it will change the access pattern
>  > to the chip:
>  >
>  >  - write PM to 2
>  >  - short delay
>  >  - read PM, see 0, return error
>  >  - driver does big delay
>  >  - write PM to 2
>  >  - short delay
>  >  - read PM ....
>  >
>  > vs. the current sequence which is
>  >
>  >  - write PM to 2
>  >  - long delay
>  >  - read PM, be happy
>  >
>  > Which -seems- to be pretty much what happens in practice, though on
>  > that chip, I don't know for sure about others.
>  >
>  > I'm worried of the possible side effects of the first sequence that
>  > you propose since it would do 2 things potentially confusing to the
>  > HW:
>  >
>  >  - read PM after a short delay... it -should- be harmless but you know
>  > HW as well as I do ...
>  >
>  >  - write PM to 2 a second time after the long delay. Again, it
>  > -should- be harmless since the chip at this stage should already be
>  > in D2 state but god knows how the HW will react.
>  >
>  > I'm especially worried about the later in fact. Maybe we can minimize
>  > it by having pci_set_power_state() dbl check the content of the PM
>  > reg before writing to it...
>
>  Honestly I'm not too happy about any of the approaches, but yeah I see
>  your point.  The main thing is to prevent any config space access for
>  a specified time after the first D-state transition, which I think we
>  do correctly in the core.  Beyond that, we just have to make sure the
>  core state is updated correctly; Rafael's first patch does that
>  correctly I think.
>
>  Actually now that I think of it, maybe Alex can get us details on the
>  errata here.  If we just need a longer wait between a D-state transition
>  and the next config space access for this chip (or set of chips), we
>  could add that as a proper quirk to the core...
>
>  Alex, any thoughts about the bug & patch in
>  http://bugzilla.kernel.org/show_bug.cgi?id=12846 ?  Looks like old
>  radeon chips need a workaround when transitioning from D0 -> D2...

I don't see any errata for this, but it's hard to find stuff as chips get older.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -436,7 +436,7 @@  static inline int platform_pci_sleep_wak
  */
 static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
-	u16 pmcsr;
+	u16 pmcsr, new_pmcsr;
 	bool need_restore = false;
 
 	/* Check if we're already there */
@@ -498,7 +498,15 @@  static int pci_raw_set_power_state(struc
 	else if (state == PCI_D2 || dev->current_state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
-	dev->current_state = state;
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &new_pmcsr);
+	if (new_pmcsr == pmcsr) {
+		dev->current_state = state;
+	} else {
+		dev->current_state = (new_pmcsr & PCI_PM_CTRL_STATE_MASK);
+		dev_warn(&dev->dev,
+			"failed to set power state to D%d, is D%d\n", state,
+			dev->current_state);
+	}
 
 	/* According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
 	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning