diff mbox

[RFC,0/2] Export platform_pci_set_power_state() and make radeonfb use it

Message ID 200903241200.37982.rjw@sisk.pl
State Superseded, archived
Headers show

Commit Message

Rafael Wysocki March 24, 2009, 11 a.m. UTC
On Tuesday 24 March 2009, Jesse Barnes wrote:
> On Tue, 24 Mar 2009 11:57:19 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> > > The thing I didn't like was that it made the radeon driver use an
> > > internal interface; I'd really prefer a proper return value from
> > > pci_set_power_state, which in turn means auditing all its current
> > > callers. But that doesn't seem worth it unless we see other drivers
> > > needing something similar...
> > > 
> > > And if we did go with something like your first patch, I'd still
> > > rather see the timeout done in the driver, rather than having the
> > > attempts & delay included in the function...
> > 
> > So what ? The driver would call pci_set_power_state() until it stops
> > failing ?
> 
> Yeah, that's what I had in mind.
> 
> > I'm not too fan of that, because it will change the access pattern
> > to the chip:
> > 
> >  - write PM to 2
> >  - short delay
> >  - read PM, see 0, return error
> >  - driver does big delay
> >  - write PM to 2
> >  - short delay
> >  - read PM ....
> > 
> > vs. the current sequence which is
> > 
> >  - write PM to 2
> >  - long delay
> >  - read PM, be happy
> > 
> > Which -seems- to be pretty much what happens in practice, though on
> > that chip, I don't know for sure about others.
> > 
> > I'm worried of the possible side effects of the first sequence that
> > you propose since it would do 2 things potentially confusing to the
> > HW:
> > 
> >  - read PM after a short delay... it -should- be harmless but you know
> > HW as well as I do ...
> > 
> >  - write PM to 2 a second time after the long delay. Again, it
> > -should- be harmless since the chip at this stage should already be
> > in D2 state but god knows how the HW will react.
> > 
> > I'm especially worried about the later in fact. Maybe we can minimize
> > it by having pci_set_power_state() dbl check the content of the PM
> > reg before writing to it...
> 
> Honestly I'm not too happy about any of the approaches, but yeah I see
> your point.  The main thing is to prevent any config space access for
> a specified time after the first D-state transition, which I think we
> do correctly in the core.  Beyond that, we just have to make sure the
> core state is updated correctly; Rafael's first patch does that
> correctly I think.

In fact I have yet another idea.  If we use the "retransmission with exponential
backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
extra parameters to pci_set_power_state() and the radeon case will be covered
automatically.  That should also cover any other devices having similar
problems IMO.

A patch to implement that is appended, please tell me what you think.

Thanks,
Rafael

---
 drivers/pci/pci.c   |   50 ++++++++++++++++++++++++++++++++++++++++----------
 include/linux/pci.h |    1 +
 2 files changed, 41 insertions(+), 10 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Benjamin Herrenschmidt March 24, 2009, 9:12 p.m. UTC | #1
On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:

> In fact I have yet another idea.  If we use the "retransmission with exponential
> backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> extra parameters to pci_set_power_state() and the radeon case will be covered
> automatically.  That should also cover any other devices having similar
> problems IMO.
> 
> A patch to implement that is appended, please tell me what you think.

This is crazy....

Most devices don't need that and those who are a bit "touchy" may well
puke on being written to too fast while they are in the middle of
the transition.

Let's wait to see if Alex from ATI has an answer here, and if not, I
would suggest keeping the dirt inside radeonfb.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki March 24, 2009, 10 p.m. UTC | #2
On Tuesday 24 March 2009, Benjamin Herrenschmidt wrote:
> On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:
> 
> > In fact I have yet another idea.  If we use the "retransmission with exponential
> > backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> > extra parameters to pci_set_power_state() and the radeon case will be covered
> > automatically.  That should also cover any other devices having similar
> > problems IMO.
> > 
> > A patch to implement that is appended, please tell me what you think.
> 
> This is crazy....
> 
> Most devices don't need that and those who are a bit "touchy" may well
> puke on being written to too fast while they are in the middle of
> the transition.
> 
> Let's wait to see if Alex from ATI has an answer here, and if not, I
> would suggest keeping the dirt inside radeonfb.

OK, so what would you like to do exactly?

The only thing I see is not to set current_state to PCI_D2 before calling the
final pci_set_power_state().

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki March 24, 2009, 10:25 p.m. UTC | #3
On Tuesday 24 March 2009, Benjamin Herrenschmidt wrote:
> On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:
> 
> > In fact I have yet another idea.  If we use the "retransmission with exponential
> > backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> > extra parameters to pci_set_power_state() and the radeon case will be covered
> > automatically.  That should also cover any other devices having similar
> > problems IMO.
> > 
> > A patch to implement that is appended, please tell me what you think.
> 
> This is crazy....
> 
> Most devices don't need that and those who are a bit "touchy" may well
> puke on being written to too fast while they are in the middle of
> the transition.
> 
> Let's wait to see if Alex from ATI has an answer here, and if not, I
> would suggest keeping the dirt inside radeonfb.

BTW, whatever you do inside of radeonfb will be based on specific assumptions
regarding how pci_set_power_state() works internally, which makes that
extremely fragile.  This way you'll also make the core depend on the radeonfb
driver to some extent, which I don't think is desirable.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -436,8 +436,10 @@  static inline int platform_pci_sleep_wak
  */
 static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
-	u16 pmcsr;
+	u16 pmcsr, pmcsr_r;
+	unsigned int delay;
 	bool need_restore = false;
+	int error = 0;
 
 	/* Check if we're already there */
 	if (dev->current_state == state)
@@ -488,17 +490,45 @@  static int pci_raw_set_power_state(struc
 		break;
 	}
 
-	/* enter specified state */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
-
-	/* Mandatory power management transition delays */
-	/* see PCI PM 1.1 5.6.1 table 18 */
+	/*
+	 * Mandatory power management transition delays, in microseconds
+	 * (see PCI PM 1.1 5.6.1 table 18).
+	 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
-		msleep(pci_pm_d3_delay);
+		delay = pci_pm_d3_delay * 1000;
 	else if (state == PCI_D2 || dev->current_state == PCI_D2)
-		udelay(PCI_PM_D2_DELAY);
+		delay = PCI_PM_D2_DELAY;
+	else
+		delay = 0;
+
+	/*
+	 * Write the new value to the control register, wait as long as needed
+	 * and check if the value read back from the register is the same as
+	 * the written one.  If not, repeat with exponential backoff.
+	 */
+	do {
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+		if (delay) {
+			if (delay < 1000)
+				udelay(delay);
+			else
+				msleep(DIV_ROUND_UP(delay, 1000));
+			delay <<= 1;
+		}
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr_r);
+		if (pmcsr == pmcsr_r) {
+			dev->current_state = state;
+			break;
+		}
+	} while (delay && delay <= PCI_PM_MAX_DELAY);
 
-	dev->current_state = state;
+	if (pmcsr != pmcsr_r) {
+		dev->current_state = (pmcsr_r & PCI_PM_CTRL_STATE_MASK);
+		dev_warn(&dev->dev,
+			"failed to set power state to D%d, currently in D%d\n",
+			state, dev->current_state);
+		error = -ENODEV;
+	}
 
 	/* According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
 	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
@@ -518,7 +548,7 @@  static int pci_raw_set_power_state(struc
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
-	return 0;
+	return error;
 }
 
 /**
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -117,6 +117,7 @@  typedef int __bitwise pci_power_t;
 #define PCI_UNKNOWN	((pci_power_t __force) 5)
 #define PCI_POWER_ERROR	((pci_power_t __force) -1)
 
+#define PCI_PM_MAX_DELAY 2000000
 #define PCI_PM_D2_DELAY	200
 #define PCI_PM_D3_WAIT	10
 #define PCI_PM_BUS_WAIT	50