Message ID | 1347972050-3509-1-git-send-email-sourav.poddar@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Sep 18, 2012 at 06:10:50PM +0530, Sourav Poddar wrote: > Greg's tty-next is not booting on 2420 based N800. The failure is > observed at serial init itself. The reason might be that n800 tries to > resume even though it is not suspended before. > > Reported-by: Paul Walmsley <paul@pwsan.com> > Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> FWIW: Reviewed-by: Felipe Balbi <balbi@ti.com> Paul does this fix the issue for you ? Note that it depends on a previous patch Sourav sent [1] [1] http://marc.info/?l=linux-omap&m=134796819607889&w=2 There's one thing that gets my attention, though, why only n800 would fail here ? I wonder if we should be using: pm_runtime_set_active(dev); pm_runtime_get_enable(dev); to prevent our runtime_resume() to be called from probe, but Sourav tested and it doesn't work on BeagleBoard, but it works on PandaBoard. Does it ring any bell ?? > --- > This patch is developed on top of greg's tty-next branch > CommitId: e740d8f tty: serial: Samsung: Fix return value + > the following patch which I have already posted to the mailing list. > http://comments.gmane.org/gmane.linux.ports.arm.omap/84729 > > drivers/tty/serial/omap-serial.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c > index 3c05c5e..bc355f2 100644 > --- a/drivers/tty/serial/omap-serial.c > +++ b/drivers/tty/serial/omap-serial.c > @@ -110,6 +110,7 @@ struct uart_omap_port { > u32 calc_latency; > struct work_struct qos_work; > struct pinctrl *pins; > + unsigned int suspended:1; > }; > > #define to_uart_omap_port(p) ((container_of((p), struct uart_omap_port, port))) > @@ -1545,14 +1546,20 @@ static int serial_omap_runtime_suspend(struct device *dev) > up->latency = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE; > schedule_work(&up->qos_work); > > + up->suspended = true; > + > return 0; > } > > static int serial_omap_runtime_resume(struct device *dev) > { > struct uart_omap_port *up = dev_get_drvdata(dev); > + u32 loss_cnt; > + > + if (!up->suspended) > + return 0; > > - u32 loss_cnt = serial_omap_get_context_loss_count(up); > + loss_cnt = serial_omap_get_context_loss_count(up); > > if (up->context_loss_cnt != loss_cnt) > serial_omap_restore_context(up); > @@ -1560,6 +1567,8 @@ static int serial_omap_runtime_resume(struct device *dev) > up->latency = up->calc_latency; > schedule_work(&up->qos_work); > > + up->suspended = false; > + > return 0; > } > #endif > -- > 1.7.1 >
On Tue, 18 Sep 2012, Felipe Balbi wrote: > On Tue, Sep 18, 2012 at 06:10:50PM +0530, Sourav Poddar wrote: > > Greg's tty-next is not booting on 2420 based N800. The failure is > > observed at serial init itself. The reason might be that n800 tries to > > resume even though it is not suspended before. > > > > Reported-by: Paul Walmsley <paul@pwsan.com> > > Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> > > FWIW: > > Reviewed-by: Felipe Balbi <balbi@ti.com> > > Paul does this fix the issue for you ? Note that it depends on a > previous patch Sourav sent [1] > > [1] http://marc.info/?l=linux-omap&m=134796819607889&w=2 Yes, thanks - that fixes it, Tested-by: Paul Walmsley <paul@pwsan.com> > There's one thing that gets my attention, though, why only n800 would > fail here ? > > I wonder if we should be using: > > pm_runtime_set_active(dev); > pm_runtime_get_enable(dev); > > to prevent our runtime_resume() to be called from probe, but Sourav > tested and it doesn't work on BeagleBoard, but it works on PandaBoard. > > Does it ring any bell ?? Nothing off the top of my head but haven't looked into it -- maybe this rings a bell with Kevin? - Paul
On Tue, Sep 18, 2012 at 5:02 PM, Felipe Balbi <balbi@ti.com> wrote: > On Tue, Sep 18, 2012 at 06:10:50PM +0530, Sourav Poddar wrote: >> Greg's tty-next is not booting on 2420 based N800. The failure is >> observed at serial init itself. The reason might be that n800 tries to >> resume even though it is not suspended before. >> >> Reported-by: Paul Walmsley <paul@pwsan.com> >> Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> > > FWIW: > > Reviewed-by: Felipe Balbi <balbi@ti.com> > > Paul does this fix the issue for you ? Note that it depends on a > previous patch Sourav sent [1] > > [1] http://marc.info/?l=linux-omap&m=134796819607889&w=2 > > There's one thing that gets my attention, though, why only n800 would > fail here ? > > I wonder if we should be using: > > pm_runtime_set_active(dev); > pm_runtime_get_enable(dev); > > to prevent our runtime_resume() to be called from probe, but Sourav > tested and it doesn't work on BeagleBoard, but it works on PandaBoard. > > Does it ring any bell ?? Well I guess it's normal resume callback is called during probe as pm_runtime_get_sync() is called there, and according to documentation [1], devices are assumed to be suspended before probe is called. According to [1], pm_runtime_set_active() should be called before pm_runtime_enable() to indicate to the core that device is active during probe if that's the case, I guess this means pm_runtime_get_sync() is not needed in that case, but I'm not sure what's the actual serial state during probe nowadays. [1] chapter 5 of Documentation/power/runtime_pm.txt: The initial runtime PM status of all devices is 'suspended', but it need not reflect the actual physical state of the device. Thus, if the device is initially active (i.e. it is able to process I/O), its runtime PM status must be changed to 'active', with the help of pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
On Wed, Sep 19, 2012 at 02:52:18PM +0300, Grazvydas Ignotas wrote: > On Tue, Sep 18, 2012 at 5:02 PM, Felipe Balbi <balbi@ti.com> wrote: > > On Tue, Sep 18, 2012 at 06:10:50PM +0530, Sourav Poddar wrote: > >> Greg's tty-next is not booting on 2420 based N800. The failure is > >> observed at serial init itself. The reason might be that n800 tries to > >> resume even though it is not suspended before. > >> > >> Reported-by: Paul Walmsley <paul@pwsan.com> > >> Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> > > > > FWIW: > > > > Reviewed-by: Felipe Balbi <balbi@ti.com> > > > > Paul does this fix the issue for you ? Note that it depends on a > > previous patch Sourav sent [1] > > > > [1] http://marc.info/?l=linux-omap&m=134796819607889&w=2 > > > > There's one thing that gets my attention, though, why only n800 would > > fail here ? > > > > I wonder if we should be using: > > > > pm_runtime_set_active(dev); > > pm_runtime_get_enable(dev); > > > > to prevent our runtime_resume() to be called from probe, but Sourav > > tested and it doesn't work on BeagleBoard, but it works on PandaBoard. > > > > Does it ring any bell ?? > > Well I guess it's normal resume callback is called during probe as > pm_runtime_get_sync() is called there, and according to documentation > [1], devices are assumed to be suspended before probe is called. > > According to [1], pm_runtime_set_active() should be called before > pm_runtime_enable() to indicate to the core that device is active > during probe if that's the case, I guess this means > pm_runtime_get_sync() is not needed in that case, but I'm not sure > what's the actual serial state during probe nowadays. > > [1] chapter 5 of Documentation/power/runtime_pm.txt: > The initial runtime PM status of all devices is 'suspended', but it > need not reflect the actual physical state of the device. Thus, if the > device is initially active (i.e. it is able to process I/O), its > runtime PM status must be changed to 'active', with the help of > pm_runtime_set_active(), before pm_runtime_enable() is called for the > device. this is what I mean, actually. If we remove pm_runtime_get_sync() in exchange for pm_runtime_set_active() before pm_runtime_enable(), it works on PandaBoard, but breaks BeagleBoard.
Hi Greg, Ping on this? On Tue, Sep 18, 2012 at 6:10 PM, Sourav Poddar <sourav.poddar@ti.com> wrote: > Greg's tty-next is not booting on 2420 based N800. The failure is > observed at serial init itself. The reason might be that n800 tries to > resume even though it is not suspended before. > > Reported-by: Paul Walmsley <paul@pwsan.com> > Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> > --- > This patch is developed on top of greg's tty-next branch > CommitId: e740d8f tty: serial: Samsung: Fix return value + > the following patch which I have already posted to the mailing list. > http://comments.gmane.org/gmane.linux.ports.arm.omap/84729 > > drivers/tty/serial/omap-serial.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c > index 3c05c5e..bc355f2 100644 > --- a/drivers/tty/serial/omap-serial.c > +++ b/drivers/tty/serial/omap-serial.c > @@ -110,6 +110,7 @@ struct uart_omap_port { > u32 calc_latency; > struct work_struct qos_work; > struct pinctrl *pins; > + unsigned int suspended:1; > }; > > #define to_uart_omap_port(p) ((container_of((p), struct uart_omap_port, port))) > @@ -1545,14 +1546,20 @@ static int serial_omap_runtime_suspend(struct device *dev) > up->latency = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE; > schedule_work(&up->qos_work); > > + up->suspended = true; > + > return 0; > } > > static int serial_omap_runtime_resume(struct device *dev) > { > struct uart_omap_port *up = dev_get_drvdata(dev); > + u32 loss_cnt; > + > + if (!up->suspended) > + return 0; > > - u32 loss_cnt = serial_omap_get_context_loss_count(up); > + loss_cnt = serial_omap_get_context_loss_count(up); > > if (up->context_loss_cnt != loss_cnt) > serial_omap_restore_context(up); > @@ -1560,6 +1567,8 @@ static int serial_omap_runtime_resume(struct device *dev) > up->latency = up->calc_latency; > schedule_work(&up->qos_work); > > + up->suspended = false; > + > return 0; > } > #endif > -- > 1.7.1 > ~Sourav
On Tue, Sep 25, 2012 at 01:52:03PM +0530, Poddar, Sourav wrote: > Hi Greg, > > Ping on this? > > On Tue, Sep 18, 2012 at 6:10 PM, Sourav Poddar <sourav.poddar@ti.com> wrote: > > Greg's tty-next is not booting on 2420 based N800. The failure is > > observed at serial init itself. The reason might be that n800 tries to > > resume even though it is not suspended before. How is this happening? I think that needs proper investigation - or if it's had more investigation, then the results needs to be included in the commit description so that everyone can understand the issue here. We should not be resuming a device which hasn't been suspended. Maybe the runtime PM enable sequence is wrong, and that's what should be fixed instead? This sequence in the probe() function: pm_runtime_irq_safe(&pdev->dev); pm_runtime_enable(&pdev->dev); pm_runtime_get_sync(&pdev->dev); would enable runtime PM while the s/w state indicates that it's disabled, and then that pm_runtime_get_sync() will want to resume the device. See the section "5. Runtime PM Initialization, Device Probing and Removal" in Documentation/power/runtime_pm.txt, specifically the second paragraph of that section.
On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 25, 2012 at 01:52:03PM +0530, Poddar, Sourav wrote: > > Hi Greg, > > > > Ping on this? > > > > On Tue, Sep 18, 2012 at 6:10 PM, Sourav Poddar <sourav.poddar@ti.com> wrote: > > > Greg's tty-next is not booting on 2420 based N800. The failure is > > > observed at serial init itself. The reason might be that n800 tries to > > > resume even though it is not suspended before. > > How is this happening? I think that needs proper investigation - or if > it's had more investigation, then the results needs to be included in > the commit description so that everyone can understand the issue here. > > We should not be resuming a device which hasn't been suspended. Maybe > the runtime PM enable sequence is wrong, and that's what should be fixed > instead? > > This sequence in the probe() function: > > pm_runtime_irq_safe(&pdev->dev); > pm_runtime_enable(&pdev->dev); > pm_runtime_get_sync(&pdev->dev); > > would enable runtime PM while the s/w state indicates that it's disabled, > and then that pm_runtime_get_sync() will want to resume the device. See > the section "5. Runtime PM Initialization, Device Probing and Removal" > in Documentation/power/runtime_pm.txt, specifically the second paragraph > of that section. that was tested. It worked in pandaboard but didn't work on beagleboard XM. Sourav tried to start a discussion about that, but it simply died... In any case, pm_runtime_get_sync() in probe will always call runtime_resume callback, right ?
On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > > How is this happening? I think that needs proper investigation - or if > > > it's had more investigation, then the results needs to be included in > > > the commit description so that everyone can understand the issue here. > > > > > > We should not be resuming a device which hasn't been suspended. Maybe > > > the runtime PM enable sequence is wrong, and that's what should be fixed > > > instead? > > > > > > This sequence in the probe() function: > > > > > > pm_runtime_irq_safe(&pdev->dev); > > > pm_runtime_enable(&pdev->dev); > > > pm_runtime_get_sync(&pdev->dev); > > > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > > and then that pm_runtime_get_sync() will want to resume the device. See > > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > > of that section. > > > > that was tested. It worked in pandaboard but didn't work on beagleboard > > XM. Sourav tried to start a discussion about that, but it simply died... > > > > In any case, pm_runtime_get_sync() in probe will always call > > runtime_resume callback, right ? > > Well, if the runtime PM state says it's suspended, and then you enable > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume > attempt. The patch description is complaining about resume events without > there being a preceding suspend event. > > This could well be why. that's most likely, of course. But should we cause a regression to beagleboard XM because of that ? Also, if you look into chapter 9 of the runtime_pm documentation, starting on line 822 you'll see documentation suggests the use of mystruct->is_suspended flag.
On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > How is this happening? I think that needs proper investigation - or if > > it's had more investigation, then the results needs to be included in > > the commit description so that everyone can understand the issue here. > > > > We should not be resuming a device which hasn't been suspended. Maybe > > the runtime PM enable sequence is wrong, and that's what should be fixed > > instead? > > > > This sequence in the probe() function: > > > > pm_runtime_irq_safe(&pdev->dev); > > pm_runtime_enable(&pdev->dev); > > pm_runtime_get_sync(&pdev->dev); > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > and then that pm_runtime_get_sync() will want to resume the device. See > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > of that section. > > that was tested. It worked in pandaboard but didn't work on beagleboard > XM. Sourav tried to start a discussion about that, but it simply died... > > In any case, pm_runtime_get_sync() in probe will always call > runtime_resume callback, right ? Well, if the runtime PM state says it's suspended, and then you enable runtime PM, the first call to pm_runtime_get_sync() will trigger a resume attempt. The patch description is complaining about resume events without there being a preceding suspend event. This could well be why.
On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: > On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: > > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > > > How is this happening? I think that needs proper investigation - or if > > > > it's had more investigation, then the results needs to be included in > > > > the commit description so that everyone can understand the issue here. > > > > > > > > We should not be resuming a device which hasn't been suspended. Maybe > > > > the runtime PM enable sequence is wrong, and that's what should be fixed > > > > instead? > > > > > > > > This sequence in the probe() function: > > > > > > > > pm_runtime_irq_safe(&pdev->dev); > > > > pm_runtime_enable(&pdev->dev); > > > > pm_runtime_get_sync(&pdev->dev); > > > > > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > > > and then that pm_runtime_get_sync() will want to resume the device. See > > > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > > > of that section. > > > > > > that was tested. It worked in pandaboard but didn't work on beagleboard > > > XM. Sourav tried to start a discussion about that, but it simply died... > > > > > > In any case, pm_runtime_get_sync() in probe will always call > > > runtime_resume callback, right ? > > > > Well, if the runtime PM state says it's suspended, and then you enable > > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume > > attempt. The patch description is complaining about resume events without > > there being a preceding suspend event. > > > > This could well be why. > > that's most likely, of course. But should we cause a regression to > beagleboard XM because of that ? What would cause a regression on beagleboard XM? I have not suggested any change other than more investigation of the issue and a fuller patch description - yet you're screaming (idiotically IMHO) that mere investigation would break beagleboard. Well, if it's _that_ fragile, that mere investigation of this issue by someone elsewhere on the planet would break your beagleboard, maybe it deserves to be broken!
Hi, On Tue, Sep 25, 2012 at 10:21:18AM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: > > On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: > > > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > > > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > > > > How is this happening? I think that needs proper investigation - or if > > > > > it's had more investigation, then the results needs to be included in > > > > > the commit description so that everyone can understand the issue here. > > > > > > > > > > We should not be resuming a device which hasn't been suspended. Maybe > > > > > the runtime PM enable sequence is wrong, and that's what should be fixed > > > > > instead? > > > > > > > > > > This sequence in the probe() function: > > > > > > > > > > pm_runtime_irq_safe(&pdev->dev); > > > > > pm_runtime_enable(&pdev->dev); > > > > > pm_runtime_get_sync(&pdev->dev); > > > > > > > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > > > > and then that pm_runtime_get_sync() will want to resume the device. See > > > > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > > > > of that section. > > > > > > > > that was tested. It worked in pandaboard but didn't work on beagleboard > > > > XM. Sourav tried to start a discussion about that, but it simply died... > > > > > > > > In any case, pm_runtime_get_sync() in probe will always call > > > > runtime_resume callback, right ? > > > > > > Well, if the runtime PM state says it's suspended, and then you enable > > > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume > > > attempt. The patch description is complaining about resume events without > > > there being a preceding suspend event. > > > > > > This could well be why. > > > > that's most likely, of course. But should we cause a regression to > > beagleboard XM because of that ? > > What would cause a regression on beagleboard XM? I have not suggested > any change other than more investigation of the issue and a fuller patch > description - yet you're screaming (idiotically IMHO) that mere > investigation would break beagleboard. > > Well, if it's _that_ fragile, that mere investigation of this issue by > someone elsewhere on the planet would break your beagleboard, maybe it > deserves to be broken! why are you always so over the top like that ? This is just counter-productive to say the least. What I'm trying to say here, is that there might be a bug either on pm core or on OMAP3's implementation and I'd like to get input from Tony, Santosh, Paul, etc before going forward. Maybe it's something they've already been through, or maybe it rings a bell and points to somewhere we should look for. anyway, instead of feeding your ego, we can stop this discussion and let Sourav see what he can come up with.
Hi, On Tue, Sep 25, 2012 at 2:51 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: >> On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: >> > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: >> > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: >> > > > How is this happening? I think that needs proper investigation - or if >> > > > it's had more investigation, then the results needs to be included in >> > > > the commit description so that everyone can understand the issue here. >> > > > >> > > > We should not be resuming a device which hasn't been suspended. Maybe >> > > > the runtime PM enable sequence is wrong, and that's what should be fixed >> > > > instead? >> > > > >> > > > This sequence in the probe() function: >> > > > >> > > > pm_runtime_irq_safe(&pdev->dev); >> > > > pm_runtime_enable(&pdev->dev); >> > > > pm_runtime_get_sync(&pdev->dev); >> > > > >> > > > would enable runtime PM while the s/w state indicates that it's disabled, >> > > > and then that pm_runtime_get_sync() will want to resume the device. See >> > > > the section "5. Runtime PM Initialization, Device Probing and Removal" >> > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph >> > > > of that section. >> > > >> > > that was tested. It worked in pandaboard but didn't work on beagleboard >> > > XM. Sourav tried to start a discussion about that, but it simply died... >> > > >> > > In any case, pm_runtime_get_sync() in probe will always call >> > > runtime_resume callback, right ? >> > >> > Well, if the runtime PM state says it's suspended, and then you enable >> > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume >> > attempt. The patch description is complaining about resume events without >> > there being a preceding suspend event. >> > >> > This could well be why. >> >> that's most likely, of course. But should we cause a regression to >> beagleboard XM because of that ? > > What would cause a regression on beagleboard XM? I have not suggested > any change other than more investigation of the issue and a fuller patch > description - yet you're screaming (idiotically IMHO) that mere > investigation would break beagleboard. > > Well, if it's _that_ fragile, that mere investigation of this issue by > someone elsewhere on the planet would break your beagleboard, maybe it > deserves to be broken! The issue was observed at serial init itself in the N800 board and the log does not show up much. http://www.pwsan.com/omap/testlogs/test_tty_next_e36851d0/20120910020323/boot/2420n800/2420n800_log.txt What we thought the problem might be with n800 is that it tries to resume when it didn't suspend before. There are two ways through which we thought of handling this issue: a) set device as active before enabling pm (which will prevent pm_runtime_set_active(dev); pm_runtime_enable(dev); OR b) adding a "suspended" flag to struct omap_uart_port which gets set on suspend and cleared on resume. Then on resume you can check: if (!up->suspended) return 0; But using "pm_runtime_set_active" approach breaks things even on beagle board xm, though it works fine on Panda. Therefore, we used the "suspended" flag approach. So. I just wanted to get some feedback from community about how using "pm_runtime_set_active" behaves differently in omap3 and omap4.
On Tue, Sep 25, 2012 at 12:48:16PM +0300, Felipe Balbi wrote: > Hi, > > On Tue, Sep 25, 2012 at 10:21:18AM +0100, Russell King - ARM Linux wrote: > > On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: > > > On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: > > > > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > > > > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > > > > > How is this happening? I think that needs proper investigation - or if > > > > > > it's had more investigation, then the results needs to be included in > > > > > > the commit description so that everyone can understand the issue here. > > > > > > > > > > > > We should not be resuming a device which hasn't been suspended. Maybe > > > > > > the runtime PM enable sequence is wrong, and that's what should be fixed > > > > > > instead? > > > > > > > > > > > > This sequence in the probe() function: > > > > > > > > > > > > pm_runtime_irq_safe(&pdev->dev); > > > > > > pm_runtime_enable(&pdev->dev); > > > > > > pm_runtime_get_sync(&pdev->dev); > > > > > > > > > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > > > > > and then that pm_runtime_get_sync() will want to resume the device. See > > > > > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > > > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > > > > > of that section. > > > > > > > > > > that was tested. It worked in pandaboard but didn't work on beagleboard > > > > > XM. Sourav tried to start a discussion about that, but it simply died... > > > > > > > > > > In any case, pm_runtime_get_sync() in probe will always call > > > > > runtime_resume callback, right ? > > > > > > > > Well, if the runtime PM state says it's suspended, and then you enable > > > > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume > > > > attempt. The patch description is complaining about resume events without > > > > there being a preceding suspend event. > > > > > > > > This could well be why. > > > > > > that's most likely, of course. But should we cause a regression to > > > beagleboard XM because of that ? > > > > What would cause a regression on beagleboard XM? I have not suggested > > any change other than more investigation of the issue and a fuller patch > > description - yet you're screaming (idiotically IMHO) that mere > > investigation would break beagleboard. > > > > Well, if it's _that_ fragile, that mere investigation of this issue by > > someone elsewhere on the planet would break your beagleboard, maybe it > > deserves to be broken! > > why are you always so over the top like that ? This is just > counter-productive to say the least. Because you are accusing me of potentially breaking your beagleboard for merely suggesting further investigation and a better commit message. You are the one going over the top, not me.
On Tue, Sep 25, 2012 at 11:29:58AM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 25, 2012 at 12:48:16PM +0300, Felipe Balbi wrote: > > Hi, > > > > On Tue, Sep 25, 2012 at 10:21:18AM +0100, Russell King - ARM Linux wrote: > > > On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: > > > > On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: > > > > > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: > > > > > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: > > > > > > > How is this happening? I think that needs proper investigation - or if > > > > > > > it's had more investigation, then the results needs to be included in > > > > > > > the commit description so that everyone can understand the issue here. > > > > > > > > > > > > > > We should not be resuming a device which hasn't been suspended. Maybe > > > > > > > the runtime PM enable sequence is wrong, and that's what should be fixed > > > > > > > instead? > > > > > > > > > > > > > > This sequence in the probe() function: > > > > > > > > > > > > > > pm_runtime_irq_safe(&pdev->dev); > > > > > > > pm_runtime_enable(&pdev->dev); > > > > > > > pm_runtime_get_sync(&pdev->dev); > > > > > > > > > > > > > > would enable runtime PM while the s/w state indicates that it's disabled, > > > > > > > and then that pm_runtime_get_sync() will want to resume the device. See > > > > > > > the section "5. Runtime PM Initialization, Device Probing and Removal" > > > > > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph > > > > > > > of that section. > > > > > > > > > > > > that was tested. It worked in pandaboard but didn't work on beagleboard > > > > > > XM. Sourav tried to start a discussion about that, but it simply died... > > > > > > > > > > > > In any case, pm_runtime_get_sync() in probe will always call > > > > > > runtime_resume callback, right ? > > > > > > > > > > Well, if the runtime PM state says it's suspended, and then you enable > > > > > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume > > > > > attempt. The patch description is complaining about resume events without > > > > > there being a preceding suspend event. > > > > > > > > > > This could well be why. > > > > > > > > that's most likely, of course. But should we cause a regression to > > > > beagleboard XM because of that ? > > > > > > What would cause a regression on beagleboard XM? I have not suggested > > > any change other than more investigation of the issue and a fuller patch > > > description - yet you're screaming (idiotically IMHO) that mere > > > investigation would break beagleboard. > > > > > > Well, if it's _that_ fragile, that mere investigation of this issue by > > > someone elsewhere on the planet would break your beagleboard, maybe it > > > deserves to be broken! > > > > why are you always so over the top like that ? This is just > > counter-productive to say the least. > > Because you are accusing me of potentially breaking your beagleboard > for merely suggesting further investigation and a better commit message. Where did I accuse you of anyting ? I just mentioned we experienced a regression with beagleboard XM when using pm_runtime_set_active(). here's my quote: > that was tested. It worked in pandaboard but didn't work on > beagleboard XM. Sourav tried to start a discussion about that, but it > simply died... To add extra info, here you go: We pinged Paul and asked if he had seen that before, he had no pointers... Because Documentation/power/runtime_pm.txt was using a mystruct->is_suspended flag, we just decided to follow the same "design" since no-one was able to suggest why pm_runtime_set_active() was breaking beagleXM nor how it was supposed to actually work. Reading the code: pm_runtime_set_active() would tell pm_runtime core the device is actually active by setting runtime_status to RPM_ACTIVE, thus the following pm_runtime_get_sync() wouldn't actually call runtime_resume() callback, but it would increment usage_counter. I can't see why this would fail on beagleXM, but it does and we'd like to hear in which situations this could fail...
Right, let's get this thread back onto a constructive footing and try to understand the problems here. On Tue, Sep 25, 2012 at 03:26:06PM +0530, Poddar, Sourav wrote: > The issue was observed at serial init itself in the N800 board and the > log does not show up much. > http://www.pwsan.com/omap/testlogs/test_tty_next_e36851d0/20120910020323/boot/2420n800/2420n800_log.txt Right, so output stops when ttyO2 is initialized. > What we thought the problem might be with n800 is that it tries to > resume when it didn't suspend before. > > There are two ways through which we thought of handling this issue: > > a) set device as active before enabling pm (which will prevent > > pm_runtime_set_active(dev); > pm_runtime_enable(dev); > > OR > > b) adding a "suspended" flag to struct omap_uart_port which gets set on > suspend and cleared on resume. Then on resume you can check: > > if (!up->suspended) > return 0; > > But using "pm_runtime_set_active" approach breaks things even on > beagle board xm, though it works fine on Panda. Right, so now the question becomes - what is different on the Beagle Board that prevents solution (a) from working - or put it another way, have we uncovered _another_ bug. For N800, for ttyO0 and ttyO1 which aren't in use, it may be correct. We don't know for certain. For ttyO2, the port is clearly already in use as it's being used for console output. So that means it's _not_ in suspended state, and therefore the initial runtime PM state is wrong. For Beagleboard - who knows, we have no information on that. All we know is that your (a) sequence causes a regression. Anyway, let's analyse the code paths. What is pm_runtime_get_sync() doing? Well, it calls __pm_runtime_resume(, RPM_GET_PUT). That alters the parent device's state (which doesn't matter for this, as the platform device is a child of the main platform device, which has no runtime PM.) It then calls the resume callback (from the pm domain/type/class/bus/ driver) and then does an idle check. OMAP devices have a pm domain, so this is the candidate for the callback at this level, which gets us into _od_runtime_resume(). This calls omap_device_enable() before then moving on to the driver's runtime resume method, and at that point the runtime PM resume is complete. So, with your (a) solution, what's different is that we omit calling omap_device_enable(). Therefore, my hunch is the reason that Beagleboard doesn't work is because it doesn't like missing that call. I bet if you do this: omap_device_enable(dev); pm_runtime_set_active(dev); pm_runtime_enable(dev); that this will solve your problem, and Beagleboard will continue working. What we have is a mismatch in both your case, and the Beagleboard case, between the real state of the hardware, the runtime PM state, and the OMAP hwmod state, and the above should bring all those states entirely into line with each other for _every_ case. It doesn't need any hacks to prevent resume callbacks without prior suspends (which, incidentally, the runtime PM core already guarantees provided you get the initial state correct.)
On Tue, Sep 25, 2012 at 01:37:03PM +0300, Felipe Balbi wrote: > On Tue, Sep 25, 2012 at 11:29:58AM +0100, Russell King - ARM Linux wrote: > > Because you are accusing me of potentially breaking your beagleboard > > for merely suggesting further investigation and a better commit message. > > Where did I accuse you of anyting ? I just mentioned we experienced a > regression with beagleboard XM when using pm_runtime_set_active(). I quote: :> But should we cause a regression to beagleboard XM because of that ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I say again: I was _only_ suggesting further investigation, yet you were mouthing off about causing a regression to beagleboard "because of that", effectively saying that no, we should not do any further investigation and this is the only fix. > To add extra info, here you go: Finally, something constructive. > We pinged Paul and asked if he had seen that before, he had no > pointers... Because Documentation/power/runtime_pm.txt was using a > mystruct->is_suspended flag, we just decided to follow the same > "design" since no-one was able to suggest why pm_runtime_set_active() > was breaking beagleXM nor how it was supposed to actually work. > > Reading the code: pm_runtime_set_active() would tell pm_runtime core > the device is actually active by setting runtime_status to RPM_ACTIVE, > thus the following pm_runtime_get_sync() wouldn't actually call > runtime_resume() callback, but it would increment usage_counter. > > I can't see why this would fail on beagleXM, but it does and we'd like > to hear in which situations this could fail... Well, I've just spent five minutes analysing the code paths - which I hadn't looked at before - and I've pointed out what's probably causing the problem for Beagle. I think you owe me an appology over your aggressive attitude towards my suggestions that there needed to be some further investigation.
Hi, On Tue, Sep 25, 2012 at 12:07:03PM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 25, 2012 at 01:37:03PM +0300, Felipe Balbi wrote: > > On Tue, Sep 25, 2012 at 11:29:58AM +0100, Russell King - ARM Linux wrote: > > > Because you are accusing me of potentially breaking your beagleboard > > > for merely suggesting further investigation and a better commit message. > > > > Where did I accuse you of anyting ? I just mentioned we experienced a > > regression with beagleboard XM when using pm_runtime_set_active(). > > I quote: > :> But should we cause a regression to beagleboard XM because of that ? > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ maybe that wasn't the best way to put it, but I was trying to point out that either there was a bug on pm core or omap_device/hwmod, not that we shouldn't investigate. > I say again: I was _only_ suggesting further investigation, yet you > were mouthing off about causing a regression to beagleboard "because > of that", effectively saying that no, we should not do any further > investigation and this is the only fix. not what I meant, but fair enough... The "because of that" was supposed to mean "because of pm_runtime_set_active()". I find the documentation for that particular call to be rather poor and a bit confusing, specially when further down, the very same document uses a "is_suspended" flag which, in fact, shouldn't be needed when we have pm_runtime_is_suspended() and the like. > > We pinged Paul and asked if he had seen that before, he had no > > pointers... Because Documentation/power/runtime_pm.txt was using a > > mystruct->is_suspended flag, we just decided to follow the same > > "design" since no-one was able to suggest why pm_runtime_set_active() > > was breaking beagleXM nor how it was supposed to actually work. > > > > Reading the code: pm_runtime_set_active() would tell pm_runtime core > > the device is actually active by setting runtime_status to RPM_ACTIVE, > > thus the following pm_runtime_get_sync() wouldn't actually call > > runtime_resume() callback, but it would increment usage_counter. > > > > I can't see why this would fail on beagleXM, but it does and we'd like > > to hear in which situations this could fail... > > Well, I've just spent five minutes analysing the code paths - which I > hadn't looked at before - and I've pointed out what's probably causing > the problem for Beagle. I think you owe me an appology over your > aggressive attitude towards my suggestions that there needed to be > some further investigation. I don't see any aggressive attitude towards what you suggested, actually. Mailing list archives are available to check, but the one cursing around was always yourself and THAT deserves an apology. If you still think I've been at all aggressive, then sure, I apologize, it wasn't what I meant though.
On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: > that's most likely, of course. But should we cause a regression to > beagleboard XM because of that ? Also, if you look into chapter 9 of the > runtime_pm documentation, starting on line 822 you'll see documentation > suggests the use of mystruct->is_suspended flag. BTW, I'll point out a fatal flaw in your justification above. If you read the entire example, you'll see that the is_suspended flag is _not_ used to prevent resumes without suspends, but is used as a flag to control whether the driver processes requests or not. That's entirely functionally different from using a "is_suspended" flag in the way you mention above. Section 5 is quite clear about the requirements at initialization time for runtime PM, and nothing in section 9 contradicts that, and the is_suspended flag in that example has nothing to do with any of this.
On Tue, Sep 25, 2012 at 02:12:43PM +0300, Felipe Balbi wrote: > I don't see any aggressive attitude towards what you suggested, > actually. Mailing list archives are available to check, but the one > cursing around was always yourself and THAT deserves an apology. Total rubbish. No apology, because I wasn't cursing you. Whatever, I'm not going to re-explain the email thread. What I said was that your raising of beagleboard breakage (twice) was idiotic to a request for investigation. That's a statement I _still_ fully agree with, and I'll say it again if I have to. Trying to shut down investigation because investigation may break something _is_ idiotic - especially if the investigation reveals potential reasons why that breakage would occur. Further investigation is precisely how we arrive at better solutions.
On 19 September 2012 17:29, Felipe Balbi <balbi@ti.com> wrote: > this is what I mean, actually. If we remove pm_runtime_get_sync() in > exchange for pm_runtime_set_active() before pm_runtime_enable(), it > works on PandaBoard, but breaks BeagleBoard. > Perhaps it suggests that OMAP4 (PandaBoard) serial port is already activated but OMAP3 (BeagleBoard) isn't. So we need to reflect that either by doing pm_runtime_set_active() only on OMAP4 or by bringing up the OMAP3 too before the pm_runtime_set_active() call. BTW I understand get_sync(), set_active() and enable() are for different purposes, they can't be traded for one another.
On Tue, Sep 25, 2012 at 01:52:03PM +0530, Poddar, Sourav wrote: > Hi Greg, > > Ping on this? It was a RFT patch, with a huge thread. What's the resolution here? Did people figure out the real problem here or not? If so, care to resend the proper patch so I know what to apply? thanks, greg k-h
"Poddar, Sourav" <sourav.poddar@ti.com> writes: > Hi, > > On Tue, Sep 25, 2012 at 2:51 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote: >>> On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote: >>> > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote: >>> > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux wrote: >>> > > > How is this happening? I think that needs proper investigation - or if >>> > > > it's had more investigation, then the results needs to be included in >>> > > > the commit description so that everyone can understand the issue here. >>> > > > >>> > > > We should not be resuming a device which hasn't been suspended. Maybe >>> > > > the runtime PM enable sequence is wrong, and that's what should be fixed >>> > > > instead? >>> > > > >>> > > > This sequence in the probe() function: >>> > > > >>> > > > pm_runtime_irq_safe(&pdev->dev); >>> > > > pm_runtime_enable(&pdev->dev); >>> > > > pm_runtime_get_sync(&pdev->dev); >>> > > > >>> > > > would enable runtime PM while the s/w state indicates that it's disabled, >>> > > > and then that pm_runtime_get_sync() will want to resume the device. See >>> > > > the section "5. Runtime PM Initialization, Device Probing and Removal" >>> > > > in Documentation/power/runtime_pm.txt, specifically the second paragraph >>> > > > of that section. >>> > > >>> > > that was tested. It worked in pandaboard but didn't work on beagleboard >>> > > XM. Sourav tried to start a discussion about that, but it simply died... >>> > > >>> > > In any case, pm_runtime_get_sync() in probe will always call >>> > > runtime_resume callback, right ? >>> > >>> > Well, if the runtime PM state says it's suspended, and then you enable >>> > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume >>> > attempt. The patch description is complaining about resume events without >>> > there being a preceding suspend event. >>> > >>> > This could well be why. >>> >>> that's most likely, of course. But should we cause a regression to >>> beagleboard XM because of that ? >> >> What would cause a regression on beagleboard XM? I have not suggested >> any change other than more investigation of the issue and a fuller patch >> description - yet you're screaming (idiotically IMHO) that mere >> investigation would break beagleboard. >> >> Well, if it's _that_ fragile, that mere investigation of this issue by >> someone elsewhere on the planet would break your beagleboard, maybe it >> deserves to be broken! > > The issue was observed at serial init itself in the N800 board and the > log does not > show up much. > http://www.pwsan.com/omap/testlogs/test_tty_next_e36851d0/20120910020323/boot/2420n800/2420n800_log.txt > What we thought the problem might be with n800 is that it tries to > resume when it didn't suspend before. > > There are two ways through which we thought of handling this issue: > > a) set device as active before enabling pm (which will prevent > > pm_runtime_set_active(dev); > pm_runtime_enable(dev); > > OR > > b) adding a "suspended" flag to struct omap_uart_port which gets set on > suspend and cleared on resume. Then on resume you can check: > > if (!up->suspended) > return 0; > > But using "pm_runtime_set_active" approach breaks things even on > beagle board xm, though > it works fine on Panda. > Therefore, we used the "suspended" flag approach. > > So. I just wanted to get some feedback from community about how using > "pm_runtime_set_active" > behaves differently in omap3 and omap4. As Russell has already pointed out in great detail, the difference is simply a mismatch between assumed HW stated and actual hardware state between various boards. Put simply, the driver assumes the HW is disabled (runtime suspended) when it loads, and the first runtime resume is meant to enable the HW. If that assumption is wrong, it needs to be fixed. Have you figured out why the HW is already active on OMAP2? (probably bootloader?) That being said, already active HW should not cause this problem. In fact, because of possible early console use, the hwmod init of the UART hwmods does not idle/reset them on boot, so they are left in the state that the bootloader set them in. When the hwmod is later enabled for real during probe, the hwmod muxing is done for that IP. So, I suspect what is really happening is that the mux settings are not right for the UARTS on n800, so when the probe happens, the UART mux settings are changed and you lose the UART. Can you double check the UART mux settings for that board? You might need some different mux settings in the board file. Kevin
Hi Sourav, Felipe, any progress on fixing the N800 problem? Would be good to keep it booting since we use it as our primary 2420 test platform. - Paul
diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c index 3c05c5e..bc355f2 100644 --- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -110,6 +110,7 @@ struct uart_omap_port { u32 calc_latency; struct work_struct qos_work; struct pinctrl *pins; + unsigned int suspended:1; }; #define to_uart_omap_port(p) ((container_of((p), struct uart_omap_port, port))) @@ -1545,14 +1546,20 @@ static int serial_omap_runtime_suspend(struct device *dev) up->latency = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE; schedule_work(&up->qos_work); + up->suspended = true; + return 0; } static int serial_omap_runtime_resume(struct device *dev) { struct uart_omap_port *up = dev_get_drvdata(dev); + u32 loss_cnt; + + if (!up->suspended) + return 0; - u32 loss_cnt = serial_omap_get_context_loss_count(up); + loss_cnt = serial_omap_get_context_loss_count(up); if (up->context_loss_cnt != loss_cnt) serial_omap_restore_context(up); @@ -1560,6 +1567,8 @@ static int serial_omap_runtime_resume(struct device *dev) up->latency = up->calc_latency; schedule_work(&up->qos_work); + up->suspended = false; + return 0; } #endif
Greg's tty-next is not booting on 2420 based N800. The failure is observed at serial init itself. The reason might be that n800 tries to resume even though it is not suspended before. Reported-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> --- This patch is developed on top of greg's tty-next branch CommitId: e740d8f tty: serial: Samsung: Fix return value + the following patch which I have already posted to the mailing list. http://comments.gmane.org/gmane.linux.ports.arm.omap/84729 drivers/tty/serial/omap-serial.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletions(-)