diff mbox

Strange OMAP3 LCD display regression - bisected.

Message ID 20130602165019.655da2eb@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown June 2, 2013, 6:50 a.m. UTC
Hi Tomi, Rajendra and other,
 I have run into a rather strange regression with the LCD panel display on my
 GTA04 (OMAP3630 based phone) that you might be able to help me with.

 I bisected down to two patches, neither of which cause the problem by
 themselves, but both together do.

 They are:

commit bd0f5cc3641cb76ae8fa2cc4924c29da157f8b2d
Author: Tomi Valkeinen <tomi.valkeinen@ti.com>
Date:   Thu Dec 13 14:21:30 2012 +0200

    OMAPDSS: fix TV-out issue with DSI PLL

and 

commit f7f73aab3d5bd19ba0f72ff00fb6b3be28e1a4d3
Author: Rajendra Nayak <rnayak@ti.com>
Date:   Fri Apr 27 16:27:51 2012 +0530

    ARM: OMAP: clock: Switch to COMMON clk


Yes - rather odd.

The details:
 I'm currently trying to move from a 3.7 kernel on my GTA04 to a 3.10-rc
 kernel (hopefully to have 3.10 fully working by the time it comes out..).

 Once I got it compiling and booting I found that while the display worked
 perfectly at boot, after it has been turned off (ioctl(FBIOBLANK)) and back
 on again, it is all white - no discernible image at all.
 A suspend/resume cycle at this point might bring it back, but it is often
 shimmery (low vsync?) and once it had inverse colours.

 I had noticed that the panel which normally gets a 22153 kHz pixel clock was
 only getting a 21600 kHz clock.  This turned out to be due to the fact
 that dpi_calc_dispc_cb rejects odd divisors, and it used to use a divisor of
 '3'.


 I disabled that code and then the screen would wake successfully resume from
 blank, but there was a strange artefact in the image.  My display has a grey
 background with some blue text.  On the lines that have text, the grey
 background had a very slight green tinge, especially when viewed at the
 right angle.

 I decided to try bisecting to find the cause of this, because I couldn't
 even begin to guess what it might be.  I found:

commit 2b8318881ddbcb67c5e8d2178b42284749442222
Merge: e81d372 e7f5c9a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Dec 15 13:03:48 2012 -0800

    Merge tag 'fbdev-for-3.8' of git://gitorious.org/linux-omap-dss2/linux
    
    Pull fbdev changes from Tomi Valkeinen:
     "OMAPDSS changes, including:


 yes - a merge commit. Neither branch by itself caused the problem but the two
 together do.
 I then decided "in for a penny, in for a pound" and proceeded to bisect both
 branches, merging in the other one at each test point.
 This lead me to the two commits listed above.

  bd0f5cc3641cb76 merged with e81d372 has the problem,
  bd0f5cc3641cb76~1 merged with e81d372 is fine.

 Similarly
  f7f73aab3d5bd19b merged with e7f5c9a has the problem, while
  f7f73aab3d5bd19b~1 merged with e7f5c9a is fine.

 I tried reverting the "OMAPDSS: fix TV-out issue with DSI PLL" patch from
 3.10-rc as below, and it seems to behave better, returning from blank
 properly. This is without disabling the "no odd divisors" code.
 I'm not really sure how it survives suspend, as suspend generally isn't
 reliable for me on 3.10-rc yet - haven't look into why as yet.

 Any help anyone could provide here would be greatly appreciated.  I'm happy
 to run any test patches you would like to suggest.

Thanks,
NeilBrown

Comments

Tomi Valkeinen June 3, 2013, 7:24 a.m. UTC | #1
Hi,

On 02/06/13 09:50, NeilBrown wrote:

> The details:
>  I'm currently trying to move from a 3.7 kernel on my GTA04 to a 3.10-rc
>  kernel (hopefully to have 3.10 fully working by the time it comes out..).

I presume the panel driver is not in the mainline? Is there anything
special with the panel, or is it just a "dummy" DPI panel that doesn't
need any kind of configuration?

>  Once I got it compiling and booting I found that while the display worked
>  perfectly at boot, after it has been turned off (ioctl(FBIOBLANK)) and back
>  on again, it is all white - no discernible image at all.

I'll try blanking on my omap3 board with 3.10-rc (I think I haven't
tried it). Did you check if the DSS hardware lost context (visible from
the PM counters)?

>  A suspend/resume cycle at this point might bring it back, but it is often
>  shimmery (low vsync?) and once it had inverse colours.
> 
>  I had noticed that the panel which normally gets a 22153 kHz pixel clock was
>  only getting a 21600 kHz clock.  This turned out to be due to the fact
>  that dpi_calc_dispc_cb rejects odd divisors, and it used to use a divisor of
>  '3'.

Normally panels should not be that picky. In my experience the pixel
clock has to be very far from the nominal, until the panel start to fail.

However, the code in dpi_calc_dispc_cb only rejects odd divisors, if the
pixel clock is greater than 100MHz. So I don't understand how you're
seeing it affect here. Are you sure the pclk is ~22MHz?

>  I tried reverting the "OMAPDSS: fix TV-out issue with DSI PLL" patch from
>  3.10-rc as below, and it seems to behave better, returning from blank
>  properly. This is without disabling the "no odd divisors" code.

Odd, indeed. Without reverting the patch, the DSS uses a clock from the
PRCM as func clock and for pixel clock. As the common clock framework is
somehow involved in the breakage, maybe (pure guess) something related
to the PRCM clock is configured wrong.

And with reverting the above patch, DSS uses DSI PLL for fclk and pclk,
and DSI PLL in turn only needs sysclk, so maybe the possible problem
with PRCM doesn't affect this case.

 Tomi
Rajendra Nayak June 3, 2013, 7:33 a.m. UTC | #2
Tomi,

> 
> Odd, indeed. Without reverting the patch, the DSS uses a clock from the
> PRCM as func clock and for pixel clock. As the common clock framework is
> somehow involved in the breakage, maybe (pure guess) something related
> to the PRCM clock is configured wrong.

So whats the specific PRCM clock thats used here? I can go check if there is
something different in the way its modeled with/without common clk.

regards,
Rajendra

> 
> And with reverting the above patch, DSS uses DSI PLL for fclk and pclk,
> and DSI PLL in turn only needs sysclk, so maybe the possible problem
> with PRCM doesn't affect this case.
> 
>  Tomi
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tomi Valkeinen June 3, 2013, 7:39 a.m. UTC | #3
On 03/06/13 10:33, Rajendra Nayak wrote:
> Tomi,
> 
>>
>> Odd, indeed. Without reverting the patch, the DSS uses a clock from the
>> PRCM as func clock and for pixel clock. As the common clock framework is
>> somehow involved in the breakage, maybe (pure guess) something related
>> to the PRCM clock is configured wrong.
> 
> So whats the specific PRCM clock thats used here? I can go check if there is
> something different in the way its modeled with/without common clk.

I don't know which OMAP version we're talking about, but I guess it's
OMAP3xxx.

From the TRM, it's DPLL4_ALWON_FCLKOUTM4X2 for OMAP34xx and OMAP36xx.
Although that is a bit odd, as I think the x2 is only for OMAP34xx.
Maybe the clock name is wrong for OMAP36xx on my TRM...

 Tomi
Paul Walmsley June 3, 2013, 7:40 a.m. UTC | #4
Hi Neil,

On Sun, 2 Jun 2013, NeilBrown wrote:

>  Any help anyone could provide here would be greatly appreciated.  I'm happy
>  to run any test patches you would like to suggest.

Just for kicks, you might consider this patch:

http://marc.info/?l=linux-arm-kernel&m=136990713027683&w=2


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean-Philippe François June 3, 2013, 8:13 a.m. UTC | #5
2013/6/2 NeilBrown <neilb@suse.de>:
>
> Hi Tomi, Rajendra and other,
>  I have run into a rather strange regression with the LCD panel display on my
>  GTA04 (OMAP3630 based phone) that you might be able to help me with.
>
>  I bisected down to two patches, neither of which cause the problem by
>  themselves, but both together do.
>
>  They are:
>
> commit bd0f5cc3641cb76ae8fa2cc4924c29da157f8b2d
> Author: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Date:   Thu Dec 13 14:21:30 2012 +0200
>
>     OMAPDSS: fix TV-out issue with DSI PLL
>
> and
>
> commit f7f73aab3d5bd19ba0f72ff00fb6b3be28e1a4d3
> Author: Rajendra Nayak <rnayak@ti.com>
> Date:   Fri Apr 27 16:27:51 2012 +0530
>
>     ARM: OMAP: clock: Switch to COMMON clk
>
>
> Yes - rather odd.
>

Does this patch solve your issue ?

https://patchwork.kernel.org/patch/2634161/

Regards,
Jean-Philippe François

> The details:
>  I'm currently trying to move from a 3.7 kernel on my GTA04 to a 3.10-rc
>  kernel (hopefully to have 3.10 fully working by the time it comes out..).
>
>  Once I got it compiling and booting I found that while the display worked
>  perfectly at boot, after it has been turned off (ioctl(FBIOBLANK)) and back
>  on again, it is all white - no discernible image at all.
>  A suspend/resume cycle at this point might bring it back, but it is often
>  shimmery (low vsync?) and once it had inverse colours.
>
>  I had noticed that the panel which normally gets a 22153 kHz pixel clock was
>  only getting a 21600 kHz clock.  This turned out to be due to the fact
>  that dpi_calc_dispc_cb rejects odd divisors, and it used to use a divisor of
>  '3'.
>
>
>  I disabled that code and then the screen would wake successfully resume from
>  blank, but there was a strange artefact in the image.  My display has a grey
>  background with some blue text.  On the lines that have text, the grey
>  background had a very slight green tinge, especially when viewed at the
>  right angle.
>
>  I decided to try bisecting to find the cause of this, because I couldn't
>  even begin to guess what it might be.  I found:
>
> commit 2b8318881ddbcb67c5e8d2178b42284749442222
> Merge: e81d372 e7f5c9a
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sat Dec 15 13:03:48 2012 -0800
>
>     Merge tag 'fbdev-for-3.8' of git://gitorious.org/linux-omap-dss2/linux
>
>     Pull fbdev changes from Tomi Valkeinen:
>      "OMAPDSS changes, including:
>
>
>  yes - a merge commit. Neither branch by itself caused the problem but the two
>  together do.
>  I then decided "in for a penny, in for a pound" and proceeded to bisect both
>  branches, merging in the other one at each test point.
>  This lead me to the two commits listed above.
>
>   bd0f5cc3641cb76 merged with e81d372 has the problem,
>   bd0f5cc3641cb76~1 merged with e81d372 is fine.
>
>  Similarly
>   f7f73aab3d5bd19b merged with e7f5c9a has the problem, while
>   f7f73aab3d5bd19b~1 merged with e7f5c9a is fine.
>
>  I tried reverting the "OMAPDSS: fix TV-out issue with DSI PLL" patch from
>  3.10-rc as below, and it seems to behave better, returning from blank
>  properly. This is without disabling the "no odd divisors" code.
>  I'm not really sure how it survives suspend, as suspend generally isn't
>  reliable for me on 3.10-rc yet - haven't look into why as yet.
>
>  Any help anyone could provide here would be greatly appreciated.  I'm happy
>  to run any test patches you would like to suggest.
>
> Thanks,
> NeilBrown
>
>
>
> diff --git a/drivers/video/omap2/dss/dpi.c b/drivers/video/omap2/dss/dpi.c
> index 757b57f..4d4f775 100644
> --- a/drivers/video/omap2/dss/dpi.c
> +++ b/drivers/video/omap2/dss/dpi.c
> @@ -62,7 +62,7 @@ static struct platform_device *dpi_get_dsidev(enum omap_channel channel)
>         case OMAPDSS_VER_OMAP34xx_ES3:
>         case OMAPDSS_VER_OMAP3630:
>         case OMAPDSS_VER_AM35xx:
> -               return NULL;
> +//             return NULL;
>
>         case OMAPDSS_VER_OMAP4430_ES1:
>         case OMAPDSS_VER_OMAP4430_ES2:
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown June 3, 2013, 8:57 a.m. UTC | #6
On Mon, 3 Jun 2013 10:24:18 +0300 Tomi Valkeinen <tomi.valkeinen@ti.com>
wrote:

> Hi,
> 
> On 02/06/13 09:50, NeilBrown wrote:
> 
> > The details:
> >  I'm currently trying to move from a 3.7 kernel on my GTA04 to a 3.10-rc
> >  kernel (hopefully to have 3.10 fully working by the time it comes out..).
> 
> I presume the panel driver is not in the mainline? Is there anything
> special with the panel, or is it just a "dummy" DPI panel that doesn't
> need any kind of configuration?

Not in mainline.  If you are interested it is here:

http://git.neil.brown.name/?p=gta04.git;a=blob;f=drivers/video/omap2/displays/panel-tpo-td028ttec1.c;h=c127e47b88ff2822fc0f168ed6706e16d6115abf;hb=69c0e01e14bd9651a1768f8108d21d39334e724b

Needs lots of cleanup.  Pokes at a bunch of SPI registers which I might have
documentation for somewhere but haven't had time to look at it yet.


> 
> >  Once I got it compiling and booting I found that while the display worked
> >  perfectly at boot, after it has been turned off (ioctl(FBIOBLANK)) and back
> >  on again, it is all white - no discernible image at all.
> 
> I'll try blanking on my omap3 board with 3.10-rc (I think I haven't
> tried it). Did you check if the DSS hardware lost context (visible from
> the PM counters)?

I turned on some debugging statements and the mentioned lost-context counters
in a way that suggested they were being handled correctly (save and restore
reported appropriately matching numbers).

> 
> >  A suspend/resume cycle at this point might bring it back, but it is often
> >  shimmery (low vsync?) and once it had inverse colours.
> > 
> >  I had noticed that the panel which normally gets a 22153 kHz pixel clock was
> >  only getting a 21600 kHz clock.  This turned out to be due to the fact
> >  that dpi_calc_dispc_cb rejects odd divisors, and it used to use a divisor of
> >  '3'.
> 
> Normally panels should not be that picky. In my experience the pixel
> clock has to be very far from the nominal, until the panel start to fail.

As I said, it works fine at first boot.  So the panel obviously copes.  But
something weird happens at blank/unblack which is affect by the pixel clock
setting in a non-obvious way.

> 
> However, the code in dpi_calc_dispc_cb only rejects odd divisors, if the
> pixel clock is greater than 100MHz. So I don't understand how you're
> seeing it affect here. Are you sure the pclk is ~22MHz?

If think you got some maths wrong.
dpi_calc_dispc_cb() contains:
	if (ctx->pck_min >= 1000000) {

which isn't 100MHz, unless the units are 0.1KHz, which seems unlikely.
It looks to me like the units for pck_min is Hz, so the cut off is 1MHz.
So that should be:
	if (ctx->pck_min >= 100000000) {
??

> 
> >  I tried reverting the "OMAPDSS: fix TV-out issue with DSI PLL" patch from
> >  3.10-rc as below, and it seems to behave better, returning from blank
> >  properly. This is without disabling the "no odd divisors" code.
> 
> Odd, indeed. Without reverting the patch, the DSS uses a clock from the
> PRCM as func clock and for pixel clock. As the common clock framework is
> somehow involved in the breakage, maybe (pure guess) something related
> to the PRCM clock is configured wrong.
> 
> And with reverting the above patch, DSS uses DSI PLL for fclk and pclk,
> and DSI PLL in turn only needs sysclk, so maybe the possible problem
> with PRCM doesn't affect this case.
> 

I'll try the patch that Paul and Jean-Philippe suggested and report the
results.

Thanks

NeilBrown
Tomi Valkeinen June 3, 2013, 9:24 a.m. UTC | #7
On 03/06/13 11:57, NeilBrown wrote:
> On Mon, 3 Jun 2013 10:24:18 +0300 Tomi Valkeinen <tomi.valkeinen@ti.com>

>> I'll try blanking on my omap3 board with 3.10-rc (I think I haven't
>> tried it). Did you check if the DSS hardware lost context (visible from
>> the PM counters)?
> 
> I turned on some debugging statements and the mentioned lost-context counters
> in a way that suggested they were being handled correctly (save and restore
> reported appropriately matching numbers).

I was mostly interested in whether the DSS goes into OFF mode when you
blank the panel or not. If it does, and it doesn't work after unblank,
it could suggest that somehow the DSS registers are not restored correctly.

I did some testing on my Overo board (3430), and it works fine if I
blank the display, or suspend the device. However, I think I'm missing
something here, as the DSS domain just stays in ON-state, even if disabled.

>>>  A suspend/resume cycle at this point might bring it back, but it is often
>>>  shimmery (low vsync?) and once it had inverse colours.
>>>
>>>  I had noticed that the panel which normally gets a 22153 kHz pixel clock was
>>>  only getting a 21600 kHz clock.  This turned out to be due to the fact
>>>  that dpi_calc_dispc_cb rejects odd divisors, and it used to use a divisor of
>>>  '3'.
>>
>> Normally panels should not be that picky. In my experience the pixel
>> clock has to be very far from the nominal, until the panel start to fail.
> 
> As I said, it works fine at first boot.  So the panel obviously copes.  But
> something weird happens at blank/unblack which is affect by the pixel clock
> setting in a non-obvious way.

Right. You don't have the option to measure the pixel clock with an
oscilloscope?

>> However, the code in dpi_calc_dispc_cb only rejects odd divisors, if the
>> pixel clock is greater than 100MHz. So I don't understand how you're
>> seeing it affect here. Are you sure the pclk is ~22MHz?
> 
> If think you got some maths wrong.
> dpi_calc_dispc_cb() contains:
> 	if (ctx->pck_min >= 1000000) {
> 
> which isn't 100MHz, unless the units are 0.1KHz, which seems unlikely.
> It looks to me like the units for pck_min is Hz, so the cut off is 1MHz.
> So that should be:
> 	if (ctx->pck_min >= 100000000) {
> ??

Indeed, it's plain wrong. The original patch introducing that limit is
b7f1fe541b01f6edaff0a5dd48027de6ac711ab6 , from which I copied the limit
to the new clock calculation. Both are from me, both are wrong. Sigh.

Thanks for pointing this out, I'll send a fix.

But, as I said earlier, I doubt that it affects your case. Especially if
the panel works after boot.

 Tomi
NeilBrown June 3, 2013, 11:20 a.m. UTC | #8
On Mon, 3 Jun 2013 10:13:18 +0200 jean-philippe francois
<jp.francois@cynove.com> wrote:

> 2013/6/2 NeilBrown <neilb@suse.de>:
> >
> > Hi Tomi, Rajendra and other,
> >  I have run into a rather strange regression with the LCD panel display on my
> >  GTA04 (OMAP3630 based phone) that you might be able to help me with.
> >
> >  I bisected down to two patches, neither of which cause the problem by
> >  themselves, but both together do.
> >
> >  They are:
> >
> > commit bd0f5cc3641cb76ae8fa2cc4924c29da157f8b2d
> > Author: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Date:   Thu Dec 13 14:21:30 2012 +0200
> >
> >     OMAPDSS: fix TV-out issue with DSI PLL
> >
> > and
> >
> > commit f7f73aab3d5bd19ba0f72ff00fb6b3be28e1a4d3
> > Author: Rajendra Nayak <rnayak@ti.com>
> > Date:   Fri Apr 27 16:27:51 2012 +0530
> >
> >     ARM: OMAP: clock: Switch to COMMON clk
> >
> >
> > Yes - rather odd.
> >
> 
> Does this patch solve your issue ?
> 
> https://patchwork.kernel.org/patch/2634161/
> 

Yes.  Thank you and Paul for pointing me to this patch.  I apply it and the
strange artefact goes away.

Panel isn't 100% yet - sometimes uniform white again after resume from
suspend, and I see

[  202.881500] omapdss APPLY error: cannot set timings for lcd: manager needs to be disabled
[  202.881530] omapdss APPLY error: cannot apply lcd config for lcd: manager needs to be disabled

which I don't think I saw before, but I'm happy to track that down myself

So you can add a

   Tested-by: NeilBrown <neilb@suse.de>

to that patch if you like.

Thanks,
NeilBrown
Tomi Valkeinen June 3, 2013, 12:16 p.m. UTC | #9
On 03/06/13 14:20, NeilBrown wrote:

> Panel isn't 100% yet - sometimes uniform white again after resume from
> suspend, and I see
> 
> [  202.881500] omapdss APPLY error: cannot set timings for lcd: manager needs to be disabled
> [  202.881530] omapdss APPLY error: cannot apply lcd config for lcd: manager needs to be disabled
> 
> which I don't think I saw before, but I'm happy to track that down myself

One quite recent change was that omapdss only allows changing output
config, including video timings, when the output is disabled. Changing
timings when output is enabled did not work very reliably even before,
but now omapdss makes sure it's not done. There may be a bug somewhere
related to that.

You only see those when resuming?

 Tomi
diff mbox

Patch

diff --git a/drivers/video/omap2/dss/dpi.c b/drivers/video/omap2/dss/dpi.c
index 757b57f..4d4f775 100644
--- a/drivers/video/omap2/dss/dpi.c
+++ b/drivers/video/omap2/dss/dpi.c
@@ -62,7 +62,7 @@  static struct platform_device *dpi_get_dsidev(enum omap_channel channel)
 	case OMAPDSS_VER_OMAP34xx_ES3:
 	case OMAPDSS_VER_OMAP3630:
 	case OMAPDSS_VER_AM35xx:
-		return NULL;
+//		return NULL;
 
 	case OMAPDSS_VER_OMAP4430_ES1:
 	case OMAPDSS_VER_OMAP4430_ES2: