diff mbox

Unable to boot mainline on snow chromebook since 3.15

Message ID 540C202E.2060009@collabora.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Javier Martinez Canillas Sept. 7, 2014, 9:06 a.m. UTC
[adding Mark Brown to cc since regulators are involved]

Hello Will,

On 09/05/2014 10:25 PM, Doug Anderson wrote:
> Will,
> 
> On Fri, Sep 5, 2014 at 5:22 AM, Will Deacon <will.deacon@arm.com> wrote:
>> [Looks like it's not just Rutland that can't spell the address of the
>>  mailing list today. Fixed here, so please use this post in any replies].
>>
>> On Fri, Sep 05, 2014 at 12:57:04PM +0100, Will Deacon wrote:
>>> Hi all,
>>>
>>> I'm one of the few, foolish people to try running mainline on my 5250-based
>>> Samsung Chromebook (snow). I can live without wireless, usb3 and video
>>> acceleration, so actually it makes a reasonable development platform for
>>> doing A15-based (micro)-architectural work.
>>>
>>> However, since 3.15 I've not been able to boot *any* mainline kernels on
>>> this board. I did mean to report this earlier, but I have other machines
>>> that can run mainline so this has fallen by the wayside.
>>>
>>> The problems started with 3.16, where simple-fb would fail to initialise
>>> and I lost my display. Note that I don't have a serial console on this
>>> machine (I looked at the PCB and there's no way I can solder one of those
>>> myself :) I bisected the issue at the time, and I could get my display back
>>> by removing some of the new regulator and hdmi properties from the DT. At
>>> that point, I could boot, but DMA didn't initialise for the MMC controller
>>> so I couldn't mount my root filesystem.
>>>
>>> With 3.17-rc3, it seems a lot worse -- I don't get any output after nv-uboot
>>> (i.e. the nv-uboot screen just remains on the display, with the last line
>>> reading "Stashed 20 records").
>>>
>>> I'd usually try to debug this a bit further, but without a console it's
>>> really painful to get anywhere. I've been working with 3.15, but now I'm
>>> having to backport patches when I want to test them, which is more effort
>>> than I can be bothered with.
>>>
>>> Is anybody else running mainline on this device and are these known/fixed
>>> problems?
> 
> I've added Javier, who says he'll try to take a look at the problem on
> Monday.  He's got a snow and I think he's got a serial console hooked
> up to it (but I don't think he's tried the simplefb workflow).
> 

I'm back from holidays with access to the machine again so I was able to look
at your issue.

> 
> He also added the following thoughts:
> 
>> Have you seen the very long "[PATCH 4/4] simplefb: add clock handling
>> code" thread [0]?. I wonder if the problem is that the display clocks were
>> not known to the kernel before 3.15 but now are getting disabled and thus
>> the simplefb driver not working?
>>
>> So probably is worth to try passing clk_ignore_unused as a parameter to
>> the kernel command line.
>>
>> [0]: https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg06623.html
> 

So my assumptions was correct and the issue is that the kernel disables the
resources (clocks and regulators) needed to have display working and because
the simplefb expects the display hardware to have been already initialized by
the bootloader/firmware, it simply fails.

You didn't face this issue before 3.15 because the default bootargs set by
nv_uboot-snow already includes the "clk_ignore_unused" parameter and the
kernel didn't know about the regulators but the later changed with commit:

b16be76 ("ARM: dts: add tps65090 power regulator for exynos5250-snow")

This was included in 3.16, so the mentioned commit is what "broke" your
workflow since now the kernel is aware of the tps65090 fet1 and fet6
regulators (used as supply for the the backlight and panel respectively) and
disables them because nothing uses them from a kernel POV.

You will have the same issue even  with 3.15 if you don't pass the
"clk_ignore_unused" parameter to the kernel command line.

The sunxi folks faced the same issue and tried to solve it by making the
simplefb driver to claim the needed resources thus preventing the kernel to
disable them due not used. But that spawn the very long thread [0] mentioned
above and I've zero interest in joining that discussion...

So, following is a workaround patch [1] that just forces the needed regulators
to be always-on but I don't think this is the proper solution. The right thing
to do IMHO is to use the needed Exynos DRM/KMS patches as Ajay mentioned
before since AFAIU the simplefb is only to have a frame-buffer console working
on platforms where a KMS/DRM driver is not available yet.

But maybe we could add a boot argument similar to "clk_ignore_unused" but for
regulators? Something like "regulator_ignore_unused" that would prevent the
regulator core to disable unused regulators? If Mark agrees with that idea
I'll be glad to propose a patch.

Best regards,
Javier

[0]: https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg06623.html
[1]:
From bdbb3bc1d69c10dce58affe74e6b64636f7810b5 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Date: Sun, 7 Sep 2014 10:58:29 +0200
Subject: [PATCH 1/1] ARM: dts: prevent exynos5250-snow display regulators to
 be disabled

The tps65090 fet1 and fet6 regulators are used in the Exynos5250
based Snow as power supplies for the PWM backlight and LCD panel
respectively. The bootloader enables those regulators in order to
have display working but the kernel still doesn't support the eDP
to LVDS bridge used in the Snow and the display device nodes are
not present in the Device Tree so these regulators are disabled.

The problem is that the simple frame-buffer driver can't be used
anymore since this driver assumes that the display hardware has
been initialized before the kernel boots and does not claim any
needed resources (regulators, clocks, etc) so the subsystems
disabled them if they are not used.

This used to work before just because support for the tps65090
PMU was not present in the DT so the kernel didn't know about it.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
---
 arch/arm/boot/dts/exynos5250-snow.dts | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mark Brown Sept. 7, 2014, 3:01 p.m. UTC | #1
On Sun, Sep 07, 2014 at 11:06:54AM +0200, Javier Martinez Canillas wrote:

> But maybe we could add a boot argument similar to "clk_ignore_unused" but for
> regulators? Something like "regulator_ignore_unused" that would prevent the
> regulator core to disable unused regulators? If Mark agrees with that idea
> I'll be glad to propose a patch.

I'm not all that sympathetic to the idea; we already have quite enough
quality problems with the way people hook up regulators without
providing yet another way for them to hack around things, I'm concerned
it'll just make things more fragile as people require magic command line
arguments to get things working.
Javier Martinez Canillas Sept. 7, 2014, 3:51 p.m. UTC | #2
Hello Mark,

On 09/07/2014 05:01 PM, Mark Brown wrote:
> On Sun, Sep 07, 2014 at 11:06:54AM +0200, Javier Martinez Canillas wrote:
> 
>> But maybe we could add a boot argument similar to "clk_ignore_unused" but for
>> regulators? Something like "regulator_ignore_unused" that would prevent the
>> regulator core to disable unused regulators? If Mark agrees with that idea
>> I'll be glad to propose a patch.
> 
> I'm not all that sympathetic to the idea; we already have quite enough
> quality problems with the way people hook up regulators without
> providing yet another way for them to hack around things, I'm concerned
> it'll just make things more fragile as people require magic command line
> arguments to get things working.
> 

I understand your position and I fully agree, I just was thinking aloud.

It seems the simplefb approach is somewhat fragile since the driver relies on
the bootloader to correctly setup the display hardware and its needed
resources (clock, regulators, etc) but also relies on the kernel to not
disable those resources even when they are unused from its point of view.

So, the best option for Will is to just use Ajay's proposed in-flight Exynos
DRM patches or if he really wants to have simplefb working then he can carry
the patch I shared to force tps65090 fet1 and fet6 regulators to be always on.

Best regards,
Javier
Tomasz Figa Sept. 7, 2014, 3:52 p.m. UTC | #3
>>> Have you seen the very long "[PATCH 4/4] simplefb: add clock handling
>>> code" thread [0]?. I wonder if the problem is that the display clocks were
>>> not known to the kernel before 3.15 but now are getting disabled and thus
>>> the simplefb driver not working?
>>>
>>> So probably is worth to try passing clk_ignore_unused as a parameter to
>>> the kernel command line.
>>>
>>> [0]: https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg06623.html
>>
> 
> So my assumptions was correct and the issue is that the kernel disables the
> resources (clocks and regulators) needed to have display working and because
> the simplefb expects the display hardware to have been already initialized by
> the bootloader/firmware, it simply fails.
> 
> You didn't face this issue before 3.15 because the default bootargs set by
> nv_uboot-snow already includes the "clk_ignore_unused" parameter and the
> kernel didn't know about the regulators but the later changed with commit:
> 
> b16be76 ("ARM: dts: add tps65090 power regulator for exynos5250-snow")
> 
> This was included in 3.16, so the mentioned commit is what "broke" your
> workflow since now the kernel is aware of the tps65090 fet1 and fet6
> regulators (used as supply for the the backlight and panel respectively) and
> disables them because nothing uses them from a kernel POV.

So I believe we've got a process issue here. If you don't have normal
support for display hardware, but you want to keep the display
operational thanks to bootloader already initializing it, you should not
add anything to the kernel which breaks it, until full support comes in.

This means that respective regulators should be either always-on or not
listed at all (I'd favor the former) and respective clocks either
somehow enabled at boot-up or completely ignored, including all their
parents capable of being gated.

Now with regulators this is pretty straightforward, but with clocks I
believe it's an open issue. AFAIR we've discussed this on MLs some time
ago (at least I remember Doug commenting on that topic) and kind of
concluded that SoC clock drivers could include lists of clocks to be
enabled at boot-up (as a HACK to enable things like simplefb until
proper support for respective features are added).

I believe this would be the proper solution for $subject.

Best regards,
Tomasz
Javier Martinez Canillas Sept. 7, 2014, 4:12 p.m. UTC | #4
Hello Tomasz,

On 09/07/2014 05:52 PM, Tomasz Figa wrote:
> 
> So I believe we've got a process issue here. If you don't have normal
> support for display hardware, but you want to keep the display
> operational thanks to bootloader already initializing it, you should not
> add anything to the kernel which breaks it, until full support comes in.
> 
> This means that respective regulators should be either always-on or not
> listed at all (I'd favor the former)

So that means that do you think that the workaround patch I shared on the
previous email could be considered as a correct solution? In that case I can
post it as a proper patch.

> somehow enabled at boot-up or completely ignored, including all their
> parents capable of being gated.
>

AFAIU from the thread I mentioned before, Nvidia folks proposed the same to
fix the simplefb issue on sunxi, to avoid the clocks in question being turned
off at boot by modifying the sunxi clock driver.

> Now with regulators this is pretty straightforward, but with clocks I
> believe it's an open issue. AFAIR we've discussed this on MLs some time
> ago (at least I remember Doug commenting on that topic) and kind of
> concluded that SoC clock drivers could include lists of clocks to be
> enabled at boot-up (as a HACK to enable things like simplefb until
> proper support for respective features are added).
>
> I believe this would be the proper solution for $subject.
> 

Clocks is not an issue at least on this machine since the bootloader already
passes the clk_ignore_unused parameter to the kernel command line so in that
sense there isn't a regression comparing with older kernels. If possible I
would prefer to leave this way instead of adding quirks to the clock driver,
specially since there are proposed patches to have the display working using
the Exynos DRM driver on this machine.

> Best regards,
> Tomasz
> 

Best regards,
Javier
Tomasz Figa Sept. 7, 2014, 4:19 p.m. UTC | #5
On 07.09.2014 18:12, Javier Martinez Canillas wrote:
> Hello Tomasz,
> 
> On 09/07/2014 05:52 PM, Tomasz Figa wrote:
>>
>> So I believe we've got a process issue here. If you don't have normal
>> support for display hardware, but you want to keep the display
>> operational thanks to bootloader already initializing it, you should not
>> add anything to the kernel which breaks it, until full support comes in.
>>
>> This means that respective regulators should be either always-on or not
>> listed at all (I'd favor the former)
> 
> So that means that do you think that the workaround patch I shared on the
> previous email could be considered as a correct solution? In that case I can
> post it as a proper patch.

Right.

> 
>> somehow enabled at boot-up or completely ignored, including all their
>> parents capable of being gated.
>>
> 
> AFAIU from the thread I mentioned before, Nvidia folks proposed the same to
> fix the simplefb issue on sunxi, to avoid the clocks in question being turned
> off at boot by modifying the sunxi clock driver.

OK.

> 
>> Now with regulators this is pretty straightforward, but with clocks I
>> believe it's an open issue. AFAIR we've discussed this on MLs some time
>> ago (at least I remember Doug commenting on that topic) and kind of
>> concluded that SoC clock drivers could include lists of clocks to be
>> enabled at boot-up (as a HACK to enable things like simplefb until
>> proper support for respective features are added).
>>
>> I believe this would be the proper solution for $subject.
>>
> 
> Clocks is not an issue at least on this machine since the bootloader already
> passes the clk_ignore_unused parameter to the kernel command line so in that
> sense there isn't a regression comparing with older kernels. If possible I
> would prefer to leave this way instead of adding quirks to the clock driver,
> specially since there are proposed patches to have the display working using
> the Exynos DRM driver on this machine.

Well, clk_ignore_unused seems a bit too coarse grained to me. Also
forcing the user to add it in his bootloader (or any other way) is not
really the best practice IMHO.

At least for next 3.17-rc I'd suggest fixing this up in respective clock
driver and dropping the hack only after Exynos DRM patches are merged
and confirmed working.

Best regards,
Tomasz
Javier Martinez Canillas Sept. 7, 2014, 4:40 p.m. UTC | #6
Hello Tomasz,

On 09/07/2014 06:19 PM, Tomasz Figa wrote:
> On 07.09.2014 18:12, Javier Martinez Canillas wrote:
>> Clocks is not an issue at least on this machine since the bootloader already
>> passes the clk_ignore_unused parameter to the kernel command line so in that
>> sense there isn't a regression comparing with older kernels. If possible I
>> would prefer to leave this way instead of adding quirks to the clock driver,
>> specially since there are proposed patches to have the display working using
>> the Exynos DRM driver on this machine.
> 
> Well, clk_ignore_unused seems a bit too coarse grained to me. Also
> forcing the user to add it in his bootloader (or any other way) is not
> really the best practice IMHO.
>

Fair enough.

> At least for next 3.17-rc I'd suggest fixing this up in respective clock
> driver and dropping the hack only after Exynos DRM patches are merged
> and confirmed working.
> 

Ok, I'll prepare a patch to add the CLK_IGNORE_UNUSED flag to the needed
clocks in drivers/clk/samsung/clk-exynos5250.c. That will be a more
fine-grained solution since the clk_ignore_unused kernel parameter won't be
needed.

> Best regards,
> Tomasz
> 

Best regards,
Javier
Doug Anderson Sept. 8, 2014, 4:36 a.m. UTC | #7
Hi,

On Sun, Sep 7, 2014 at 8:52 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:
> So I believe we've got a process issue here. If you don't have normal
> support for display hardware, but you want to keep the display
> operational thanks to bootloader already initializing it, you should not
> add anything to the kernel which breaks it, until full support comes in.
>
> This means that respective regulators should be either always-on or not
> listed at all (I'd favor the former) and respective clocks either
> somehow enabled at boot-up or completely ignored, including all their
> parents capable of being gated.

It seems slightly broken to hack the device tree in this way.  I'll be
the first to admit that I often list regulators as "always-on" during
bringup when not everything is done, and I guess it's not that
different.  ...but given everything going on upstream (and people
working on Suspend/Resume, DRM, etc) it seems like it might be a bit
of a pain.  ...but if that's what everyone agrees on, I won't disagree
too strongly.

One (ugly?) solution would be to add a feature to your bootloader to
modify the device tree to mark regulators as "always-on".  Since the
booloader gets to touch the device tree and the bootloader is involved
in communicating into about SimpleFB, it kinda makes sense.
Doug Anderson Sept. 8, 2014, 4:43 a.m. UTC | #8
Hi,

On Sun, Sep 7, 2014 at 8:52 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:
> Now with regulators this is pretty straightforward, but with clocks I
> believe it's an open issue. AFAIR we've discussed this on MLs some time
> ago (at least I remember Doug commenting on that topic) and kind of
> concluded that SoC clock drivers could include lists of clocks to be
> enabled at boot-up (as a HACK to enable things like simplefb until
> proper support for respective features are added).

I think my old problem was with earlyprintk and a core clock getting
disabled.  See (44ff025 clk: exynos5420: Remove aclk66_peric from the
clock tree description).  I think I've seen others solve the same
problem with the concept of "critical clocks".

I agree that regulator and clock frameworks allow very different "hacks".  ;)

-Doug
Javier Martinez Canillas Sept. 8, 2014, 6:09 a.m. UTC | #9
Hello Doug,

On 09/08/2014 06:36 AM, Doug Anderson wrote:
> 
> One (ugly?) solution would be to add a feature to your bootloader to
> modify the device tree to mark regulators as "always-on".  Since the
> booloader gets to touch the device tree and the bootloader is involved
> in communicating into about SimpleFB, it kinda makes sense.
> 

I can't say I like to mark the regulators as always-on on the DT and that's
why I copied the patch in the response instead of posting it as a proper patch
but I think relying in the bootloaders to modify the DT is not better.

IMHO U-boot should only modify the strictly necessary like the /chosen branch
even though lately I've seen some attempts in the OMAP community to (ab)use
U-Boot's fdt command to mangle the DT before passing to the kernel in order to
support different Beagle Bone Black capes.

Best regards,
Javier
Mark Brown Sept. 8, 2014, 10:20 a.m. UTC | #10
On Sun, Sep 07, 2014 at 09:36:56PM -0700, Doug Anderson wrote:
> On Sun, Sep 7, 2014 at 8:52 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:

> > So I believe we've got a process issue here. If you don't have normal
> > support for display hardware, but you want to keep the display
> > operational thanks to bootloader already initializing it, you should not
> > add anything to the kernel which breaks it, until full support comes in.

> > This means that respective regulators should be either always-on or not
> > listed at all (I'd favor the former) and respective clocks either
> > somehow enabled at boot-up or completely ignored, including all their
> > parents capable of being gated.

> It seems slightly broken to hack the device tree in this way.  I'll be
> the first to admit that I often list regulators as "always-on" during
> bringup when not everything is done, and I guess it's not that
> different.  ...but given everything going on upstream (and people
> working on Suspend/Resume, DRM, etc) it seems like it might be a bit
> of a pain.  ...but if that's what everyone agrees on, I won't disagree
> too strongly.

> One (ugly?) solution would be to add a feature to your bootloader to
> modify the device tree to mark regulators as "always-on".  Since the
> booloader gets to touch the device tree and the bootloader is involved
> in communicating into about SimpleFB, it kinda makes sense.

That would seem to make sense, yes - we're apparently communicating this
as a virtual device so we should make sure that virtual device has the
resources it needs either directly or by reference to other devices so
the driver can keep them on.  Ideally we'd be doing this with fallback
compatibles or something but this will probably work OK.

I'd expect we're also going to run into the same problems with what
people are currently doing with the SoC power domains, we also have code
to power them off when they're idle, and this whole performance with
adding hacks isn't going to be robust or scale - it's essentially just
praying that nothing turns off the resources we need as far as I can
tell.
Will Deacon Sept. 8, 2014, 11:21 a.m. UTC | #11
On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
> At least for next 3.17-rc I'd suggest fixing this up in respective clock
> driver and dropping the hack only after Exynos DRM patches are merged
> and confirmed working.

Whilst I'm sympathetic to people working to enable DRM, I think this is
the right solution to the problem. The transition from simplefb to DRM
shouldn't break display for a bunch of kernel revisions whilst the code is
in flux.

Will
Javier Martinez Canillas Sept. 8, 2014, 11:55 a.m. UTC | #12
Hello Will,

On 09/08/2014 01:21 PM, Will Deacon wrote:
> On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
>> At least for next 3.17-rc I'd suggest fixing this up in respective clock
>> driver and dropping the hack only after Exynos DRM patches are merged
>> and confirmed working.
> 
> Whilst I'm sympathetic to people working to enable DRM, I think this is
> the right solution to the problem. The transition from simplefb to DRM
> shouldn't break display for a bunch of kernel revisions whilst the code is
> in flux.
> 

Agreed, I'm preparing a patch-set to prevent the kernel to disable both the
clocks and regulators needed for the display, I'll try to post it today or
tomorrow at worst.

> Will
> 

Best regards,
Javier
Grant Likely Sept. 8, 2014, 12:20 p.m. UTC | #13
On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
>> At least for next 3.17-rc I'd suggest fixing this up in respective clock
>> driver and dropping the hack only after Exynos DRM patches are merged
>> and confirmed working.
>
> Whilst I'm sympathetic to people working to enable DRM, I think this is
> the right solution to the problem. The transition from simplefb to DRM
> shouldn't break display for a bunch of kernel revisions whilst the code is
> in flux.

I would go further. The kernel behaviour has changed, and we have to
deal with platforms that assume the old behaviour. That means either
defaulting to leaving enabled regulators/clocks alone unless there is
a flag in the DT saying they can be power managed, or black listing
platforms that are known to depend on the regulator being on.

Updating the device tree must not be required to get the kernel to
boot, but it is valid to require a DT upgrade to get better
performance (battery life) out of the platform.

g.
Will Deacon Sept. 8, 2014, 12:46 p.m. UTC | #14
On Mon, Sep 08, 2014 at 12:55:29PM +0100, Javier Martinez Canillas wrote:
> On 09/08/2014 01:21 PM, Will Deacon wrote:
> > On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
> >> At least for next 3.17-rc I'd suggest fixing this up in respective clock
> >> driver and dropping the hack only after Exynos DRM patches are merged
> >> and confirmed working.
> > 
> > Whilst I'm sympathetic to people working to enable DRM, I think this is
> > the right solution to the problem. The transition from simplefb to DRM
> > shouldn't break display for a bunch of kernel revisions whilst the code is
> > in flux.
> > 
> 
> Agreed, I'm preparing a patch-set to prevent the kernel to disable both the
> clocks and regulators needed for the display, I'll try to post it today or
> tomorrow at worst.

Thanks, Javier. Please CC me on the patch and I'll take it for a spin.

Will
Mark Brown Sept. 8, 2014, 1:49 p.m. UTC | #15
On Mon, Sep 08, 2014 at 01:20:11PM +0100, Grant Likely wrote:
> On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:

> > Whilst I'm sympathetic to people working to enable DRM, I think this is
> > the right solution to the problem. The transition from simplefb to DRM
> > shouldn't break display for a bunch of kernel revisions whilst the code is
> > in flux.

> I would go further. The kernel behaviour has changed, and we have to
> deal with platforms that assume the old behaviour. That means either
> defaulting to leaving enabled regulators/clocks alone unless there is
> a flag in the DT saying they can be power managed, or black listing
> platforms that are known to depend on the regulator being on.

For regulators there is essentially a flag in DT already - the
regulators should not be described in DT if the OS isn't supposed to be
managing them.

> Updating the device tree must not be required to get the kernel to
> boot, but it is valid to require a DT upgrade to get better
> performance (battery life) out of the platform.

This has got to be a blacklist then, and it seems like we've got to fix
simplefb to actually support managing the resources it's using.  The
current plan does not seem at all sensible - we're talking about adding
hacks in every subsystem that provides resources and bodging DTs in
order to work around simplefb.
Javier Martinez Canillas Sept. 8, 2014, 2:05 p.m. UTC | #16
Hello Will,

On 09/08/2014 03:49 PM, Mark Brown wrote:
> On Mon, Sep 08, 2014 at 01:20:11PM +0100, Grant Likely wrote:
>> On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:
> 
>> > Whilst I'm sympathetic to people working to enable DRM, I think this is
>> > the right solution to the problem. The transition from simplefb to DRM
>> > shouldn't break display for a bunch of kernel revisions whilst the code is
>> > in flux.
> 
>> I would go further. The kernel behaviour has changed, and we have to
>> deal with platforms that assume the old behaviour. That means either
>> defaulting to leaving enabled regulators/clocks alone unless there is
>> a flag in the DT saying they can be power managed, or black listing
>> platforms that are known to depend on the regulator being on.
> 
> For regulators there is essentially a flag in DT already - the
> regulators should not be described in DT if the OS isn't supposed to be
> managing them.
> 
>> Updating the device tree must not be required to get the kernel to
>> boot, but it is valid to require a DT upgrade to get better
>> performance (battery life) out of the platform.
> 
> This has got to be a blacklist then, and it seems like we've got to fix
> simplefb to actually support managing the resources it's using.  The
> current plan does not seem at all sensible - we're talking about adding
> hacks in every subsystem that provides resources and bodging DTs in
> order to work around simplefb.
> 

Since many folks don't agree that hacking different subsystems is the way
forward I'll hold the patches and don't post them. The sunxi thread [0]
already shows how different people have strong opposite positions on the
correct approach to handle this.

For now you can just disable the tps65090 PMIC support by not enabling the
CONFIG_REGULATOR_TPS65090 kconfig symbol on your kernel config. That will give
you exactly the same behavior that before tps65090 support was added to the
Snow DT on commit b16be76 ("ARM: dts: add tps65090 power regulator for
exynos5250-snow") which AFAIU was good enough for your workflow.

Best regards,
Javier

[0]: https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg06623.html
Doug Anderson Sept. 8, 2014, 3:55 p.m. UTC | #17
Javier,

On Sun, Sep 7, 2014 at 11:09 PM, Javier Martinez Canillas
<javier.martinez@collabora.co.uk> wrote:
> Hello Doug,
>
> On 09/08/2014 06:36 AM, Doug Anderson wrote:
>>
>> One (ugly?) solution would be to add a feature to your bootloader to
>> modify the device tree to mark regulators as "always-on".  Since the
>> booloader gets to touch the device tree and the bootloader is involved
>> in communicating into about SimpleFB, it kinda makes sense.
>>
>
> I can't say I like to mark the regulators as always-on on the DT and that's
> why I copied the patch in the response instead of posting it as a proper patch
> but I think relying in the bootloaders to modify the DT is not better.
>
> IMHO U-boot should only modify the strictly necessary like the /chosen branch
> even though lately I've seen some attempts in the OMAP community to (ab)use
> U-Boot's fdt command to mangle the DT before passing to the kernel in order to
> support different Beagle Bone Black capes.

So "simple-framebuffer" is added to the device tree here:

https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c

That's one of the two patches to build your own U-Boot for enabling
simplefb.  You'll notice that's not a super official thing.  It's a
"DO NOT SUBMIT" patch sitting in a gerrit code review server, so I
wouldn't exactly call it a stable ABI that we can't break.  It's not
something shipping in real products and it's not even landed in a git
tree (I suppose maybe someone somewhere landed it, but...).

To me, that means that if someone is using that patch and it works for
them, then that's great!  If it stops working (possibly because it was
making assumptions about the state of the kernel) then it should be
fixed up.

In this case, that patch really should be adding references to
regulators (and possibly clocks) that are needed.  Given that this
patch is already reaching into the device tree to add the
"simple-framebuffer" node, it doesn't seem unreasonable to say that it
should be grabbing the proper references (or mark regulators as
always-on).


...as always, though, remember that my opinion doesn't count for much.
I also sympathize with the problems people are running into.  :(
Doug Anderson Sept. 8, 2014, 3:58 p.m. UTC | #18
Grant,

On Mon, Sep 8, 2014 at 5:20 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:
>> On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
>>> At least for next 3.17-rc I'd suggest fixing this up in respective clock
>>> driver and dropping the hack only after Exynos DRM patches are merged
>>> and confirmed working.
>>
>> Whilst I'm sympathetic to people working to enable DRM, I think this is
>> the right solution to the problem. The transition from simplefb to DRM
>> shouldn't break display for a bunch of kernel revisions whilst the code is
>> in flux.
>
> I would go further. The kernel behaviour has changed, and we have to
> deal with platforms that assume the old behaviour. That means either
> defaulting to leaving enabled regulators/clocks alone unless there is
> a flag in the DT saying they can be power managed, or black listing
> platforms that are known to depend on the regulator being on.
>
> Updating the device tree must not be required to get the kernel to
> boot, but it is valid to require a DT upgrade to get better
> performance (battery life) out of the platform.

In this case people using SImple FB are not really using an officially
sanctioned device tree.  The simple-fb fragment is created on the fly
via a "DO NOT SUBMIT" patch sitting on a code review server.  It's not
something that's shipped with real firmware nor is it something
present in the kernel.  See
<https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c>
as I mentioned above.

Is this really a device tree that we need to guarantee backward
compatibility with?


-Doug
Will Deacon Sept. 8, 2014, 4:07 p.m. UTC | #19
On Mon, Sep 08, 2014 at 04:55:31PM +0100, Doug Anderson wrote:
> So "simple-framebuffer" is added to the device tree here:
> 
> https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c
> 
> That's one of the two patches to build your own U-Boot for enabling
> simplefb.  You'll notice that's not a super official thing.  It's a
> "DO NOT SUBMIT" patch sitting in a gerrit code review server, so I
> wouldn't exactly call it a stable ABI that we can't break.  It's not
> something shipping in real products and it's not even landed in a git
> tree (I suppose maybe someone somewhere landed it, but...).

I just took the uboot image linked to from the chromium.org page here:

  http://www.chromium.org/chromium-os/u-boot-porting-guide/using-nv-u-boot-on-the-samsung-arm-chromebook#TOC-Getting-nv-U-Boot

Will
Doug Anderson Sept. 8, 2014, 4:12 p.m. UTC | #20
Will,

On Mon, Sep 8, 2014 at 9:07 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Sep 08, 2014 at 04:55:31PM +0100, Doug Anderson wrote:
>> So "simple-framebuffer" is added to the device tree here:
>>
>> https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c
>>
>> That's one of the two patches to build your own U-Boot for enabling
>> simplefb.  You'll notice that's not a super official thing.  It's a
>> "DO NOT SUBMIT" patch sitting in a gerrit code review server, so I
>> wouldn't exactly call it a stable ABI that we can't break.  It's not
>> something shipping in real products and it's not even landed in a git
>> tree (I suppose maybe someone somewhere landed it, but...).
>
> I just took the uboot image linked to from the chromium.org page here:
>
>   http://www.chromium.org/chromium-os/u-boot-porting-guide/using-nv-u-boot-on-the-samsung-arm-chromebook#TOC-Getting-nv-U-Boot

Ah, OK.  It's still using the "DO NOT SUBMIT" patchs, but I guess
given that there is a binary built up there by a fairly official
source...  Hrmm.  I think Olof is the one that built that.  Perhaps
he'd be willing to muck with that and see if he can grab the
regulators?

-Doug
Grant Likely Sept. 8, 2014, 7:40 p.m. UTC | #21
On Mon, Sep 8, 2014 at 4:58 PM, Doug Anderson <dianders@chromium.org> wrote:
> Grant,
>
> On Mon, Sep 8, 2014 at 5:20 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
>> On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
>>>> At least for next 3.17-rc I'd suggest fixing this up in respective clock
>>>> driver and dropping the hack only after Exynos DRM patches are merged
>>>> and confirmed working.
>>>
>>> Whilst I'm sympathetic to people working to enable DRM, I think this is
>>> the right solution to the problem. The transition from simplefb to DRM
>>> shouldn't break display for a bunch of kernel revisions whilst the code is
>>> in flux.
>>
>> I would go further. The kernel behaviour has changed, and we have to
>> deal with platforms that assume the old behaviour. That means either
>> defaulting to leaving enabled regulators/clocks alone unless there is
>> a flag in the DT saying they can be power managed, or black listing
>> platforms that are known to depend on the regulator being on.
>>
>> Updating the device tree must not be required to get the kernel to
>> boot, but it is valid to require a DT upgrade to get better
>> performance (battery life) out of the platform.
>
> In this case people using SImple FB are not really using an officially
> sanctioned device tree.  The simple-fb fragment is created on the fly
> via a "DO NOT SUBMIT" patch sitting on a code review server.  It's not
> something that's shipped with real firmware nor is it something
> present in the kernel.  See
> <https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c>
> as I mentioned above.
>
> Is this really a device tree that we need to guarantee backward
> compatibility with?

Well, lets see... We've got a real user complaining about a platform
that used to work on mainline, and no longer does. The only loophole
for ignoring breakage is if there nobody cares that it is broken. That
currently isn't the case. So even though it's based on a patch that
has "DO NOT SUBMIT" in large friendly letters on the front cover, it
doesn't change the situation that mainline has a regression.

g.
Olof Johansson Sept. 10, 2014, 1:06 p.m. UTC | #22
Hi,

Been travelling I'm buried in email, so a bit slow at responding.

On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Mon, Sep 8, 2014 at 4:58 PM, Doug Anderson <dianders@chromium.org> wrote:
>> Grant,
>>
>> On Mon, Sep 8, 2014 at 5:20 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
>>> On Mon, Sep 8, 2014 at 12:21 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Sun, Sep 07, 2014 at 05:19:03PM +0100, Tomasz Figa wrote:
>>>>> At least for next 3.17-rc I'd suggest fixing this up in respective clock
>>>>> driver and dropping the hack only after Exynos DRM patches are merged
>>>>> and confirmed working.
>>>>
>>>> Whilst I'm sympathetic to people working to enable DRM, I think this is
>>>> the right solution to the problem. The transition from simplefb to DRM
>>>> shouldn't break display for a bunch of kernel revisions whilst the code is
>>>> in flux.
>>>
>>> I would go further. The kernel behaviour has changed, and we have to
>>> deal with platforms that assume the old behaviour. That means either
>>> defaulting to leaving enabled regulators/clocks alone unless there is
>>> a flag in the DT saying they can be power managed, or black listing
>>> platforms that are known to depend on the regulator being on.
>>>
>>> Updating the device tree must not be required to get the kernel to
>>> boot, but it is valid to require a DT upgrade to get better
>>> performance (battery life) out of the platform.
>>
>> In this case people using SImple FB are not really using an officially
>> sanctioned device tree.  The simple-fb fragment is created on the fly
>> via a "DO NOT SUBMIT" patch sitting on a code review server.  It's not
>> something that's shipped with real firmware nor is it something
>> present in the kernel.  See
>> <https://chromium-review.googlesource.com/#/c/49358/2/board/samsung/smdk5250/smdk5250.c>
>> as I mentioned above.
>>
>> Is this really a device tree that we need to guarantee backward
>> compatibility with?
>
> Well, lets see... We've got a real user complaining about a platform
> that used to work on mainline, and no longer does. The only loophole
> for ignoring breakage is if there nobody cares that it is broken. That
> currently isn't the case. So even though it's based on a patch that
> has "DO NOT SUBMIT" in large friendly letters on the front cover, it
> doesn't change the situation that mainline has a regression.

Yeah, I'm with you on this Grant, it doesn't matter what the patch is
labelled as.

For extra added complication, the firmware that is referenced above
isn't what most people use, they use another binary that someone that
I don't even know who it is has built, that boots the kernel in HYP
mode. I expect the ARM guys to be using that version since they make
use of KVM, etc.

One way to deal with this could be to add a quirk at boot time --
looking for the simplefb and if found, modifies the regulators to keep
them on. That'd go in the kernel, not in firmware.

Much better would have been if the DRM changes worked when they
landed, so that the migration form simplefb to drm was invisible to
the user. Or at least, to get them working ASAP since they're still
broken. :(


-Olof
Mark Brown Sept. 10, 2014, 2:31 p.m. UTC | #23
On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:

> > Well, lets see... We've got a real user complaining about a platform
> > that used to work on mainline, and no longer does. The only loophole
> > for ignoring breakage is if there nobody cares that it is broken. That
> > currently isn't the case. So even though it's based on a patch that
> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
> > doesn't change the situation that mainline has a regression.

> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
> labelled as.

> One way to deal with this could be to add a quirk at boot time --
> looking for the simplefb and if found, modifies the regulators to keep
> them on. That'd go in the kernel, not in firmware.

Well, we should also be fixing simplefb to manage the resources it uses
though that doesn't clean up after the broken DTs that are currently
deployed.

As well as the regulators we'll also need to fix the clocks.  If we're
going to start adding these fixups perhaps we want to consider having a
wrapper stage that deals with rewriting DTs prior to trying to use them?
I'm not sure if it makes much difference but there's overlap with other
tools like the ATAGs conversion wrapper and building separately would
let the fixup code run early without directly going into the early init
code (which seems a bit scary).

> Much better would have been if the DRM changes worked when they
> landed, so that the migration form simplefb to drm was invisible to
> the user. Or at least, to get them working ASAP since they're still
> broken. :(

As far as I can tell the problem here is coming from the decision to
have simplefb use resources without knowing about them - can we agree
that this is a bad idea?
Grant Likely Sept. 10, 2014, 2:56 p.m. UTC | #24
On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
>> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
>
>> > Well, lets see... We've got a real user complaining about a platform
>> > that used to work on mainline, and no longer does. The only loophole
>> > for ignoring breakage is if there nobody cares that it is broken. That
>> > currently isn't the case. So even though it's based on a patch that
>> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
>> > doesn't change the situation that mainline has a regression.
>
>> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
>> labelled as.
>
>> One way to deal with this could be to add a quirk at boot time --
>> looking for the simplefb and if found, modifies the regulators to keep
>> them on. That'd go in the kernel, not in firmware.
>
> Well, we should also be fixing simplefb to manage the resources it uses
> though that doesn't clean up after the broken DTs that are currently
> deployed.
>
> As well as the regulators we'll also need to fix the clocks.  If we're
> going to start adding these fixups perhaps we want to consider having a
> wrapper stage that deals with rewriting DTs prior to trying to use them?
> I'm not sure if it makes much difference but there's overlap with other
> tools like the ATAGs conversion wrapper and building separately would
> let the fixup code run early without directly going into the early init
> code (which seems a bit scary).
>
>> Much better would have been if the DRM changes worked when they
>> landed, so that the migration form simplefb to drm was invisible to
>> the user. Or at least, to get them working ASAP since they're still
>> broken. :(
>
> As far as I can tell the problem here is coming from the decision to
> have simplefb use resources without knowing about them - can we agree
> that this is a bad idea?

No, I don't think we can... there is a certain amount of "firmware got
things working for us, and we're going to use it for a while" that is
absolutely reasonable. simplefb is a good example, but there are
certainly others.

I /do/ think it would be better for the simplefb data to get embedded
or linked into the node of the graphics controller so that it can be
torn down appropriately, and we need a rule for how long boot-state
can be considered valid so that a proper driver can either reserve the
resources for a given SoC, or do a full handoff from the simplefb.
Even without that though, we need to be able to handle the case of an
anonymous simplefb node with no regulator information. If that means
the default simplefb behaviour is to inhibit runtime pm on all
resources until a real driver show up, then that might just be what we
need to do.

Two things should probably be changed from the current setup. 1)
simplefb shouldn't be a platform driver. It is a boot thing that
handles initial state from the graphics chip. By implementing it as a
platform driver, it prevents the real driver from binding to the real
device if the simplefb data embedded into it. 2) make sure that an SoC
driver can protect the needed resources before they are automatically
disabled. Either by putting them in an earlier initcall, or handling
it in the subsystem code. I don't know enough about the regulator and
clock runtime PM to know what the best way to do this is.

g.
Mark Brown Sept. 10, 2014, 3:39 p.m. UTC | #25
On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
> On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:

> > As far as I can tell the problem here is coming from the decision to
> > have simplefb use resources without knowing about them - can we agree
> > that this is a bad idea?

> No, I don't think we can... there is a certain amount of "firmware got
> things working for us, and we're going to use it for a while" that is
> absolutely reasonable. simplefb is a good example, but there are
> certainly others.

That bit is fine - I definitely think it's reasonable to have things
like this where the device is initialized prior to the kernel starting
and we use some simplified subset.  What I think is a big problem here
is that we're not being told what parts of the system state are relevant
to this initialization (worse, we're being told things that are actively
wrong for some of the resources).  This seems inherently fragile.

> I /do/ think it would be better for the simplefb data to get embedded
> or linked into the node of the graphics controller so that it can be
> torn down appropriately, and we need a rule for how long boot-state
> can be considered valid so that a proper driver can either reserve the
> resources for a given SoC, or do a full handoff from the simplefb.
> Even without that though, we need to be able to handle the case of an
> anonymous simplefb node with no regulator information. If that means
> the default simplefb behaviour is to inhibit runtime pm on all
> resources until a real driver show up, then that might just be what we
> need to do.

I think saying that it's a good idea to have an simplefb node without
resource management is exactly the problem here - if we start from the
assumption that this is a good idea we do get dragged down this path but
it seems like we took a wrong turn going that way in the first place.

It's not just regulators - we've got exactly the same problem with
clocks on this system for example, they're also getting disabled because
they seem unused and users have to pass in a kernel command line bodge
to avoid that.  We'd also have an issue if something decided to change
the rates of some of the clocks, and power domains have the same problem
(Ulf's patches to genericise their code has the same behaviour with
regard to powering off unused domains, some of the existing
implementations do that already).

> Two things should probably be changed from the current setup. 1)
> simplefb shouldn't be a platform driver. It is a boot thing that
> handles initial state from the graphics chip. By implementing it as a
> platform driver, it prevents the real driver from binding to the real
> device if the simplefb data embedded into it. 2) make sure that an SoC
> driver can protect the needed resources before they are automatically
> disabled. Either by putting them in an earlier initcall, or handling
> it in the subsystem code. I don't know enough about the regulator and
> clock runtime PM to know what the best way to do this is.

Right, I agree with what you're saying here but what I'm saying is that
the way to ensure that the resources are protected is for the simplefb
node to tell the kernel what resources are being used, otherwise it
seems like we're just guessing and will fall over ourselves sooner or
later.  

We can't use initcall hacks as these only work in cases where we will at
some point hand over to a real driver and there seems to be a clear use
case for using simplefb prior to that driver being written; even where
we will hand over to a real driver we can't put a definite timescale on
that happening since in the distro case it might be being loaded from
disk at some point after userspace is running.

There's not a lot the subsystem can do without breaking other users or
making the system substantially worse if we don't tell it what resources
are affected, we'll end up being forced to tend too far towards being
conservative about what we allow to happen.
Grant Likely Sept. 10, 2014, 4:29 p.m. UTC | #26
On Wed, Sep 10, 2014 at 4:39 PM, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
>> On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
>
>> > As far as I can tell the problem here is coming from the decision to
>> > have simplefb use resources without knowing about them - can we agree
>> > that this is a bad idea?
>
>> No, I don't think we can... there is a certain amount of "firmware got
>> things working for us, and we're going to use it for a while" that is
>> absolutely reasonable. simplefb is a good example, but there are
>> certainly others.
>
> That bit is fine - I definitely think it's reasonable to have things
> like this where the device is initialized prior to the kernel starting
> and we use some simplified subset.  What I think is a big problem here
> is that we're not being told what parts of the system state are relevant
> to this initialization (worse, we're being told things that are actively
> wrong for some of the resources).  This seems inherently fragile.
>
>> I /do/ think it would be better for the simplefb data to get embedded
>> or linked into the node of the graphics controller so that it can be
>> torn down appropriately, and we need a rule for how long boot-state
>> can be considered valid so that a proper driver can either reserve the
>> resources for a given SoC, or do a full handoff from the simplefb.
>> Even without that though, we need to be able to handle the case of an
>> anonymous simplefb node with no regulator information. If that means
>> the default simplefb behaviour is to inhibit runtime pm on all
>> resources until a real driver show up, then that might just be what we
>> need to do.
>
> I think saying that it's a good idea to have an simplefb node without
> resource management is exactly the problem here - if we start from the
> assumption that this is a good idea we do get dragged down this path but
> it seems like we took a wrong turn going that way in the first place.
>
> It's not just regulators - we've got exactly the same problem with
> clocks on this system for example, they're also getting disabled because
> they seem unused and users have to pass in a kernel command line bodge
> to avoid that.  We'd also have an issue if something decided to change
> the rates of some of the clocks, and power domains have the same problem
> (Ulf's patches to genericise their code has the same behaviour with
> regard to powering off unused domains, some of the existing
> implementations do that already).
>
>> Two things should probably be changed from the current setup. 1)
>> simplefb shouldn't be a platform driver. It is a boot thing that
>> handles initial state from the graphics chip. By implementing it as a
>> platform driver, it prevents the real driver from binding to the real
>> device if the simplefb data embedded into it. 2) make sure that an SoC
>> driver can protect the needed resources before they are automatically
>> disabled. Either by putting them in an earlier initcall, or handling
>> it in the subsystem code. I don't know enough about the regulator and
>> clock runtime PM to know what the best way to do this is.
>
> Right, I agree with what you're saying here but what I'm saying is that
> the way to ensure that the resources are protected is for the simplefb
> node to tell the kernel what resources are being used, otherwise it
> seems like we're just guessing and will fall over ourselves sooner or
> later.
>
> We can't use initcall hacks as these only work in cases where we will at
> some point hand over to a real driver and there seems to be a clear use
> case for using simplefb prior to that driver being written; even where
> we will hand over to a real driver we can't put a definite timescale on
> that happening since in the distro case it might be being loaded from
> disk at some point after userspace is running.

What we can do is have an inhibit flag for
simplefb/simpleuart/simplewhatever that holds off PM. When a real
driver, or a stub that understands parsing the resource dependencies,
takes ownership of the device (or userspace tells the kernel to stop
caring) it can clear the inhibit.

I don't want to build knowledge of resource dependencies into the
simple case. We'll simply frequently get it wrong. For example: A
future kernel will have better PM and will turn off more devices which
isn't accounted for in an older DT.

g.
Olof Johansson Sept. 10, 2014, 4:36 p.m. UTC | #27
On Wed, Sep 10, 2014 at 7:31 AM, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
>> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
>
>> > Well, lets see... We've got a real user complaining about a platform
>> > that used to work on mainline, and no longer does. The only loophole
>> > for ignoring breakage is if there nobody cares that it is broken. That
>> > currently isn't the case. So even though it's based on a patch that
>> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
>> > doesn't change the situation that mainline has a regression.
>
>> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
>> labelled as.
>
>> One way to deal with this could be to add a quirk at boot time --
>> looking for the simplefb and if found, modifies the regulators to keep
>> them on. That'd go in the kernel, not in firmware.
>
> Well, we should also be fixing simplefb to manage the resources it uses
> though that doesn't clean up after the broken DTs that are currently
> deployed.
>
> As well as the regulators we'll also need to fix the clocks.  If we're
> going to start adding these fixups perhaps we want to consider having a
> wrapper stage that deals with rewriting DTs prior to trying to use them?
> I'm not sure if it makes much difference but there's overlap with other
> tools like the ATAGs conversion wrapper and building separately would
> let the fixup code run early without directly going into the early init
> code (which seems a bit scary).

Yes, having a stage that fixes up broken device trees makes a lot of
sense. It can likely be plugged into the machine descriptor today per
platform, since I think most things we have going on right now are
platform-specific quirks.

I'm strongly against doing this outside of the kernel, since they're
closely tied together today. We've always had the quirk tables for
devices in the kernel, and we used to do this a long time ago on
powerpc as well (we did it before we built the flat DT out of the OF
equivalent there, most of the time).

>> Much better would have been if the DRM changes worked when they
>> landed, so that the migration form simplefb to drm was invisible to
>> the user. Or at least, to get them working ASAP since they're still
>> broken. :(
>
> As far as I can tell the problem here is coming from the decision to
> have simplefb use resources without knowing about them - can we agree
> that this is a bad idea?

As already argued, there are good reasons to sometimes allow this, as
long as it can be expected that it's something that's just used during
early boot. For example, having DEBUG_LL output on a pre-mapped
framebuffer could be really useful. Once DRM comes up, it'll tear down
the existing one.


-Olof
Doug Anderson Sept. 10, 2014, 4:45 p.m. UTC | #28
Grant,

On Wed, Sep 10, 2014 at 9:29 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Wed, Sep 10, 2014 at 4:39 PM, Mark Brown <broonie@kernel.org> wrote:
>> On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
>>> On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
>>
>>> > As far as I can tell the problem here is coming from the decision to
>>> > have simplefb use resources without knowing about them - can we agree
>>> > that this is a bad idea?
>>
>>> No, I don't think we can... there is a certain amount of "firmware got
>>> things working for us, and we're going to use it for a while" that is
>>> absolutely reasonable. simplefb is a good example, but there are
>>> certainly others.
>>
>> That bit is fine - I definitely think it's reasonable to have things
>> like this where the device is initialized prior to the kernel starting
>> and we use some simplified subset.  What I think is a big problem here
>> is that we're not being told what parts of the system state are relevant
>> to this initialization (worse, we're being told things that are actively
>> wrong for some of the resources).  This seems inherently fragile.
>>
>>> I /do/ think it would be better for the simplefb data to get embedded
>>> or linked into the node of the graphics controller so that it can be
>>> torn down appropriately, and we need a rule for how long boot-state
>>> can be considered valid so that a proper driver can either reserve the
>>> resources for a given SoC, or do a full handoff from the simplefb.
>>> Even without that though, we need to be able to handle the case of an
>>> anonymous simplefb node with no regulator information. If that means
>>> the default simplefb behaviour is to inhibit runtime pm on all
>>> resources until a real driver show up, then that might just be what we
>>> need to do.
>>
>> I think saying that it's a good idea to have an simplefb node without
>> resource management is exactly the problem here - if we start from the
>> assumption that this is a good idea we do get dragged down this path but
>> it seems like we took a wrong turn going that way in the first place.
>>
>> It's not just regulators - we've got exactly the same problem with
>> clocks on this system for example, they're also getting disabled because
>> they seem unused and users have to pass in a kernel command line bodge
>> to avoid that.  We'd also have an issue if something decided to change
>> the rates of some of the clocks, and power domains have the same problem
>> (Ulf's patches to genericise their code has the same behaviour with
>> regard to powering off unused domains, some of the existing
>> implementations do that already).
>>
>>> Two things should probably be changed from the current setup. 1)
>>> simplefb shouldn't be a platform driver. It is a boot thing that
>>> handles initial state from the graphics chip. By implementing it as a
>>> platform driver, it prevents the real driver from binding to the real
>>> device if the simplefb data embedded into it. 2) make sure that an SoC
>>> driver can protect the needed resources before they are automatically
>>> disabled. Either by putting them in an earlier initcall, or handling
>>> it in the subsystem code. I don't know enough about the regulator and
>>> clock runtime PM to know what the best way to do this is.
>>
>> Right, I agree with what you're saying here but what I'm saying is that
>> the way to ensure that the resources are protected is for the simplefb
>> node to tell the kernel what resources are being used, otherwise it
>> seems like we're just guessing and will fall over ourselves sooner or
>> later.
>>
>> We can't use initcall hacks as these only work in cases where we will at
>> some point hand over to a real driver and there seems to be a clear use
>> case for using simplefb prior to that driver being written; even where
>> we will hand over to a real driver we can't put a definite timescale on
>> that happening since in the distro case it might be being loaded from
>> disk at some point after userspace is running.
>
> What we can do is have an inhibit flag for
> simplefb/simpleuart/simplewhatever that holds off PM. When a real
> driver, or a stub that understands parsing the resource dependencies,
> takes ownership of the device (or userspace tells the kernel to stop
> caring) it can clear the inhibit.

This doesn't seem crazy, though it means that if you're planning on
using nothing but simplefb then you're never going to be able to get
any power savings anywhere.

Right now I know that clock disabling is supposed to be inhibited
during the early boot process.  I think regulators too?


> I don't want to build knowledge of resource dependencies into the
> simple case. We'll simply frequently get it wrong. For example: A
> future kernel will have better PM and will turn off more devices which
> isn't accounted for in an older DT.

In IRC I made a suggestion that perhaps the "simplefb" ought to be put
in the main in-kernel "dts" file but with no address information and
set to "disabled".  Then the firmware can do something very simple:
set to enabled and fill in the address.

Right now the firmware is taking a dts that it doesn't really own (it
grabs it from the kernel FIT image) and making changes to it to add
simplefb.  That's inherently pretty fragile.  If the kernel DTS file
adds regulators then as we saw things break.  I can imagine lots of
other breakages, too.  If the dts is in kernel then we can add
regulator / clock references very easily.  As regulators are added to
the kernel dts file then the references can be added to simplefb.

That doesn't solve the problem with people who have old copies of
U-Boot.  All of those people have already flashed a custom firmware
though, so it doesn't seem unreasonable (to me) to solve their problem
by giving them a new custom firmware.  I agree that something needs to
be done to help those people, but I don't feel like adding extra hacks
to the kernel is the right answer to solve people who are clearly
living on the bleeding edge.


Note that I haven't actually worked with simplefb or the custom
U-Boots, so perhaps there's something I'm missing...

-Doug
Mark Brown Sept. 10, 2014, 4:57 p.m. UTC | #29
On Wed, Sep 10, 2014 at 05:29:32PM +0100, Grant Likely wrote:

> What we can do is have an inhibit flag for
> simplefb/simpleuart/simplewhatever that holds off PM. When a real
> driver, or a stub that understands parsing the resource dependencies,
> takes ownership of the device (or userspace tells the kernel to stop
> caring) it can clear the inhibit.

It's not quite as simple as just disabling PM - for example in the
clocks case we've also got to worry about what happens with rate changes
(which is going to get more and more risky as we get smarter about being
able to push configuration changes back up the tree), regulators have a
similar thing with voltage changes.  With simple enables and disables we
have to worry about things like handling users who actively want to
power things on and and off but may potentially be sharing a resource
with an undeclared dependency.

If we are going to go with an approach like you suggest I think that
rather than require a userspace notification that everything is OK we
should have the stub drivers do something which causes the appropriate
behaviour to happen so long as they're loaded.  This means userspace
doesn't need an update and ensures it doesn't have to worry about cases
where we're using the stub driver at runtime due to a real driver not
being available - we can figure this stuff out within the kernel
oureslves.

That said a kick from userspace when the first round of module loading
has finished would be very helpful, I just don't think we should rely on
it for this behaviour.

> I don't want to build knowledge of resource dependencies into the
> simple case. We'll simply frequently get it wrong. For example: A
> future kernel will have better PM and will turn off more devices which
> isn't accounted for in an older DT.

That is tricky and there will be problems.  Being fairly aggressive
about doing these things and avoiding having runtime configuration hacks
since it makes it harder for people to introduce problems without
noticing them, and requiring an explicit request to do resource
management at all is the most conservative option.  Between them those
strategies should help for anything that's getting tested at least, it
makes it hard for the kernel to learn about a resource without it being
handled safely from the get go.
Mark Brown Sept. 10, 2014, 6:17 p.m. UTC | #30
On Wed, Sep 10, 2014 at 09:36:32AM -0700, Olof Johansson wrote:
> On Wed, Sep 10, 2014 at 7:31 AM, Mark Brown <broonie@kernel.org> wrote:

> > As well as the regulators we'll also need to fix the clocks.  If we're
> > going to start adding these fixups perhaps we want to consider having a
> > wrapper stage that deals with rewriting DTs prior to trying to use them?
> > I'm not sure if it makes much difference but there's overlap with other
> > tools like the ATAGs conversion wrapper and building separately would
> > let the fixup code run early without directly going into the early init
> > code (which seems a bit scary).

> I'm strongly against doing this outside of the kernel, since they're
> closely tied together today. We've always had the quirk tables for
> devices in the kernel, and we used to do this a long time ago on
> powerpc as well (we did it before we built the flat DT out of the OF
> equivalent there, most of the time).

Indeed - sorry, the above wasn't adequately clear.  I think that we
should build this separately but keep it part of the kernel source.  The
split I was thinking of was purely technical.

> > As far as I can tell the problem here is coming from the decision to
> > have simplefb use resources without knowing about them - can we agree
> > that this is a bad idea?

> As already argued, there are good reasons to sometimes allow this, as
> long as it can be expected that it's something that's just used during
> early boot. For example, having DEBUG_LL output on a pre-mapped
> framebuffer could be really useful. Once DRM comes up, it'll tear down
> the existing one.

The problem here seems to be that that just during early boot assumption
isn't playing out so well...
Mark Brown Sept. 10, 2014, 7:45 p.m. UTC | #31
On Wed, Sep 10, 2014 at 09:45:21AM -0700, Doug Anderson wrote:

> Right now I know that clock disabling is supposed to be inhibited
> during the early boot process.  I think regulators too?

No, for regulators we'll quite happily disable anything a consumer asks
us to at any point but we'll only do a sweep for regulators that were
enabled on startup then not subsequently referenced and turn them off in
late_initcall.
Doug Anderson Sept. 10, 2014, 7:51 p.m. UTC | #32
Mark,

On Wed, Sep 10, 2014 at 12:45 PM, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 09:45:21AM -0700, Doug Anderson wrote:
>
>> Right now I know that clock disabling is supposed to be inhibited
>> during the early boot process.  I think regulators too?
>
> No, for regulators we'll quite happily disable anything a consumer asks
> us to at any point but we'll only do a sweep for regulators that were
> enabled on startup then not subsequently referenced and turn them off in
> late_initcall.

Ah, that sounds exactly like the clock framework then.  I think I just
didn't explain the clock framework properly.

-Doug
Grant Likely Sept. 11, 2014, 9:06 a.m. UTC | #33
On Wed, 10 Sep 2014 15:31:44 +0100, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
> > On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> 
> > > Well, lets see... We've got a real user complaining about a platform
> > > that used to work on mainline, and no longer does. The only loophole
> > > for ignoring breakage is if there nobody cares that it is broken. That
> > > currently isn't the case. So even though it's based on a patch that
> > > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
> > > doesn't change the situation that mainline has a regression.
> 
> > Yeah, I'm with you on this Grant, it doesn't matter what the patch is
> > labelled as.
> 
> > One way to deal with this could be to add a quirk at boot time --
> > looking for the simplefb and if found, modifies the regulators to keep
> > them on. That'd go in the kernel, not in firmware.
> 
> Well, we should also be fixing simplefb to manage the resources it uses
> though that doesn't clean up after the broken DTs that are currently
> deployed.
> 
> As well as the regulators we'll also need to fix the clocks.  If we're
> going to start adding these fixups perhaps we want to consider having a
> wrapper stage that deals with rewriting DTs prior to trying to use them?
> I'm not sure if it makes much difference but there's overlap with other
> tools like the ATAGs conversion wrapper and building separately would
> let the fixup code run early without directly going into the early init
> code (which seems a bit scary).

We've already got a dt fixup hook in the machine struct, created for
exactly this reason. Fixing an incorrect DT provided by firmware:

arch/arm/include/asm/mach/arch.h:
struct machine_desc {
	...
	void (*dt_fixup)(void);
	...

g.
Grant Likely Sept. 11, 2014, 9:22 a.m. UTC | #34
On Wed, 10 Sep 2014 17:57:23 +0100, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Sep 10, 2014 at 05:29:32PM +0100, Grant Likely wrote:
> 
> > What we can do is have an inhibit flag for
> > simplefb/simpleuart/simplewhatever that holds off PM. When a real
> > driver, or a stub that understands parsing the resource dependencies,
> > takes ownership of the device (or userspace tells the kernel to stop
> > caring) it can clear the inhibit.
> 
> It's not quite as simple as just disabling PM - for example in the
> clocks case we've also got to worry about what happens with rate changes
> (which is going to get more and more risky as we get smarter about being
> able to push configuration changes back up the tree), regulators have a
> similar thing with voltage changes.  With simple enables and disables we
> have to worry about things like handling users who actively want to
> power things on and and off but may potentially be sharing a resource
> with an undeclared dependency.

I think we can be okay with the above. This is a best-effort situation
where we don't want to tear down how firmware has set up the board if
it can be reasonably assumed that something depends on it (simplefb).
However, if clocks or regulators are shared with other devices and those
drivers ask for other settings, then there is simply no recourse. In
that situation there must be a driver for the video device that takes
care of any constraints.

g.
Mark Brown Sept. 11, 2014, 4:16 p.m. UTC | #35
On Thu, Sep 11, 2014 at 10:06:08AM +0100, Grant Likely wrote:
> On Wed, 10 Sep 2014 15:31:44 +0100, Mark Brown <broonie@kernel.org> wrote:

> > As well as the regulators we'll also need to fix the clocks.  If we're
> > going to start adding these fixups perhaps we want to consider having a
> > wrapper stage that deals with rewriting DTs prior to trying to use them?
> > I'm not sure if it makes much difference but there's overlap with other
> > tools like the ATAGs conversion wrapper and building separately would
> > let the fixup code run early without directly going into the early init
> > code (which seems a bit scary).

> We've already got a dt fixup hook in the machine struct, created for
> exactly this reason. Fixing an incorrect DT provided by firmware:

> arch/arm/include/asm/mach/arch.h:
> struct machine_desc {
> 	...
> 	void (*dt_fixup)(void);
> 	...

Hrm, that's in the machine descriptor which doesn't seem the ideal
place - something keying off machine ID would be nicer.  But that's
relatively speaking just detail.
Mark Brown Sept. 11, 2014, 6:03 p.m. UTC | #36
On Thu, Sep 11, 2014 at 10:22:32AM +0100, Grant Likely wrote:
> On Wed, 10 Sep 2014 17:57:23 +0100, Mark Brown <broonie@kernel.org> wrote:

> > It's not quite as simple as just disabling PM - for example in the
> > clocks case we've also got to worry about what happens with rate changes
> > (which is going to get more and more risky as we get smarter about being
> > able to push configuration changes back up the tree), regulators have a
> > similar thing with voltage changes.  With simple enables and disables we
> > have to worry about things like handling users who actively want to
> > power things on and and off but may potentially be sharing a resource
> > with an undeclared dependency.

> I think we can be okay with the above. This is a best-effort situation
> where we don't want to tear down how firmware has set up the board if
> it can be reasonably assumed that something depends on it (simplefb).
> However, if clocks or regulators are shared with other devices and those
> drivers ask for other settings, then there is simply no recourse. In
> that situation there must be a driver for the video device that takes
> care of any constraints.

When things break I'm not sure that users are going to understand that
something that used to work for them was only provided on a best effort
basis, I think they will expect things to carry on working.  It's not
going to be great if enabling some driver for a device that happens to
be in the same power domain as a component used in a framebuffer causes
the display to vanish, or if better power management in an existing
driver causes breakage.  It's relatively OK to have a brief hiccup
during boot but usage seems to have expanded beyond that point and I
think we need to take robustness more seriously.

Given that we have straightforward ways to communicate resource usage it
seems sensible to add robustness to the system by making use of them.
Doug Anderson Sept. 11, 2014, 10:54 p.m. UTC | #37
Hi,

On Thu, Sep 11, 2014 at 11:03 AM, Mark Brown <broonie@kernel.org> wrote:
> On Thu, Sep 11, 2014 at 10:22:32AM +0100, Grant Likely wrote:
>> On Wed, 10 Sep 2014 17:57:23 +0100, Mark Brown <broonie@kernel.org> wrote:
>
>> > It's not quite as simple as just disabling PM - for example in the
>> > clocks case we've also got to worry about what happens with rate changes
>> > (which is going to get more and more risky as we get smarter about being
>> > able to push configuration changes back up the tree), regulators have a
>> > similar thing with voltage changes.  With simple enables and disables we
>> > have to worry about things like handling users who actively want to
>> > power things on and and off but may potentially be sharing a resource
>> > with an undeclared dependency.
>
>> I think we can be okay with the above. This is a best-effort situation
>> where we don't want to tear down how firmware has set up the board if
>> it can be reasonably assumed that something depends on it (simplefb).
>> However, if clocks or regulators are shared with other devices and those
>> drivers ask for other settings, then there is simply no recourse. In
>> that situation there must be a driver for the video device that takes
>> care of any constraints.
>
> When things break I'm not sure that users are going to understand that
> something that used to work for them was only provided on a best effort
> basis, I think they will expect things to carry on working.

Right.  This is exactly what happened at the start of this thread.
SimpleFB was working only on a best effort basis and then it stopped
working.  I agree that's pretty non-ideal.

-Doug
Thierry Reding Sept. 29, 2014, 12:57 p.m. UTC | #38
On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
> On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
> > On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
> >> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> >
> >> > Well, lets see... We've got a real user complaining about a platform
> >> > that used to work on mainline, and no longer does. The only loophole
> >> > for ignoring breakage is if there nobody cares that it is broken. That
> >> > currently isn't the case. So even though it's based on a patch that
> >> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
> >> > doesn't change the situation that mainline has a regression.
> >
> >> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
> >> labelled as.
> >
> >> One way to deal with this could be to add a quirk at boot time --
> >> looking for the simplefb and if found, modifies the regulators to keep
> >> them on. That'd go in the kernel, not in firmware.
> >
> > Well, we should also be fixing simplefb to manage the resources it uses
> > though that doesn't clean up after the broken DTs that are currently
> > deployed.
> >
> > As well as the regulators we'll also need to fix the clocks.  If we're
> > going to start adding these fixups perhaps we want to consider having a
> > wrapper stage that deals with rewriting DTs prior to trying to use them?
> > I'm not sure if it makes much difference but there's overlap with other
> > tools like the ATAGs conversion wrapper and building separately would
> > let the fixup code run early without directly going into the early init
> > code (which seems a bit scary).
> >
> >> Much better would have been if the DRM changes worked when they
> >> landed, so that the migration form simplefb to drm was invisible to
> >> the user. Or at least, to get them working ASAP since they're still
> >> broken. :(
> >
> > As far as I can tell the problem here is coming from the decision to
> > have simplefb use resources without knowing about them - can we agree
> > that this is a bad idea?
> 
> No, I don't think we can... there is a certain amount of "firmware got
> things working for us, and we're going to use it for a while" that is
> absolutely reasonable. simplefb is a good example, but there are
> certainly others.
> 
> I /do/ think it would be better for the simplefb data to get embedded
> or linked into the node of the graphics controller so that it can be
> torn down appropriately, and we need a rule for how long boot-state
> can be considered valid so that a proper driver can either reserve the
> resources for a given SoC, or do a full handoff from the simplefb.
> Even without that though, we need to be able to handle the case of an
> anonymous simplefb node with no regulator information. If that means
> the default simplefb behaviour is to inhibit runtime pm on all
> resources until a real driver show up, then that might just be what we
> need to do.
> 
> Two things should probably be changed from the current setup. 1)
> simplefb shouldn't be a platform driver. It is a boot thing that
> handles initial state from the graphics chip. By implementing it as a
> platform driver, it prevents the real driver from binding to the real
> device if the simplefb data embedded into it. 2) make sure that an SoC
> driver can protect the needed resources before they are automatically
> disabled. Either by putting them in an earlier initcall, or handling
> it in the subsystem code. I don't know enough about the regulator and
> clock runtime PM to know what the best way to do this is.

I posted a patch[0] earlier to do this for the clock framework in "that
other thread". The idea is that shim drivers for these types of firmware
devices can tell the various subsystems that they might need resources
that aren't explicitly requested. The current implementation simply uses
the existing infrastructure already present for the clk_ignore_unused
command-line argument and allows drivers to declare this requirement. It
also allows these drivers to retire the request once they've properly
handed off to the real driver.

Something similar could be done other frameworks.

One of the objections to that in the other thread is that it won't
prevent clocks from being disabled if some other driver was using those
same clocks and doing a clk_enable()/clk_disable() on them. But quite
frankly I don't think that's something we need to worry about.

Though there are two cases: one is to use simplefb as a means to have
early boot messages on a graphical display (and optionally hand off to a
real driver). The other is to use simplefb as the only framebuffer
driver until a proper driver has been implemented. The latter would have
the disadvantage of not allowing unused resources from being garbage
collected at all. Then again, I don't think power consumption is going
to be a very big issue on hardware where no proper display driver is
available.

Thierry

[0]: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/291295.html
Grant Likely Sept. 29, 2014, 1:12 p.m. UTC | #39
On Mon, Sep 29, 2014 at 1:57 PM, Thierry Reding
<thierry.reding@gmail.com> wrote:
> On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
>> On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
>> > On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
>> >> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
>> >
>> >> > Well, lets see... We've got a real user complaining about a platform
>> >> > that used to work on mainline, and no longer does. The only loophole
>> >> > for ignoring breakage is if there nobody cares that it is broken. That
>> >> > currently isn't the case. So even though it's based on a patch that
>> >> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
>> >> > doesn't change the situation that mainline has a regression.
>> >
>> >> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
>> >> labelled as.
>> >
>> >> One way to deal with this could be to add a quirk at boot time --
>> >> looking for the simplefb and if found, modifies the regulators to keep
>> >> them on. That'd go in the kernel, not in firmware.
>> >
>> > Well, we should also be fixing simplefb to manage the resources it uses
>> > though that doesn't clean up after the broken DTs that are currently
>> > deployed.
>> >
>> > As well as the regulators we'll also need to fix the clocks.  If we're
>> > going to start adding these fixups perhaps we want to consider having a
>> > wrapper stage that deals with rewriting DTs prior to trying to use them?
>> > I'm not sure if it makes much difference but there's overlap with other
>> > tools like the ATAGs conversion wrapper and building separately would
>> > let the fixup code run early without directly going into the early init
>> > code (which seems a bit scary).
>> >
>> >> Much better would have been if the DRM changes worked when they
>> >> landed, so that the migration form simplefb to drm was invisible to
>> >> the user. Or at least, to get them working ASAP since they're still
>> >> broken. :(
>> >
>> > As far as I can tell the problem here is coming from the decision to
>> > have simplefb use resources without knowing about them - can we agree
>> > that this is a bad idea?
>>
>> No, I don't think we can... there is a certain amount of "firmware got
>> things working for us, and we're going to use it for a while" that is
>> absolutely reasonable. simplefb is a good example, but there are
>> certainly others.
>>
>> I /do/ think it would be better for the simplefb data to get embedded
>> or linked into the node of the graphics controller so that it can be
>> torn down appropriately, and we need a rule for how long boot-state
>> can be considered valid so that a proper driver can either reserve the
>> resources for a given SoC, or do a full handoff from the simplefb.
>> Even without that though, we need to be able to handle the case of an
>> anonymous simplefb node with no regulator information. If that means
>> the default simplefb behaviour is to inhibit runtime pm on all
>> resources until a real driver show up, then that might just be what we
>> need to do.
>>
>> Two things should probably be changed from the current setup. 1)
>> simplefb shouldn't be a platform driver. It is a boot thing that
>> handles initial state from the graphics chip. By implementing it as a
>> platform driver, it prevents the real driver from binding to the real
>> device if the simplefb data embedded into it. 2) make sure that an SoC
>> driver can protect the needed resources before they are automatically
>> disabled. Either by putting them in an earlier initcall, or handling
>> it in the subsystem code. I don't know enough about the regulator and
>> clock runtime PM to know what the best way to do this is.
>
> I posted a patch[0] earlier to do this for the clock framework in "that
> other thread". The idea is that shim drivers for these types of firmware
> devices can tell the various subsystems that they might need resources
> that aren't explicitly requested. The current implementation simply uses
> the existing infrastructure already present for the clk_ignore_unused
> command-line argument and allows drivers to declare this requirement. It
> also allows these drivers to retire the request once they've properly
> handed off to the real driver.
>
> Something similar could be done other frameworks.
>
> One of the objections to that in the other thread is that it won't
> prevent clocks from being disabled if some other driver was using those
> same clocks and doing a clk_enable()/clk_disable() on them. But quite
> frankly I don't think that's something we need to worry about.

Agreed

> Though there are two cases: one is to use simplefb as a means to have
> early boot messages on a graphical display (and optionally hand off to a
> real driver). The other is to use simplefb as the only framebuffer
> driver until a proper driver has been implemented. The latter would have
> the disadvantage of not allowing unused resources from being garbage
> collected at all. Then again, I don't think power consumption is going
> to be a very big issue on hardware where no proper display driver is
> available.

When simplefb is the only framebuffer to get a platform working, it is
reasonable to have a placeholder driver that grabs the resources and
nothing else. When a real driver is implemented, and merged, the
placeholder driver should drop compatibility with the device node at
the same time.

g.
Mark Brown Sept. 29, 2014, 4:37 p.m. UTC | #40
On Mon, Sep 29, 2014 at 02:12:43PM +0100, Grant Likely wrote:
> On Mon, Sep 29, 2014 at 1:57 PM, Thierry Reding

> > Though there are two cases: one is to use simplefb as a means to have
> > early boot messages on a graphical display (and optionally hand off to a
> > real driver). The other is to use simplefb as the only framebuffer
> > driver until a proper driver has been implemented. The latter would have
> > the disadvantage of not allowing unused resources from being garbage
> > collected at all. Then again, I don't think power consumption is going
> > to be a very big issue on hardware where no proper display driver is
> > available.

> When simplefb is the only framebuffer to get a platform working, it is
> reasonable to have a placeholder driver that grabs the resources and
> nothing else. When a real driver is implemented, and merged, the
> placeholder driver should drop compatibility with the device node at
> the same time.

I'd thought there was some objection to doing this?  It does seem like a
sensible approach.
Maxime Ripard Sept. 29, 2014, 8:46 p.m. UTC | #41
On Mon, Sep 29, 2014 at 02:57:19PM +0200, Thierry Reding wrote:
> On Wed, Sep 10, 2014 at 03:56:16PM +0100, Grant Likely wrote:
> > On Wed, Sep 10, 2014 at 3:31 PM, Mark Brown <broonie@kernel.org> wrote:
> > > On Wed, Sep 10, 2014 at 06:06:46AM -0700, Olof Johansson wrote:
> > >> On Mon, Sep 8, 2014 at 12:40 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> > >
> > >> > Well, lets see... We've got a real user complaining about a platform
> > >> > that used to work on mainline, and no longer does. The only loophole
> > >> > for ignoring breakage is if there nobody cares that it is broken. That
> > >> > currently isn't the case. So even though it's based on a patch that
> > >> > has "DO NOT SUBMIT" in large friendly letters on the front cover, it
> > >> > doesn't change the situation that mainline has a regression.
> > >
> > >> Yeah, I'm with you on this Grant, it doesn't matter what the patch is
> > >> labelled as.
> > >
> > >> One way to deal with this could be to add a quirk at boot time --
> > >> looking for the simplefb and if found, modifies the regulators to keep
> > >> them on. That'd go in the kernel, not in firmware.
> > >
> > > Well, we should also be fixing simplefb to manage the resources it uses
> > > though that doesn't clean up after the broken DTs that are currently
> > > deployed.
> > >
> > > As well as the regulators we'll also need to fix the clocks.  If we're
> > > going to start adding these fixups perhaps we want to consider having a
> > > wrapper stage that deals with rewriting DTs prior to trying to use them?
> > > I'm not sure if it makes much difference but there's overlap with other
> > > tools like the ATAGs conversion wrapper and building separately would
> > > let the fixup code run early without directly going into the early init
> > > code (which seems a bit scary).
> > >
> > >> Much better would have been if the DRM changes worked when they
> > >> landed, so that the migration form simplefb to drm was invisible to
> > >> the user. Or at least, to get them working ASAP since they're still
> > >> broken. :(
> > >
> > > As far as I can tell the problem here is coming from the decision to
> > > have simplefb use resources without knowing about them - can we agree
> > > that this is a bad idea?
> > 
> > No, I don't think we can... there is a certain amount of "firmware got
> > things working for us, and we're going to use it for a while" that is
> > absolutely reasonable. simplefb is a good example, but there are
> > certainly others.
> > 
> > I /do/ think it would be better for the simplefb data to get embedded
> > or linked into the node of the graphics controller so that it can be
> > torn down appropriately, and we need a rule for how long boot-state
> > can be considered valid so that a proper driver can either reserve the
> > resources for a given SoC, or do a full handoff from the simplefb.
> > Even without that though, we need to be able to handle the case of an
> > anonymous simplefb node with no regulator information. If that means
> > the default simplefb behaviour is to inhibit runtime pm on all
> > resources until a real driver show up, then that might just be what we
> > need to do.
> > 
> > Two things should probably be changed from the current setup. 1)
> > simplefb shouldn't be a platform driver. It is a boot thing that
> > handles initial state from the graphics chip. By implementing it as a
> > platform driver, it prevents the real driver from binding to the real
> > device if the simplefb data embedded into it. 2) make sure that an SoC
> > driver can protect the needed resources before they are automatically
> > disabled. Either by putting them in an earlier initcall, or handling
> > it in the subsystem code. I don't know enough about the regulator and
> > clock runtime PM to know what the best way to do this is.
> 
> I posted a patch[0] earlier to do this for the clock framework in "that
> other thread". The idea is that shim drivers for these types of firmware
> devices can tell the various subsystems that they might need resources
> that aren't explicitly requested. The current implementation simply uses
> the existing infrastructure already present for the clk_ignore_unused
> command-line argument and allows drivers to declare this requirement. It
> also allows these drivers to retire the request once they've properly
> handed off to the real driver.
> 
> Something similar could be done other frameworks.
> 
> One of the objections to that in the other thread is that it won't
> prevent clocks from being disabled if some other driver was using those
> same clocks and doing a clk_enable()/clk_disable() on them. But quite
> frankly I don't think that's something we need to worry about.

That's not what has been said.

What might happen is this.

parent (gate)
  |
  +------> clock A (sound)
  |
  +------> clock B (display)

So. Let's say that at boot, we have parent enabled, and display
enabled, and that the display have been setup.

We have a sound driver that is going to probe. If at *any* point in
time, the sound driver is to disable its clock, the clock framework,
since there's no registered user left of the parent clock, will
disable the parent clock as well, effectively disabling the display
clock.

This can happen for various reason: failed probe, PM, whatever, or
even if the sound clock is to be reparented.

Quite frankly, I think that's definitely something we need to worry
about.

> Though there are two cases: one is to use simplefb as a means to have
> early boot messages on a graphical display (and optionally hand off to a
> real driver). The other is to use simplefb as the only framebuffer
> driver until a proper driver has been implemented. The latter would have
> the disadvantage of not allowing unused resources from being garbage
> collected at all. Then again, I don't think power consumption is going
> to be a very big issue on hardware where no proper display driver is
> available.

Two use cases, one single driver, and a proper way to prevent all the
issues your solution doesn't address, like the one we saw above, or
preventing the clock rate to change.

Maxime
Thierry Reding Sept. 30, 2014, 6:12 a.m. UTC | #42
On Mon, Sep 29, 2014 at 02:12:43PM +0100, Grant Likely wrote:
> On Mon, Sep 29, 2014 at 1:57 PM, Thierry Reding <thierry.reding@gmail.com> wrote:
> > Though there are two cases: one is to use simplefb as a means to have
> > early boot messages on a graphical display (and optionally hand off to a
> > real driver). The other is to use simplefb as the only framebuffer
> > driver until a proper driver has been implemented. The latter would have
> > the disadvantage of not allowing unused resources from being garbage
> > collected at all. Then again, I don't think power consumption is going
> > to be a very big issue on hardware where no proper display driver is
> > available.
> 
> When simplefb is the only framebuffer to get a platform working, it is
> reasonable to have a placeholder driver that grabs the resources and
> nothing else. When a real driver is implemented, and merged, the
> placeholder driver should drop compatibility with the device node at
> the same time.

You mean the device node for the real device should be compatible with
"simplefb"? One problem I see with that is that there may be multiple
dummy drivers for different pieces of hardware, all of them binding to
the simplefb compatible and conflicting.

Also this assumes that a device tree node exists for the device. One of
the reasons for using simplefb is so that you don't have to write that
device tree node and its binding yet.

Presumably, though, if the firmware already knows what resources are
needed and generate them at runtime it should be possible to come up
with a static device tree node, too.

Thierry
diff mbox

Patch

diff --git a/arch/arm/boot/dts/exynos5250-snow.dts
b/arch/arm/boot/dts/exynos5250-snow.dts
index 2a62459..6a29b44 100644
--- a/arch/arm/boot/dts/exynos5250-snow.dts
+++ b/arch/arm/boot/dts/exynos5250-snow.dts
@@ -196,6 +196,7 @@ 
 					};
 					fet1 {
 						regulator-name = "vcd_led";
+						regulator-always-on;
 						ti,overcurrent-wait = <3>;
 					};
 					tps65090_fet2: fet2 {
@@ -219,6 +220,7 @@ 
 					};
 					fet6 {
 						regulator-name = "lcd_vdd";
+						regulator-always-on;
 						ti,overcurrent-wait = <3>;
 					};
 					tps65090_fet7: fet7 {