diff mbox series

drm/Makefile: Move tiny drivers before native drivers

Message ID 20231108024613.2898921-1-chenhuacai@loongson.cn (mailing list archive)
State New, archived
Headers show
Series drm/Makefile: Move tiny drivers before native drivers | expand

Commit Message

Huacai Chen Nov. 8, 2023, 2:46 a.m. UTC
After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
screen until the display manager starts.

This regression occurs with such a Kconfig combination:
CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
CONFIG_DRM_SIMPLEDRM=y
CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu

If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
device), there is no blank screen. The root cause is the initialization
order, and this order depends on the Makefile.

FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
DRM_SIMPLEDRM will try to takeover I915, but fails to work.

So we can move the "tiny" directory before native DRM drivers to solve
this problem.

Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
Reported-by: Jaak Ristioja <jaak@ristioja.ee>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/gpu/drm/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas Zimmermann Nov. 8, 2023, 8:14 a.m. UTC | #1
Hi,

thanks for the patch.

Am 08.11.23 um 03:46 schrieb Huacai Chen:
> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
> screen until the display manager starts.
> 
> This regression occurs with such a Kconfig combination:
> CONFIG_SYSFB=y
> CONFIG_SYSFB_SIMPLEFB=y
> CONFIG_DRM_SIMPLEDRM=y
> CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu
> 
> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
> device), there is no blank screen. The root cause is the initialization
> order, and this order depends on the Makefile.
> 
> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
> DRM_SIMPLEDRM will try to takeover I915, but fails to work.

But what exactly is the problem? From the lengthy discussion threat, it 
looks like you've stumbled across a long-known problem, where the 
firmware driver probes a device that has already been taken by a native 
driver. But that should not be possible.

As you know, there's a platform device that represents the firmware 
framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
i915 and the other native drivers we remove that platform device, so 
that simpledrm does not run.

We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
function contains a call to sysfb_disable(), [3] which should be invoked 
for the i915 device and remove the platform device.

[1] 
https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
[2] 
https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
[3] 
https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63

Can you investigate why this does not work? Is sysfb_disable() not being 
called? Does it remove the platform device?

> 
> So we can move the "tiny" directory before native DRM drivers to solve
> this problem.

Relying on linking order is just as unreliable. The usual workaround is 
to build native drivers as modules. But first, please investigate where 
the current code fails.

Best regards
Thomas

> 
> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
> Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
> Reported-by: Jaak Ristioja <jaak@ristioja.ee>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>   drivers/gpu/drm/Makefile | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 8e1bde059170..db0f3d3aff43 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -141,6 +141,7 @@ obj-y			+= arm/
>   obj-y			+= display/
>   obj-$(CONFIG_DRM_TTM)	+= ttm/
>   obj-$(CONFIG_DRM_SCHED)	+= scheduler/
> +obj-y			+= tiny/
>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>   obj-y			+= hisilicon/
>   obj-y			+= mxsfb/
> -obj-y			+= tiny/
>   obj-$(CONFIG_DRM_PL111) += pl111/
>   obj-$(CONFIG_DRM_TVE200) += tve200/
>   obj-$(CONFIG_DRM_XEN) += xen/
Javier Martinez Canillas Nov. 8, 2023, 8:24 a.m. UTC | #2
Hello,

On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi,
>

[...]

>
> Relying on linking order is just as unreliable. The usual workaround is
> to build native drivers as modules. But first, please investigate where
> the current code fails.
>

I fully agree with Thomas here. This is just papering over the issue.

I'll read the lengthy thread now to see if I can better understand
what's going on here.
Huacai Chen Nov. 8, 2023, 2:45 p.m. UTC | #3
Hi, Thomas,

On Wed, Nov 8, 2023 at 4:14 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi,
>
> thanks for the patch.
>
> Am 08.11.23 um 03:46 schrieb Huacai Chen:
> > After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
> > device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
> > screen until the display manager starts.
> >
> > This regression occurs with such a Kconfig combination:
> > CONFIG_SYSFB=y
> > CONFIG_SYSFB_SIMPLEFB=y
> > CONFIG_DRM_SIMPLEDRM=y
> > CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu
> >
> > If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
> > device), there is no blank screen. The root cause is the initialization
> > order, and this order depends on the Makefile.
> >
> > FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
> > so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
> > will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
> > DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>
> But what exactly is the problem? From the lengthy discussion threat, it
> looks like you've stumbled across a long-known problem, where the
> firmware driver probes a device that has already been taken by a native
> driver. But that should not be possible.
Yes, it is a long-known problem but only exposed after commit
60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync"), because the platform device
for simpledrm was not created as early as possible in old kernels.

>
> As you know, there's a platform device that represents the firmware
> framebuffer. The firmware drivers, such as simpledrm, bind to it. In
> i915 and the other native drivers we remove that platform device, so
> that simpledrm does not run.
>
> We call the DRM aperture helpers at [1]. It's implemented at [2]. The
> function contains a call to sysfb_disable(), [3] which should be invoked
> for the i915 device and remove the platform device.
Yes, this looks a little strange, which I haven't investigated before.
Jaak, could you please try to see whether sysfb_disable() is called in
bad kernels?

>
> [1]
> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
> [2]
> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
> [3]
> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>
> Can you investigate why this does not work? Is sysfb_disable() not being
> called? Does it remove the platform device?
>
> >
> > So we can move the "tiny" directory before native DRM drivers to solve
> > this problem.
>
> Relying on linking order is just as unreliable. The usual workaround is
> to build native drivers as modules. But first, please investigate where
> the current code fails.
Yes, native drivers are usually built as modules, but Jaak tries to
built-in, and then reports this bug. :)

Huacai

>
> Best regards
> Thomas
>
> >
> > Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
> > Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
> > Reported-by: Jaak Ristioja <jaak@ristioja.ee>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >   drivers/gpu/drm/Makefile | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index 8e1bde059170..db0f3d3aff43 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -141,6 +141,7 @@ obj-y                     += arm/
> >   obj-y                       += display/
> >   obj-$(CONFIG_DRM_TTM)       += ttm/
> >   obj-$(CONFIG_DRM_SCHED)     += scheduler/
> > +obj-y                        += tiny/
> >   obj-$(CONFIG_DRM_RADEON)+= radeon/
> >   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
> >   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
> > @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
> >   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
> >   obj-y                       += hisilicon/
> >   obj-y                       += mxsfb/
> > -obj-y                        += tiny/
> >   obj-$(CONFIG_DRM_PL111) += pl111/
> >   obj-$(CONFIG_DRM_TVE200) += tve200/
> >   obj-$(CONFIG_DRM_XEN) += xen/
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
Huacai Chen Nov. 8, 2023, 2:46 p.m. UTC | #4
Hi, Javier,

On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> Hello,
>
> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >
> > Hi,
> >
>
> [...]
>
> >
> > Relying on linking order is just as unreliable. The usual workaround is
> > to build native drivers as modules. But first, please investigate where
> > the current code fails.
> >
>
> I fully agree with Thomas here. This is just papering over the issue.
>
> I'll read the lengthy thread now to see if I can better understand
> what's going on here.
Thank you very much.

Huacai

> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>
Huacai Chen Dec. 11, 2023, 3:08 a.m. UTC | #5
Hi, Javier,

On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> Hello,
>
> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >
> > Hi,
> >
>
> [...]
>
> >
> > Relying on linking order is just as unreliable. The usual workaround is
> > to build native drivers as modules. But first, please investigate where
> > the current code fails.
> >
>
> I fully agree with Thomas here. This is just papering over the issue.
>
> I'll read the lengthy thread now to see if I can better understand
> what's going on here.
Have you understood enough now? I really don't want the original patch
to be reverted.

And Jaak, could you please test with the below patch (but keep the
original order in Makefile) and then give me the dmesg output?

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..cc2e39fb98f5 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -350,21 +350,29 @@ int
aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
char *na
        resource_size_t base, size;
        int bar, ret = 0;

-       if (pdev == vga_default_device())
+       printk("DEBUG: remove 1\n");
+
+       if (pdev == vga_default_device()) {
+               printk("DEBUG: primary = true\n");
                primary = true;
+       }

-       if (primary)
+       if (primary) {
+               printk("DEBUG: disable sysfb\n");
                sysfb_disable();
+       }

        for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
                if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
                        continue;

+               printk("DEBUG: remove 2\n");
                base = pci_resource_start(pdev, bar);
                size = pci_resource_len(pdev, bar);
                aperture_detach_devices(base, size);
        }

+       printk("DEBUG: remove 3\n");
        /*
         * If this is the primary adapter, there could be a VGA device
         * that consumes the VGA framebuffer I/O range. Remove this

[1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u

> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>
Javier Martinez Canillas Dec. 11, 2023, 8:33 a.m. UTC | #6
Huacai Chen <chenhuacai@kernel.org> writes:

Hello Huacai,

> Hi, Javier,
>
> On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
> <javierm@redhat.com> wrote:
>>
>> Hello,
>>
>> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> >
>> > Hi,
>> >
>>
>> [...]
>>
>> >
>> > Relying on linking order is just as unreliable. The usual workaround is
>> > to build native drivers as modules. But first, please investigate where
>> > the current code fails.
>> >
>>
>> I fully agree with Thomas here. This is just papering over the issue.
>>
>> I'll read the lengthy thread now to see if I can better understand
>> what's going on here.
> Have you understood enough now? I really don't want the original patch
> to be reverted.
>

I discussed this with Thomas but we didn't fully understand what was going
on. In theory, it should work since the native driver should disable sysfb
and remove the registered platform device. But it seems that this does not
happen for Jaak and others who reported the same issue.

Something that we noticed is that PCI fixups happen in fs_initcall_sync()
and since the sysfb_init() should happen after the PCI subsystem for EFI
quirks, we think that at least should be moved after that initcall level.

That means rootfs_initcall() onwards, and that takes it almost at the same
before your patch. The safest would be to move sysfb_init() to initcall
device_initcall_sync() and make sure that happens after all the native DRM
drivers, since module_init() happens at device_initcall().

I think that Thomas meant to send a patch to do that change.
Huacai Chen Dec. 11, 2023, 9:11 a.m. UTC | #7
Hi, Javier,

On Mon, Dec 11, 2023 at 4:33 PM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> Huacai Chen <chenhuacai@kernel.org> writes:
>
> Hello Huacai,
>
> > Hi, Javier,
> >
> > On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
> > <javierm@redhat.com> wrote:
> >>
> >> Hello,
> >>
> >> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> >
> >> > Hi,
> >> >
> >>
> >> [...]
> >>
> >> >
> >> > Relying on linking order is just as unreliable. The usual workaround is
> >> > to build native drivers as modules. But first, please investigate where
> >> > the current code fails.
> >> >
> >>
> >> I fully agree with Thomas here. This is just papering over the issue.
> >>
> >> I'll read the lengthy thread now to see if I can better understand
> >> what's going on here.
> > Have you understood enough now? I really don't want the original patch
> > to be reverted.
> >
>
> I discussed this with Thomas but we didn't fully understand what was going
> on. In theory, it should work since the native driver should disable sysfb
> and remove the registered platform device. But it seems that this does not
> happen for Jaak and others who reported the same issue.
>
> Something that we noticed is that PCI fixups happen in fs_initcall_sync()
> and since the sysfb_init() should happen after the PCI subsystem for EFI
> quirks, we think that at least should be moved after that initcall level.
>
> That means rootfs_initcall() onwards, and that takes it almost at the same
> before your patch. The safest would be to move sysfb_init() to initcall
> device_initcall_sync() and make sure that happens after all the native DRM
> drivers, since module_init() happens at device_initcall().
>
> I think that Thomas meant to send a patch to do that change.
Thank you very much. I guess things may be like this:
i915 init at first, then simpledrm init in parallel and finished
before i915 call sysfb_disable(), so in my previous reply I provide a
debug patch for Jaak to see what happens.

Huacai

>
> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>
Javier Martinez Canillas Dec. 11, 2023, 9:16 a.m. UTC | #8
Huacai Chen <chenhuacai@kernel.org> writes:

> Hi, Javier,
>
> On Mon, Dec 11, 2023 at 4:33 PM Javier Martinez Canillas
> <javierm@redhat.com> wrote:
>>
>> Huacai Chen <chenhuacai@kernel.org> writes:
>>
>> Hello Huacai,
>>
>> > Hi, Javier,
>> >
>> > On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
>> > <javierm@redhat.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >>
>> >> [...]
>> >>
>> >> >
>> >> > Relying on linking order is just as unreliable. The usual workaround is
>> >> > to build native drivers as modules. But first, please investigate where
>> >> > the current code fails.
>> >> >
>> >>
>> >> I fully agree with Thomas here. This is just papering over the issue.
>> >>
>> >> I'll read the lengthy thread now to see if I can better understand
>> >> what's going on here.
>> > Have you understood enough now? I really don't want the original patch
>> > to be reverted.
>> >
>>
>> I discussed this with Thomas but we didn't fully understand what was going
>> on. In theory, it should work since the native driver should disable sysfb
>> and remove the registered platform device. But it seems that this does not
>> happen for Jaak and others who reported the same issue.
>>
>> Something that we noticed is that PCI fixups happen in fs_initcall_sync()
>> and since the sysfb_init() should happen after the PCI subsystem for EFI
>> quirks, we think that at least should be moved after that initcall level.
>>
>> That means rootfs_initcall() onwards, and that takes it almost at the same
>> before your patch. The safest would be to move sysfb_init() to initcall
>> device_initcall_sync() and make sure that happens after all the native DRM
>> drivers, since module_init() happens at device_initcall().
>>
>> I think that Thomas meant to send a patch to do that change.
> Thank you very much. I guess things may be like this:
> i915 init at first, then simpledrm init in parallel and finished
> before i915 call sysfb_disable(), so in my previous reply I provide a
> debug patch for Jaak to see what happens.
>

Yes, specially with async probing although neither i915 nor simpledrm use
it right now AFAICT.

Is still unclear to me what's going on in this particular case, although
moving it to device_initcall_sync() seems to be the correct thing to do
regardless of this issue.
Huacai Chen Dec. 11, 2023, 9:26 a.m. UTC | #9
Hi, Javier,

On Mon, Dec 11, 2023 at 5:17 PM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> Huacai Chen <chenhuacai@kernel.org> writes:
>
> > Hi, Javier,
> >
> > On Mon, Dec 11, 2023 at 4:33 PM Javier Martinez Canillas
> > <javierm@redhat.com> wrote:
> >>
> >> Huacai Chen <chenhuacai@kernel.org> writes:
> >>
> >> Hello Huacai,
> >>
> >> > Hi, Javier,
> >> >
> >> > On Wed, Nov 8, 2023 at 4:24 PM Javier Martinez Canillas
> >> > <javierm@redhat.com> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> On Wed, Nov 8, 2023 at 9:14 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >>
> >> >> [...]
> >> >>
> >> >> >
> >> >> > Relying on linking order is just as unreliable. The usual workaround is
> >> >> > to build native drivers as modules. But first, please investigate where
> >> >> > the current code fails.
> >> >> >
> >> >>
> >> >> I fully agree with Thomas here. This is just papering over the issue.
> >> >>
> >> >> I'll read the lengthy thread now to see if I can better understand
> >> >> what's going on here.
> >> > Have you understood enough now? I really don't want the original patch
> >> > to be reverted.
> >> >
> >>
> >> I discussed this with Thomas but we didn't fully understand what was going
> >> on. In theory, it should work since the native driver should disable sysfb
> >> and remove the registered platform device. But it seems that this does not
> >> happen for Jaak and others who reported the same issue.
> >>
> >> Something that we noticed is that PCI fixups happen in fs_initcall_sync()
> >> and since the sysfb_init() should happen after the PCI subsystem for EFI
> >> quirks, we think that at least should be moved after that initcall level.
> >>
> >> That means rootfs_initcall() onwards, and that takes it almost at the same
> >> before your patch. The safest would be to move sysfb_init() to initcall
> >> device_initcall_sync() and make sure that happens after all the native DRM
> >> drivers, since module_init() happens at device_initcall().
> >>
> >> I think that Thomas meant to send a patch to do that change.
> > Thank you very much. I guess things may be like this:
> > i915 init at first, then simpledrm init in parallel and finished
> > before i915 call sysfb_disable(), so in my previous reply I provide a
> > debug patch for Jaak to see what happens.
> >
>
> Yes, specially with async probing although neither i915 nor simpledrm use
> it right now AFAICT.
>
> Is still unclear to me what's going on in this particular case, although
> moving it to device_initcall_sync() seems to be the correct thing to do
> regardless of this issue.
But that cannot "make the screen display as early as possible". Maybe
we can wait for Jaak's result.

Huacai

>
> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>
Jani Nikula Jan. 23, 2024, 8:53 a.m. UTC | #10
On Wed, 08 Nov 2023, Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi,
>
> thanks for the patch.
>
> Am 08.11.23 um 03:46 schrieb Huacai Chen:
>> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
>> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
>> screen until the display manager starts.
>> 
>> This regression occurs with such a Kconfig combination:
>> CONFIG_SYSFB=y
>> CONFIG_SYSFB_SIMPLEFB=y
>> CONFIG_DRM_SIMPLEDRM=y
>> CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu
>> 
>> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
>> device), there is no blank screen. The root cause is the initialization
>> order, and this order depends on the Makefile.
>> 
>> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
>> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
>> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
>> DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>
> But what exactly is the problem? From the lengthy discussion threat, it 
> looks like you've stumbled across a long-known problem, where the 
> firmware driver probes a device that has already been taken by a native 
> driver. But that should not be possible.
>
> As you know, there's a platform device that represents the firmware 
> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
> i915 and the other native drivers we remove that platform device, so 
> that simpledrm does not run.

The problem is still not resolved. Another bug report at [1].

The commit message here points at 60aebc955949 ("drivers/firmware: Move
sysfb_init() from device_initcall to subsys_initcall_sync") as
regressing, and Jaak also bisected it (see Closes:).

I agree the patch here is just papering over the issue, but lacking a
proper fix, for months, a revert would be in order, no?


BR,
Jani.


[1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133


>
> We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
> function contains a call to sysfb_disable(), [3] which should be invoked 
> for the i915 device and remove the platform device.
>
> [1] 
> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
> [2] 
> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
> [3] 
> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>
> Can you investigate why this does not work? Is sysfb_disable() not being 
> called? Does it remove the platform device?
>
>> 
>> So we can move the "tiny" directory before native DRM drivers to solve
>> this problem.
>
> Relying on linking order is just as unreliable. The usual workaround is 
> to build native drivers as modules. But first, please investigate where 
> the current code fails.
>
> Best regards
> Thomas
>
>> 
>> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
>> Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
>> Reported-by: Jaak Ristioja <jaak@ristioja.ee>
>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>> ---
>>   drivers/gpu/drm/Makefile | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 8e1bde059170..db0f3d3aff43 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -141,6 +141,7 @@ obj-y			+= arm/
>>   obj-y			+= display/
>>   obj-$(CONFIG_DRM_TTM)	+= ttm/
>>   obj-$(CONFIG_DRM_SCHED)	+= scheduler/
>> +obj-y			+= tiny/
>>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
>> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>>   obj-y			+= hisilicon/
>>   obj-y			+= mxsfb/
>> -obj-y			+= tiny/
>>   obj-$(CONFIG_DRM_PL111) += pl111/
>>   obj-$(CONFIG_DRM_TVE200) += tve200/
>>   obj-$(CONFIG_DRM_XEN) += xen/
Linux regression tracking (Thorsten Leemhuis) Jan. 23, 2024, 9:17 a.m. UTC | #11
On 23.01.24 09:53, Jani Nikula wrote:
> On Wed, 08 Nov 2023, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> thanks for the patch.
>>
>> Am 08.11.23 um 03:46 schrieb Huacai Chen:
>>> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
>>> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
>>> screen until the display manager starts.
>>>
>>> This regression occurs with such a Kconfig combination:
>>> CONFIG_SYSFB=y
>>> CONFIG_SYSFB_SIMPLEFB=y
>>> CONFIG_DRM_SIMPLEDRM=y
>>> CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu
>>>
>>> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
>>> device), there is no blank screen. The root cause is the initialization
>>> order, and this order depends on the Makefile.
>>>
>>> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
>>> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
>>> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
>>> DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>>
>> But what exactly is the problem? From the lengthy discussion threat, it 
>> looks like you've stumbled across a long-known problem, where the 
>> firmware driver probes a device that has already been taken by a native 
>> driver. But that should not be possible.
>>
>> As you know, there's a platform device that represents the firmware 
>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
>> i915 and the other native drivers we remove that platform device, so 
>> that simpledrm does not run.
> 
> The problem is still not resolved. Another bug report at [1].
> 
> The commit message here points at 60aebc955949 ("drivers/firmware: Move
> sysfb_init() from device_initcall to subsys_initcall_sync") as
> regressing, and Jaak also bisected it (see Closes:).
> 
> I agree the patch here is just papering over the issue, but lacking a
> proper fix, for months, a revert would be in order, no?
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133

Interesting.

JFYI for those that don't follow this closely: Huacai Chen proposed a
fix and asked a earlier reporter to test it, but afaics heard nothing back:

https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyJ-9GXgvA@mail.gmail.com/

That's afaics why this got stuck (and why I didn't request on a escalate
this weeks ago).

Ciao, Thorsten

>> We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
>> function contains a call to sysfb_disable(), [3] which should be invoked 
>> for the i915 device and remove the platform device.
>>
>> [1] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
>> [2] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
>> [3] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>>
>> Can you investigate why this does not work? Is sysfb_disable() not being 
>> called? Does it remove the platform device?
>>
>>>
>>> So we can move the "tiny" directory before native DRM drivers to solve
>>> this problem.
>>
>> Relying on linking order is just as unreliable. The usual workaround is 
>> to build native drivers as modules. But first, please investigate where 
>> the current code fails.
>>
>> Best regards
>> Thomas
>>
>>>
>>> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
>>> Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
>>> Reported-by: Jaak Ristioja <jaak@ristioja.ee>
>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>> ---
>>>   drivers/gpu/drm/Makefile | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 8e1bde059170..db0f3d3aff43 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -141,6 +141,7 @@ obj-y			+= arm/
>>>   obj-y			+= display/
>>>   obj-$(CONFIG_DRM_TTM)	+= ttm/
>>>   obj-$(CONFIG_DRM_SCHED)	+= scheduler/
>>> +obj-y			+= tiny/
>>>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
>>> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>>>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>>>   obj-y			+= hisilicon/
>>>   obj-y			+= mxsfb/
>>> -obj-y			+= tiny/
>>>   obj-$(CONFIG_DRM_PL111) += pl111/
>>>   obj-$(CONFIG_DRM_TVE200) += tve200/
>>>   obj-$(CONFIG_DRM_XEN) += xen/
>
Javier Martinez Canillas Jan. 23, 2024, 9:41 a.m. UTC | #12
Thorsten Leemhuis <regressions@leemhuis.info> writes:

> On 23.01.24 09:53, Jani Nikula wrote:
>> On Wed, 08 Nov 2023, Thomas Zimmermann <tzimmermann@suse.de> wrote:

[...]

>
>>> As you know, there's a platform device that represents the firmware 
>>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
>>> i915 and the other native drivers we remove that platform device, so 
>>> that simpledrm does not run.
>> 
>> The problem is still not resolved. Another bug report at [1].
>> 
>> The commit message here points at 60aebc955949 ("drivers/firmware: Move
>> sysfb_init() from device_initcall to subsys_initcall_sync") as
>> regressing, and Jaak also bisected it (see Closes:).
>> 
>> I agree the patch here is just papering over the issue, but lacking a
>> proper fix, for months, a revert would be in order, no?
>>

Yes, I agree that this patch has to be reverted, since as you said the
issue has not been fixed and is causing regressions for multiple users.

>> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133
>
> Interesting.
>
> JFYI for those that don't follow this closely: Huacai Chen proposed a
> fix and asked a earlier reporter to test it, but afaics heard nothing back:
>
> https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyJ-9GXgvA@mail.gmail.com/
>

It's not a fix but a debug patch for the patch author to get more info ?
Linux regression tracking (Thorsten Leemhuis) Jan. 23, 2024, 9:44 a.m. UTC | #13
On 23.01.24 10:17, Thorsten Leemhuis wrote:
> On 23.01.24 09:53, Jani Nikula wrote:
>> On Wed, 08 Nov 2023, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>
>>> thanks for the patch.
>>>
>>> Am 08.11.23 um 03:46 schrieb Huacai Chen:
>>>> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
>>>> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
>>>> screen until the display manager starts.
>>>>
>>>> This regression occurs with such a Kconfig combination:
>>>> CONFIG_SYSFB=y
>>>> CONFIG_SYSFB_SIMPLEFB=y
>>>> CONFIG_DRM_SIMPLEDRM=y
>>>> CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu
>>>>
>>>> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
>>>> device), there is no blank screen. The root cause is the initialization
>>>> order, and this order depends on the Makefile.
>>>>
>>>> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
>>>> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
>>>> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
>>>> DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>>>
>>> But what exactly is the problem? From the lengthy discussion threat, it 
>>> looks like you've stumbled across a long-known problem, where the 
>>> firmware driver probes a device that has already been taken by a native 
>>> driver. But that should not be possible.
>>>
>>> As you know, there's a platform device that represents the firmware 
>>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
>>> i915 and the other native drivers we remove that platform device, so 
>>> that simpledrm does not run.
>>
>> The problem is still not resolved. Another bug report at [1].
>>
>> The commit message here points at 60aebc955949 ("drivers/firmware: Move
>> sysfb_init() from device_initcall to subsys_initcall_sync") as
>> regressing, and Jaak also bisected it (see Closes:).
>>
>> I agree the patch here is just papering over the issue, but lacking a
>> proper fix, for months, a revert would be in order, no?
>>
>> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133
> 
> Interesting.
> 
> JFYI for those that don't follow this closely: Huacai Chen proposed a
> fix

Sorry, I did not look closely and misremembered: that was not a fix, it
was just a test patch for gather more data for debugging.

Ciao, Thorsten

> and asked a earlier reporter to test it, but afaics heard nothing back:
> 
> https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyJ-9GXgvA@mail.gmail.com/
> 
> That's afaics why this got stuck (and why I didn't request on a escalate
> this weeks ago).
> 
> Ciao, Thorsten
> 
>>> We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
>>> function contains a call to sysfb_disable(), [3] which should be invoked 
>>> for the i915 device and remove the platform device.
>>>
>>> [1] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
>>> [2] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
>>> [3] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>>>
>>> Can you investigate why this does not work? Is sysfb_disable() not being 
>>> called? Does it remove the platform device?
>>>
>>>>
>>>> So we can move the "tiny" directory before native DRM drivers to solve
>>>> this problem.
>>>
>>> Relying on linking order is just as unreliable. The usual workaround is 
>>> to build native drivers as modules. But first, please investigate where 
>>> the current code fails.
>>>
>>> Best regards
>>> Thomas
>>>
>>>>
>>>> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
>>>> Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
>>>> Reported-by: Jaak Ristioja <jaak@ristioja.ee>
>>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>>> ---
>>>>   drivers/gpu/drm/Makefile | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>>> index 8e1bde059170..db0f3d3aff43 100644
>>>> --- a/drivers/gpu/drm/Makefile
>>>> +++ b/drivers/gpu/drm/Makefile
>>>> @@ -141,6 +141,7 @@ obj-y			+= arm/
>>>>   obj-y			+= display/
>>>>   obj-$(CONFIG_DRM_TTM)	+= ttm/
>>>>   obj-$(CONFIG_DRM_SCHED)	+= scheduler/
>>>> +obj-y			+= tiny/
>>>>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
>>>> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>>>>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>>>>   obj-y			+= hisilicon/
>>>>   obj-y			+= mxsfb/
>>>> -obj-y			+= tiny/
>>>>   obj-$(CONFIG_DRM_PL111) += pl111/
>>>>   obj-$(CONFIG_DRM_TVE200) += tve200/
>>>>   obj-$(CONFIG_DRM_XEN) += xen/
>>
Thomas Zimmermann Jan. 23, 2024, 10:19 a.m. UTC | #14
Hi

Am 23.01.24 um 10:41 schrieb Javier Martinez Canillas:
> Thorsten Leemhuis <regressions@leemhuis.info> writes:
> 
>> On 23.01.24 09:53, Jani Nikula wrote:
>>> On Wed, 08 Nov 2023, Thomas Zimmermann <tzimmermann@suse.de> wrote:
> 
> [...]
> 
>>
>>>> As you know, there's a platform device that represents the firmware
>>>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In
>>>> i915 and the other native drivers we remove that platform device, so
>>>> that simpledrm does not run.
>>>
>>> The problem is still not resolved. Another bug report at [1].
>>>
>>> The commit message here points at 60aebc955949 ("drivers/firmware: Move
>>> sysfb_init() from device_initcall to subsys_initcall_sync") as
>>> regressing, and Jaak also bisected it (see Closes:).
>>>
>>> I agree the patch here is just papering over the issue, but lacking a
>>> proper fix, for months, a revert would be in order, no?
>>>
> 
> Yes, I agree that this patch has to be reverted, since as you said the
> issue has not been fixed and is causing regressions for multiple users.

I wanted to send out a revert anyway. I'll do this in a bit.

Best regards
Thomas

> 
>>> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133
>>
>> Interesting.
>>
>> JFYI for those that don't follow this closely: Huacai Chen proposed a
>> fix and asked a earlier reporter to test it, but afaics heard nothing back:
>>
>> https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyJ-9GXgvA@mail.gmail.com/
>>
> 
> It's not a fix but a debug patch for the patch author to get more info ?
>
Jaak Ristioja Jan. 23, 2024, 9:20 p.m. UTC | #15
Hi,

I apologize for not finding the time to test this earlier.

On 11.12.23 05:08, Huacai Chen wrote:
> And Jaak, could you please test with the below patch (but keep the
> original order in Makefile) and then give me the dmesg output?
> 
> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> index 561be8feca96..cc2e39fb98f5 100644
> --- a/drivers/video/aperture.c
> +++ b/drivers/video/aperture.c
> @@ -350,21 +350,29 @@ int
> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> char *na
>          resource_size_t base, size;
>          int bar, ret = 0;
> 
> -       if (pdev == vga_default_device())
> +       printk("DEBUG: remove 1\n");
> +
> +       if (pdev == vga_default_device()) {
> +               printk("DEBUG: primary = true\n");
>                  primary = true;
> +       }
> 
> -       if (primary)
> +       if (primary) {
> +               printk("DEBUG: disable sysfb\n");
>                  sysfb_disable();
> +       }
> 
>          for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>                  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>                          continue;
> 
> +               printk("DEBUG: remove 2\n");
>                  base = pci_resource_start(pdev, bar);
>                  size = pci_resource_len(pdev, bar);
>                  aperture_detach_devices(base, size);
>          }
> 
> +       printk("DEBUG: remove 3\n");
>          /*
>           * If this is the primary adapter, there could be a VGA device
>           * that consumes the VGA framebuffer I/O range. Remove this
> 
> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u

Copy-pasting this from the e-mail body didn't work well, but I applied 
the changes manually to a 6.5.9 kernel without any of the other patches. 
Here's the relevant dmesg output on the Lenovo L570:

...
[    2.953405] ACPI: bus type drm_connector registered
[    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    2.954018] DEBUG: remove 1
[    2.954020] DEBUG: remove 2
[    2.954021] DEBUG: remove 2
[    2.954023] DEBUG: remove 3
[    2.954029] resource: resource sanity check: requesting [mem 
0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB 
[mem 0xe0000000-0xe012bfff]
[    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
[    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
[    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware 
i915/kbl_dmc_ver1_04.bin (v1.4)
...
[    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on 
minor 0
[    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom: 
no  post: no)
[    4.147244] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
[    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
[    4.147420] usbcore: registered new interface driver udl
[    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for 
simple-framebuffer.0 on minor 2
[    4.148643] Console: switching to colour frame buffer device 80x30
[    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0: 
simpledrmdrmfb frame buffer device
[    4.154043] loop: module loaded
[    4.156017] ahci 0000:00:17.0: version 3.0
[    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
...

J
Huacai Chen Jan. 24, 2024, 3 a.m. UTC | #16
Hi, Javier and Thomas,

On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>
> Hi,
>
> I apologize for not finding the time to test this earlier.
>
> On 11.12.23 05:08, Huacai Chen wrote:
> > And Jaak, could you please test with the below patch (but keep the
> > original order in Makefile) and then give me the dmesg output?
> >
> > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> > index 561be8feca96..cc2e39fb98f5 100644
> > --- a/drivers/video/aperture.c
> > +++ b/drivers/video/aperture.c
> > @@ -350,21 +350,29 @@ int
> > aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> > char *na
> >          resource_size_t base, size;
> >          int bar, ret = 0;
> >
> > -       if (pdev == vga_default_device())
> > +       printk("DEBUG: remove 1\n");
> > +
> > +       if (pdev == vga_default_device()) {
> > +               printk("DEBUG: primary = true\n");
> >                  primary = true;
> > +       }
> >
> > -       if (primary)
> > +       if (primary) {
> > +               printk("DEBUG: disable sysfb\n");
> >                  sysfb_disable();
> > +       }
> >
> >          for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >                  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >                          continue;
> >
> > +               printk("DEBUG: remove 2\n");
> >                  base = pci_resource_start(pdev, bar);
> >                  size = pci_resource_len(pdev, bar);
> >                  aperture_detach_devices(base, size);
> >          }
> >
> > +       printk("DEBUG: remove 3\n");
> >          /*
> >           * If this is the primary adapter, there could be a VGA device
> >           * that consumes the VGA framebuffer I/O range. Remove this
> >
> > [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>
> Copy-pasting this from the e-mail body didn't work well, but I applied
> the changes manually to a 6.5.9 kernel without any of the other patches.
> Here's the relevant dmesg output on the Lenovo L570:
>
> ...
> [    2.953405] ACPI: bus type drm_connector registered
> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
> [    2.954018] DEBUG: remove 1
> [    2.954020] DEBUG: remove 2
> [    2.954021] DEBUG: remove 2
> [    2.954023] DEBUG: remove 3

My tmp patch is as follows:

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..cc2e39fb98f5 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -350,21 +350,29 @@ int
aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
char *na
        resource_size_t base, size;
        int bar, ret = 0;

-       if (pdev == vga_default_device())
+       printk("DEBUG: remove 1\n");
+
+       if (pdev == vga_default_device()) {
+               printk("DEBUG: primary = true\n");
                primary = true;
+       }

-       if (primary)
+       if (primary) {
+               printk("DEBUG: disable sysfb\n");
                sysfb_disable();
+       }

        for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
                if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
                        continue;

+               printk("DEBUG: remove 2\n");
                base = pci_resource_start(pdev, bar);
                size = pci_resource_len(pdev, bar);
                aperture_detach_devices(base, size);
        }

+       printk("DEBUG: remove 3\n");
        /*
         * If this is the primary adapter, there could be a VGA device
         * that consumes the VGA framebuffer I/O range. Remove this

From the Jaak's dmesg, it is obvious that "pdev ==
vga_default_device()" is false, which causes sysfb_disable() to be not
called. And I think the simple-drm device is not provided by the i915
device in this case. So, can we unconditionally call sysfb_disable()
here, which is the same as aperture_remove_conflicting_devices()?

Huacai

> [    2.954029] resource: resource sanity check: requesting [mem
> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
> [mem 0xe0000000-0xe012bfff]
> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> i915/kbl_dmc_ver1_04.bin (v1.4)
> ...
> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
> minor 0
> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
> no  post: no)
> [    4.147244] input: Video Bus as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
> [    4.147420] usbcore: registered new interface driver udl
> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
> simple-framebuffer.0 on minor 2
> [    4.148643] Console: switching to colour frame buffer device 80x30
> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
> simpledrmdrmfb frame buffer device
> [    4.154043] loop: module loaded
> [    4.156017] ahci 0000:00:17.0: version 3.0
> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
> ...
>
> J
>
Thomas Zimmermann Jan. 24, 2024, 8:16 a.m. UTC | #17
Hi

Am 24.01.24 um 04:00 schrieb Huacai Chen:
> Hi, Javier and Thomas,
> 
> On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>
>> Hi,
>>
>> I apologize for not finding the time to test this earlier.
>>
>> On 11.12.23 05:08, Huacai Chen wrote:
>>> And Jaak, could you please test with the below patch (but keep the
>>> original order in Makefile) and then give me the dmesg output?
>>>
>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>>> index 561be8feca96..cc2e39fb98f5 100644
>>> --- a/drivers/video/aperture.c
>>> +++ b/drivers/video/aperture.c
>>> @@ -350,21 +350,29 @@ int
>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
>>> char *na
>>>           resource_size_t base, size;
>>>           int bar, ret = 0;
>>>
>>> -       if (pdev == vga_default_device())
>>> +       printk("DEBUG: remove 1\n");
>>> +
>>> +       if (pdev == vga_default_device()) {
>>> +               printk("DEBUG: primary = true\n");
>>>                   primary = true;
>>> +       }
>>>
>>> -       if (primary)
>>> +       if (primary) {
>>> +               printk("DEBUG: disable sysfb\n");
>>>                   sysfb_disable();
>>> +       }
>>>
>>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>>>                           continue;
>>>
>>> +               printk("DEBUG: remove 2\n");
>>>                   base = pci_resource_start(pdev, bar);
>>>                   size = pci_resource_len(pdev, bar);
>>>                   aperture_detach_devices(base, size);
>>>           }
>>>
>>> +       printk("DEBUG: remove 3\n");
>>>           /*
>>>            * If this is the primary adapter, there could be a VGA device
>>>            * that consumes the VGA framebuffer I/O range. Remove this
>>>
>>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>>
>> Copy-pasting this from the e-mail body didn't work well, but I applied
>> the changes manually to a 6.5.9 kernel without any of the other patches.
>> Here's the relevant dmesg output on the Lenovo L570:
>>
>> ...
>> [    2.953405] ACPI: bus type drm_connector registered
>> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
>> [    2.954018] DEBUG: remove 1
>> [    2.954020] DEBUG: remove 2
>> [    2.954021] DEBUG: remove 2
>> [    2.954023] DEBUG: remove 3
> 
> My tmp patch is as follows:
> 
> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> index 561be8feca96..cc2e39fb98f5 100644
> --- a/drivers/video/aperture.c
> +++ b/drivers/video/aperture.c
> @@ -350,21 +350,29 @@ int
> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> char *na
>          resource_size_t base, size;
>          int bar, ret = 0;
> 
> -       if (pdev == vga_default_device())
> +       printk("DEBUG: remove 1\n");
> +
> +       if (pdev == vga_default_device()) {
> +               printk("DEBUG: primary = true\n");
>                  primary = true;
> +       }
> 
> -       if (primary)
> +       if (primary) {
> +               printk("DEBUG: disable sysfb\n");
>                  sysfb_disable();
> +       }
> 
>          for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>                  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>                          continue;
> 
> +               printk("DEBUG: remove 2\n");
>                  base = pci_resource_start(pdev, bar);
>                  size = pci_resource_len(pdev, bar);
>                  aperture_detach_devices(base, size);
>          }
> 
> +       printk("DEBUG: remove 3\n");
>          /*
>           * If this is the primary adapter, there could be a VGA device
>           * that consumes the VGA framebuffer I/O range. Remove this
> 
>  From the Jaak's dmesg, it is obvious that "pdev ==
> vga_default_device()" is false, which causes sysfb_disable() to be not
> called. And I think the simple-drm device is not provided by the i915
> device in this case. So, can we unconditionally call sysfb_disable()
> here, which is the same as aperture_remove_conflicting_devices()?

Unfortunately, we cannot call sysfb_disable() unconditionally. 
Otherwise, we'd possibly remove simpledrm et al on the primary driver 
even pdev is not the primary device.

Both, sysfb and vgaarb, are initialized with subsys_initcall_sync() and 
the order of initialization is most likely wrong in the broken builds. 
Hence, reordering the linking mitigates the problem.

I've long been thinking about how the graphics init code could be 
streamlined into something more manageable. But it's a larger effort.

Best regards
Thomas

> 
> Huacai
> 
>> [    2.954029] resource: resource sanity check: requesting [mem
>> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
>> [mem 0xe0000000-0xe012bfff]
>> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
>> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
>> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
>> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
>> i915/kbl_dmc_ver1_04.bin (v1.4)
>> ...
>> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
>> minor 0
>> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
>> no  post: no)
>> [    4.147244] input: Video Bus as
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
>> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
>> [    4.147420] usbcore: registered new interface driver udl
>> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
>> simple-framebuffer.0 on minor 2
>> [    4.148643] Console: switching to colour frame buffer device 80x30
>> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
>> simpledrmdrmfb frame buffer device
>> [    4.154043] loop: module loaded
>> [    4.156017] ahci 0000:00:17.0: version 3.0
>> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
>> ...
>>
>> J
>>
Huacai Chen Jan. 24, 2024, 9:24 a.m. UTC | #18
Hi, Thomas,

On Wed, Jan 24, 2024 at 4:16 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 24.01.24 um 04:00 schrieb Huacai Chen:
> > Hi, Javier and Thomas,
> >
> > On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
> >>
> >> Hi,
> >>
> >> I apologize for not finding the time to test this earlier.
> >>
> >> On 11.12.23 05:08, Huacai Chen wrote:
> >>> And Jaak, could you please test with the below patch (but keep the
> >>> original order in Makefile) and then give me the dmesg output?
> >>>
> >>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> >>> index 561be8feca96..cc2e39fb98f5 100644
> >>> --- a/drivers/video/aperture.c
> >>> +++ b/drivers/video/aperture.c
> >>> @@ -350,21 +350,29 @@ int
> >>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> >>> char *na
> >>>           resource_size_t base, size;
> >>>           int bar, ret = 0;
> >>>
> >>> -       if (pdev == vga_default_device())
> >>> +       printk("DEBUG: remove 1\n");
> >>> +
> >>> +       if (pdev == vga_default_device()) {
> >>> +               printk("DEBUG: primary = true\n");
> >>>                   primary = true;
> >>> +       }
> >>>
> >>> -       if (primary)
> >>> +       if (primary) {
> >>> +               printk("DEBUG: disable sysfb\n");
> >>>                   sysfb_disable();
> >>> +       }
> >>>
> >>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >>>                           continue;
> >>>
> >>> +               printk("DEBUG: remove 2\n");
> >>>                   base = pci_resource_start(pdev, bar);
> >>>                   size = pci_resource_len(pdev, bar);
> >>>                   aperture_detach_devices(base, size);
> >>>           }
> >>>
> >>> +       printk("DEBUG: remove 3\n");
> >>>           /*
> >>>            * If this is the primary adapter, there could be a VGA device
> >>>            * that consumes the VGA framebuffer I/O range. Remove this
> >>>
> >>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
> >>
> >> Copy-pasting this from the e-mail body didn't work well, but I applied
> >> the changes manually to a 6.5.9 kernel without any of the other patches.
> >> Here's the relevant dmesg output on the Lenovo L570:
> >>
> >> ...
> >> [    2.953405] ACPI: bus type drm_connector registered
> >> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
> >> [    2.954018] DEBUG: remove 1
> >> [    2.954020] DEBUG: remove 2
> >> [    2.954021] DEBUG: remove 2
> >> [    2.954023] DEBUG: remove 3
> >
> > My tmp patch is as follows:
> >
> > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> > index 561be8feca96..cc2e39fb98f5 100644
> > --- a/drivers/video/aperture.c
> > +++ b/drivers/video/aperture.c
> > @@ -350,21 +350,29 @@ int
> > aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> > char *na
> >          resource_size_t base, size;
> >          int bar, ret = 0;
> >
> > -       if (pdev == vga_default_device())
> > +       printk("DEBUG: remove 1\n");
> > +
> > +       if (pdev == vga_default_device()) {
> > +               printk("DEBUG: primary = true\n");
> >                  primary = true;
> > +       }
> >
> > -       if (primary)
> > +       if (primary) {
> > +               printk("DEBUG: disable sysfb\n");
> >                  sysfb_disable();
> > +       }
> >
> >          for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >                  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >                          continue;
> >
> > +               printk("DEBUG: remove 2\n");
> >                  base = pci_resource_start(pdev, bar);
> >                  size = pci_resource_len(pdev, bar);
> >                  aperture_detach_devices(base, size);
> >          }
> >
> > +       printk("DEBUG: remove 3\n");
> >          /*
> >           * If this is the primary adapter, there could be a VGA device
> >           * that consumes the VGA framebuffer I/O range. Remove this
> >
> >  From the Jaak's dmesg, it is obvious that "pdev ==
> > vga_default_device()" is false, which causes sysfb_disable() to be not
> > called. And I think the simple-drm device is not provided by the i915
> > device in this case. So, can we unconditionally call sysfb_disable()
> > here, which is the same as aperture_remove_conflicting_devices()?
>
> Unfortunately, we cannot call sysfb_disable() unconditionally.
> Otherwise, we'd possibly remove simpledrm et al on the primary driver
> even pdev is not the primary device.
>
> Both, sysfb and vgaarb, are initialized with subsys_initcall_sync() and
> the order of initialization is most likely wrong in the broken builds.
> Hence, reordering the linking mitigates the problem.
OK, sysfb_init() should be after vgaarb, so I think we  can move
sysfb_init to be fs_initcall(). Though sysfb has nothing to do with
"file system", I found that there are already a lot of init functions
using fs_initcall().

Hi, Jaak, could you please make sysfb_initcall from
subsys_initcall_sync to be fs_initcall and see if your problem can be
fixed? Thank you very much.

Huacai

>
> I've long been thinking about how the graphics init code could be
> streamlined into something more manageable. But it's a larger effort.
>
> Best regards
> Thomas
>
> >
> > Huacai
> >
> >> [    2.954029] resource: resource sanity check: requesting [mem
> >> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
> >> [mem 0xe0000000-0xe012bfff]
> >> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
> >> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
> >> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
> >> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> >> i915/kbl_dmc_ver1_04.bin (v1.4)
> >> ...
> >> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
> >> minor 0
> >> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
> >> no  post: no)
> >> [    4.147244] input: Video Bus as
> >> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> >> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
> >> [    4.147420] usbcore: registered new interface driver udl
> >> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
> >> simple-framebuffer.0 on minor 2
> >> [    4.148643] Console: switching to colour frame buffer device 80x30
> >> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
> >> simpledrmdrmfb frame buffer device
> >> [    4.154043] loop: module loaded
> >> [    4.156017] ahci 0000:00:17.0: version 3.0
> >> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
> >> ...
> >>
> >> J
> >>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
Thomas Zimmermann Jan. 24, 2024, 9:44 a.m. UTC | #19
Hi

Am 24.01.24 um 10:24 schrieb Huacai Chen:
> Hi, Thomas,
> 
> On Wed, Jan 24, 2024 at 4:16 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> Hi
>>
>> Am 24.01.24 um 04:00 schrieb Huacai Chen:
>>> Hi, Javier and Thomas,
>>>
>>> On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I apologize for not finding the time to test this earlier.
>>>>
>>>> On 11.12.23 05:08, Huacai Chen wrote:
>>>>> And Jaak, could you please test with the below patch (but keep the
>>>>> original order in Makefile) and then give me the dmesg output?
>>>>>
>>>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>>>>> index 561be8feca96..cc2e39fb98f5 100644
>>>>> --- a/drivers/video/aperture.c
>>>>> +++ b/drivers/video/aperture.c
>>>>> @@ -350,21 +350,29 @@ int
>>>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
>>>>> char *na
>>>>>            resource_size_t base, size;
>>>>>            int bar, ret = 0;
>>>>>
>>>>> -       if (pdev == vga_default_device())
>>>>> +       printk("DEBUG: remove 1\n");
>>>>> +
>>>>> +       if (pdev == vga_default_device()) {
>>>>> +               printk("DEBUG: primary = true\n");
>>>>>                    primary = true;
>>>>> +       }
>>>>>
>>>>> -       if (primary)
>>>>> +       if (primary) {
>>>>> +               printk("DEBUG: disable sysfb\n");
>>>>>                    sysfb_disable();
>>>>> +       }
>>>>>
>>>>>            for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>>>>                    if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>>>>>                            continue;
>>>>>
>>>>> +               printk("DEBUG: remove 2\n");
>>>>>                    base = pci_resource_start(pdev, bar);
>>>>>                    size = pci_resource_len(pdev, bar);
>>>>>                    aperture_detach_devices(base, size);
>>>>>            }
>>>>>
>>>>> +       printk("DEBUG: remove 3\n");
>>>>>            /*
>>>>>             * If this is the primary adapter, there could be a VGA device
>>>>>             * that consumes the VGA framebuffer I/O range. Remove this
>>>>>
>>>>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>>>>
>>>> Copy-pasting this from the e-mail body didn't work well, but I applied
>>>> the changes manually to a 6.5.9 kernel without any of the other patches.
>>>> Here's the relevant dmesg output on the Lenovo L570:
>>>>
>>>> ...
>>>> [    2.953405] ACPI: bus type drm_connector registered
>>>> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
>>>> [    2.954018] DEBUG: remove 1
>>>> [    2.954020] DEBUG: remove 2
>>>> [    2.954021] DEBUG: remove 2
>>>> [    2.954023] DEBUG: remove 3
>>>
>>> My tmp patch is as follows:
>>>
>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>>> index 561be8feca96..cc2e39fb98f5 100644
>>> --- a/drivers/video/aperture.c
>>> +++ b/drivers/video/aperture.c
>>> @@ -350,21 +350,29 @@ int
>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
>>> char *na
>>>           resource_size_t base, size;
>>>           int bar, ret = 0;
>>>
>>> -       if (pdev == vga_default_device())
>>> +       printk("DEBUG: remove 1\n");
>>> +
>>> +       if (pdev == vga_default_device()) {
>>> +               printk("DEBUG: primary = true\n");
>>>                   primary = true;
>>> +       }
>>>
>>> -       if (primary)
>>> +       if (primary) {
>>> +               printk("DEBUG: disable sysfb\n");
>>>                   sysfb_disable();
>>> +       }
>>>
>>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>>>                           continue;
>>>
>>> +               printk("DEBUG: remove 2\n");
>>>                   base = pci_resource_start(pdev, bar);
>>>                   size = pci_resource_len(pdev, bar);
>>>                   aperture_detach_devices(base, size);
>>>           }
>>>
>>> +       printk("DEBUG: remove 3\n");
>>>           /*
>>>            * If this is the primary adapter, there could be a VGA device
>>>            * that consumes the VGA framebuffer I/O range. Remove this
>>>
>>>   From the Jaak's dmesg, it is obvious that "pdev ==
>>> vga_default_device()" is false, which causes sysfb_disable() to be not
>>> called. And I think the simple-drm device is not provided by the i915
>>> device in this case. So, can we unconditionally call sysfb_disable()
>>> here, which is the same as aperture_remove_conflicting_devices()?
>>
>> Unfortunately, we cannot call sysfb_disable() unconditionally.
>> Otherwise, we'd possibly remove simpledrm et al on the primary driver
>> even pdev is not the primary device.
>>
>> Both, sysfb and vgaarb, are initialized with subsys_initcall_sync() and
>> the order of initialization is most likely wrong in the broken builds.
>> Hence, reordering the linking mitigates the problem.
> OK, sysfb_init() should be after vgaarb, so I think we  can move
> sysfb_init to be fs_initcall(). Though sysfb has nothing to do with
> "file system", I found that there are already a lot of init functions
> using fs_initcall().
> 
> Hi, Jaak, could you please make sysfb_initcall from
> subsys_initcall_sync to be fs_initcall and see if your problem can be
> fixed? Thank you very much.

Thanks for helping. I already supplied a patch to fix sysfb. No further 
action is required.

Best regards
Thomas

> 
> Huacai
> 
>>
>> I've long been thinking about how the graphics init code could be
>> streamlined into something more manageable. But it's a larger effort.
>>
>> Best regards
>> Thomas
>>
>>>
>>> Huacai
>>>
>>>> [    2.954029] resource: resource sanity check: requesting [mem
>>>> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
>>>> [mem 0xe0000000-0xe012bfff]
>>>> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
>>>> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
>>>> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
>>>> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
>>>> i915/kbl_dmc_ver1_04.bin (v1.4)
>>>> ...
>>>> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
>>>> minor 0
>>>> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
>>>> no  post: no)
>>>> [    4.147244] input: Video Bus as
>>>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
>>>> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
>>>> [    4.147420] usbcore: registered new interface driver udl
>>>> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
>>>> simple-framebuffer.0 on minor 2
>>>> [    4.148643] Console: switching to colour frame buffer device 80x30
>>>> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
>>>> simpledrmdrmfb frame buffer device
>>>> [    4.154043] loop: module loaded
>>>> [    4.156017] ahci 0000:00:17.0: version 3.0
>>>> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
>>>> ...
>>>>
>>>> J
>>>>
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Frankenstrasse 146, 90461 Nuernberg, Germany
>> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
>> HRB 36809 (AG Nuernberg)
Huacai Chen Jan. 24, 2024, 2:56 p.m. UTC | #20
Hi, Thomas,

On Wed, Jan 24, 2024 at 5:44 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 24.01.24 um 10:24 schrieb Huacai Chen:
> > Hi, Thomas,
> >
> > On Wed, Jan 24, 2024 at 4:16 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>
> >> Hi
> >>
> >> Am 24.01.24 um 04:00 schrieb Huacai Chen:
> >>> Hi, Javier and Thomas,
> >>>
> >>> On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I apologize for not finding the time to test this earlier.
> >>>>
> >>>> On 11.12.23 05:08, Huacai Chen wrote:
> >>>>> And Jaak, could you please test with the below patch (but keep the
> >>>>> original order in Makefile) and then give me the dmesg output?
> >>>>>
> >>>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> >>>>> index 561be8feca96..cc2e39fb98f5 100644
> >>>>> --- a/drivers/video/aperture.c
> >>>>> +++ b/drivers/video/aperture.c
> >>>>> @@ -350,21 +350,29 @@ int
> >>>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> >>>>> char *na
> >>>>>            resource_size_t base, size;
> >>>>>            int bar, ret = 0;
> >>>>>
> >>>>> -       if (pdev == vga_default_device())
> >>>>> +       printk("DEBUG: remove 1\n");
> >>>>> +
> >>>>> +       if (pdev == vga_default_device()) {
> >>>>> +               printk("DEBUG: primary = true\n");
> >>>>>                    primary = true;
> >>>>> +       }
> >>>>>
> >>>>> -       if (primary)
> >>>>> +       if (primary) {
> >>>>> +               printk("DEBUG: disable sysfb\n");
> >>>>>                    sysfb_disable();
> >>>>> +       }
> >>>>>
> >>>>>            for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >>>>>                    if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >>>>>                            continue;
> >>>>>
> >>>>> +               printk("DEBUG: remove 2\n");
> >>>>>                    base = pci_resource_start(pdev, bar);
> >>>>>                    size = pci_resource_len(pdev, bar);
> >>>>>                    aperture_detach_devices(base, size);
> >>>>>            }
> >>>>>
> >>>>> +       printk("DEBUG: remove 3\n");
> >>>>>            /*
> >>>>>             * If this is the primary adapter, there could be a VGA device
> >>>>>             * that consumes the VGA framebuffer I/O range. Remove this
> >>>>>
> >>>>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
> >>>>
> >>>> Copy-pasting this from the e-mail body didn't work well, but I applied
> >>>> the changes manually to a 6.5.9 kernel without any of the other patches.
> >>>> Here's the relevant dmesg output on the Lenovo L570:
> >>>>
> >>>> ...
> >>>> [    2.953405] ACPI: bus type drm_connector registered
> >>>> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
> >>>> [    2.954018] DEBUG: remove 1
> >>>> [    2.954020] DEBUG: remove 2
> >>>> [    2.954021] DEBUG: remove 2
> >>>> [    2.954023] DEBUG: remove 3
> >>>
> >>> My tmp patch is as follows:
> >>>
> >>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> >>> index 561be8feca96..cc2e39fb98f5 100644
> >>> --- a/drivers/video/aperture.c
> >>> +++ b/drivers/video/aperture.c
> >>> @@ -350,21 +350,29 @@ int
> >>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> >>> char *na
> >>>           resource_size_t base, size;
> >>>           int bar, ret = 0;
> >>>
> >>> -       if (pdev == vga_default_device())
> >>> +       printk("DEBUG: remove 1\n");
> >>> +
> >>> +       if (pdev == vga_default_device()) {
> >>> +               printk("DEBUG: primary = true\n");
> >>>                   primary = true;
> >>> +       }
> >>>
> >>> -       if (primary)
> >>> +       if (primary) {
> >>> +               printk("DEBUG: disable sysfb\n");
> >>>                   sysfb_disable();
> >>> +       }
> >>>
> >>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >>>                           continue;
> >>>
> >>> +               printk("DEBUG: remove 2\n");
> >>>                   base = pci_resource_start(pdev, bar);
> >>>                   size = pci_resource_len(pdev, bar);
> >>>                   aperture_detach_devices(base, size);
> >>>           }
> >>>
> >>> +       printk("DEBUG: remove 3\n");
> >>>           /*
> >>>            * If this is the primary adapter, there could be a VGA device
> >>>            * that consumes the VGA framebuffer I/O range. Remove this
> >>>
> >>>   From the Jaak's dmesg, it is obvious that "pdev ==
> >>> vga_default_device()" is false, which causes sysfb_disable() to be not
> >>> called. And I think the simple-drm device is not provided by the i915
> >>> device in this case. So, can we unconditionally call sysfb_disable()
> >>> here, which is the same as aperture_remove_conflicting_devices()?
> >>
> >> Unfortunately, we cannot call sysfb_disable() unconditionally.
> >> Otherwise, we'd possibly remove simpledrm et al on the primary driver
> >> even pdev is not the primary device.
> >>
> >> Both, sysfb and vgaarb, are initialized with subsys_initcall_sync() and
> >> the order of initialization is most likely wrong in the broken builds.
> >> Hence, reordering the linking mitigates the problem.
> > OK, sysfb_init() should be after vgaarb, so I think we  can move
> > sysfb_init to be fs_initcall(). Though sysfb has nothing to do with
> > "file system", I found that there are already a lot of init functions
> > using fs_initcall().
> >
> > Hi, Jaak, could you please make sysfb_initcall from
> > subsys_initcall_sync to be fs_initcall and see if your problem can be
> > fixed? Thank you very much.
>
> Thanks for helping. I already supplied a patch to fix sysfb. No further
> action is required.
Do you mean reverting "drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync"? I still want to keep that,
since we can probably solve both the original problem and Jaak's
problem now.

And if you mean others, please ignore what I said. :)

Huacai

>
> Best regards
> Thomas
>
> >
> > Huacai
> >
> >>
> >> I've long been thinking about how the graphics init code could be
> >> streamlined into something more manageable. But it's a larger effort.
> >>
> >> Best regards
> >> Thomas
> >>
> >>>
> >>> Huacai
> >>>
> >>>> [    2.954029] resource: resource sanity check: requesting [mem
> >>>> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
> >>>> [mem 0xe0000000-0xe012bfff]
> >>>> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
> >>>> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
> >>>> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
> >>>> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> >>>> i915/kbl_dmc_ver1_04.bin (v1.4)
> >>>> ...
> >>>> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
> >>>> minor 0
> >>>> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
> >>>> no  post: no)
> >>>> [    4.147244] input: Video Bus as
> >>>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> >>>> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
> >>>> [    4.147420] usbcore: registered new interface driver udl
> >>>> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
> >>>> simple-framebuffer.0 on minor 2
> >>>> [    4.148643] Console: switching to colour frame buffer device 80x30
> >>>> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
> >>>> simpledrmdrmfb frame buffer device
> >>>> [    4.154043] loop: module loaded
> >>>> [    4.156017] ahci 0000:00:17.0: version 3.0
> >>>> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
> >>>> ...
> >>>>
> >>>> J
> >>>>
> >>
> >> --
> >> Thomas Zimmermann
> >> Graphics Driver Developer
> >> SUSE Software Solutions Germany GmbH
> >> Frankenstrasse 146, 90461 Nuernberg, Germany
> >> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> >> HRB 36809 (AG Nuernberg)
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
Huacai Chen March 18, 2024, 2:43 p.m. UTC | #21
Hi, Jaak,

I'm still here as a boring man. :)
Have you ever tested whether using "fs_initcall(sysfb_init)" works
fine on your machine?

Huacai

On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>
> Hi,
>
> I apologize for not finding the time to test this earlier.
>
> On 11.12.23 05:08, Huacai Chen wrote:
> > And Jaak, could you please test with the below patch (but keep the
> > original order in Makefile) and then give me the dmesg output?
> >
> > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> > index 561be8feca96..cc2e39fb98f5 100644
> > --- a/drivers/video/aperture.c
> > +++ b/drivers/video/aperture.c
> > @@ -350,21 +350,29 @@ int
> > aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> > char *na
> >          resource_size_t base, size;
> >          int bar, ret = 0;
> >
> > -       if (pdev == vga_default_device())
> > +       printk("DEBUG: remove 1\n");
> > +
> > +       if (pdev == vga_default_device()) {
> > +               printk("DEBUG: primary = true\n");
> >                  primary = true;
> > +       }
> >
> > -       if (primary)
> > +       if (primary) {
> > +               printk("DEBUG: disable sysfb\n");
> >                  sysfb_disable();
> > +       }
> >
> >          for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >                  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >                          continue;
> >
> > +               printk("DEBUG: remove 2\n");
> >                  base = pci_resource_start(pdev, bar);
> >                  size = pci_resource_len(pdev, bar);
> >                  aperture_detach_devices(base, size);
> >          }
> >
> > +       printk("DEBUG: remove 3\n");
> >          /*
> >           * If this is the primary adapter, there could be a VGA device
> >           * that consumes the VGA framebuffer I/O range. Remove this
> >
> > [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>
> Copy-pasting this from the e-mail body didn't work well, but I applied
> the changes manually to a 6.5.9 kernel without any of the other patches.
> Here's the relevant dmesg output on the Lenovo L570:
>
> ...
> [    2.953405] ACPI: bus type drm_connector registered
> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
> [    2.954018] DEBUG: remove 1
> [    2.954020] DEBUG: remove 2
> [    2.954021] DEBUG: remove 2
> [    2.954023] DEBUG: remove 3
> [    2.954029] resource: resource sanity check: requesting [mem
> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
> [mem 0xe0000000-0xe012bfff]
> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> i915/kbl_dmc_ver1_04.bin (v1.4)
> ...
> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
> minor 0
> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
> no  post: no)
> [    4.147244] input: Video Bus as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
> [    4.147420] usbcore: registered new interface driver udl
> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
> simple-framebuffer.0 on minor 2
> [    4.148643] Console: switching to colour frame buffer device 80x30
> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
> simpledrmdrmfb frame buffer device
> [    4.154043] loop: module loaded
> [    4.156017] ahci 0000:00:17.0: version 3.0
> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
> ...
>
> J
>
Jaak Ristioja March 18, 2024, 3:42 p.m. UTC | #22
Hi Huacai,

Uh, no, sorry, I did not get to test such changes. From what Thomas 
wrote I presumed that this got fixed and no further action would be 
required.

To speed things up I would appreciate it if you provided a patch. As I 
worked around the problem for the end user by changing the kernel 
configuration, I have long forgotten the exact details. It would 
otherwise probably take me a while to understand what and where you 
propose to change exactly.

Also, do you want me to test it on some newer kernel or should I 
re-download the 6.5.# version sources?


Best regards,
Jaak

On 18.03.24 16:43, Huacai Chen wrote:
> Hi, Jaak,
> 
> I'm still here as a boring man. :)
> Have you ever tested whether using "fs_initcall(sysfb_init)" works
> fine on your machine?
> 
> Huacai
> 
> On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>
>> Hi,
>>
>> I apologize for not finding the time to test this earlier.
>>
>> On 11.12.23 05:08, Huacai Chen wrote:
>>> And Jaak, could you please test with the below patch (but keep the
>>> original order in Makefile) and then give me the dmesg output?
>>>
>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>>> index 561be8feca96..cc2e39fb98f5 100644
>>> --- a/drivers/video/aperture.c
>>> +++ b/drivers/video/aperture.c
>>> @@ -350,21 +350,29 @@ int
>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
>>> char *na
>>>           resource_size_t base, size;
>>>           int bar, ret = 0;
>>>
>>> -       if (pdev == vga_default_device())
>>> +       printk("DEBUG: remove 1\n");
>>> +
>>> +       if (pdev == vga_default_device()) {
>>> +               printk("DEBUG: primary = true\n");
>>>                   primary = true;
>>> +       }
>>>
>>> -       if (primary)
>>> +       if (primary) {
>>> +               printk("DEBUG: disable sysfb\n");
>>>                   sysfb_disable();
>>> +       }
>>>
>>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>>>                           continue;
>>>
>>> +               printk("DEBUG: remove 2\n");
>>>                   base = pci_resource_start(pdev, bar);
>>>                   size = pci_resource_len(pdev, bar);
>>>                   aperture_detach_devices(base, size);
>>>           }
>>>
>>> +       printk("DEBUG: remove 3\n");
>>>           /*
>>>            * If this is the primary adapter, there could be a VGA device
>>>            * that consumes the VGA framebuffer I/O range. Remove this
>>>
>>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>>
>> Copy-pasting this from the e-mail body didn't work well, but I applied
>> the changes manually to a 6.5.9 kernel without any of the other patches.
>> Here's the relevant dmesg output on the Lenovo L570:
>>
>> ...
>> [    2.953405] ACPI: bus type drm_connector registered
>> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
>> [    2.954018] DEBUG: remove 1
>> [    2.954020] DEBUG: remove 2
>> [    2.954021] DEBUG: remove 2
>> [    2.954023] DEBUG: remove 3
>> [    2.954029] resource: resource sanity check: requesting [mem
>> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
>> [mem 0xe0000000-0xe012bfff]
>> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
>> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
>> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
>> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
>> i915/kbl_dmc_ver1_04.bin (v1.4)
>> ...
>> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
>> minor 0
>> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
>> no  post: no)
>> [    4.147244] input: Video Bus as
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
>> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
>> [    4.147420] usbcore: registered new interface driver udl
>> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
>> simple-framebuffer.0 on minor 2
>> [    4.148643] Console: switching to colour frame buffer device 80x30
>> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
>> simpledrmdrmfb frame buffer device
>> [    4.154043] loop: module loaded
>> [    4.156017] ahci 0000:00:17.0: version 3.0
>> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
>> ...
>>
>> J
>>
Thomas Zimmermann March 18, 2024, 4:17 p.m. UTC | #23
Hi

Am 18.03.24 um 16:42 schrieb Jaak Ristioja:
> Hi Huacai,
>
> Uh, no, sorry, I did not get to test such changes. From what Thomas 
> wrote I presumed that this got fixed and no further action would be 
> required.

Right. We reverted the problematic patch and the issue should be gone now.

Best regards
Thomas

>
> To speed things up I would appreciate it if you provided a patch. As I 
> worked around the problem for the end user by changing the kernel 
> configuration, I have long forgotten the exact details. It would 
> otherwise probably take me a while to understand what and where you 
> propose to change exactly.
>
> Also, do you want me to test it on some newer kernel or should I 
> re-download the 6.5.# version sources?
>
>
> Best regards,
> Jaak
>
> On 18.03.24 16:43, Huacai Chen wrote:
>> Hi, Jaak,
>>
>> I'm still here as a boring man. :)
>> Have you ever tested whether using "fs_initcall(sysfb_init)" works
>> fine on your machine?
>>
>> Huacai
>>
>> On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>>
>>> Hi,
>>>
>>> I apologize for not finding the time to test this earlier.
>>>
>>> On 11.12.23 05:08, Huacai Chen wrote:
>>>> And Jaak, could you please test with the below patch (but keep the
>>>> original order in Makefile) and then give me the dmesg output?
>>>>
>>>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>>>> index 561be8feca96..cc2e39fb98f5 100644
>>>> --- a/drivers/video/aperture.c
>>>> +++ b/drivers/video/aperture.c
>>>> @@ -350,21 +350,29 @@ int
>>>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
>>>> char *na
>>>>           resource_size_t base, size;
>>>>           int bar, ret = 0;
>>>>
>>>> -       if (pdev == vga_default_device())
>>>> +       printk("DEBUG: remove 1\n");
>>>> +
>>>> +       if (pdev == vga_default_device()) {
>>>> +               printk("DEBUG: primary = true\n");
>>>>                   primary = true;
>>>> +       }
>>>>
>>>> -       if (primary)
>>>> +       if (primary) {
>>>> +               printk("DEBUG: disable sysfb\n");
>>>>                   sysfb_disable();
>>>> +       }
>>>>
>>>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>>>                   if (!(pci_resource_flags(pdev, bar) & 
>>>> IORESOURCE_MEM))
>>>>                           continue;
>>>>
>>>> +               printk("DEBUG: remove 2\n");
>>>>                   base = pci_resource_start(pdev, bar);
>>>>                   size = pci_resource_len(pdev, bar);
>>>>                   aperture_detach_devices(base, size);
>>>>           }
>>>>
>>>> +       printk("DEBUG: remove 3\n");
>>>>           /*
>>>>            * If this is the primary adapter, there could be a VGA 
>>>> device
>>>>            * that consumes the VGA framebuffer I/O range. Remove this
>>>>
>>>> [1] 
>>>> https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
>>>
>>> Copy-pasting this from the e-mail body didn't work well, but I applied
>>> the changes manually to a 6.5.9 kernel without any of the other 
>>> patches.
>>> Here's the relevant dmesg output on the Lenovo L570:
>>>
>>> ...
>>> [    2.953405] ACPI: bus type drm_connector registered
>>> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
>>> [    2.954018] DEBUG: remove 1
>>> [    2.954020] DEBUG: remove 2
>>> [    2.954021] DEBUG: remove 2
>>> [    2.954023] DEBUG: remove 3
>>> [    2.954029] resource: resource sanity check: requesting [mem
>>> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
>>> [mem 0xe0000000-0xe012bfff]
>>> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple 
>>> BARs
>>> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
>>> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
>>> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
>>> i915/kbl_dmc_ver1_04.bin (v1.4)
>>> ...
>>> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 
>>> 0000:00:02.0 on
>>> minor 0
>>> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
>>> no  post: no)
>>> [    4.147244] input: Video Bus as
>>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
>>> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on 
>>> minor 1
>>> [    4.147420] usbcore: registered new interface driver udl
>>> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
>>> simple-framebuffer.0 on minor 2
>>> [    4.148643] Console: switching to colour frame buffer device 80x30
>>> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
>>> simpledrmdrmfb frame buffer device
>>> [    4.154043] loop: module loaded
>>> [    4.156017] ahci 0000:00:17.0: version 3.0
>>> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer 
>>> device
>>> ...
>>>
>>> J
>>>
>
Huacai Chen March 19, 2024, 2:16 p.m. UTC | #24
Hi, Jaak,

On Mon, Mar 18, 2024 at 11:42 PM Jaak Ristioja <jaak@ristioja.ee> wrote:
>
> Hi Huacai,
>
> Uh, no, sorry, I did not get to test such changes. From what Thomas
> wrote I presumed that this got fixed and no further action would be
> required.
>
> To speed things up I would appreciate it if you provided a patch. As I
> worked around the problem for the end user by changing the kernel
> configuration, I have long forgotten the exact details. It would
> otherwise probably take me a while to understand what and where you
> propose to change exactly.
>
> Also, do you want me to test it on some newer kernel or should I
> re-download the 6.5.# version sources?
Yes, it is better to use 6.5, you can simply change the last line of
drivers/firmware/sysfb.c to fs_initcall(sysfb_init), So no patch
needed.

And to Thomas,

I'm sorry that reverting 60aebc95594 solve Jaak's problem, but my
original problem exist (at least in 6.5), and I want to know the
result of the above experiment to understand what happens.

Huacai

>
>
> Best regards,
> Jaak
>
> On 18.03.24 16:43, Huacai Chen wrote:
> > Hi, Jaak,
> >
> > I'm still here as a boring man. :)
> > Have you ever tested whether using "fs_initcall(sysfb_init)" works
> > fine on your machine?
> >
> > Huacai
> >
> > On Wed, Jan 24, 2024 at 5:21 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
> >>
> >> Hi,
> >>
> >> I apologize for not finding the time to test this earlier.
> >>
> >> On 11.12.23 05:08, Huacai Chen wrote:
> >>> And Jaak, could you please test with the below patch (but keep the
> >>> original order in Makefile) and then give me the dmesg output?
> >>>
> >>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> >>> index 561be8feca96..cc2e39fb98f5 100644
> >>> --- a/drivers/video/aperture.c
> >>> +++ b/drivers/video/aperture.c
> >>> @@ -350,21 +350,29 @@ int
> >>> aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const
> >>> char *na
> >>>           resource_size_t base, size;
> >>>           int bar, ret = 0;
> >>>
> >>> -       if (pdev == vga_default_device())
> >>> +       printk("DEBUG: remove 1\n");
> >>> +
> >>> +       if (pdev == vga_default_device()) {
> >>> +               printk("DEBUG: primary = true\n");
> >>>                   primary = true;
> >>> +       }
> >>>
> >>> -       if (primary)
> >>> +       if (primary) {
> >>> +               printk("DEBUG: disable sysfb\n");
> >>>                   sysfb_disable();
> >>> +       }
> >>>
> >>>           for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >>>                   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >>>                           continue;
> >>>
> >>> +               printk("DEBUG: remove 2\n");
> >>>                   base = pci_resource_start(pdev, bar);
> >>>                   size = pci_resource_len(pdev, bar);
> >>>                   aperture_detach_devices(base, size);
> >>>           }
> >>>
> >>> +       printk("DEBUG: remove 3\n");
> >>>           /*
> >>>            * If this is the primary adapter, there could be a VGA device
> >>>            * that consumes the VGA framebuffer I/O range. Remove this
> >>>
> >>> [1]  https://lore.kernel.org/lkml/170222766284.86103.11020060769330721008@leemhuis.info/T/#u
> >>
> >> Copy-pasting this from the e-mail body didn't work well, but I applied
> >> the changes manually to a 6.5.9 kernel without any of the other patches.
> >> Here's the relevant dmesg output on the Lenovo L570:
> >>
> >> ...
> >> [    2.953405] ACPI: bus type drm_connector registered
> >> [    2.954014] i915 0000:00:02.0: [drm] VT-d active for gfx access
> >> [    2.954018] DEBUG: remove 1
> >> [    2.954020] DEBUG: remove 2
> >> [    2.954021] DEBUG: remove 2
> >> [    2.954023] DEBUG: remove 3
> >> [    2.954029] resource: resource sanity check: requesting [mem
> >> 0x00000000e0000000-0x00000000efffffff], which spans more than BOOTFB
> >> [mem 0xe0000000-0xe012bfff]
> >> [    2.954035] caller i915_ggtt_init_hw+0x88/0x120 mapping multiple BARs
> >> [    2.954061] i915 0000:00:02.0: [drm] Using Transparent Hugepages
> >> [    2.955103] Loading firmware: i915/kbl_dmc_ver1_04.bin
> >> [    2.955384] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> >> i915/kbl_dmc_ver1_04.bin (v1.4)
> >> ...
> >> [    4.145013] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on
> >> minor 0
> >> [    4.147101] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
> >> no  post: no)
> >> [    4.147244] input: Video Bus as
> >> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> >> [    4.147410] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
> >> [    4.147420] usbcore: registered new interface driver udl
> >> [    4.147500] [drm] Initialized simpledrm 1.0.0 20200625 for
> >> simple-framebuffer.0 on minor 2
> >> [    4.148643] Console: switching to colour frame buffer device 80x30
> >> [    4.153216] simple-framebuffer simple-framebuffer.0: [drm] fb0:
> >> simpledrmdrmfb frame buffer device
> >> [    4.154043] loop: module loaded
> >> [    4.156017] ahci 0000:00:17.0: version 3.0
> >> [    4.157373] i915 0000:00:02.0: [drm] fb1: i915drmfb frame buffer device
> >> ...
> >>
> >> J
> >>
>
Jaak Ristioja March 20, 2024, 8:55 p.m. UTC | #25
Hi Huacai,

On 19.03.24 16:16, Huacai Chen wrote:
> Hi, Jaak,
> 
> On Mon, Mar 18, 2024 at 11:42 PM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>
>> Hi Huacai,
>>
>> Uh, no, sorry, I did not get to test such changes. From what Thomas
>> wrote I presumed that this got fixed and no further action would be
>> required.
>>
>> To speed things up I would appreciate it if you provided a patch. As I
>> worked around the problem for the end user by changing the kernel
>> configuration, I have long forgotten the exact details. It would
>> otherwise probably take me a while to understand what and where you
>> propose to change exactly.
>>
>> Also, do you want me to test it on some newer kernel or should I
>> re-download the 6.5.# version sources?
> Yes, it is better to use 6.5, you can simply change the last line of
> drivers/firmware/sysfb.c to fs_initcall(sysfb_init), So no patch
> needed.
> 
> And to Thomas,
> 
> I'm sorry that reverting 60aebc95594 solve Jaak's problem, but my
> original problem exist (at least in 6.5), and I want to know the
> result of the above experiment to understand what happens.

Using the sources for 6.5.9 and fs_initcall(sysfb_init) instead of 
subsys_initcall_sync(sysfb_init) in drivers/firmware/sysfb.c did not 
help. The screen still went black during the boot and stayed black until 
SDDM started.

Jaak
Huacai Chen March 22, 2024, 2:06 p.m. UTC | #26
On Thu, Mar 21, 2024 at 4:55 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>
> Hi Huacai,
>
> On 19.03.24 16:16, Huacai Chen wrote:
> > Hi, Jaak,
> >
> > On Mon, Mar 18, 2024 at 11:42 PM Jaak Ristioja <jaak@ristioja.ee> wrote:
> >>
> >> Hi Huacai,
> >>
> >> Uh, no, sorry, I did not get to test such changes. From what Thomas
> >> wrote I presumed that this got fixed and no further action would be
> >> required.
> >>
> >> To speed things up I would appreciate it if you provided a patch. As I
> >> worked around the problem for the end user by changing the kernel
> >> configuration, I have long forgotten the exact details. It would
> >> otherwise probably take me a while to understand what and where you
> >> propose to change exactly.
> >>
> >> Also, do you want me to test it on some newer kernel or should I
> >> re-download the 6.5.# version sources?
> > Yes, it is better to use 6.5, you can simply change the last line of
> > drivers/firmware/sysfb.c to fs_initcall(sysfb_init), So no patch
> > needed.
> >
> > And to Thomas,
> >
> > I'm sorry that reverting 60aebc95594 solve Jaak's problem, but my
> > original problem exist (at least in 6.5), and I want to know the
> > result of the above experiment to understand what happens.
>
> Using the sources for 6.5.9 and fs_initcall(sysfb_init) instead of
> subsys_initcall_sync(sysfb_init) in drivers/firmware/sysfb.c did not
> help. The screen still went black during the boot and stayed black until
> SDDM started.
OK, then can you try rootfs_initcall(sysfb_init)?

Huacai

>
> Jaak
Jaak Ristioja March 27, 2024, 10:51 a.m. UTC | #27
Hi,

On 22.03.24 16:06, Huacai Chen wrote:
> On Thu, Mar 21, 2024 at 4:55 AM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>
>> Hi Huacai,
>>
>> On 19.03.24 16:16, Huacai Chen wrote:
>>> Hi, Jaak,
>>>
>>> On Mon, Mar 18, 2024 at 11:42 PM Jaak Ristioja <jaak@ristioja.ee> wrote:
>>>>
>>>> Hi Huacai,
>>>>
>>>> Uh, no, sorry, I did not get to test such changes. From what Thomas
>>>> wrote I presumed that this got fixed and no further action would be
>>>> required.
>>>>
>>>> To speed things up I would appreciate it if you provided a patch. As I
>>>> worked around the problem for the end user by changing the kernel
>>>> configuration, I have long forgotten the exact details. It would
>>>> otherwise probably take me a while to understand what and where you
>>>> propose to change exactly.
>>>>
>>>> Also, do you want me to test it on some newer kernel or should I
>>>> re-download the 6.5.# version sources?
>>> Yes, it is better to use 6.5, you can simply change the last line of
>>> drivers/firmware/sysfb.c to fs_initcall(sysfb_init), So no patch
>>> needed.
>>>
>>> And to Thomas,
>>>
>>> I'm sorry that reverting 60aebc95594 solve Jaak's problem, but my
>>> original problem exist (at least in 6.5), and I want to know the
>>> result of the above experiment to understand what happens.
>>
>> Using the sources for 6.5.9 and fs_initcall(sysfb_init) instead of
>> subsys_initcall_sync(sysfb_init) in drivers/firmware/sysfb.c did not
>> help. The screen still went black during the boot and stayed black until
>> SDDM started.
> OK, then can you try rootfs_initcall(sysfb_init)?

Unfortunately, this did not help, I still get the black screen until 
SDDM starts.

Jaak
diff mbox series

Patch

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 8e1bde059170..db0f3d3aff43 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -141,6 +141,7 @@  obj-y			+= arm/
 obj-y			+= display/
 obj-$(CONFIG_DRM_TTM)	+= ttm/
 obj-$(CONFIG_DRM_SCHED)	+= scheduler/
+obj-y			+= tiny/
 obj-$(CONFIG_DRM_RADEON)+= radeon/
 obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
 obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
@@ -182,7 +183,6 @@  obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
 obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
 obj-y			+= hisilicon/
 obj-y			+= mxsfb/
-obj-y			+= tiny/
 obj-$(CONFIG_DRM_PL111) += pl111/
 obj-$(CONFIG_DRM_TVE200) += tve200/
 obj-$(CONFIG_DRM_XEN) += xen/