diff mbox series

[v2,18/19] Revert "fbdev: Prevent probing generic drivers if a FB is already registered"

Message ID 20220208210824.2238981-19-daniel.vetter@ffwll.ch (mailing list archive)
State Handled Elsewhere
Headers show
Series fbcon patches, take two | expand

Commit Message

Daniel Vetter Feb. 8, 2022, 9:08 p.m. UTC
This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.

With

commit 27599aacbaefcbf2af7b06b0029459bbf682000d
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Tue Jan 25 10:12:18 2022 +0100

    fbdev: Hot-unplug firmware fb devices on forced removal

this should be fixed properly and we can remove this somewhat hackish
check here (e.g. this won't catch drm drivers if fbdev emulation isn't
enabled).

Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ilya Trukhanov <lahvuun@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: linux-fbdev@vger.kernel.org
---
 drivers/video/fbdev/efifb.c    | 11 -----------
 drivers/video/fbdev/simplefb.c | 11 -----------
 2 files changed, 22 deletions(-)

Comments

Javier Martinez Canillas Feb. 9, 2022, 12:19 a.m. UTC | #1
On 2/8/22 22:08, Daniel Vetter wrote:
> This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.
> 
> With
> 
> commit 27599aacbaefcbf2af7b06b0029459bbf682000d
> Author: Thomas Zimmermann <tzimmermann@suse.de>
> Date:   Tue Jan 25 10:12:18 2022 +0100
> 
>     fbdev: Hot-unplug firmware fb devices on forced removal
> 
> this should be fixed properly and we can remove this somewhat hackish
> check here (e.g. this won't catch drm drivers if fbdev emulation isn't
> enabled).
>

Unfortunately this hack can't be reverted yet. Thomas' patch solves the issue
of platform devices matched with fbdev drivers to be properly unregistered if
a DRM driver attempts to remove all the conflicting framebuffers.

But the problem that fb561bf9abde ("fbdev: Prevent probing generic drivers if
a FB is already registered") worked around is different. It happens when the
DRM driver is probed before the {efi,simple}fb and other fbdev drivers, the
kicking out of conflicting framebuffers already happened and these drivers
will be allowed to probe even when a DRM driver is already present.

We need a clearer way to prevent it, but can't revert fb561bf9abde until that.

Best regards,
Daniel Vetter April 5, 2022, 8:36 a.m. UTC | #2
On Wed, Feb 09, 2022 at 01:19:26AM +0100, Javier Martinez Canillas wrote:
> On 2/8/22 22:08, Daniel Vetter wrote:
> > This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.
> > 
> > With
> > 
> > commit 27599aacbaefcbf2af7b06b0029459bbf682000d
> > Author: Thomas Zimmermann <tzimmermann@suse.de>
> > Date:   Tue Jan 25 10:12:18 2022 +0100
> > 
> >     fbdev: Hot-unplug firmware fb devices on forced removal
> > 
> > this should be fixed properly and we can remove this somewhat hackish
> > check here (e.g. this won't catch drm drivers if fbdev emulation isn't
> > enabled).
> >
> 
> Unfortunately this hack can't be reverted yet. Thomas' patch solves the issue
> of platform devices matched with fbdev drivers to be properly unregistered if
> a DRM driver attempts to remove all the conflicting framebuffers.
> 
> But the problem that fb561bf9abde ("fbdev: Prevent probing generic drivers if
> a FB is already registered") worked around is different. It happens when the
> DRM driver is probed before the {efi,simple}fb and other fbdev drivers, the
> kicking out of conflicting framebuffers already happened and these drivers
> will be allowed to probe even when a DRM driver is already present.
> 
> We need a clearer way to prevent it, but can't revert fb561bf9abde until that.

Yeah that entire area is a mess still, ideally we'd have something else
creating the platform devices, and efifb/offb and all these would just
bind against them.

Hm one idea that just crossed my mind: Could we have a flag in fb_info for
fw drivers, and check this in framebuffer_register? Then at least all the
logic would be in the fbdev core.
-Daniel
Daniel Vetter April 5, 2022, 8:40 a.m. UTC | #3
On Tue, Apr 05, 2022 at 10:36:35AM +0200, Daniel Vetter wrote:
> On Wed, Feb 09, 2022 at 01:19:26AM +0100, Javier Martinez Canillas wrote:
> > On 2/8/22 22:08, Daniel Vetter wrote:
> > > This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.
> > > 
> > > With
> > > 
> > > commit 27599aacbaefcbf2af7b06b0029459bbf682000d
> > > Author: Thomas Zimmermann <tzimmermann@suse.de>
> > > Date:   Tue Jan 25 10:12:18 2022 +0100
> > > 
> > >     fbdev: Hot-unplug firmware fb devices on forced removal
> > > 
> > > this should be fixed properly and we can remove this somewhat hackish
> > > check here (e.g. this won't catch drm drivers if fbdev emulation isn't
> > > enabled).
> > >
> > 
> > Unfortunately this hack can't be reverted yet. Thomas' patch solves the issue
> > of platform devices matched with fbdev drivers to be properly unregistered if
> > a DRM driver attempts to remove all the conflicting framebuffers.
> > 
> > But the problem that fb561bf9abde ("fbdev: Prevent probing generic drivers if
> > a FB is already registered") worked around is different. It happens when the
> > DRM driver is probed before the {efi,simple}fb and other fbdev drivers, the
> > kicking out of conflicting framebuffers already happened and these drivers
> > will be allowed to probe even when a DRM driver is already present.
> > 
> > We need a clearer way to prevent it, but can't revert fb561bf9abde until that.
> 
> Yeah that entire area is a mess still, ideally we'd have something else
> creating the platform devices, and efifb/offb and all these would just
> bind against them.
> 
> Hm one idea that just crossed my mind: Could we have a flag in fb_info for
> fw drivers, and check this in framebuffer_register? Then at least all the
> logic would be in the fbdev core.

Ok coffee just kicked in, how exactly does your scenario work?

This code I'm reverting here is in the platform_dev->probe function.
Thomas' patch removes the platform_dev. How exactly can you still probe
against a platform dev if that platform dev is gone?

Iow, now that I reponder your case after a few weeks I'm no longer sure
things work like you claim.

There is the issue that offb still bidns without a platform_dev, but
that's not affected by this patch here.
-Daniel
Javier Martinez Canillas April 5, 2022, 9:19 a.m. UTC | #4
Hello Daniel,

On 4/5/22 10:40, Daniel Vetter wrote:
> On Tue, Apr 05, 2022 at 10:36:35AM +0200, Daniel Vetter wrote:
>> On Wed, Feb 09, 2022 at 01:19:26AM +0100, Javier Martinez Canillas wrote:
>>> On 2/8/22 22:08, Daniel Vetter wrote:
>>>> This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.
>>>>
>>>> With
>>>>
>>>> commit 27599aacbaefcbf2af7b06b0029459bbf682000d
>>>> Author: Thomas Zimmermann <tzimmermann@suse.de>
>>>> Date:   Tue Jan 25 10:12:18 2022 +0100
>>>>
>>>>     fbdev: Hot-unplug firmware fb devices on forced removal
>>>>
>>>> this should be fixed properly and we can remove this somewhat hackish
>>>> check here (e.g. this won't catch drm drivers if fbdev emulation isn't
>>>> enabled).
>>>>
>>>
>>> Unfortunately this hack can't be reverted yet. Thomas' patch solves the issue
>>> of platform devices matched with fbdev drivers to be properly unregistered if
>>> a DRM driver attempts to remove all the conflicting framebuffers.
>>>
>>> But the problem that fb561bf9abde ("fbdev: Prevent probing generic drivers if
>>> a FB is already registered") worked around is different. It happens when the
>>> DRM driver is probed before the {efi,simple}fb and other fbdev drivers, the
>>> kicking out of conflicting framebuffers already happened and these drivers
>>> will be allowed to probe even when a DRM driver is already present.
>>>
>>> We need a clearer way to prevent it, but can't revert fb561bf9abde until that.
>>
>> Yeah that entire area is a mess still, ideally we'd have something else
>> creating the platform devices, and efifb/offb and all these would just
>> bind against them.
>>
>> Hm one idea that just crossed my mind: Could we have a flag in fb_info for
>> fw drivers, and check this in framebuffer_register? Then at least all the
>> logic would be in the fbdev core.
>

I can't answer right away since I've since forgotten this part of the code
and will require to do a detailed read to refresh my memory.

I'll answer later but preferred to mention the other question ASAP.
 
> Ok coffee just kicked in, how exactly does your scenario work?
> 
> This code I'm reverting here is in the platform_dev->probe function.
> Thomas' patch removes the platform_dev. How exactly can you still probe
> against a platform dev if that platform dev is gone?
>

Because the platform was not even registered by the time the DRM driver
probed and all the devices for the conflicting drivers were unregistered.
 
> Iow, now that I reponder your case after a few weeks I'm no longer sure
> things work like you claim.
>

This is how I think that work, please let me know if you see something
wrong in my logic:

1) A PCI device of OF device is registered for the GPU, this attempt to
   match a registered driver but no driver was registered that match yet.

2) The efifb driver is built-in, will be initialized according to the link
   order of the objects under drivers/video and the fbdev driver is registered.

   There is no platform device or PCI/OF device registered that matches.

3) The DRM driver is built-in, will be initialized according to the link
   order of the objects under drivers/gpu and the DRM driver is registered.
   
   This matches the device registered in (1) and the DRM driver probes.

4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
   before registering the DRM device.

   There are no conflicting drivers or platform device at this point.

5) Latter at some point the drivers/firmware/sysfb.c init function is
   executed, and this registers a platform device for the generic fb.

   This device matches the efifb driver registered in (2) and the fbdev
   driver probes.
   
   Since that happens *after* the DRM driver already matched, probed
   and registered the DRM device, that is a bug and what the reverted
   patch worked around.

So we need to prevent (5) if (1) and (3) already happened. Having a flag
set in the fbdev core somewhere when remove_conflicting_framebuffers()
is called could be a solution indeed.

That is, the fbdev core needs to know that a DRM driver already probed
and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE

I can attempt to write a patch for that.
Daniel Vetter April 5, 2022, 9:24 a.m. UTC | #5
On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> Hello Daniel,
>
> On 4/5/22 10:40, Daniel Vetter wrote:
> > On Tue, Apr 05, 2022 at 10:36:35AM +0200, Daniel Vetter wrote:
> >> On Wed, Feb 09, 2022 at 01:19:26AM +0100, Javier Martinez Canillas wrote:
> >>> On 2/8/22 22:08, Daniel Vetter wrote:
> >>>> This reverts commit fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee.
> >>>>
> >>>> With
> >>>>
> >>>> commit 27599aacbaefcbf2af7b06b0029459bbf682000d
> >>>> Author: Thomas Zimmermann <tzimmermann@suse.de>
> >>>> Date:   Tue Jan 25 10:12:18 2022 +0100
> >>>>
> >>>>     fbdev: Hot-unplug firmware fb devices on forced removal
> >>>>
> >>>> this should be fixed properly and we can remove this somewhat hackish
> >>>> check here (e.g. this won't catch drm drivers if fbdev emulation isn't
> >>>> enabled).
> >>>>
> >>>
> >>> Unfortunately this hack can't be reverted yet. Thomas' patch solves the issue
> >>> of platform devices matched with fbdev drivers to be properly unregistered if
> >>> a DRM driver attempts to remove all the conflicting framebuffers.
> >>>
> >>> But the problem that fb561bf9abde ("fbdev: Prevent probing generic drivers if
> >>> a FB is already registered") worked around is different. It happens when the
> >>> DRM driver is probed before the {efi,simple}fb and other fbdev drivers, the
> >>> kicking out of conflicting framebuffers already happened and these drivers
> >>> will be allowed to probe even when a DRM driver is already present.
> >>>
> >>> We need a clearer way to prevent it, but can't revert fb561bf9abde until that.
> >>
> >> Yeah that entire area is a mess still, ideally we'd have something else
> >> creating the platform devices, and efifb/offb and all these would just
> >> bind against them.
> >>
> >> Hm one idea that just crossed my mind: Could we have a flag in fb_info for
> >> fw drivers, and check this in framebuffer_register? Then at least all the
> >> logic would be in the fbdev core.
> >
>
> I can't answer right away since I've since forgotten this part of the code
> and will require to do a detailed read to refresh my memory.
>
> I'll answer later but preferred to mention the other question ASAP.
>
> > Ok coffee just kicked in, how exactly does your scenario work?
> >
> > This code I'm reverting here is in the platform_dev->probe function.
> > Thomas' patch removes the platform_dev. How exactly can you still probe
> > against a platform dev if that platform dev is gone?
> >
>
> Because the platform was not even registered by the time the DRM driver
> probed and all the devices for the conflicting drivers were unregistered.
>
> > Iow, now that I reponder your case after a few weeks I'm no longer sure
> > things work like you claim.
> >
>
> This is how I think that work, please let me know if you see something
> wrong in my logic:
>
> 1) A PCI device of OF device is registered for the GPU, this attempt to
>    match a registered driver but no driver was registered that match yet.
>
> 2) The efifb driver is built-in, will be initialized according to the link
>    order of the objects under drivers/video and the fbdev driver is registered.
>
>    There is no platform device or PCI/OF device registered that matches.
>
> 3) The DRM driver is built-in, will be initialized according to the link
>    order of the objects under drivers/gpu and the DRM driver is registered.
>
>    This matches the device registered in (1) and the DRM driver probes.
>
> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
>    before registering the DRM device.
>
>    There are no conflicting drivers or platform device at this point.
>
> 5) Latter at some point the drivers/firmware/sysfb.c init function is
>    executed, and this registers a platform device for the generic fb.
>
>    This device matches the efifb driver registered in (2) and the fbdev
>    driver probes.
>
>    Since that happens *after* the DRM driver already matched, probed
>    and registered the DRM device, that is a bug and what the reverted
>    patch worked around.
>
> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> is called could be a solution indeed.
>
> That is, the fbdev core needs to know that a DRM driver already probed
> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
>
> I can attempt to write a patch for that.

Ah yeah that could be an issue. I think the right fix is to replace
the platform dev unregister with a sysfb_unregister() function in
sysfb.c, which is synced with a common lock with the sysfb_init
function and a small boolean. I think I can type that up quickly for
v3.
-Daniel

>
> --
> Best regards,
>
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
>
Javier Martinez Canillas April 5, 2022, 9:52 a.m. UTC | #6
On 4/5/22 11:24, Daniel Vetter wrote:
> On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas

[snip]

>>
>> This is how I think that work, please let me know if you see something
>> wrong in my logic:
>>
>> 1) A PCI device of OF device is registered for the GPU, this attempt to
>>    match a registered driver but no driver was registered that match yet.
>>
>> 2) The efifb driver is built-in, will be initialized according to the link
>>    order of the objects under drivers/video and the fbdev driver is registered.
>>
>>    There is no platform device or PCI/OF device registered that matches.
>>
>> 3) The DRM driver is built-in, will be initialized according to the link
>>    order of the objects under drivers/gpu and the DRM driver is registered.
>>
>>    This matches the device registered in (1) and the DRM driver probes.
>>
>> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
>>    before registering the DRM device.
>>
>>    There are no conflicting drivers or platform device at this point.
>>
>> 5) Latter at some point the drivers/firmware/sysfb.c init function is
>>    executed, and this registers a platform device for the generic fb.
>>
>>    This device matches the efifb driver registered in (2) and the fbdev
>>    driver probes.
>>
>>    Since that happens *after* the DRM driver already matched, probed
>>    and registered the DRM device, that is a bug and what the reverted
>>    patch worked around.
>>
>> So we need to prevent (5) if (1) and (3) already happened. Having a flag
>> set in the fbdev core somewhere when remove_conflicting_framebuffers()
>> is called could be a solution indeed.
>>
>> That is, the fbdev core needs to know that a DRM driver already probed
>> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
>>
>> I can attempt to write a patch for that.
> 
> Ah yeah that could be an issue. I think the right fix is to replace
> the platform dev unregister with a sysfb_unregister() function in
> sysfb.c, which is synced with a common lock with the sysfb_init
> function and a small boolean. I think I can type that up quickly for
> v3.

It's more complicated than that since sysfb is just *one* of the several
places where platform devices can be registered for video devices.

For instance, the vga16fb driver registers its own platform device in
its module_init() function so that can also happen after the conflicting
framebuffers (and associated devices) were removed by a DRM driver probe.

I tried to minimize the issue for that particular driver with commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f

But the point stands, it all boils down to the fact that you have two
different subsystems registering video drivers and they don't know all
about each other to take a proper decision.

Right now the drm_aperture_remove_conflicting_framebuffers() call signals
in one direction from DRM to fbdev but there isn't a communication in the
other direction, from fbdev to DRM.

I believe the correct fix would be for the fbdev core to keep a list of
the apertures struct that are passed to remove_conflicting_framebuffers(),
that way it will know what apertures are not available anymore and prevent
to register any fbdev framebuffer that conflicts with one already present.

Let me know if you think that makes sense and I can attempt to write a fix.
Daniel Vetter April 5, 2022, 10:34 a.m. UTC | #7
On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> On 4/5/22 11:24, Daniel Vetter wrote:
> > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
>
> [snip]
>
> >>
> >> This is how I think that work, please let me know if you see something
> >> wrong in my logic:
> >>
> >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> >>    match a registered driver but no driver was registered that match yet.
> >>
> >> 2) The efifb driver is built-in, will be initialized according to the link
> >>    order of the objects under drivers/video and the fbdev driver is registered.
> >>
> >>    There is no platform device or PCI/OF device registered that matches.
> >>
> >> 3) The DRM driver is built-in, will be initialized according to the link
> >>    order of the objects under drivers/gpu and the DRM driver is registered.
> >>
> >>    This matches the device registered in (1) and the DRM driver probes.
> >>
> >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> >>    before registering the DRM device.
> >>
> >>    There are no conflicting drivers or platform device at this point.
> >>
> >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> >>    executed, and this registers a platform device for the generic fb.
> >>
> >>    This device matches the efifb driver registered in (2) and the fbdev
> >>    driver probes.
> >>
> >>    Since that happens *after* the DRM driver already matched, probed
> >>    and registered the DRM device, that is a bug and what the reverted
> >>    patch worked around.
> >>
> >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> >> is called could be a solution indeed.
> >>
> >> That is, the fbdev core needs to know that a DRM driver already probed
> >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> >>
> >> I can attempt to write a patch for that.
> >
> > Ah yeah that could be an issue. I think the right fix is to replace
> > the platform dev unregister with a sysfb_unregister() function in
> > sysfb.c, which is synced with a common lock with the sysfb_init
> > function and a small boolean. I think I can type that up quickly for
> > v3.
>
> It's more complicated than that since sysfb is just *one* of the several
> places where platform devices can be registered for video devices.
>
> For instance, the vga16fb driver registers its own platform device in
> its module_init() function so that can also happen after the conflicting
> framebuffers (and associated devices) were removed by a DRM driver probe.
>
> I tried to minimize the issue for that particular driver with commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
>
> But the point stands, it all boils down to the fact that you have two
> different subsystems registering video drivers and they don't know all
> about each other to take a proper decision.
>
> Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> in one direction from DRM to fbdev but there isn't a communication in the
> other direction, from fbdev to DRM.
>
> I believe the correct fix would be for the fbdev core to keep a list of
> the apertures struct that are passed to remove_conflicting_framebuffers(),
> that way it will know what apertures are not available anymore and prevent
> to register any fbdev framebuffer that conflicts with one already present.

Hm that still feels like reinventing a driver model, badly.

I think there's two cleaner solutions:
- move all the firmware driver platform_dev into sysfb.c, and then
just bind the special cases against that (e.g. offb, vga16fb and all
these). Then we'd have one sysfb_try_unregister(struct device *dev)
interface that fbmem.c uses.
- let fbmem.c call into each of these firmware device providers, which
means some loops most likely (like we can't call into vga16fb), so
probably need to move that into fbmem.c and it all gets a bit messy.

> Let me know if you think that makes sense and I can attempt to write a fix.

I still think unregistering the platform_dev properly makes the most
sense, and feels like the most proper linux device model solution
instead of hacks on top - if the firmware fb is unuseable because a
native driver has taken over, we should nuke that. And also the
firmware fb driver would then just bind to that platform_dev if it
exists, and only if it exists. Also I think it should be the
responsibility of whichever piece of code that registers these
platform devices to ensure that platform_dev actually still exists.
That's why I think pushing all that code into sysfb.c is probably the
cleanest solution.

fbdev predates all that stuff by a lot, hence the hand-rolling.

But maybe Greg has some more thoughts here too?
-Daniel

>
> --
> Best regards,
>
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
>
Geert Uytterhoeven April 5, 2022, 1:24 p.m. UTC | #8
Hi Daniel,

On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> <javierm@redhat.com> wrote:
> > On 4/5/22 11:24, Daniel Vetter wrote:
> > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > >> This is how I think that work, please let me know if you see something
> > >> wrong in my logic:
> > >>
> > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > >>    match a registered driver but no driver was registered that match yet.
> > >>
> > >> 2) The efifb driver is built-in, will be initialized according to the link
> > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > >>
> > >>    There is no platform device or PCI/OF device registered that matches.
> > >>
> > >> 3) The DRM driver is built-in, will be initialized according to the link
> > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > >>
> > >>    This matches the device registered in (1) and the DRM driver probes.
> > >>
> > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > >>    before registering the DRM device.
> > >>
> > >>    There are no conflicting drivers or platform device at this point.
> > >>
> > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > >>    executed, and this registers a platform device for the generic fb.
> > >>
> > >>    This device matches the efifb driver registered in (2) and the fbdev
> > >>    driver probes.
> > >>
> > >>    Since that happens *after* the DRM driver already matched, probed
> > >>    and registered the DRM device, that is a bug and what the reverted
> > >>    patch worked around.
> > >>
> > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > >> is called could be a solution indeed.
> > >>
> > >> That is, the fbdev core needs to know that a DRM driver already probed
> > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > >>
> > >> I can attempt to write a patch for that.
> > >
> > > Ah yeah that could be an issue. I think the right fix is to replace
> > > the platform dev unregister with a sysfb_unregister() function in
> > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > function and a small boolean. I think I can type that up quickly for
> > > v3.
> >
> > It's more complicated than that since sysfb is just *one* of the several
> > places where platform devices can be registered for video devices.
> >
> > For instance, the vga16fb driver registers its own platform device in
> > its module_init() function so that can also happen after the conflicting
> > framebuffers (and associated devices) were removed by a DRM driver probe.
> >
> > I tried to minimize the issue for that particular driver with commit:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> >
> > But the point stands, it all boils down to the fact that you have two
> > different subsystems registering video drivers and they don't know all
> > about each other to take a proper decision.
> >
> > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > in one direction from DRM to fbdev but there isn't a communication in the
> > other direction, from fbdev to DRM.
> >
> > I believe the correct fix would be for the fbdev core to keep a list of
> > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > that way it will know what apertures are not available anymore and prevent
> > to register any fbdev framebuffer that conflicts with one already present.
>
> Hm that still feels like reinventing a driver model, badly.
>
> I think there's two cleaner solutions:
> - move all the firmware driver platform_dev into sysfb.c, and then
> just bind the special cases against that (e.g. offb, vga16fb and all
> these). Then we'd have one sysfb_try_unregister(struct device *dev)
> interface that fbmem.c uses.
> - let fbmem.c call into each of these firmware device providers, which
> means some loops most likely (like we can't call into vga16fb), so
> probably need to move that into fbmem.c and it all gets a bit messy.
>
> > Let me know if you think that makes sense and I can attempt to write a fix.
>
> I still think unregistering the platform_dev properly makes the most

That doesn't sound very driver-model-aware to me. The device is what
the driver binds to; it does not cease to exist.

> sense, and feels like the most proper linux device model solution
> instead of hacks on top - if the firmware fb is unuseable because a
> native driver has taken over, we should nuke that. And also the
> firmware fb driver would then just bind to that platform_dev if it
> exists, and only if it exists. Also I think it should be the
> responsibility of whichever piece of code that registers these
> platform devices to ensure that platform_dev actually still exists.
> That's why I think pushing all that code into sysfb.c is probably the
> cleanest solution.

Can't you unbind the generic driver first, and bind the specific driver
afterwards? Alike writing to sysfs unbind/driver_override/bind,
but from code?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Javier Martinez Canillas April 5, 2022, 1:25 p.m. UTC | #9
On 4/5/22 12:34, Daniel Vetter wrote:
> On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> <javierm@redhat.com> wrote:

[snip]

>>
>> I believe the correct fix would be for the fbdev core to keep a list of
>> the apertures struct that are passed to remove_conflicting_framebuffers(),
>> that way it will know what apertures are not available anymore and prevent
>> to register any fbdev framebuffer that conflicts with one already present.
> 
> Hm that still feels like reinventing a driver model, badly.
>

Yeah, you are correct.
 
> I think there's two cleaner solutions:
> - move all the firmware driver platform_dev into sysfb.c, and then
> just bind the special cases against that (e.g. offb, vga16fb and all
> these). Then we'd have one sysfb_try_unregister(struct device *dev)
> interface that fbmem.c uses.

I think this is the cleaner option. And makes sense to consolidate all
the firmware drivers platform device registration to sysfb.c.

Already does for VIDEO_TYPE_EFI ("efi-framebuffer") and VIDEO_TYPE_VLFB
("vesa-framebuffer"), so need to also make it cope with VIDEO_TYPE_EGAC
and VIDEO_TYPE_VGAC ("vga16fb").

For offb is less clear since currently the offb driver does not really
use the Linux device model, that is the driver does not match a device
that's registered, there's no device which is the bug that was reported
to Thomas in the other thread.

It's unclear how to properly fix that since we will need to convert the
offb driver to register a platform driver and match against a device that
is registered by some platform code that parses the OF...

> - let fbmem.c call into each of these firmware device providers, which
> means some loops most likely (like we can't call into vga16fb), so
> probably need to move that into fbmem.c and it all gets a bit messy.
> 

Yup, that would get messy indeed so not a good option.

>> Let me know if you think that makes sense and I can attempt to write a fix.
> 
> I still think unregistering the platform_dev properly makes the most
> sense, and feels like the most proper linux device model solution
> instead of hacks on top - if the firmware fb is unuseable because a
> native driver has taken over, we should nuke that. And also the
> firmware fb driver would then just bind to that platform_dev if it
> exists, and only if it exists. Also I think it should be the
> responsibility of whichever piece of code that registers these
> platform devices to ensure that platform_dev actually still exists.
> That's why I think pushing all that code into sysfb.c is probably the
> cleanest solution.
>

Agreed. Not registering the platform devices if there is already a DRM
driver for the same device is what makes the most sense. What I don't
understand is how sysfb would know that if run after a DRM registration.

The only way that could know is if sysfb would keep a list of apertures
for all the DRM drivers registered or if the DRM core somewhat notifies
to sysfb that a native driver was already registered.

Another option and probably the cleanest although the harder solution is
to finally bite the bullet and make all the DRM drivers to request their
memory region.

Or as you mentioned in the past, to move that logic into the device model
and then not allow to register devices that require an overlapping region.

And there could be a request_mem_region_remove_conflicting() or something
that real DRM drivers could use to force a memory region request and make
the device model to unregister any device that may already have that mem.
Greg Kroah-Hartman April 5, 2022, 1:33 p.m. UTC | #10
On Tue, Apr 05, 2022 at 03:24:40PM +0200, Geert Uytterhoeven wrote:
> Hi Daniel,
> 
> On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> > <javierm@redhat.com> wrote:
> > > On 4/5/22 11:24, Daniel Vetter wrote:
> > > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > > >> This is how I think that work, please let me know if you see something
> > > >> wrong in my logic:
> > > >>
> > > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > > >>    match a registered driver but no driver was registered that match yet.
> > > >>
> > > >> 2) The efifb driver is built-in, will be initialized according to the link
> > > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > > >>
> > > >>    There is no platform device or PCI/OF device registered that matches.
> > > >>
> > > >> 3) The DRM driver is built-in, will be initialized according to the link
> > > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > > >>
> > > >>    This matches the device registered in (1) and the DRM driver probes.
> > > >>
> > > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > > >>    before registering the DRM device.
> > > >>
> > > >>    There are no conflicting drivers or platform device at this point.
> > > >>
> > > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > > >>    executed, and this registers a platform device for the generic fb.
> > > >>
> > > >>    This device matches the efifb driver registered in (2) and the fbdev
> > > >>    driver probes.
> > > >>
> > > >>    Since that happens *after* the DRM driver already matched, probed
> > > >>    and registered the DRM device, that is a bug and what the reverted
> > > >>    patch worked around.
> > > >>
> > > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > > >> is called could be a solution indeed.
> > > >>
> > > >> That is, the fbdev core needs to know that a DRM driver already probed
> > > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > > >>
> > > >> I can attempt to write a patch for that.
> > > >
> > > > Ah yeah that could be an issue. I think the right fix is to replace
> > > > the platform dev unregister with a sysfb_unregister() function in
> > > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > > function and a small boolean. I think I can type that up quickly for
> > > > v3.
> > >
> > > It's more complicated than that since sysfb is just *one* of the several
> > > places where platform devices can be registered for video devices.
> > >
> > > For instance, the vga16fb driver registers its own platform device in
> > > its module_init() function so that can also happen after the conflicting
> > > framebuffers (and associated devices) were removed by a DRM driver probe.
> > >
> > > I tried to minimize the issue for that particular driver with commit:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> > >
> > > But the point stands, it all boils down to the fact that you have two
> > > different subsystems registering video drivers and they don't know all
> > > about each other to take a proper decision.
> > >
> > > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > > in one direction from DRM to fbdev but there isn't a communication in the
> > > other direction, from fbdev to DRM.
> > >
> > > I believe the correct fix would be for the fbdev core to keep a list of
> > > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > > that way it will know what apertures are not available anymore and prevent
> > > to register any fbdev framebuffer that conflicts with one already present.
> >
> > Hm that still feels like reinventing a driver model, badly.
> >
> > I think there's two cleaner solutions:
> > - move all the firmware driver platform_dev into sysfb.c, and then
> > just bind the special cases against that (e.g. offb, vga16fb and all
> > these). Then we'd have one sysfb_try_unregister(struct device *dev)
> > interface that fbmem.c uses.
> > - let fbmem.c call into each of these firmware device providers, which
> > means some loops most likely (like we can't call into vga16fb), so
> > probably need to move that into fbmem.c and it all gets a bit messy.
> >
> > > Let me know if you think that makes sense and I can attempt to write a fix.
> >
> > I still think unregistering the platform_dev properly makes the most
> 
> That doesn't sound very driver-model-aware to me. The device is what
> the driver binds to; it does not cease to exist.

I agree, that sounds odd.

The device should always stick around (as the bus creates it), it's up
to the driver to bind to the device as needed.

> > sense, and feels like the most proper linux device model solution
> > instead of hacks on top - if the firmware fb is unuseable because a
> > native driver has taken over, we should nuke that. And also the
> > firmware fb driver would then just bind to that platform_dev if it
> > exists, and only if it exists. Also I think it should be the
> > responsibility of whichever piece of code that registers these
> > platform devices to ensure that platform_dev actually still exists.
> > That's why I think pushing all that code into sysfb.c is probably the
> > cleanest solution.
> 
> Can't you unbind the generic driver first, and bind the specific driver
> afterwards? Alike writing to sysfs unbind/driver_override/bind,
> but from code?

That too feels odd, what is so special about the fbdev code that the
normal driver functions do not work for them?  It shouldn't matter if
multiple subsystems register video devices, why can't we handle more
than one fb device?

thanks,

greg k-h
Daniel Vetter April 5, 2022, 4:12 p.m. UTC | #11
On Tue, Apr 05, 2022 at 03:33:17PM +0200, Greg KH wrote:
> On Tue, Apr 05, 2022 at 03:24:40PM +0200, Geert Uytterhoeven wrote:
> > Hi Daniel,
> > 
> > On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> > > <javierm@redhat.com> wrote:
> > > > On 4/5/22 11:24, Daniel Vetter wrote:
> > > > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > > > >> This is how I think that work, please let me know if you see something
> > > > >> wrong in my logic:
> > > > >>
> > > > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > > > >>    match a registered driver but no driver was registered that match yet.
> > > > >>
> > > > >> 2) The efifb driver is built-in, will be initialized according to the link
> > > > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > > > >>
> > > > >>    There is no platform device or PCI/OF device registered that matches.
> > > > >>
> > > > >> 3) The DRM driver is built-in, will be initialized according to the link
> > > > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > > > >>
> > > > >>    This matches the device registered in (1) and the DRM driver probes.
> > > > >>
> > > > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > > > >>    before registering the DRM device.
> > > > >>
> > > > >>    There are no conflicting drivers or platform device at this point.
> > > > >>
> > > > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > > > >>    executed, and this registers a platform device for the generic fb.
> > > > >>
> > > > >>    This device matches the efifb driver registered in (2) and the fbdev
> > > > >>    driver probes.
> > > > >>
> > > > >>    Since that happens *after* the DRM driver already matched, probed
> > > > >>    and registered the DRM device, that is a bug and what the reverted
> > > > >>    patch worked around.
> > > > >>
> > > > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > > > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > > > >> is called could be a solution indeed.
> > > > >>
> > > > >> That is, the fbdev core needs to know that a DRM driver already probed
> > > > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > > > >>
> > > > >> I can attempt to write a patch for that.
> > > > >
> > > > > Ah yeah that could be an issue. I think the right fix is to replace
> > > > > the platform dev unregister with a sysfb_unregister() function in
> > > > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > > > function and a small boolean. I think I can type that up quickly for
> > > > > v3.
> > > >
> > > > It's more complicated than that since sysfb is just *one* of the several
> > > > places where platform devices can be registered for video devices.
> > > >
> > > > For instance, the vga16fb driver registers its own platform device in
> > > > its module_init() function so that can also happen after the conflicting
> > > > framebuffers (and associated devices) were removed by a DRM driver probe.
> > > >
> > > > I tried to minimize the issue for that particular driver with commit:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> > > >
> > > > But the point stands, it all boils down to the fact that you have two
> > > > different subsystems registering video drivers and they don't know all
> > > > about each other to take a proper decision.
> > > >
> > > > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > > > in one direction from DRM to fbdev but there isn't a communication in the
> > > > other direction, from fbdev to DRM.
> > > >
> > > > I believe the correct fix would be for the fbdev core to keep a list of
> > > > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > > > that way it will know what apertures are not available anymore and prevent
> > > > to register any fbdev framebuffer that conflicts with one already present.
> > >
> > > Hm that still feels like reinventing a driver model, badly.
> > >
> > > I think there's two cleaner solutions:
> > > - move all the firmware driver platform_dev into sysfb.c, and then
> > > just bind the special cases against that (e.g. offb, vga16fb and all
> > > these). Then we'd have one sysfb_try_unregister(struct device *dev)
> > > interface that fbmem.c uses.
> > > - let fbmem.c call into each of these firmware device providers, which
> > > means some loops most likely (like we can't call into vga16fb), so
> > > probably need to move that into fbmem.c and it all gets a bit messy.
> > >
> > > > Let me know if you think that makes sense and I can attempt to write a fix.
> > >
> > > I still think unregistering the platform_dev properly makes the most
> > 
> > That doesn't sound very driver-model-aware to me. The device is what
> > the driver binds to; it does not cease to exist.
> 
> I agree, that sounds odd.
> 
> The device should always stick around (as the bus creates it), it's up
> to the driver to bind to the device as needed.

The device actually disappears when the real driver takes over.

The firmware fb is a special thing which only really exists as long as the
firmware is in charge of the display hardware. As soon as a real driver
takes over, it stops being a thing.

And since a driver without a device is a bit a funny thing, we have been
pushing towards a model where the firmware code sets up a platform_device
for this fw interface, and the fw driver (efifb, simplefb and others like
that) bind against it. And then we started to throw out that
platform_device (which unbinds the fw driver and prevents it from ever
rebinding), except in the wrong layer so there's a few races.

Should we throw out all that code and replace it with something else? What
would that be like?

Note that the fw side generally has not much clue which real device on
some bus it corresponds to, that part is done through a bunch of magic
tricks. Some of them are simply "I'm taking over a display, pls through
out all fw drivers just to be sure".

> > > sense, and feels like the most proper linux device model solution
> > > instead of hacks on top - if the firmware fb is unuseable because a
> > > native driver has taken over, we should nuke that. And also the
> > > firmware fb driver would then just bind to that platform_dev if it
> > > exists, and only if it exists. Also I think it should be the
> > > responsibility of whichever piece of code that registers these
> > > platform devices to ensure that platform_dev actually still exists.
> > > That's why I think pushing all that code into sysfb.c is probably the
> > > cleanest solution.
> > 
> > Can't you unbind the generic driver first, and bind the specific driver
> > afterwards? Alike writing to sysfs unbind/driver_override/bind,
> > but from code?
> 
> That too feels odd, what is so special about the fbdev code that the
> normal driver functions do not work for them?  It shouldn't matter if
> multiple subsystems register video devices, why can't we handle more
> than one fb device?

The specific driver binds to a completely different device (this one is
more real), and sometimes has not much clue about what exactly the
fw/legacy driver is doing.

The special thing is that in fbdev we have "drivers" which are extremely
thin shims around the fw driver, which has done all the real display setup
for us. I don't think any other subsystem bothers with this, e.g. input
just tells the fw to get lost and never tries to use the fw input support
(stuff like the old horrors of emulating usb kbd as a ps/2 device and
things like that which fw tended to do). Only with display drivers do we
have this world where fairly often a fw driver is loaded first, and then
quite a bit later in the boot process, the real driver loads. It's a bit
like early serial console perhaps, to reduce the gap between when the
kernel loads and when the real display driver is ready.

Cheers, Daniel


> 
> thanks,
> 
> greg k-h
Greg Kroah-Hartman April 5, 2022, 4:44 p.m. UTC | #12
On Tue, Apr 05, 2022 at 06:12:59PM +0200, Daniel Vetter wrote:
> On Tue, Apr 05, 2022 at 03:33:17PM +0200, Greg KH wrote:
> > On Tue, Apr 05, 2022 at 03:24:40PM +0200, Geert Uytterhoeven wrote:
> > > Hi Daniel,
> > > 
> > > On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> > > > <javierm@redhat.com> wrote:
> > > > > On 4/5/22 11:24, Daniel Vetter wrote:
> > > > > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > > > > >> This is how I think that work, please let me know if you see something
> > > > > >> wrong in my logic:
> > > > > >>
> > > > > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > > > > >>    match a registered driver but no driver was registered that match yet.
> > > > > >>
> > > > > >> 2) The efifb driver is built-in, will be initialized according to the link
> > > > > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > > > > >>
> > > > > >>    There is no platform device or PCI/OF device registered that matches.
> > > > > >>
> > > > > >> 3) The DRM driver is built-in, will be initialized according to the link
> > > > > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > > > > >>
> > > > > >>    This matches the device registered in (1) and the DRM driver probes.
> > > > > >>
> > > > > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > > > > >>    before registering the DRM device.
> > > > > >>
> > > > > >>    There are no conflicting drivers or platform device at this point.
> > > > > >>
> > > > > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > > > > >>    executed, and this registers a platform device for the generic fb.
> > > > > >>
> > > > > >>    This device matches the efifb driver registered in (2) and the fbdev
> > > > > >>    driver probes.
> > > > > >>
> > > > > >>    Since that happens *after* the DRM driver already matched, probed
> > > > > >>    and registered the DRM device, that is a bug and what the reverted
> > > > > >>    patch worked around.
> > > > > >>
> > > > > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > > > > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > > > > >> is called could be a solution indeed.
> > > > > >>
> > > > > >> That is, the fbdev core needs to know that a DRM driver already probed
> > > > > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > > > > >>
> > > > > >> I can attempt to write a patch for that.
> > > > > >
> > > > > > Ah yeah that could be an issue. I think the right fix is to replace
> > > > > > the platform dev unregister with a sysfb_unregister() function in
> > > > > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > > > > function and a small boolean. I think I can type that up quickly for
> > > > > > v3.
> > > > >
> > > > > It's more complicated than that since sysfb is just *one* of the several
> > > > > places where platform devices can be registered for video devices.
> > > > >
> > > > > For instance, the vga16fb driver registers its own platform device in
> > > > > its module_init() function so that can also happen after the conflicting
> > > > > framebuffers (and associated devices) were removed by a DRM driver probe.
> > > > >
> > > > > I tried to minimize the issue for that particular driver with commit:
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> > > > >
> > > > > But the point stands, it all boils down to the fact that you have two
> > > > > different subsystems registering video drivers and they don't know all
> > > > > about each other to take a proper decision.
> > > > >
> > > > > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > > > > in one direction from DRM to fbdev but there isn't a communication in the
> > > > > other direction, from fbdev to DRM.
> > > > >
> > > > > I believe the correct fix would be for the fbdev core to keep a list of
> > > > > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > > > > that way it will know what apertures are not available anymore and prevent
> > > > > to register any fbdev framebuffer that conflicts with one already present.
> > > >
> > > > Hm that still feels like reinventing a driver model, badly.
> > > >
> > > > I think there's two cleaner solutions:
> > > > - move all the firmware driver platform_dev into sysfb.c, and then
> > > > just bind the special cases against that (e.g. offb, vga16fb and all
> > > > these). Then we'd have one sysfb_try_unregister(struct device *dev)
> > > > interface that fbmem.c uses.
> > > > - let fbmem.c call into each of these firmware device providers, which
> > > > means some loops most likely (like we can't call into vga16fb), so
> > > > probably need to move that into fbmem.c and it all gets a bit messy.
> > > >
> > > > > Let me know if you think that makes sense and I can attempt to write a fix.
> > > >
> > > > I still think unregistering the platform_dev properly makes the most
> > > 
> > > That doesn't sound very driver-model-aware to me. The device is what
> > > the driver binds to; it does not cease to exist.
> > 
> > I agree, that sounds odd.
> > 
> > The device should always stick around (as the bus creates it), it's up
> > to the driver to bind to the device as needed.
> 
> The device actually disappears when the real driver takes over.
> 
> The firmware fb is a special thing which only really exists as long as the
> firmware is in charge of the display hardware. As soon as a real driver
> takes over, it stops being a thing.
> 
> And since a driver without a device is a bit a funny thing, we have been
> pushing towards a model where the firmware code sets up a platform_device
> for this fw interface, and the fw driver (efifb, simplefb and others like
> that) bind against it. And then we started to throw out that
> platform_device (which unbinds the fw driver and prevents it from ever
> rebinding), except in the wrong layer so there's a few races.
> 
> Should we throw out all that code and replace it with something else? What
> would that be like?

Ah, no, sorry, I didn't know that at all.

That sounds semi-sane, just fix the races by moving the layer elsewhere?
Daniel Vetter April 5, 2022, 5:29 p.m. UTC | #13
On Tue, 5 Apr 2022 at 18:45, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Apr 05, 2022 at 06:12:59PM +0200, Daniel Vetter wrote:
> > On Tue, Apr 05, 2022 at 03:33:17PM +0200, Greg KH wrote:
> > > On Tue, Apr 05, 2022 at 03:24:40PM +0200, Geert Uytterhoeven wrote:
> > > > Hi Daniel,
> > > >
> > > > On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> > > > > <javierm@redhat.com> wrote:
> > > > > > On 4/5/22 11:24, Daniel Vetter wrote:
> > > > > > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > > > > > >> This is how I think that work, please let me know if you see something
> > > > > > >> wrong in my logic:
> > > > > > >>
> > > > > > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > > > > > >>    match a registered driver but no driver was registered that match yet.
> > > > > > >>
> > > > > > >> 2) The efifb driver is built-in, will be initialized according to the link
> > > > > > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > > > > > >>
> > > > > > >>    There is no platform device or PCI/OF device registered that matches.
> > > > > > >>
> > > > > > >> 3) The DRM driver is built-in, will be initialized according to the link
> > > > > > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > > > > > >>
> > > > > > >>    This matches the device registered in (1) and the DRM driver probes.
> > > > > > >>
> > > > > > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > > > > > >>    before registering the DRM device.
> > > > > > >>
> > > > > > >>    There are no conflicting drivers or platform device at this point.
> > > > > > >>
> > > > > > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > > > > > >>    executed, and this registers a platform device for the generic fb.
> > > > > > >>
> > > > > > >>    This device matches the efifb driver registered in (2) and the fbdev
> > > > > > >>    driver probes.
> > > > > > >>
> > > > > > >>    Since that happens *after* the DRM driver already matched, probed
> > > > > > >>    and registered the DRM device, that is a bug and what the reverted
> > > > > > >>    patch worked around.
> > > > > > >>
> > > > > > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > > > > > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > > > > > >> is called could be a solution indeed.
> > > > > > >>
> > > > > > >> That is, the fbdev core needs to know that a DRM driver already probed
> > > > > > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > > > > > >>
> > > > > > >> I can attempt to write a patch for that.
> > > > > > >
> > > > > > > Ah yeah that could be an issue. I think the right fix is to replace
> > > > > > > the platform dev unregister with a sysfb_unregister() function in
> > > > > > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > > > > > function and a small boolean. I think I can type that up quickly for
> > > > > > > v3.
> > > > > >
> > > > > > It's more complicated than that since sysfb is just *one* of the several
> > > > > > places where platform devices can be registered for video devices.
> > > > > >
> > > > > > For instance, the vga16fb driver registers its own platform device in
> > > > > > its module_init() function so that can also happen after the conflicting
> > > > > > framebuffers (and associated devices) were removed by a DRM driver probe.
> > > > > >
> > > > > > I tried to minimize the issue for that particular driver with commit:
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> > > > > >
> > > > > > But the point stands, it all boils down to the fact that you have two
> > > > > > different subsystems registering video drivers and they don't know all
> > > > > > about each other to take a proper decision.
> > > > > >
> > > > > > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > > > > > in one direction from DRM to fbdev but there isn't a communication in the
> > > > > > other direction, from fbdev to DRM.
> > > > > >
> > > > > > I believe the correct fix would be for the fbdev core to keep a list of
> > > > > > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > > > > > that way it will know what apertures are not available anymore and prevent
> > > > > > to register any fbdev framebuffer that conflicts with one already present.
> > > > >
> > > > > Hm that still feels like reinventing a driver model, badly.
> > > > >
> > > > > I think there's two cleaner solutions:
> > > > > - move all the firmware driver platform_dev into sysfb.c, and then
> > > > > just bind the special cases against that (e.g. offb, vga16fb and all
> > > > > these). Then we'd have one sysfb_try_unregister(struct device *dev)
> > > > > interface that fbmem.c uses.
> > > > > - let fbmem.c call into each of these firmware device providers, which
> > > > > means some loops most likely (like we can't call into vga16fb), so
> > > > > probably need to move that into fbmem.c and it all gets a bit messy.
> > > > >
> > > > > > Let me know if you think that makes sense and I can attempt to write a fix.
> > > > >
> > > > > I still think unregistering the platform_dev properly makes the most
> > > >
> > > > That doesn't sound very driver-model-aware to me. The device is what
> > > > the driver binds to; it does not cease to exist.
> > >
> > > I agree, that sounds odd.
> > >
> > > The device should always stick around (as the bus creates it), it's up
> > > to the driver to bind to the device as needed.
> >
> > The device actually disappears when the real driver takes over.
> >
> > The firmware fb is a special thing which only really exists as long as the
> > firmware is in charge of the display hardware. As soon as a real driver
> > takes over, it stops being a thing.
> >
> > And since a driver without a device is a bit a funny thing, we have been
> > pushing towards a model where the firmware code sets up a platform_device
> > for this fw interface, and the fw driver (efifb, simplefb and others like
> > that) bind against it. And then we started to throw out that
> > platform_device (which unbinds the fw driver and prevents it from ever
> > rebinding), except in the wrong layer so there's a few races.
> >
> > Should we throw out all that code and replace it with something else? What
> > would that be like?
>
> Ah, no, sorry, I didn't know that at all.
>
> That sounds semi-sane, just fix the races by moving the layer elsewhere?

Yeah essentially move it all into drivers/firmware/sysfb.c, for all
drivers, both the registering and the nuking, and warp that into a
local mutex. Currently parts is in there, parts is in fbmem.c, parts
in some of the drivers like vga16fb, and some drivers (iirc only offb)
still don't even have any platform_dev underneath their driver. So
ideally the drivers would all just have their platform_driver probe
functions, and that's it. It does mean though that some of that stuff
needs to be moved to sysfb.c or into the relevant fw code that sets
stuff up.

It'll take some, so really just a direction check before we move
further. You should get cc'ed on the patches (like with the sysfb
stuff) anyway. Sounds roughly right?
-Daniel
Greg Kroah-Hartman April 7, 2022, 5:26 p.m. UTC | #14
On Tue, Apr 05, 2022 at 07:29:22PM +0200, Daniel Vetter wrote:
> On Tue, 5 Apr 2022 at 18:45, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Apr 05, 2022 at 06:12:59PM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 05, 2022 at 03:33:17PM +0200, Greg KH wrote:
> > > > On Tue, Apr 05, 2022 at 03:24:40PM +0200, Geert Uytterhoeven wrote:
> > > > > Hi Daniel,
> > > > >
> > > > > On Tue, Apr 5, 2022 at 1:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > On Tue, 5 Apr 2022 at 11:52, Javier Martinez Canillas
> > > > > > <javierm@redhat.com> wrote:
> > > > > > > On 4/5/22 11:24, Daniel Vetter wrote:
> > > > > > > > On Tue, 5 Apr 2022 at 11:19, Javier Martinez Canillas
> > > > > > > >> This is how I think that work, please let me know if you see something
> > > > > > > >> wrong in my logic:
> > > > > > > >>
> > > > > > > >> 1) A PCI device of OF device is registered for the GPU, this attempt to
> > > > > > > >>    match a registered driver but no driver was registered that match yet.
> > > > > > > >>
> > > > > > > >> 2) The efifb driver is built-in, will be initialized according to the link
> > > > > > > >>    order of the objects under drivers/video and the fbdev driver is registered.
> > > > > > > >>
> > > > > > > >>    There is no platform device or PCI/OF device registered that matches.
> > > > > > > >>
> > > > > > > >> 3) The DRM driver is built-in, will be initialized according to the link
> > > > > > > >>    order of the objects under drivers/gpu and the DRM driver is registered.
> > > > > > > >>
> > > > > > > >>    This matches the device registered in (1) and the DRM driver probes.
> > > > > > > >>
> > > > > > > >> 4) The DRM driver .probe kicks out any conflicting DRM drivers and pdev
> > > > > > > >>    before registering the DRM device.
> > > > > > > >>
> > > > > > > >>    There are no conflicting drivers or platform device at this point.
> > > > > > > >>
> > > > > > > >> 5) Latter at some point the drivers/firmware/sysfb.c init function is
> > > > > > > >>    executed, and this registers a platform device for the generic fb.
> > > > > > > >>
> > > > > > > >>    This device matches the efifb driver registered in (2) and the fbdev
> > > > > > > >>    driver probes.
> > > > > > > >>
> > > > > > > >>    Since that happens *after* the DRM driver already matched, probed
> > > > > > > >>    and registered the DRM device, that is a bug and what the reverted
> > > > > > > >>    patch worked around.
> > > > > > > >>
> > > > > > > >> So we need to prevent (5) if (1) and (3) already happened. Having a flag
> > > > > > > >> set in the fbdev core somewhere when remove_conflicting_framebuffers()
> > > > > > > >> is called could be a solution indeed.
> > > > > > > >>
> > > > > > > >> That is, the fbdev core needs to know that a DRM driver already probed
> > > > > > > >> and make register_framebuffer() fail if info->flag & FBINFO_MISC_FIRMWARE
> > > > > > > >>
> > > > > > > >> I can attempt to write a patch for that.
> > > > > > > >
> > > > > > > > Ah yeah that could be an issue. I think the right fix is to replace
> > > > > > > > the platform dev unregister with a sysfb_unregister() function in
> > > > > > > > sysfb.c, which is synced with a common lock with the sysfb_init
> > > > > > > > function and a small boolean. I think I can type that up quickly for
> > > > > > > > v3.
> > > > > > >
> > > > > > > It's more complicated than that since sysfb is just *one* of the several
> > > > > > > places where platform devices can be registered for video devices.
> > > > > > >
> > > > > > > For instance, the vga16fb driver registers its own platform device in
> > > > > > > its module_init() function so that can also happen after the conflicting
> > > > > > > framebuffers (and associated devices) were removed by a DRM driver probe.
> > > > > > >
> > > > > > > I tried to minimize the issue for that particular driver with commit:
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0499f419b76f
> > > > > > >
> > > > > > > But the point stands, it all boils down to the fact that you have two
> > > > > > > different subsystems registering video drivers and they don't know all
> > > > > > > about each other to take a proper decision.
> > > > > > >
> > > > > > > Right now the drm_aperture_remove_conflicting_framebuffers() call signals
> > > > > > > in one direction from DRM to fbdev but there isn't a communication in the
> > > > > > > other direction, from fbdev to DRM.
> > > > > > >
> > > > > > > I believe the correct fix would be for the fbdev core to keep a list of
> > > > > > > the apertures struct that are passed to remove_conflicting_framebuffers(),
> > > > > > > that way it will know what apertures are not available anymore and prevent
> > > > > > > to register any fbdev framebuffer that conflicts with one already present.
> > > > > >
> > > > > > Hm that still feels like reinventing a driver model, badly.
> > > > > >
> > > > > > I think there's two cleaner solutions:
> > > > > > - move all the firmware driver platform_dev into sysfb.c, and then
> > > > > > just bind the special cases against that (e.g. offb, vga16fb and all
> > > > > > these). Then we'd have one sysfb_try_unregister(struct device *dev)
> > > > > > interface that fbmem.c uses.
> > > > > > - let fbmem.c call into each of these firmware device providers, which
> > > > > > means some loops most likely (like we can't call into vga16fb), so
> > > > > > probably need to move that into fbmem.c and it all gets a bit messy.
> > > > > >
> > > > > > > Let me know if you think that makes sense and I can attempt to write a fix.
> > > > > >
> > > > > > I still think unregistering the platform_dev properly makes the most
> > > > >
> > > > > That doesn't sound very driver-model-aware to me. The device is what
> > > > > the driver binds to; it does not cease to exist.
> > > >
> > > > I agree, that sounds odd.
> > > >
> > > > The device should always stick around (as the bus creates it), it's up
> > > > to the driver to bind to the device as needed.
> > >
> > > The device actually disappears when the real driver takes over.
> > >
> > > The firmware fb is a special thing which only really exists as long as the
> > > firmware is in charge of the display hardware. As soon as a real driver
> > > takes over, it stops being a thing.
> > >
> > > And since a driver without a device is a bit a funny thing, we have been
> > > pushing towards a model where the firmware code sets up a platform_device
> > > for this fw interface, and the fw driver (efifb, simplefb and others like
> > > that) bind against it. And then we started to throw out that
> > > platform_device (which unbinds the fw driver and prevents it from ever
> > > rebinding), except in the wrong layer so there's a few races.
> > >
> > > Should we throw out all that code and replace it with something else? What
> > > would that be like?
> >
> > Ah, no, sorry, I didn't know that at all.
> >
> > That sounds semi-sane, just fix the races by moving the layer elsewhere?
> 
> Yeah essentially move it all into drivers/firmware/sysfb.c, for all
> drivers, both the registering and the nuking, and warp that into a
> local mutex. Currently parts is in there, parts is in fbmem.c, parts
> in some of the drivers like vga16fb, and some drivers (iirc only offb)
> still don't even have any platform_dev underneath their driver. So
> ideally the drivers would all just have their platform_driver probe
> functions, and that's it. It does mean though that some of that stuff
> needs to be moved to sysfb.c or into the relevant fw code that sets
> stuff up.
> 
> It'll take some, so really just a direction check before we move
> further. You should get cc'ed on the patches (like with the sysfb
> stuff) anyway. Sounds roughly right?

That's fine with me, thanks.
diff mbox series

Patch

diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
index ea42ba6445b2..edca3703b964 100644
--- a/drivers/video/fbdev/efifb.c
+++ b/drivers/video/fbdev/efifb.c
@@ -351,17 +351,6 @@  static int efifb_probe(struct platform_device *dev)
 	char *option = NULL;
 	efi_memory_desc_t md;
 
-	/*
-	 * Generic drivers must not be registered if a framebuffer exists.
-	 * If a native driver was probed, the display hardware was already
-	 * taken and attempting to use the system framebuffer is dangerous.
-	 */
-	if (num_registered_fb > 0) {
-		dev_err(&dev->dev,
-			"efifb: a framebuffer is already registered\n");
-		return -EINVAL;
-	}
-
 	if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || pci_dev_disabled)
 		return -ENODEV;
 
diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
index 94fc9c6d0411..0ef41173325a 100644
--- a/drivers/video/fbdev/simplefb.c
+++ b/drivers/video/fbdev/simplefb.c
@@ -413,17 +413,6 @@  static int simplefb_probe(struct platform_device *pdev)
 	struct simplefb_par *par;
 	struct resource *res, *mem;
 
-	/*
-	 * Generic drivers must not be registered if a framebuffer exists.
-	 * If a native driver was probed, the display hardware was already
-	 * taken and attempting to use the system framebuffer is dangerous.
-	 */
-	if (num_registered_fb > 0) {
-		dev_err(&pdev->dev,
-			"simplefb: a framebuffer is already registered\n");
-		return -EINVAL;
-	}
-
 	if (fb_get_options("simplefb", NULL))
 		return -ENODEV;