[RFC,0/2] drm/panic: Add a drm panic handler

Message ID	20230809192514.158062-1-jfalempe@redhat.com (mailing list archive)
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Jocelyn Falempe <jfalempe@redhat.com> To: dri-devel@lists.freedesktop.org, tzimmermann@suse.de, airlied@redhat.com, maarten.lankhorst@linux.intel.com, mripard@kernel.org, daniel@ffwll.ch, javierm@redhat.com, bluescreen_avenger@verizon.net Subject: [RFC][PATCH 0/2] drm/panic: Add a drm panic handler Date: Wed, 9 Aug 2023 21:17:27 +0200 Message-ID: <20230809192514.158062-1-jfalempe@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Precedence: list Cc: Jocelyn Falempe <jfalempe@redhat.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/panic: Add a drm panic handler \| expand [RFC,0/2] drm/panic: Add a drm panic handler [1/2] drm/panic: Add a drm panic handler [2/2] drm/simpledrm: Add drm_panic support

Jocelyn Falempe Aug. 9, 2023, 7:17 p.m. UTC

This introduces a new drm panic handler, which displays a message when a panic occurs.
So when fbcon is disabled, you can still see a kernel panic.

This is one of the missing feature, when disabling VT/fbcon in the kernel:
https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
Fbcon can be replaced by a userspace kms console, but the panic screen must be done in the kernel.

This is a proof of concept, and works only with simpledrm, using the drm_client API.
This implementation with the drm client API, allocates new framebuffers, and looks a bit too complex to run in a panic handler.
Maybe we should add an API to "steal" the current framebuffer instead, because in a panic handler user-space is already stopped.

To test it, make sure you're using the simpledrm driver, and trigger a panic:
echo c > /proc/sysrq-trigger

There is one thing I don't know how to do, is to unregister the drm_panic when the graphic driver is unloaded.
drm_client_register() says it will automatically unregister on driver unload. But then I don't know how to remove it from my linked list, and free the drm_client_dev struct.

This is a first draft, so let me know what do you think about it.

Best regards,

Jocelyn Falempe (2):
drm/panic: Add a drm panic handler
drm/simpledrm: Add drm_panic support

drivers/gpu/drm/Kconfig | 11 ++
drivers/gpu/drm/Makefile | 1 +
drivers/gpu/drm/drm_drv.c | 3 +
drivers/gpu/drm/drm_panic.c | 286 +++++++++++++++++++++++++++++++
drivers/gpu/drm/tiny/simpledrm.c | 2 +
include/drm/drm_panic.h | 26 +++
6 files changed, 329 insertions(+)
create mode 100644 drivers/gpu/drm/drm_panic.c
create mode 100644 include/drm/drm_panic.h

base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1

nerdopolis Aug. 13, 2023, 2:20 a.m. UTC | #1

On Wednesday, August 9, 2023 3:17:27 PM EDT Jocelyn Falempe wrote:
> This introduces a new drm panic handler, which displays a message when a panic occurs.
> So when fbcon is disabled, you can still see a kernel panic.
> 
> This is one of the missing feature, when disabling VT/fbcon in the kernel:
> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
> Fbcon can be replaced by a userspace kms console, but the panic screen must be done in the kernel.
> 
> This is a proof of concept, and works only with simpledrm, using the drm_client API.
> This implementation with the drm client API, allocates new framebuffers, and looks a bit too complex to run in a panic handler.
> Maybe we should add an API to "steal" the current framebuffer instead, because in a panic handler user-space is already stopped.
> 
> To test it, make sure you're using the simpledrm driver, and trigger a panic:
> echo c > /proc/sysrq-trigger
> 
> There is one thing I don't know how to do, is to unregister the drm_panic when the graphic driver is unloaded.
> drm_client_register() says it will automatically unregister on driver unload. But then I don't know how to remove it from my linked list, and free the drm_client_dev struct.
> 
> This is a first draft, so let me know what do you think about it.
Hi, 

Oh wow, that's my post. I'm sorry about the late reply, I only saw this late yesterday, and I wanted to test it first.
I had to edit my test QEMU script a bit to use TianoCore for virtual UEFI boot as there is no gfxmode=keep for SimpleDRM to work otherwise when specifying -kernel to qemu AFAIK

I tested it, and it works! That's pretty cool, although is it possible to show the message, such as "attempted to kill init"?

I like the little ASCII Tux. Maybe an ASCII /!\ or [X] on the belly would make it more obvious to users that it is an error condition. 
Especially for non-English speaking users


I will tweak my script a bit so I can test it more quickly in the future too.

Thanks!
> 
> Best regards,
> 
> 
> 
> 
> Jocelyn Falempe (2):
>   drm/panic: Add a drm panic handler
>   drm/simpledrm: Add drm_panic support
> 
>  drivers/gpu/drm/Kconfig          |  11 ++
>  drivers/gpu/drm/Makefile         |   1 +
>  drivers/gpu/drm/drm_drv.c        |   3 +
>  drivers/gpu/drm/drm_panic.c      | 286 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/tiny/simpledrm.c |   2 +
>  include/drm/drm_panic.h          |  26 +++
>  6 files changed, 329 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_panic.c
>  create mode 100644 include/drm/drm_panic.h
> 
> 
> base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
>

Jocelyn Falempe Aug. 21, 2023, 1:34 p.m. UTC | #2

On 13/08/2023 04:20, nerdopolis wrote:
> On Wednesday, August 9, 2023 3:17:27 PM EDT Jocelyn Falempe wrote:
>> This introduces a new drm panic handler, which displays a message when a panic occurs.
>> So when fbcon is disabled, you can still see a kernel panic.
>>
>> This is one of the missing feature, when disabling VT/fbcon in the kernel:
>> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
>> Fbcon can be replaced by a userspace kms console, but the panic screen must be done in the kernel.
>>
>> This is a proof of concept, and works only with simpledrm, using the drm_client API.
>> This implementation with the drm client API, allocates new framebuffers, and looks a bit too complex to run in a panic handler.
>> Maybe we should add an API to "steal" the current framebuffer instead, because in a panic handler user-space is already stopped.
>>
>> To test it, make sure you're using the simpledrm driver, and trigger a panic:
>> echo c > /proc/sysrq-trigger
>>
>> There is one thing I don't know how to do, is to unregister the drm_panic when the graphic driver is unloaded.
>> drm_client_register() says it will automatically unregister on driver unload. But then I don't know how to remove it from my linked list, and free the drm_client_dev struct.
>>
>> This is a first draft, so let me know what do you think about it.
> Hi,
> 
> Oh wow, that's my post. I'm sorry about the late reply, I only saw this late yesterday, and I wanted to test it first.
> I had to edit my test QEMU script a bit to use TianoCore for virtual UEFI boot as there is no gfxmode=keep for SimpleDRM to work otherwise when specifying -kernel to qemu AFAIK
> 
> I tested it, and it works! That's pretty cool, although is it possible to show the message, such as "attempted to kill init"?

Thanks for taking time to test it. Yes it's possible to show the panic 
reason, as it's a parameter in the panic callback.
> 
> I like the little ASCII Tux. Maybe an ASCII /!\ or [X] on the belly would make it more obvious to users that it is an error condition.
> Especially for non-English speaking users

That's a good idea. It's also probably possible to re-use the tux boot 
logo, but I didn't try it yet.

But currently, my priority is to see if it can get accepted, and if it 
can work with a wide range of drivers.

> 
> 
> I will tweak my script a bit so I can test it more quickly in the future too.


Best Regards,

nerdopolis Aug. 22, 2023, 10:29 p.m. UTC | #3

On Monday, August 21, 2023 9:34:52 AM EDT Jocelyn Falempe wrote:
> On 13/08/2023 04:20, nerdopolis wrote:
> > On Wednesday, August 9, 2023 3:17:27 PM EDT Jocelyn Falempe wrote:
> >> This introduces a new drm panic handler, which displays a message when a panic occurs.
> >> So when fbcon is disabled, you can still see a kernel panic.
> >>
> >> This is one of the missing feature, when disabling VT/fbcon in the kernel:
> >> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
> >> Fbcon can be replaced by a userspace kms console, but the panic screen must be done in the kernel.
> >>
> >> This is a proof of concept, and works only with simpledrm, using the drm_client API.
> >> This implementation with the drm client API, allocates new framebuffers, and looks a bit too complex to run in a panic handler.
> >> Maybe we should add an API to "steal" the current framebuffer instead, because in a panic handler user-space is already stopped.
> >>
> >> To test it, make sure you're using the simpledrm driver, and trigger a panic:
> >> echo c > /proc/sysrq-trigger
> >>
> >> There is one thing I don't know how to do, is to unregister the drm_panic when the graphic driver is unloaded.
> >> drm_client_register() says it will automatically unregister on driver unload. But then I don't know how to remove it from my linked list, and free the drm_client_dev struct.
> >>
> >> This is a first draft, so let me know what do you think about it.
> > Hi,
> > 
> > Oh wow, that's my post. I'm sorry about the late reply, I only saw this late yesterday, and I wanted to test it first.
> > I had to edit my test QEMU script a bit to use TianoCore for virtual UEFI boot as there is no gfxmode=keep for SimpleDRM to work otherwise when specifying -kernel to qemu AFAIK
> > 
> > I tested it, and it works! That's pretty cool, although is it possible to show the message, such as "attempted to kill init"?
> 
> Thanks for taking time to test it. Yes it's possible to show the panic 
> reason, as it's a parameter in the panic callback.
> > 
> > I like the little ASCII Tux. Maybe an ASCII /!\ or [X] on the belly would make it more obvious to users that it is an error condition.
> > Especially for non-English speaking users
> 
> That's a good idea. It's also probably possible to re-use the tux boot 
> logo, but I didn't try it yet.
> 
> But currently, my priority is to see if it can get accepted, and if it 
> can work with a wide range of drivers.
> 
Alright, that makes sense. I don't have a huge variety of hardware, but I have a udl (displaylink 2) device that can be tested once ready
> > 
> > 
> > I will tweak my script a bit so I can test it more quickly in the future too.
> 
> 
> Best Regards,
> 
> > 
> > Thanks!
> >>
> >> Best regards,
> >>
> >>
> >>
> >>
> >> Jocelyn Falempe (2):
> >>    drm/panic: Add a drm panic handler
> >>    drm/simpledrm: Add drm_panic support
> >>
> >>   drivers/gpu/drm/Kconfig          |  11 ++
> >>   drivers/gpu/drm/Makefile         |   1 +
> >>   drivers/gpu/drm/drm_drv.c        |   3 +
> >>   drivers/gpu/drm/drm_panic.c      | 286 +++++++++++++++++++++++++++++++
> >>   drivers/gpu/drm/tiny/simpledrm.c |   2 +
> >>   include/drm/drm_panic.h          |  26 +++
> >>   6 files changed, 329 insertions(+)
> >>   create mode 100644 drivers/gpu/drm/drm_panic.c
> >>   create mode 100644 include/drm/drm_panic.h
> >>
> >>
> >> base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
> >>
> > 
> > 
> > 
> > 
> 
> 
>

Thomas Zimmermann Sept. 4, 2023, 2:29 p.m. UTC | #4

Hi Jocelyn,

thanks for moving this effort forward. It's much appreciated. I looked 
through the patches and tried the patchset on my test machine.

Am 09.08.23 um 21:17 schrieb Jocelyn Falempe:
> This introduces a new drm panic handler, which displays a message when a panic occurs.
> So when fbcon is disabled, you can still see a kernel panic.
> 
> This is one of the missing feature, when disabling VT/fbcon in the kernel:
> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
> Fbcon can be replaced by a userspace kms console, but the panic screen must be done in the kernel.
> 
> This is a proof of concept, and works only with simpledrm, using the drm_client API.
> This implementation with the drm client API, allocates new framebuffers, and looks a bit too complex to run in a panic handler.
> Maybe we should add an API to "steal" the current framebuffer instead, because in a panic handler user-space is already stopped.

Yes, that was also my first thought. I'd use an extra callback in struct 
drm_driver, like this:

struct drm_driver {
   int (*get_scanout_buffer)(/* return HW scanout */)
}

The scanout buffer would be described by kernel virtual address address, 
resolution, color format and scanline pitch. And that's what the panic 
handler uses.

Any driver implementing this interface would support the panic handler. 
If there's a concurrent display update, we'd have to synchronize.

> 
> To test it, make sure you're using the simpledrm driver, and trigger a panic:
> echo c > /proc/sysrq-trigger

The penguin was cute. :)

This only works if the display is already running. I had to start Gnome 
to set a display mode. Then let the panic handler take over the output.

But with simpledrm, we could even display a message without an output, 
as the framebuffer is always there.

> 
> There is one thing I don't know how to do, is to unregister the drm_panic when the graphic driver is unloaded.
> drm_client_register() says it will automatically unregister on driver unload. But then I don't know how to remove it from my linked list, and free the drm_client_dev struct.

Unregistering wouldn't be necessary with this proposed 
get_scanout_buffer. In the case of a panic, just remain silent if 
there's no driver that provides such a callback.

> 
> This is a first draft, so let me know what do you think about it.

One thing that will need serious work is the raw output. The current 
blitting for XRGB8888 is really just a quick-and-dirty hack.

I think we should try to reuse fbdev's blitting code, if possible. The 
fbdev core, helpers and console come with all the features we need. We 
really only need to make them work without the struct fb_info, which is 
a full fbdev device.

In struct fb_ops, there are callbacks for modifying the framebuffer. [1] 
They are used by fbcon foir drawing. But they operate on fb_info.

For a while I've been thinking about using something like a drawable to 
provide some abstractions:

struct drawable {
	/* store buffer parameters here */
	...

	struct drawable_funcs *funcs;
};

struct drawable_funcs {
	/* have function pointers similar to struct fb_ops */
	fill_rect()
	copy_area()
	image_blit()
};

We cannot rewrite all the existing fbdev drivers. To make this work with 
fbdev, we'd need adapter code that converts from drawable to fb_info and 
forwards to the existing helpers in fb_ops.

But for DRM's panic output, drawable_funcs would have to point to the 
scanout buffer and compatible callback funcs, for which we have 
implementations in fbdev.

We might be able to create console-like output that is independent from 
the fb_info. Hence, we could possible reuse a good chunk of the current 
panic output.

Best regards
Thomas

[1] https://elixir.bootlin.com/linux/v6.5.1/source/include/linux/fb.h#L273

> 
> Best regards,
> 
> 
> 
> 
> Jocelyn Falempe (2):
>    drm/panic: Add a drm panic handler
>    drm/simpledrm: Add drm_panic support
> 
>   drivers/gpu/drm/Kconfig          |  11 ++
>   drivers/gpu/drm/Makefile         |   1 +
>   drivers/gpu/drm/drm_drv.c        |   3 +
>   drivers/gpu/drm/drm_panic.c      | 286 +++++++++++++++++++++++++++++++
>   drivers/gpu/drm/tiny/simpledrm.c |   2 +
>   include/drm/drm_panic.h          |  26 +++
>   6 files changed, 329 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_panic.c
>   create mode 100644 include/drm/drm_panic.h
> 
> 
> base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1

Jocelyn Falempe Sept. 5, 2023, 2:46 p.m. UTC | #5

On 04/09/2023 16:29, Thomas Zimmermann wrote:
> Hi Jocelyn,
> 
> thanks for moving this effort forward. It's much appreciated. I looked 
> through the patches and tried the patchset on my test machine.

Thanks for taking the time to review and test it.
> 
> Am 09.08.23 um 21:17 schrieb Jocelyn Falempe:
>> This introduces a new drm panic handler, which displays a message when 
>> a panic occurs.
>> So when fbcon is disabled, you can still see a kernel panic.
>>
>> This is one of the missing feature, when disabling VT/fbcon in the 
>> kernel:
>> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
>> Fbcon can be replaced by a userspace kms console, but the panic screen 
>> must be done in the kernel.
>>
>> This is a proof of concept, and works only with simpledrm, using the 
>> drm_client API.
>> This implementation with the drm client API, allocates new 
>> framebuffers, and looks a bit too complex to run in a panic handler.
>> Maybe we should add an API to "steal" the current framebuffer instead, 
>> because in a panic handler user-space is already stopped.
> 
> Yes, that was also my first thought. I'd use an extra callback in struct 
> drm_driver, like this:
> 
> struct drm_driver {
>    int (*get_scanout_buffer)(/* return HW scanout */)
> }
> 
> The scanout buffer would be described by kernel virtual address address, 
> resolution, color format and scanline pitch. And that's what the panic 
> handler uses.

Thanks, I will try this solution. It shouldn't be too hard for simpledrm.
> 
> Any driver implementing this interface would support the panic handler. 
> If there's a concurrent display update, we'd have to synchronize.

Normally in the panic handler, the kernel is already single-threaded, so 
there won't be another task doing things in parallel. (But the GPU might 
still be busy doing other stuff, if we're in the middle of a game).
I think this drm_panic should be a "best effort", we can't guarantee the 
user will see the panic screen in all panic situations.

> 
>>
>> To test it, make sure you're using the simpledrm driver, and trigger a 
>> panic:
>> echo c > /proc/sysrq-trigger
> 
> The penguin was cute. :)
> 
> This only works if the display is already running. I had to start Gnome 
> to set a display mode. Then let the panic handler take over the output.

oh, I didn't expect this limitation. I will try to test that too. It 
might also get fixed by using the get_scanout_buffer() above.
> 
> But with simpledrm, we could even display a message without an output, 
> as the framebuffer is always there.
> 
>>
>> There is one thing I don't know how to do, is to unregister the 
>> drm_panic when the graphic driver is unloaded.
>> drm_client_register() says it will automatically unregister on driver 
>> unload. But then I don't know how to remove it from my linked list, 
>> and free the drm_client_dev struct.
> 
> Unregistering wouldn't be necessary with this proposed 
> get_scanout_buffer. In the case of a panic, just remain silent if 
> there's no driver that provides such a callback.

Is there a way to loop over all drm_devices ?
otherwise drm_panic may still call get_scanout_buffer() on a freed 
device, which would be problematic.
> 
>>
>> This is a first draft, so let me know what do you think about it.
> 
> One thing that will need serious work is the raw output. The current 
> blitting for XRGB8888 is really just a quick-and-dirty hack.
> 
> I think we should try to reuse fbdev's blitting code, if possible. The 
> fbdev core, helpers and console come with all the features we need. We 
> really only need to make them work without the struct fb_info, which is 
> a full fbdev device.

I'm a bit reluctant to re-use the fbdev code, for a few reasons:
  * I want drm_panic to work if CONFIG_FB and CONFIG_DRM_FBDEV_EMULATION 
are not set.
  * drm_panic only needs two things, to clear the framebuffer, and then 
draw white pixels where needed. As the frame is static, and the amount 
of white pixels is low, that should be good enough. So copy_area() or 
image_blit() are not that useful.
  * For this particular use-case, performances are not a priority, it 
doesn't matter if it takes 10us or 100ms to draw the panic screen, as it 
will stay the same until the user reboots.
  * Some aggressive optimizations might cause issues in a panic handler, 
like if you use workqueue/tasklet.

On the other hand, writing the code for all supported formats is a bit 
tedious. drm_log [1] did it in ~300 lines, which should keep code 
duplication low.

> 
> In struct fb_ops, there are callbacks for modifying the framebuffer. [1] 
> They are used by fbcon foir drawing. But they operate on fb_info.
> 
> For a while I've been thinking about using something like a drawable to 
> provide some abstractions:
> 
> struct drawable {
>      /* store buffer parameters here */
>      ...
> 
>      struct drawable_funcs *funcs;
> };
> 
> struct drawable_funcs {
>      /* have function pointers similar to struct fb_ops */
>      fill_rect()
>      copy_area()
>      image_blit()
> };
> 
> We cannot rewrite all the existing fbdev drivers. To make this work with 
> fbdev, we'd need adapter code that converts from drawable to fb_info and 
> forwards to the existing helpers in fb_ops.
> 
> But for DRM's panic output, drawable_funcs would have to point to the 
> scanout buffer and compatible callback funcs, for which we have 
> implementations in fbdev.
> 
> We might be able to create console-like output that is independent from 
> the fb_info. Hence, we could possible reuse a good chunk of the current 
> panic output.

I think that was the goal of drm_log, but this can be done better in 
userspace, for example there is work ongoing to make plymouth display 
them during the boot [2].

For the panic screen, only the kernel can do it. I also think the 
current fbcon/kernel log is good for developer, but too technical for 
most end-user.

Best regards,

Thomas Zimmermann Sept. 6, 2023, 9:14 a.m. UTC | #6

Hi Jocelyn

Am 05.09.23 um 16:46 schrieb Jocelyn Falempe:
> On 04/09/2023 16:29, Thomas Zimmermann wrote:
>> Hi Jocelyn,
>>
>> thanks for moving this effort forward. It's much appreciated. I looked 
>> through the patches and tried the patchset on my test machine.
> 
> Thanks for taking the time to review and test it.
>>
>> Am 09.08.23 um 21:17 schrieb Jocelyn Falempe:
>>> This introduces a new drm panic handler, which displays a message 
>>> when a panic occurs.
>>> So when fbcon is disabled, you can still see a kernel panic.
>>>
>>> This is one of the missing feature, when disabling VT/fbcon in the 
>>> kernel:
>>> https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/
>>> Fbcon can be replaced by a userspace kms console, but the panic 
>>> screen must be done in the kernel.
>>>
>>> This is a proof of concept, and works only with simpledrm, using the 
>>> drm_client API.
>>> This implementation with the drm client API, allocates new 
>>> framebuffers, and looks a bit too complex to run in a panic handler.
>>> Maybe we should add an API to "steal" the current framebuffer 
>>> instead, because in a panic handler user-space is already stopped.
>>
>> Yes, that was also my first thought. I'd use an extra callback in 
>> struct drm_driver, like this:
>>
>> struct drm_driver {
>>    int (*get_scanout_buffer)(/* return HW scanout */)
>> }
>>
>> The scanout buffer would be described by kernel virtual address 
>> address, resolution, color format and scanline pitch. And that's what 
>> the panic handler uses.
> 
> Thanks, I will try this solution. It shouldn't be too hard for simpledrm.
>>
>> Any driver implementing this interface would support the panic 
>> handler. If there's a concurrent display update, we'd have to 
>> synchronize.
> 
> Normally in the panic handler, the kernel is already single-threaded, so 
> there won't be another task doing things in parallel. (But the GPU might 
> still be busy doing other stuff, if we're in the middle of a game).
> I think this drm_panic should be a "best effort", we can't guarantee the 
> user will see the panic screen in all panic situations.
> 
>>
>>>
>>> To test it, make sure you're using the simpledrm driver, and trigger 
>>> a panic:
>>> echo c > /proc/sysrq-trigger
>>
>> The penguin was cute. :)
>>
>> This only works if the display is already running. I had to start 
>> Gnome to set a display mode. Then let the panic handler take over the 
>> output.
> 
> oh, I didn't expect this limitation. I will try to test that too. It 
> might also get fixed by using the get_scanout_buffer() above.

I guess it depends. We'd need a working display mode, or we'd require 
get_scanout_buffer() to set it up for us. IDK how much that is possible 
in the case of a kernel panic.

Apart from that, we should use whatever has been programmed into 
hardware. No need for DRM clients or GEM buffers, etc.

>>
>> But with simpledrm, we could even display a message without an output, 
>> as the framebuffer is always there.
>>
>>>
>>> There is one thing I don't know how to do, is to unregister the 
>>> drm_panic when the graphic driver is unloaded.
>>> drm_client_register() says it will automatically unregister on driver 
>>> unload. But then I don't know how to remove it from my linked list, 
>>> and free the drm_client_dev struct.
>>
>> Unregistering wouldn't be necessary with this proposed 
>> get_scanout_buffer. In the case of a panic, just remain silent if 
>> there's no driver that provides such a callback.
> 
> Is there a way to loop over all drm_devices ?
> otherwise drm_panic may still call get_scanout_buffer() on a freed 
> device, which would be problematic.

I don't see such a list. It could be added as part of the registering 
the device.

>>
>>>
>>> This is a first draft, so let me know what do you think about it.
>>
>> One thing that will need serious work is the raw output. The current 
>> blitting for XRGB8888 is really just a quick-and-dirty hack.
>>
>> I think we should try to reuse fbdev's blitting code, if possible. The 
>> fbdev core, helpers and console come with all the features we need. We 
>> really only need to make them work without the struct fb_info, which 
>> is a full fbdev device.
> 
> I'm a bit reluctant to re-use the fbdev code, for a few reasons:
>   * I want drm_panic to work if CONFIG_FB and CONFIG_DRM_FBDEV_EMULATION 
> are not set.

The code would live in video/ and be independend from fbdev. It's quite 
a bit of refactoring, but might be worth it in the long term.

>   * drm_panic only needs two things, to clear the framebuffer, and then 
> draw white pixels where needed. As the frame is static, and the amount 
> of white pixels is low, that should be good enough. So copy_area() or 
> image_blit() are not that useful.

It's still a lot of color formats. And there's panel rotation to take 
into account. Fbcon has support for this already and knows how to get 
glyphs onto the screen. I'd expect that all this gets useful if you want 
to display a stack trace.

>   * For this particular use-case, performances are not a priority, it 
> doesn't matter if it takes 10us or 100ms to draw the panic screen, as it 
> will stay the same until the user reboots.
>   * Some aggressive optimizations might cause issues in a panic handler, 
> like if you use workqueue/tasklet.
> 
> On the other hand, writing the code for all supported formats is a bit 
> tedious. drm_log [1] did it in ~300 lines, which should keep code 
> duplication low.

I'm concerned that we'll now add yet another set format-conversion 
helpers. We already have the fbdev drawing helpers and DRM's 
format-conversion helpers. IMHO we should try to reuse more of the 
existing code. I assume that we need some extensive prototyping first to 
see how to do this correctly.

Best regards
Thomas

> 
>>
>> In struct fb_ops, there are callbacks for modifying the framebuffer. 
>> [1] They are used by fbcon foir drawing. But they operate on fb_info.
>>
>> For a while I've been thinking about using something like a drawable 
>> to provide some abstractions:
>>
>> struct drawable {
>>      /* store buffer parameters here */
>>      ...
>>
>>      struct drawable_funcs *funcs;
>> };
>>
>> struct drawable_funcs {
>>      /* have function pointers similar to struct fb_ops */
>>      fill_rect()
>>      copy_area()
>>      image_blit()
>> };
>>
>> We cannot rewrite all the existing fbdev drivers. To make this work 
>> with fbdev, we'd need adapter code that converts from drawable to 
>> fb_info and forwards to the existing helpers in fb_ops.
>>
>> But for DRM's panic output, drawable_funcs would have to point to the 
>> scanout buffer and compatible callback funcs, for which we have 
>> implementations in fbdev.
>>
>> We might be able to create console-like output that is independent 
>> from the fb_info. Hence, we could possible reuse a good chunk of the 
>> current panic output.
> 
> I think that was the goal of drm_log, but this can be done better in 
> userspace, for example there is work ongoing to make plymouth display 
> them during the boot [2].
> 
> For the panic screen, only the kernel can do it. I also think the 
> current fbcon/kernel log is good for developer, but too technical for 
> most end-user.
> 
> Best regards,
>

Javier Martinez Canillas Sept. 6, 2023, 9:29 a.m. UTC | #7

Thomas Zimmermann <tzimmermann@suse.de> writes:

Hello Jocelyn and Thomas,

> Hi Jocelyn
>
> Am 05.09.23 um 16:46 schrieb Jocelyn Falempe:

[...]

>> 
>> I'm a bit reluctant to re-use the fbdev code, for a few reasons:
>>   * I want drm_panic to work if CONFIG_FB and CONFIG_DRM_FBDEV_EMULATION 
>> are not set.
>
> The code would live in video/ and be independend from fbdev. It's quite 
> a bit of refactoring, but might be worth it in the long term.
>

FWIW, I agree with Thomas here. The drawing/blitting logic seems to be
useful regardless of fbdev and moving it to video/ makes sense to me,
instead of adding yet another infrastructure to do the same that will
only be used for the DRM panic handler.

[RFC,0/2] drm/panic: Add a drm panic handler

Message

Comments