[v4,2/4] drm/panic: Add a drm panic handler

Message ID	20231003142508.190246-3-jfalempe@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Jocelyn Falempe <jfalempe@redhat.com> To: dri-devel@lists.freedesktop.org, tzimmermann@suse.de, airlied@redhat.com, maarten.lankhorst@linux.intel.com, mripard@kernel.org, daniel@ffwll.ch, javierm@redhat.com, bluescreen_avenger@verizon.net Subject: [PATCH v4 2/4] drm/panic: Add a drm panic handler Date: Tue, 3 Oct 2023 16:22:45 +0200 Message-ID: <20231003142508.190246-3-jfalempe@redhat.com> In-Reply-To: <20231003142508.190246-1-jfalempe@redhat.com> References: <20231003142508.190246-1-jfalempe@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Precedence: list Cc: gpiccoli@igalia.com, Jocelyn Falempe <jfalempe@redhat.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/panic: Add a drm panic handler \| expand [RFC,v4,0/4] drm/panic: Add a drm panic handler [v4,1/4] drm/format-helper: Export line conversion helper for drm_panic [v4,2/4] drm/panic: Add a drm panic handler [v4,3/4] drm/simpledrm: Add drm_panic support [v4,4/4] drm/mgag200: Add drm_panic support

Jocelyn Falempe Oct. 3, 2023, 2:22 p.m. UTC

This module displays a user friendly message when a kernel panic
occurs. It currently doesn't contain any debug information,
but that can be added later.

v2
 * Use get_scanout_buffer() instead of the drm client API.
  (Thomas Zimmermann)
 * Add the panic reason to the panic message (Nerdopolis)
 * Add an exclamation mark (Nerdopolis)

v3
 * Rework the drawing functions, to write the pixels line by line and
 to use the drm conversion helper to support other formats.
 (Thomas Zimmermann)

v4
 * Use drm_fb_r1_to_32bit for fonts (Thomas Zimmermann)
 * Remove the default y to DRM_PANIC config option (Thomas Zimmermann)
 * Add foreground/background color config option
 * Fix the bottom lines not painted if the framebuffer height
   is not a multiple of the font height.
 * Automatically register the device to drm_panic, if the function
   get_scanout_buffer exists. (Thomas Zimmermann)

Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
---
 drivers/gpu/drm/Kconfig     |  22 ++
 drivers/gpu/drm/Makefile    |   1 +
 drivers/gpu/drm/drm_drv.c   |   8 +
 drivers/gpu/drm/drm_panic.c | 413 ++++++++++++++++++++++++++++++++++++
 include/drm/drm_drv.h       |  14 ++
 include/drm/drm_panic.h     |  41 ++++
 6 files changed, 499 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_panic.c
 create mode 100644 include/drm/drm_panic.h

kernel test robot Oct. 5, 2023, 3:39 a.m. UTC | #1

Hi Jocelyn,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 2dde18cd1d8fac735875f2e4987f11817cc0bc2c]

url:    https://github.com/intel-lab-lkp/linux/commits/Jocelyn-Falempe/drm-format-helper-Export-line-conversion-helper-for-drm_panic/20231003-222642
base:   2dde18cd1d8fac735875f2e4987f11817cc0bc2c
patch link:    https://lore.kernel.org/r/20231003142508.190246-3-jfalempe%40redhat.com
patch subject: [PATCH v4 2/4] drm/panic: Add a drm panic handler
config: i386-randconfig-063-20231005 (https://download.01.org/0day-ci/archive/20231005/202310051128.nptC5NyW-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231005/202310051128.nptC5NyW-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310051128.nptC5NyW-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/drm_panic.c:339:23: sparse: sparse: symbol 'drm_panic_notifier' was not declared. Should it be static?

Maxime Ripard Oct. 5, 2023, 8:18 a.m. UTC | #2

Hi,

On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 89e2706cac56..e538c87116d3 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>  struct drm_display_mode;
>  struct drm_mode_create_dumb;
>  struct drm_printer;
> +struct drm_scanout_buffer;
>  struct sg_table;
>  
>  /**
> @@ -408,6 +409,19 @@ struct drm_driver {
>  	 */
>  	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>  
> +	/**
> +	 * @get_scanout_buffer:
> +	 *
> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
> +	 * It is called from a panic callback, and must follow its restrictions.
> +	 *
> +	 * Returns:
> +	 *
> +	 * Zero on success, negative errno on failure.
> +	 */
> +	int (*get_scanout_buffer)(struct drm_device *dev,
> +				  struct drm_scanout_buffer *sb);
> +

What is the format of that buffer? What is supposed to happen if the
planes / CRTC are setup in a way that is incompatible with the buffer
format?

I've said it in that other series, but I really think we should be
having a proper state on the side to deal with those properly.

Maxime

Jocelyn Falempe Oct. 5, 2023, 9:16 a.m. UTC | #3

On 05/10/2023 10:18, Maxime Ripard wrote:
> Hi,
> 
> On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>> index 89e2706cac56..e538c87116d3 100644
>> --- a/include/drm/drm_drv.h
>> +++ b/include/drm/drm_drv.h
>> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>>   struct drm_display_mode;
>>   struct drm_mode_create_dumb;
>>   struct drm_printer;
>> +struct drm_scanout_buffer;
>>   struct sg_table;
>>   
>>   /**
>> @@ -408,6 +409,19 @@ struct drm_driver {
>>   	 */
>>   	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>>   
>> +	/**
>> +	 * @get_scanout_buffer:
>> +	 *
>> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
>> +	 * It is called from a panic callback, and must follow its restrictions.
>> +	 *
>> +	 * Returns:
>> +	 *
>> +	 * Zero on success, negative errno on failure.
>> +	 */
>> +	int (*get_scanout_buffer)(struct drm_device *dev,
>> +				  struct drm_scanout_buffer *sb);
>> +
> 
> What is the format of that buffer? What is supposed to happen if the
> planes / CRTC are setup in a way that is incompatible with the buffer
> format?

Currently, it only supports linear format, either in system memory, or 
iomem.
But really what is needed is the screen size, and a way to write pixels 
to it.
For more complex GPU, I don't know if it's easier to reprogram the GPU 
to linear format, or to add a simple "tiled" support to drm_panic.
What would you propose as a panic interface to handle those complex format ?

Sometime it's also just not possible to write pixels to the screen, like 
if the panic occurs in the middle of suspend/resume, or during a 
mode-setting, and the hardware state is broken. In this case it's ok to 
return an error, and nothing will get displayed.

Best regards,

--

Jocelyn

> 
> I've said it in that other series, but I really think we should be
> having a proper state on the side to deal with those properly.
> 
> Maxime

Maxime Ripard Oct. 6, 2023, 2:35 p.m. UTC | #4

Hi Jocelyn,

On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
> On 05/10/2023 10:18, Maxime Ripard wrote:
> > Hi,
> > 
> > On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
> > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > index 89e2706cac56..e538c87116d3 100644
> > > --- a/include/drm/drm_drv.h
> > > +++ b/include/drm/drm_drv.h
> > > @@ -43,6 +43,7 @@ struct dma_buf_attachment;
> > >   struct drm_display_mode;
> > >   struct drm_mode_create_dumb;
> > >   struct drm_printer;
> > > +struct drm_scanout_buffer;
> > >   struct sg_table;
> > >   /**
> > > @@ -408,6 +409,19 @@ struct drm_driver {
> > >   	 */
> > >   	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> > > +	/**
> > > +	 * @get_scanout_buffer:
> > > +	 *
> > > +	 * Get the current scanout buffer, to display a panic message with drm_panic.
> > > +	 * It is called from a panic callback, and must follow its restrictions.
> > > +	 *
> > > +	 * Returns:
> > > +	 *
> > > +	 * Zero on success, negative errno on failure.
> > > +	 */
> > > +	int (*get_scanout_buffer)(struct drm_device *dev,
> > > +				  struct drm_scanout_buffer *sb);
> > > +
> > 
> > What is the format of that buffer? What is supposed to happen if the
> > planes / CRTC are setup in a way that is incompatible with the buffer
> > format?
> 
> Currently, it only supports linear format, either in system memory, or
> iomem.
> But really what is needed is the screen size, and a way to write pixels to
> it.
> For more complex GPU, I don't know if it's easier to reprogram the GPU to
> linear format, or to add a simple "tiled" support to drm_panic.
> What would you propose as a panic interface to handle those complex format ?

It's not just about tiling, but also about YUV formats. If the display
engine is currently playing a video at the moment, it's probably going
to output some variation of multi-planar YUV and you won't have an RGB
buffer available.

Same story if you're using a dma-buf buffer. You might not even be able
to access that buffer at all from the CPU or the kernel.

I really think we should have some emergency state ready to commit on
the side, and possibly a panic_commit function to prevent things like
sleeping or waiting that regular atomic_commit can use.

That way, you know have all the resources available to you any time.

> Sometime it's also just not possible to write pixels to the screen, like if
> the panic occurs in the middle of suspend/resume, or during a mode-setting,
> and the hardware state is broken. In this case it's ok to return an error,
> and nothing will get displayed.

And yeah, you won't be able to do it every time, but if it's never for
some workload it's going to be a concern.

Anyway, we should at the very least document what we expect here.

Maxime

Noralf Trønnes Oct. 6, 2023, 4:54 p.m. UTC | #5

On 10/6/23 16:35, Maxime Ripard wrote:
> Hi Jocelyn,
> 
> On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
>> On 05/10/2023 10:18, Maxime Ripard wrote:
>>> Hi,
>>>
>>> On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>> index 89e2706cac56..e538c87116d3 100644
>>>> --- a/include/drm/drm_drv.h
>>>> +++ b/include/drm/drm_drv.h
>>>> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>>>>   struct drm_display_mode;
>>>>   struct drm_mode_create_dumb;
>>>>   struct drm_printer;
>>>> +struct drm_scanout_buffer;
>>>>   struct sg_table;
>>>>   /**
>>>> @@ -408,6 +409,19 @@ struct drm_driver {
>>>>   	 */
>>>>   	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>>>> +	/**
>>>> +	 * @get_scanout_buffer:
>>>> +	 *
>>>> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
>>>> +	 * It is called from a panic callback, and must follow its restrictions.
>>>> +	 *
>>>> +	 * Returns:
>>>> +	 *
>>>> +	 * Zero on success, negative errno on failure.
>>>> +	 */
>>>> +	int (*get_scanout_buffer)(struct drm_device *dev,
>>>> +				  struct drm_scanout_buffer *sb);
>>>> +
>>>
>>> What is the format of that buffer? What is supposed to happen if the
>>> planes / CRTC are setup in a way that is incompatible with the buffer
>>> format?
>>
>> Currently, it only supports linear format, either in system memory, or
>> iomem.
>> But really what is needed is the screen size, and a way to write pixels to
>> it.
>> For more complex GPU, I don't know if it's easier to reprogram the GPU to
>> linear format, or to add a simple "tiled" support to drm_panic.
>> What would you propose as a panic interface to handle those complex format ?
> 
> It's not just about tiling, but also about YUV formats. If the display
> engine is currently playing a video at the moment, it's probably going
> to output some variation of multi-planar YUV and you won't have an RGB
> buffer available.
> 

I had support for some YUV formats in my 2019 attempt on a panic
handler[1] and I made a recording of a test run as well[2] (see 4:30 for
YUV). There was a discussion about challenges and i915 can disable
tiling by flipping a bit in a register[3] and AMD has a debug
interface[4] they can use to write pixels.

Noralf.

[1]
https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
[2] https://youtu.be/lZ80vL4dgpE
[3]
https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
[4]
https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/

> Same story if you're using a dma-buf buffer. You might not even be able
> to access that buffer at all from the CPU or the kernel.
> 
> I really think we should have some emergency state ready to commit on
> the side, and possibly a panic_commit function to prevent things like
> sleeping or waiting that regular atomic_commit can use.
> 
> That way, you know have all the resources available to you any time.
> 
>> Sometime it's also just not possible to write pixels to the screen, like if
>> the panic occurs in the middle of suspend/resume, or during a mode-setting,
>> and the hardware state is broken. In this case it's ok to return an error,
>> and nothing will get displayed.
> 
> And yeah, you won't be able to do it every time, but if it's never for
> some workload it's going to be a concern.
> 
> Anyway, we should at the very least document what we expect here.
> 
> Maxime

Noralf Trønnes Oct. 7, 2023, 12:38 p.m. UTC | #6

On 10/3/23 16:22, Jocelyn Falempe wrote:
> This module displays a user friendly message when a kernel panic
> occurs. It currently doesn't contain any debug information,
> but that can be added later.
> 
> v2
>  * Use get_scanout_buffer() instead of the drm client API.
>   (Thomas Zimmermann)
>  * Add the panic reason to the panic message (Nerdopolis)
>  * Add an exclamation mark (Nerdopolis)
> 
> v3
>  * Rework the drawing functions, to write the pixels line by line and
>  to use the drm conversion helper to support other formats.
>  (Thomas Zimmermann)
> 
> v4
>  * Use drm_fb_r1_to_32bit for fonts (Thomas Zimmermann)
>  * Remove the default y to DRM_PANIC config option (Thomas Zimmermann)
>  * Add foreground/background color config option
>  * Fix the bottom lines not painted if the framebuffer height
>    is not a multiple of the font height.
>  * Automatically register the device to drm_panic, if the function
>    get_scanout_buffer exists. (Thomas Zimmermann)
> 
> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
> ---
>  drivers/gpu/drm/Kconfig     |  22 ++
>  drivers/gpu/drm/Makefile    |   1 +
>  drivers/gpu/drm/drm_drv.c   |   8 +
>  drivers/gpu/drm/drm_panic.c | 413 ++++++++++++++++++++++++++++++++++++
>  include/drm/drm_drv.h       |  14 ++
>  include/drm/drm_panic.h     |  41 ++++
>  6 files changed, 499 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_panic.c
>  create mode 100644 include/drm/drm_panic.h
> 

> diff --git a/drivers/gpu/drm/drm_panic.c b/drivers/gpu/drm/drm_panic.c
> new file mode 100644

> +static void draw_panic_device(struct drm_device *dev, const char *msg)
> +{
> +	struct drm_scanout_buffer sb;
> +
> +	if (dev->driver->get_scanout_buffer(dev, &sb))
> +		return;
> +	if (!drm_panic_line_buf)
> +		return;
> +

Unless something has changed since 2019 we need to make sure fbcon
hasn't already printed the panic to avoid jumbled output. See [1] for
more info.

Noralf.

[1]
https://lore.kernel.org/dri-devel/20190312095320.GX2665@phenom.ffwll.local/

> +	/* to avoid buffer overflow on drm_panic_line_buf */
> +	if (sb.width > DRM_PANIC_MAX_WIDTH)
> +		sb.width = DRM_PANIC_MAX_WIDTH;
> +
> +	draw_panic_static(&sb, msg);
> +}

Jocelyn Falempe Oct. 9, 2023, 7:47 a.m. UTC | #7

On 06/10/2023 18:54, Noralf Trønnes wrote:
> 
> 
> On 10/6/23 16:35, Maxime Ripard wrote:
>> Hi Jocelyn,
>>
>> On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
>>> On 05/10/2023 10:18, Maxime Ripard wrote:
>>>> Hi,
>>>>
>>>> On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>> index 89e2706cac56..e538c87116d3 100644
>>>>> --- a/include/drm/drm_drv.h
>>>>> +++ b/include/drm/drm_drv.h
>>>>> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>>>>>    struct drm_display_mode;
>>>>>    struct drm_mode_create_dumb;
>>>>>    struct drm_printer;
>>>>> +struct drm_scanout_buffer;
>>>>>    struct sg_table;
>>>>>    /**
>>>>> @@ -408,6 +409,19 @@ struct drm_driver {
>>>>>    	 */
>>>>>    	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>>>>> +	/**
>>>>> +	 * @get_scanout_buffer:
>>>>> +	 *
>>>>> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
>>>>> +	 * It is called from a panic callback, and must follow its restrictions.
>>>>> +	 *
>>>>> +	 * Returns:
>>>>> +	 *
>>>>> +	 * Zero on success, negative errno on failure.
>>>>> +	 */
>>>>> +	int (*get_scanout_buffer)(struct drm_device *dev,
>>>>> +				  struct drm_scanout_buffer *sb);
>>>>> +
>>>>
>>>> What is the format of that buffer? What is supposed to happen if the
>>>> planes / CRTC are setup in a way that is incompatible with the buffer
>>>> format?
>>>
>>> Currently, it only supports linear format, either in system memory, or
>>> iomem.
>>> But really what is needed is the screen size, and a way to write pixels to
>>> it.
>>> For more complex GPU, I don't know if it's easier to reprogram the GPU to
>>> linear format, or to add a simple "tiled" support to drm_panic.
>>> What would you propose as a panic interface to handle those complex format ?
>>
>> It's not just about tiling, but also about YUV formats. If the display
>> engine is currently playing a video at the moment, it's probably going
>> to output some variation of multi-planar YUV and you won't have an RGB
>> buffer available.
>>
> 
> I had support for some YUV formats in my 2019 attempt on a panic
> handler[1] and I made a recording of a test run as well[2] (see 4:30 for
> YUV). There was a discussion about challenges and i915 can disable
> tiling by flipping a bit in a register[3] and AMD has a debug
> interface[4] they can use to write pixels.

I only added support for the format used by simpledrm, because I don't 
want to add support for all possible format if no driver are using it.
It should be possible to add YUV format too.

I also prefer to convert only the foreground/background color, and then 
write directly into the buffers, instead of converting line by line.
It works for all format where pixel size is a multiple of byte.

> 
> Noralf.
> 
> [1]
> https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
> [2] https://youtu.be/lZ80vL4dgpE
> [3]
> https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
> [4]
> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> 
>> Same story if you're using a dma-buf buffer. You might not even be able
>> to access that buffer at all from the CPU or the kernel.
>>
>> I really think we should have some emergency state ready to commit on
>> the side, and possibly a panic_commit function to prevent things like
>> sleeping or waiting that regular atomic_commit can use.
>>
>> That way, you know have all the resources available to you any time.

I think reusing the atomic commit functions might be hard, because there 
are locks/allocation/threads hidden in drivers callback. I'm more in 
favor of an emergency function, that each driver has to implement, and 
use what the hardware can do to display a simple frame quickly.
get_scanout_buffer() is a good start for simple driver, but will need 
refactoring for the more complex case, like adding a callback to write 
pixels one by one, if there is no memory mapped buffer available.

>>
>>> Sometime it's also just not possible to write pixels to the screen, like if
>>> the panic occurs in the middle of suspend/resume, or during a mode-setting,
>>> and the hardware state is broken. In this case it's ok to return an error,
>>> and nothing will get displayed.
>>
>> And yeah, you won't be able to do it every time, but if it's never for
>> some workload it's going to be a concern.
>>
>> Anyway, we should at the very least document what we expect here.

Yes I should better document the drm panic feature, and the 
get_scanout_buffer() interface.

>>
>> Maxime
>

Jocelyn Falempe Oct. 9, 2023, 8:01 a.m. UTC | #8

On 07/10/2023 14:38, Noralf Trønnes wrote:
> 
> 
> On 10/3/23 16:22, Jocelyn Falempe wrote:
>> This module displays a user friendly message when a kernel panic
>> occurs. It currently doesn't contain any debug information,
>> but that can be added later.
>>
>> v2
>>   * Use get_scanout_buffer() instead of the drm client API.
>>    (Thomas Zimmermann)
>>   * Add the panic reason to the panic message (Nerdopolis)
>>   * Add an exclamation mark (Nerdopolis)
>>
>> v3
>>   * Rework the drawing functions, to write the pixels line by line and
>>   to use the drm conversion helper to support other formats.
>>   (Thomas Zimmermann)
>>
>> v4
>>   * Use drm_fb_r1_to_32bit for fonts (Thomas Zimmermann)
>>   * Remove the default y to DRM_PANIC config option (Thomas Zimmermann)
>>   * Add foreground/background color config option
>>   * Fix the bottom lines not painted if the framebuffer height
>>     is not a multiple of the font height.
>>   * Automatically register the device to drm_panic, if the function
>>     get_scanout_buffer exists. (Thomas Zimmermann)
>>
>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
>> ---
>>   drivers/gpu/drm/Kconfig     |  22 ++
>>   drivers/gpu/drm/Makefile    |   1 +
>>   drivers/gpu/drm/drm_drv.c   |   8 +
>>   drivers/gpu/drm/drm_panic.c | 413 ++++++++++++++++++++++++++++++++++++
>>   include/drm/drm_drv.h       |  14 ++
>>   include/drm/drm_panic.h     |  41 ++++
>>   6 files changed, 499 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_panic.c
>>   create mode 100644 include/drm/drm_panic.h
>>
> 
>> diff --git a/drivers/gpu/drm/drm_panic.c b/drivers/gpu/drm/drm_panic.c
>> new file mode 100644
> 
>> +static void draw_panic_device(struct drm_device *dev, const char *msg)
>> +{
>> +	struct drm_scanout_buffer sb;
>> +
>> +	if (dev->driver->get_scanout_buffer(dev, &sb))
>> +		return;
>> +	if (!drm_panic_line_buf)
>> +		return;
>> +
> 
> Unless something has changed since 2019 we need to make sure fbcon
> hasn't already printed the panic to avoid jumbled output. See [1] for
> more info.

I think DRM_PANIC and fbcon are incompatible, so in Kconfig I prevent 
them to be built together:

config DRM_PANIC
   bool "Display a user-friendly message when a kernel panic occurs"
   depends on DRM && !FRAMEBUFFER_CONSOLE

So DRM_PANIC should be used with a userspace console, either kmscon, or 
some lightweight terminal emulator/wayland compositor, like cage/foot.

As fbcon has lost scrolling support, it's time to switch to something 
better. [1]

Maxime Ripard Oct. 9, 2023, 8:28 a.m. UTC | #9

Hi,

On Mon, Oct 09, 2023 at 09:47:49AM +0200, Jocelyn Falempe wrote:
> On 06/10/2023 18:54, Noralf Trønnes wrote:
> > 
> > 
> > On 10/6/23 16:35, Maxime Ripard wrote:
> > > Hi Jocelyn,
> > > 
> > > On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
> > > > On 05/10/2023 10:18, Maxime Ripard wrote:
> > > > > Hi,
> > > > > 
> > > > > On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
> > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > index 89e2706cac56..e538c87116d3 100644
> > > > > > --- a/include/drm/drm_drv.h
> > > > > > +++ b/include/drm/drm_drv.h
> > > > > > @@ -43,6 +43,7 @@ struct dma_buf_attachment;
> > > > > >    struct drm_display_mode;
> > > > > >    struct drm_mode_create_dumb;
> > > > > >    struct drm_printer;
> > > > > > +struct drm_scanout_buffer;
> > > > > >    struct sg_table;
> > > > > >    /**
> > > > > > @@ -408,6 +409,19 @@ struct drm_driver {
> > > > > >    	 */
> > > > > >    	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> > > > > > +	/**
> > > > > > +	 * @get_scanout_buffer:
> > > > > > +	 *
> > > > > > +	 * Get the current scanout buffer, to display a panic message with drm_panic.
> > > > > > +	 * It is called from a panic callback, and must follow its restrictions.
> > > > > > +	 *
> > > > > > +	 * Returns:
> > > > > > +	 *
> > > > > > +	 * Zero on success, negative errno on failure.
> > > > > > +	 */
> > > > > > +	int (*get_scanout_buffer)(struct drm_device *dev,
> > > > > > +				  struct drm_scanout_buffer *sb);
> > > > > > +
> > > > > 
> > > > > What is the format of that buffer? What is supposed to happen if the
> > > > > planes / CRTC are setup in a way that is incompatible with the buffer
> > > > > format?
> > > > 
> > > > Currently, it only supports linear format, either in system memory, or
> > > > iomem.
> > > > But really what is needed is the screen size, and a way to write pixels to
> > > > it.
> > > > For more complex GPU, I don't know if it's easier to reprogram the GPU to
> > > > linear format, or to add a simple "tiled" support to drm_panic.
> > > > What would you propose as a panic interface to handle those complex format ?
> > > 
> > > It's not just about tiling, but also about YUV formats. If the display
> > > engine is currently playing a video at the moment, it's probably going
> > > to output some variation of multi-planar YUV and you won't have an RGB
> > > buffer available.
> > > 
> > 
> > I had support for some YUV formats in my 2019 attempt on a panic
> > handler[1] and I made a recording of a test run as well[2] (see 4:30 for
> > YUV). There was a discussion about challenges and i915 can disable
> > tiling by flipping a bit in a register[3] and AMD has a debug
> > interface[4] they can use to write pixels.
> 
> I only added support for the format used by simpledrm, because I don't want
> to add support for all possible format if no driver are using it.

Sure.

> It should be possible to add YUV format too.
>
> I also prefer to convert only the foreground/background color, and then
> write directly into the buffers, instead of converting line by line.
> It works for all format where pixel size is a multiple of byte.

My point was that there might not be a buffer to write to.
DMA_ATTR_NO_KERNEL_MAPPING exists, dma-buf might be unaccessible, unsafe
or be completely out of control of the kernel space, or even not be
accessible by the system at all (when doing secure playback for example,
where the "trusted" component will do the decoding and will only give
back a dma-buf handle to a secure memory buffer).

I appreciate that we probably don't want to address these scenarios
right now, but we should have a path forward to support them eventually.
Copying the panic handler content to the buffer is optimistic and won't
work in all the scenarios described above, pretty much requiring to
start from scratch that effort.

> > https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
> > [2] https://youtu.be/lZ80vL4dgpE
> > [3]
> > https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
> > [4]
> > https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> > 
> > > Same story if you're using a dma-buf buffer. You might not even be able
> > > to access that buffer at all from the CPU or the kernel.
> > > 
> > > I really think we should have some emergency state ready to commit on
> > > the side, and possibly a panic_commit function to prevent things like
> > > sleeping or waiting that regular atomic_commit can use.
> > > 
> > > That way, you know have all the resources available to you any time.
> 
> I think reusing the atomic commit functions might be hard, because there are
> locks/allocation/threads hidden in drivers callback.

Allocations are bugs as far as I'm concerned. Allocations in
atomic_commit path are pretty big hint that you're doing something wrong
so I wouldn't worry too much about them. For locking, yeah... Which is
why I was suggesting an emergency atomic_commit of some sorts (for
planes only?). Switching back to whatever we were doing to an RGB plane
should be simple enough for most drivers and probably can be done safely
enough on most drivers without any locks.

And you don't need to support all kinds of tiling, YUV or RGB variants.

> I'm more in favor of an emergency function, that each driver has to
> implement, and use what the hardware can do to display a simple frame
> quickly. get_scanout_buffer() is a good start for simple driver, but
> will need refactoring for the more complex case, like adding a
> callback to write pixels one by one, if there is no memory mapped
> buffer available.

Sorry, I'm not quite sure what you mean there, where would you write the
pixel to?

Maxime

Jocelyn Falempe Oct. 9, 2023, 9:48 a.m. UTC | #10

On 09/10/2023 10:28, Maxime Ripard wrote:
> Hi,
> 
> On Mon, Oct 09, 2023 at 09:47:49AM +0200, Jocelyn Falempe wrote:
>> On 06/10/2023 18:54, Noralf Trønnes wrote:
>>>
>>>
>>> On 10/6/23 16:35, Maxime Ripard wrote:
>>>> Hi Jocelyn,
>>>>
>>>> On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
>>>>> On 05/10/2023 10:18, Maxime Ripard wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>>>> index 89e2706cac56..e538c87116d3 100644
>>>>>>> --- a/include/drm/drm_drv.h
>>>>>>> +++ b/include/drm/drm_drv.h
>>>>>>> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>>>>>>>     struct drm_display_mode;
>>>>>>>     struct drm_mode_create_dumb;
>>>>>>>     struct drm_printer;
>>>>>>> +struct drm_scanout_buffer;
>>>>>>>     struct sg_table;
>>>>>>>     /**
>>>>>>> @@ -408,6 +409,19 @@ struct drm_driver {
>>>>>>>     	 */
>>>>>>>     	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>>>>>>> +	/**
>>>>>>> +	 * @get_scanout_buffer:
>>>>>>> +	 *
>>>>>>> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
>>>>>>> +	 * It is called from a panic callback, and must follow its restrictions.
>>>>>>> +	 *
>>>>>>> +	 * Returns:
>>>>>>> +	 *
>>>>>>> +	 * Zero on success, negative errno on failure.
>>>>>>> +	 */
>>>>>>> +	int (*get_scanout_buffer)(struct drm_device *dev,
>>>>>>> +				  struct drm_scanout_buffer *sb);
>>>>>>> +
>>>>>>
>>>>>> What is the format of that buffer? What is supposed to happen if the
>>>>>> planes / CRTC are setup in a way that is incompatible with the buffer
>>>>>> format?
>>>>>
>>>>> Currently, it only supports linear format, either in system memory, or
>>>>> iomem.
>>>>> But really what is needed is the screen size, and a way to write pixels to
>>>>> it.
>>>>> For more complex GPU, I don't know if it's easier to reprogram the GPU to
>>>>> linear format, or to add a simple "tiled" support to drm_panic.
>>>>> What would you propose as a panic interface to handle those complex format ?
>>>>
>>>> It's not just about tiling, but also about YUV formats. If the display
>>>> engine is currently playing a video at the moment, it's probably going
>>>> to output some variation of multi-planar YUV and you won't have an RGB
>>>> buffer available.
>>>>
>>>
>>> I had support for some YUV formats in my 2019 attempt on a panic
>>> handler[1] and I made a recording of a test run as well[2] (see 4:30 for
>>> YUV). There was a discussion about challenges and i915 can disable
>>> tiling by flipping a bit in a register[3] and AMD has a debug
>>> interface[4] they can use to write pixels.
>>
>> I only added support for the format used by simpledrm, because I don't want
>> to add support for all possible format if no driver are using it.
> 
> Sure.
> 
>> It should be possible to add YUV format too.
>>
>> I also prefer to convert only the foreground/background color, and then
>> write directly into the buffers, instead of converting line by line.
>> It works for all format where pixel size is a multiple of byte.
> 
> My point was that there might not be a buffer to write to.
> DMA_ATTR_NO_KERNEL_MAPPING exists, dma-buf might be unaccessible, unsafe
> or be completely out of control of the kernel space, or even not be
> accessible by the system at all (when doing secure playback for example,
> where the "trusted" component will do the decoding and will only give
> back a dma-buf handle to a secure memory buffer).
> 
> I appreciate that we probably don't want to address these scenarios
> right now, but we should have a path forward to support them eventually.
> Copying the panic handler content to the buffer is optimistic and won't
> work in all the scenarios described above, pretty much requiring to
> start from scratch that effort.
> 
>>> https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
>>> [2] https://youtu.be/lZ80vL4dgpE
>>> [3]
>>> https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
>>> [4]
>>> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
>>>
>>>> Same story if you're using a dma-buf buffer. You might not even be able
>>>> to access that buffer at all from the CPU or the kernel.
>>>>
>>>> I really think we should have some emergency state ready to commit on
>>>> the side, and possibly a panic_commit function to prevent things like
>>>> sleeping or waiting that regular atomic_commit can use.
>>>>
>>>> That way, you know have all the resources available to you any time.
>>
>> I think reusing the atomic commit functions might be hard, because there are
>> locks/allocation/threads hidden in drivers callback.
> 
> Allocations are bugs as far as I'm concerned. Allocations in
> atomic_commit path are pretty big hint that you're doing something wrong
> so I wouldn't worry too much about them. For locking, yeah... Which is
> why I was suggesting an emergency atomic_commit of some sorts (for
> planes only?). Switching back to whatever we were doing to an RGB plane
> should be simple enough for most drivers and probably can be done safely
> enough on most drivers without any locks.
> 
> And you don't need to support all kinds of tiling, YUV or RGB variants.

So if I understand correctly, drm_panic would pre-allocate a 
plane/commit, and use that when a panic occurs ?
I have two concern about this approach:
- How much memory would be allocated for this ? a whole framebuffer can 
be big for just this use case.
- I find it risky to completely reconfigure the hardware in a panic handler.

Also how many drivers would need this ?

Currently I was mostly considering x86 platform, so:

simpledrm/ast/mgag200 which works well with the get_scanout_buffer().

i915/amdgpu/nouveau, which are quite complex, and will need to do their 
own thing anyway.


> 
>> I'm more in favor of an emergency function, that each driver has to
>> implement, and use what the hardware can do to display a simple frame
>> quickly. get_scanout_buffer() is a good start for simple driver, but
>> will need refactoring for the more complex case, like adding a
>> callback to write pixels one by one, if there is no memory mapped
>> buffer available.
> 
> Sorry, I'm not quite sure what you mean there, where would you write the
> pixel to?

It was mentioned by Noralf, about the amdgpu driver:

https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/

They have a slow "debug interface" that you can write to, and can be 
used for the panic handler. It's not memory mapped, so you have to write 
pixels one by one.

So for the struct drm_scanout_buffer, I can add a function pointer to a 
write_pixel(u32 x, u32 y, u32 color)
So if the iosys map is null, it will use that instead.

> 
> Maxime

Maxime Ripard Oct. 9, 2023, 11:33 a.m. UTC | #11

On Mon, Oct 09, 2023 at 11:48:29AM +0200, Jocelyn Falempe wrote:
> On 09/10/2023 10:28, Maxime Ripard wrote:
> > Hi,
> > 
> > On Mon, Oct 09, 2023 at 09:47:49AM +0200, Jocelyn Falempe wrote:
> > > On 06/10/2023 18:54, Noralf Trønnes wrote:
> > > > 
> > > > 
> > > > On 10/6/23 16:35, Maxime Ripard wrote:
> > > > > Hi Jocelyn,
> > > > > 
> > > > > On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
> > > > > > On 05/10/2023 10:18, Maxime Ripard wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
> > > > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > > > index 89e2706cac56..e538c87116d3 100644
> > > > > > > > --- a/include/drm/drm_drv.h
> > > > > > > > +++ b/include/drm/drm_drv.h
> > > > > > > > @@ -43,6 +43,7 @@ struct dma_buf_attachment;
> > > > > > > >     struct drm_display_mode;
> > > > > > > >     struct drm_mode_create_dumb;
> > > > > > > >     struct drm_printer;
> > > > > > > > +struct drm_scanout_buffer;
> > > > > > > >     struct sg_table;
> > > > > > > >     /**
> > > > > > > > @@ -408,6 +409,19 @@ struct drm_driver {
> > > > > > > >     	 */
> > > > > > > >     	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> > > > > > > > +	/**
> > > > > > > > +	 * @get_scanout_buffer:
> > > > > > > > +	 *
> > > > > > > > +	 * Get the current scanout buffer, to display a panic message with drm_panic.
> > > > > > > > +	 * It is called from a panic callback, and must follow its restrictions.
> > > > > > > > +	 *
> > > > > > > > +	 * Returns:
> > > > > > > > +	 *
> > > > > > > > +	 * Zero on success, negative errno on failure.
> > > > > > > > +	 */
> > > > > > > > +	int (*get_scanout_buffer)(struct drm_device *dev,
> > > > > > > > +				  struct drm_scanout_buffer *sb);
> > > > > > > > +
> > > > > > > 
> > > > > > > What is the format of that buffer? What is supposed to happen if the
> > > > > > > planes / CRTC are setup in a way that is incompatible with the buffer
> > > > > > > format?
> > > > > > 
> > > > > > Currently, it only supports linear format, either in system memory, or
> > > > > > iomem.
> > > > > > But really what is needed is the screen size, and a way to write pixels to
> > > > > > it.
> > > > > > For more complex GPU, I don't know if it's easier to reprogram the GPU to
> > > > > > linear format, or to add a simple "tiled" support to drm_panic.
> > > > > > What would you propose as a panic interface to handle those complex format ?
> > > > > 
> > > > > It's not just about tiling, but also about YUV formats. If the display
> > > > > engine is currently playing a video at the moment, it's probably going
> > > > > to output some variation of multi-planar YUV and you won't have an RGB
> > > > > buffer available.
> > > > > 
> > > > 
> > > > I had support for some YUV formats in my 2019 attempt on a panic
> > > > handler[1] and I made a recording of a test run as well[2] (see 4:30 for
> > > > YUV). There was a discussion about challenges and i915 can disable
> > > > tiling by flipping a bit in a register[3] and AMD has a debug
> > > > interface[4] they can use to write pixels.
> > > 
> > > I only added support for the format used by simpledrm, because I don't want
> > > to add support for all possible format if no driver are using it.
> > 
> > Sure.
> > 
> > > It should be possible to add YUV format too.
> > > 
> > > I also prefer to convert only the foreground/background color, and then
> > > write directly into the buffers, instead of converting line by line.
> > > It works for all format where pixel size is a multiple of byte.
> > 
> > My point was that there might not be a buffer to write to.
> > DMA_ATTR_NO_KERNEL_MAPPING exists, dma-buf might be unaccessible, unsafe
> > or be completely out of control of the kernel space, or even not be
> > accessible by the system at all (when doing secure playback for example,
> > where the "trusted" component will do the decoding and will only give
> > back a dma-buf handle to a secure memory buffer).
> > 
> > I appreciate that we probably don't want to address these scenarios
> > right now, but we should have a path forward to support them eventually.
> > Copying the panic handler content to the buffer is optimistic and won't
> > work in all the scenarios described above, pretty much requiring to
> > start from scratch that effort.
> > 
> > > > https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
> > > > [2] https://youtu.be/lZ80vL4dgpE
> > > > [3]
> > > > https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
> > > > [4]
> > > > https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> > > > 
> > > > > Same story if you're using a dma-buf buffer. You might not even be able
> > > > > to access that buffer at all from the CPU or the kernel.
> > > > > 
> > > > > I really think we should have some emergency state ready to commit on
> > > > > the side, and possibly a panic_commit function to prevent things like
> > > > > sleeping or waiting that regular atomic_commit can use.
> > > > > 
> > > > > That way, you know have all the resources available to you any time.
> > > 
> > > I think reusing the atomic commit functions might be hard, because there are
> > > locks/allocation/threads hidden in drivers callback.
> > 
> > Allocations are bugs as far as I'm concerned. Allocations in
> > atomic_commit path are pretty big hint that you're doing something wrong
> > so I wouldn't worry too much about them. For locking, yeah... Which is
> > why I was suggesting an emergency atomic_commit of some sorts (for
> > planes only?). Switching back to whatever we were doing to an RGB plane
> > should be simple enough for most drivers and probably can be done safely
> > enough on most drivers without any locks.
> > 
> > And you don't need to support all kinds of tiling, YUV or RGB variants.
> 
> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> and use that when a panic occurs ?

And have it checked already, yes. We would only wait for a panic to
happen to pull the trigger on the commit.

> I have two concern about this approach:
> - How much memory would be allocated for this ? a whole framebuffer can be
> big for just this use case.

I'dd expect a whole framebuffer for the current
configuration/resolution. It would be typically 4MB for a full-HD system
which isn't a lot really and I guess we can always add an option to
disable the mechanism if needed.

> - I find it risky to completely reconfigure the hardware in a panic handler.

I would expect to only change the format and base address of the
framebuffer. I guess it can fail, but it doesn't seem that different to
the async plane update we already have and works well.

> Also how many drivers would need this ?
> 
> Currently I was mostly considering x86 platform, so:
> 
> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
> 
> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
> thing anyway.

I guess we're not entirely aligned there then. I would expect that
mechanism to work with any atomic KMS driver. You are right that i915,
amdgpu and nouveau are special enough that some extra internal plumbing
is going to be required, but I'd expect it to be easy to support with
any other driver for a memory-mapped device.

I guess what I'm trying to say is, even though it's totally fine that
you only support those drivers at first, supporting in vc4 for example
shouldn't require to rewrite the whole thing.

> > > I'm more in favor of an emergency function, that each driver has to
> > > implement, and use what the hardware can do to display a simple frame
> > > quickly. get_scanout_buffer() is a good start for simple driver, but
> > > will need refactoring for the more complex case, like adding a
> > > callback to write pixels one by one, if there is no memory mapped
> > > buffer available.
> > 
> > Sorry, I'm not quite sure what you mean there, where would you write the
> > pixel to?
> 
> It was mentioned by Noralf, about the amdgpu driver:
> 
> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> 
> They have a slow "debug interface" that you can write to, and can be used
> for the panic handler. It's not memory mapped, so you have to write pixels
> one by one.
> 
> So for the struct drm_scanout_buffer, I can add a function pointer to a
> write_pixel(u32 x, u32 y, u32 color)
> So if the iosys map is null, it will use that instead.

It would be nice to support indeed, but it's a fairly rare feature afaik
so I'd rather focus on getting something that can work for everyone first

Maxime

Jocelyn Falempe Oct. 9, 2023, 2:05 p.m. UTC | #12

On 09/10/2023 13:33, Maxime Ripard wrote:
> On Mon, Oct 09, 2023 at 11:48:29AM +0200, Jocelyn Falempe wrote:
>> On 09/10/2023 10:28, Maxime Ripard wrote:
>>> Hi,
>>>
>>> On Mon, Oct 09, 2023 at 09:47:49AM +0200, Jocelyn Falempe wrote:
>>>> On 06/10/2023 18:54, Noralf Trønnes wrote:
>>>>>
>>>>>
>>>>> On 10/6/23 16:35, Maxime Ripard wrote:
>>>>>> Hi Jocelyn,
>>>>>>
>>>>>> On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
>>>>>>> On 05/10/2023 10:18, Maxime Ripard wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
>>>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>>>>>> index 89e2706cac56..e538c87116d3 100644
>>>>>>>>> --- a/include/drm/drm_drv.h
>>>>>>>>> +++ b/include/drm/drm_drv.h
>>>>>>>>> @@ -43,6 +43,7 @@ struct dma_buf_attachment;
>>>>>>>>>      struct drm_display_mode;
>>>>>>>>>      struct drm_mode_create_dumb;
>>>>>>>>>      struct drm_printer;
>>>>>>>>> +struct drm_scanout_buffer;
>>>>>>>>>      struct sg_table;
>>>>>>>>>      /**
>>>>>>>>> @@ -408,6 +409,19 @@ struct drm_driver {
>>>>>>>>>      	 */
>>>>>>>>>      	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>>>>>>>>> +	/**
>>>>>>>>> +	 * @get_scanout_buffer:
>>>>>>>>> +	 *
>>>>>>>>> +	 * Get the current scanout buffer, to display a panic message with drm_panic.
>>>>>>>>> +	 * It is called from a panic callback, and must follow its restrictions.
>>>>>>>>> +	 *
>>>>>>>>> +	 * Returns:
>>>>>>>>> +	 *
>>>>>>>>> +	 * Zero on success, negative errno on failure.
>>>>>>>>> +	 */
>>>>>>>>> +	int (*get_scanout_buffer)(struct drm_device *dev,
>>>>>>>>> +				  struct drm_scanout_buffer *sb);
>>>>>>>>> +
>>>>>>>>
>>>>>>>> What is the format of that buffer? What is supposed to happen if the
>>>>>>>> planes / CRTC are setup in a way that is incompatible with the buffer
>>>>>>>> format?
>>>>>>>
>>>>>>> Currently, it only supports linear format, either in system memory, or
>>>>>>> iomem.
>>>>>>> But really what is needed is the screen size, and a way to write pixels to
>>>>>>> it.
>>>>>>> For more complex GPU, I don't know if it's easier to reprogram the GPU to
>>>>>>> linear format, or to add a simple "tiled" support to drm_panic.
>>>>>>> What would you propose as a panic interface to handle those complex format ?
>>>>>>
>>>>>> It's not just about tiling, but also about YUV formats. If the display
>>>>>> engine is currently playing a video at the moment, it's probably going
>>>>>> to output some variation of multi-planar YUV and you won't have an RGB
>>>>>> buffer available.
>>>>>>
>>>>>
>>>>> I had support for some YUV formats in my 2019 attempt on a panic
>>>>> handler[1] and I made a recording of a test run as well[2] (see 4:30 for
>>>>> YUV). There was a discussion about challenges and i915 can disable
>>>>> tiling by flipping a bit in a register[3] and AMD has a debug
>>>>> interface[4] they can use to write pixels.
>>>>
>>>> I only added support for the format used by simpledrm, because I don't want
>>>> to add support for all possible format if no driver are using it.
>>>
>>> Sure.
>>>
>>>> It should be possible to add YUV format too.
>>>>
>>>> I also prefer to convert only the foreground/background color, and then
>>>> write directly into the buffers, instead of converting line by line.
>>>> It works for all format where pixel size is a multiple of byte.
>>>
>>> My point was that there might not be a buffer to write to.
>>> DMA_ATTR_NO_KERNEL_MAPPING exists, dma-buf might be unaccessible, unsafe
>>> or be completely out of control of the kernel space, or even not be
>>> accessible by the system at all (when doing secure playback for example,
>>> where the "trusted" component will do the decoding and will only give
>>> back a dma-buf handle to a secure memory buffer).
>>>
>>> I appreciate that we probably don't want to address these scenarios
>>> right now, but we should have a path forward to support them eventually.
>>> Copying the panic handler content to the buffer is optimistic and won't
>>> work in all the scenarios described above, pretty much requiring to
>>> start from scratch that effort.
>>>
>>>>> https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
>>>>> [2] https://youtu.be/lZ80vL4dgpE
>>>>> [3]
>>>>> https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
>>>>> [4]
>>>>> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
>>>>>
>>>>>> Same story if you're using a dma-buf buffer. You might not even be able
>>>>>> to access that buffer at all from the CPU or the kernel.
>>>>>>
>>>>>> I really think we should have some emergency state ready to commit on
>>>>>> the side, and possibly a panic_commit function to prevent things like
>>>>>> sleeping or waiting that regular atomic_commit can use.
>>>>>>
>>>>>> That way, you know have all the resources available to you any time.
>>>>
>>>> I think reusing the atomic commit functions might be hard, because there are
>>>> locks/allocation/threads hidden in drivers callback.
>>>
>>> Allocations are bugs as far as I'm concerned. Allocations in
>>> atomic_commit path are pretty big hint that you're doing something wrong
>>> so I wouldn't worry too much about them. For locking, yeah... Which is
>>> why I was suggesting an emergency atomic_commit of some sorts (for
>>> planes only?). Switching back to whatever we were doing to an RGB plane
>>> should be simple enough for most drivers and probably can be done safely
>>> enough on most drivers without any locks.
>>>
>>> And you don't need to support all kinds of tiling, YUV or RGB variants.
>>
>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
>> and use that when a panic occurs ?
> 
> And have it checked already, yes. We would only wait for a panic to
> happen to pull the trigger on the commit.
> 
>> I have two concern about this approach:
>> - How much memory would be allocated for this ? a whole framebuffer can be
>> big for just this use case.
> 
> I'dd expect a whole framebuffer for the current
> configuration/resolution. It would be typically 4MB for a full-HD system
> which isn't a lot really and I guess we can always add an option to
> disable the mechanism if needed.

Ok, that's a bit big, but if there is no other way.

> 
>> - I find it risky to completely reconfigure the hardware in a panic handler.
> 
> I would expect to only change the format and base address of the
> framebuffer. I guess it can fail, but it doesn't seem that different to
> the async plane update we already have and works well.
> 
In this case it can work, but by using generic drm api, it's hard to 
know what the driver will do.


>> Also how many drivers would need this ?
>>
>> Currently I was mostly considering x86 platform, so:
>>
>> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
>>
>> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
>> thing anyway.
> 
> I guess we're not entirely aligned there then. I would expect that
> mechanism to work with any atomic KMS driver. You are right that i915,
> amdgpu and nouveau are special enough that some extra internal plumbing
> is going to be required, but I'd expect it to be easy to support with
> any other driver for a memory-mapped device.
> 
> I guess what I'm trying to say is, even though it's totally fine that
> you only support those drivers at first, supporting in vc4 for example
> shouldn't require to rewrite the whole thing.

Would that work for you to put that in a drm_panic_helper.c,
so that drivers can opt-in ?

So the driver can call a drm_panic_helper_prepare_commit() at 
initialization, and then in the get_scanout_buffer() function, run the 
atomic_update() on it, and return this commit's framebuffer ?

That way each driver have a better control on what the panic handler 
will do.
It can even call directly its internal functions, to avoid the locks of 
the drm generic functions, and make sure it will only change the format 
and base address.
That's a bit more work for each driver, but should be more reliable I think.

Maxime Ripard Oct. 9, 2023, 4:07 p.m. UTC | #13

On Mon, Oct 09, 2023 at 04:05:19PM +0200, Jocelyn Falempe wrote:
> > > - I find it risky to completely reconfigure the hardware in a panic handler.
> > 
> > I would expect to only change the format and base address of the
> > framebuffer. I guess it can fail, but it doesn't seem that different to
> > the async plane update we already have and works well.
> > 
> In this case it can work, but by using generic drm api, it's hard to know
> what the driver will do.

We should document extensively what we expect drivers to do in those
hooks, and possibly call cant_sleep() in the framework function to have
some reporting at least.

> > > Also how many drivers would need this ?
> > > 
> > > Currently I was mostly considering x86 platform, so:
> > > 
> > > simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
> > > 
> > > i915/amdgpu/nouveau, which are quite complex, and will need to do their own
> > > thing anyway.
> > 
> > I guess we're not entirely aligned there then. I would expect that
> > mechanism to work with any atomic KMS driver. You are right that i915,
> > amdgpu and nouveau are special enough that some extra internal plumbing
> > is going to be required, but I'd expect it to be easy to support with
> > any other driver for a memory-mapped device.
> > 
> > I guess what I'm trying to say is, even though it's totally fine that
> > you only support those drivers at first, supporting in vc4 for example
> > shouldn't require to rewrite the whole thing.
> 
> Would that work for you to put that in a drm_panic_helper.c,
> so that drivers can opt-in ?
> 
> So the driver can call a drm_panic_helper_prepare_commit() at
> initialization, and then in the get_scanout_buffer() function

If we have a full blown commit with a new framebuffer, why do we need
get_scanout_buffer? It should be either the framebuffer itself, or in
the plane state if you have a conversion.

> run the atomic_update() on it, and return this commit's framebuffer ?
> 
> That way each driver have a better control on what the panic handler will
> do.
> It can even call directly its internal functions, to avoid the locks of the
> drm generic functions, and make sure it will only change the format and base
> address.
> That's a bit more work for each driver, but should be more reliable I think.

I don't think that better control there is a good idea, it's a path that
won't get tested much so we'd be better off not allowing drivers to
deviate too much from the "ideal" design.

What I had in mind is something like:

  - Add a panic hook in drm_mode_config_funcs, with a
    drm_atomic_helper_panic helper;

  - Provide an atomic_panic hook or something in drm_plane_helper_funcs;

  - If they are set, we register the drm_panic handler;

  - The handler will call drm_mode_config_funcs.panic, which will take
    its prepared state, fill the framebuffer it allocated with the
    penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().

  - The driver now updates the format and fb address.

  - Halt and catch fire

Does that make sense?
Maxime

Jocelyn Falempe Oct. 10, 2023, 7:55 a.m. UTC | #14

On 09/10/2023 18:07, Maxime Ripard wrote:
> On Mon, Oct 09, 2023 at 04:05:19PM +0200, Jocelyn Falempe wrote:
>>>> - I find it risky to completely reconfigure the hardware in a panic handler.
>>>
>>> I would expect to only change the format and base address of the
>>> framebuffer. I guess it can fail, but it doesn't seem that different to
>>> the async plane update we already have and works well.
>>>
>> In this case it can work, but by using generic drm api, it's hard to know
>> what the driver will do.
> 
> We should document extensively what we expect drivers to do in those
> hooks, and possibly call cant_sleep() in the framework function to have
> some reporting at least.
> 
>>>> Also how many drivers would need this ?
>>>>
>>>> Currently I was mostly considering x86 platform, so:
>>>>
>>>> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
>>>>
>>>> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
>>>> thing anyway.
>>>
>>> I guess we're not entirely aligned there then. I would expect that
>>> mechanism to work with any atomic KMS driver. You are right that i915,
>>> amdgpu and nouveau are special enough that some extra internal plumbing
>>> is going to be required, but I'd expect it to be easy to support with
>>> any other driver for a memory-mapped device.
>>>
>>> I guess what I'm trying to say is, even though it's totally fine that
>>> you only support those drivers at first, supporting in vc4 for example
>>> shouldn't require to rewrite the whole thing.
>>
>> Would that work for you to put that in a drm_panic_helper.c,
>> so that drivers can opt-in ?
>>
>> So the driver can call a drm_panic_helper_prepare_commit() at
>> initialization, and then in the get_scanout_buffer() function
> 
> If we have a full blown commit with a new framebuffer, why do we need
> get_scanout_buffer? It should be either the framebuffer itself, or in
> the plane state if you have a conversion.
> 
>> run the atomic_update() on it, and return this commit's framebuffer ?
>>
>> That way each driver have a better control on what the panic handler will
>> do.
>> It can even call directly its internal functions, to avoid the locks of the
>> drm generic functions, and make sure it will only change the format and base
>> address.
>> That's a bit more work for each driver, but should be more reliable I think.
> 
> I don't think that better control there is a good idea, it's a path that
> won't get tested much so we'd be better off not allowing drivers to
> deviate too much from the "ideal" design.
> 
> What I had in mind is something like:
> 
>    - Add a panic hook in drm_mode_config_funcs, with a
>      drm_atomic_helper_panic helper;
> 
>    - Provide an atomic_panic hook or something in drm_plane_helper_funcs;
> 
>    - If they are set, we register the drm_panic handler;
> 
>    - The handler will call drm_mode_config_funcs.panic, which will take
>      its prepared state, fill the framebuffer it allocated with the
>      penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().
> 
>    - The driver now updates the format and fb address.
> 
>    - Halt and catch fire
> 
> Does that make sense?

Yes, I will do some experiment with that, and see if I can make it work 
this way.
If possible I still want to have a way for simple drivers like 
simpledrm/mgag200 to not allocate a full framebuffer.

> Maxime

Maxime Ripard Oct. 10, 2023, 8:30 a.m. UTC | #15

On Tue, Oct 10, 2023 at 09:55:46AM +0200, Jocelyn Falempe wrote:
> On 09/10/2023 18:07, Maxime Ripard wrote:
> > On Mon, Oct 09, 2023 at 04:05:19PM +0200, Jocelyn Falempe wrote:
> > > > > - I find it risky to completely reconfigure the hardware in a panic handler.
> > > > 
> > > > I would expect to only change the format and base address of the
> > > > framebuffer. I guess it can fail, but it doesn't seem that different to
> > > > the async plane update we already have and works well.
> > > > 
> > > In this case it can work, but by using generic drm api, it's hard to know
> > > what the driver will do.
> > 
> > We should document extensively what we expect drivers to do in those
> > hooks, and possibly call cant_sleep() in the framework function to have
> > some reporting at least.
> > 
> > > > > Also how many drivers would need this ?
> > > > > 
> > > > > Currently I was mostly considering x86 platform, so:
> > > > > 
> > > > > simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
> > > > > 
> > > > > i915/amdgpu/nouveau, which are quite complex, and will need to do their own
> > > > > thing anyway.
> > > > 
> > > > I guess we're not entirely aligned there then. I would expect that
> > > > mechanism to work with any atomic KMS driver. You are right that i915,
> > > > amdgpu and nouveau are special enough that some extra internal plumbing
> > > > is going to be required, but I'd expect it to be easy to support with
> > > > any other driver for a memory-mapped device.
> > > > 
> > > > I guess what I'm trying to say is, even though it's totally fine that
> > > > you only support those drivers at first, supporting in vc4 for example
> > > > shouldn't require to rewrite the whole thing.
> > > 
> > > Would that work for you to put that in a drm_panic_helper.c,
> > > so that drivers can opt-in ?
> > > 
> > > So the driver can call a drm_panic_helper_prepare_commit() at
> > > initialization, and then in the get_scanout_buffer() function
> > 
> > If we have a full blown commit with a new framebuffer, why do we need
> > get_scanout_buffer? It should be either the framebuffer itself, or in
> > the plane state if you have a conversion.
> > 
> > > run the atomic_update() on it, and return this commit's framebuffer ?
> > > 
> > > That way each driver have a better control on what the panic handler will
> > > do.
> > > It can even call directly its internal functions, to avoid the locks of the
> > > drm generic functions, and make sure it will only change the format and base
> > > address.
> > > That's a bit more work for each driver, but should be more reliable I think.
> > 
> > I don't think that better control there is a good idea, it's a path that
> > won't get tested much so we'd be better off not allowing drivers to
> > deviate too much from the "ideal" design.
> > 
> > What I had in mind is something like:
> > 
> >    - Add a panic hook in drm_mode_config_funcs, with a
> >      drm_atomic_helper_panic helper;
> > 
> >    - Provide an atomic_panic hook or something in drm_plane_helper_funcs;
> > 
> >    - If they are set, we register the drm_panic handler;
> > 
> >    - The handler will call drm_mode_config_funcs.panic, which will take
> >      its prepared state, fill the framebuffer it allocated with the
> >      penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().
> > 
> >    - The driver now updates the format and fb address.
> > 
> >    - Halt and catch fire
> > 
> > Does that make sense?
> 
> Yes, I will do some experiment with that, and see if I can make it
> work this way.

Thanks :)

> If possible I still want to have a way for simple drivers like
> simpledrm/mgag200 to not allocate a full framebuffer.

I guess the split isn't going to be whether the driver is simple or not,
but whether it always has access to the buffer used by the scanout.

Like for simpledrm, we have that guarantee so it makes sense. For other,
if we allow "direct" dma-buf, it's game over and we just can't.

Maxime

Thomas Zimmermann Oct. 10, 2023, 8:55 a.m. UTC | #16

Hi

Am 09.10.23 um 13:33 schrieb Maxime Ripard:
[...]
>>> And you don't need to support all kinds of tiling, YUV or RGB variants.

We should indeed not use YUV at all. For RGB, we already have plenty of 
conversion code from XRGB8888-to-<whatever>, so we are more flexible there.

>>
>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
>> and use that when a panic occurs ?
> 
> And have it checked already, yes. We would only wait for a panic to
> happen to pull the trigger on the commit.
> 
>> I have two concern about this approach:
>> - How much memory would be allocated for this ? a whole framebuffer can be
>> big for just this use case.

As I outlined in my email at [1], there are a number of different 
scenarios. The question of atomic state and commits is entirely separate 
from the DRM panic handler. We should not throw them together. Whatever 
is necessary is get a scanout buffer, should happen on the driver side 
of get_scanout_buffer, not on the drm_panic side.

[1] 
https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1

> 
> I'dd expect a whole framebuffer for the current
> configuration/resolution. It would be typically 4MB for a full-HD system
> which isn't a lot really and I guess we can always add an option to
> disable the mechanism if needed.
> 
>> - I find it risky to completely reconfigure the hardware in a panic handler.
> 
> I would expect to only change the format and base address of the
> framebuffer. I guess it can fail, but it doesn't seem that different to
> the async plane update we already have and works well.

The one thing I don't understand is: Why should we use atomic commits in 
the first place? It doesn't make sense for firmware-based drivers. In 
some drivers, even the simple ast, we hold locks during the regular 
commit. Trying to run the panic commit concurrently will likely give a 
deadlock.

In the end it's a per-driver decision, but in most cases, the driver can 
easily switch to a default mode with some ad-hoc code.

Best regards
Thomas

> 
>> Also how many drivers would need this ?
>>
>> Currently I was mostly considering x86 platform, so:
>>
>> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
>>
>> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
>> thing anyway.
> 
> I guess we're not entirely aligned there then. I would expect that
> mechanism to work with any atomic KMS driver. You are right that i915,
> amdgpu and nouveau are special enough that some extra internal plumbing
> is going to be required, but I'd expect it to be easy to support with
> any other driver for a memory-mapped device.
> 
> I guess what I'm trying to say is, even though it's totally fine that
> you only support those drivers at first, supporting in vc4 for example
> shouldn't require to rewrite the whole thing.
> 
>>>> I'm more in favor of an emergency function, that each driver has to
>>>> implement, and use what the hardware can do to display a simple frame
>>>> quickly. get_scanout_buffer() is a good start for simple driver, but
>>>> will need refactoring for the more complex case, like adding a
>>>> callback to write pixels one by one, if there is no memory mapped
>>>> buffer available.
>>>
>>> Sorry, I'm not quite sure what you mean there, where would you write the
>>> pixel to?
>>
>> It was mentioned by Noralf, about the amdgpu driver:
>>
>> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
>>
>> They have a slow "debug interface" that you can write to, and can be used
>> for the panic handler. It's not memory mapped, so you have to write pixels
>> one by one.
>>
>> So for the struct drm_scanout_buffer, I can add a function pointer to a
>> write_pixel(u32 x, u32 y, u32 color)
>> So if the iosys map is null, it will use that instead.
> 
> It would be nice to support indeed, but it's a fairly rare feature afaik
> so I'd rather focus on getting something that can work for everyone first
> 
> Maxime

Thomas Zimmermann Oct. 10, 2023, 9:04 a.m. UTC | #17

Hi

Am 09.10.23 um 18:07 schrieb Maxime Ripard:
> On Mon, Oct 09, 2023 at 04:05:19PM +0200, Jocelyn Falempe wrote:
>>>> - I find it risky to completely reconfigure the hardware in a panic handler.
>>>
>>> I would expect to only change the format and base address of the
>>> framebuffer. I guess it can fail, but it doesn't seem that different to
>>> the async plane update we already have and works well.
>>>
>> In this case it can work, but by using generic drm api, it's hard to know
>> what the driver will do.
> 
> We should document extensively what we expect drivers to do in those
> hooks, and possibly call cant_sleep() in the framework function to have
> some reporting at least.
> 
>>>> Also how many drivers would need this ?
>>>>
>>>> Currently I was mostly considering x86 platform, so:
>>>>
>>>> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
>>>>
>>>> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
>>>> thing anyway.
>>>
>>> I guess we're not entirely aligned there then. I would expect that
>>> mechanism to work with any atomic KMS driver. You are right that i915,
>>> amdgpu and nouveau are special enough that some extra internal plumbing
>>> is going to be required, but I'd expect it to be easy to support with
>>> any other driver for a memory-mapped device.
>>>
>>> I guess what I'm trying to say is, even though it's totally fine that
>>> you only support those drivers at first, supporting in vc4 for example
>>> shouldn't require to rewrite the whole thing.
>>
>> Would that work for you to put that in a drm_panic_helper.c,
>> so that drivers can opt-in ?
>>
>> So the driver can call a drm_panic_helper_prepare_commit() at
>> initialization, and then in the get_scanout_buffer() function
> 
> If we have a full blown commit with a new framebuffer, why do we need
> get_scanout_buffer? It should be either the framebuffer itself, or in
> the plane state if you have a conversion.

We also have discussions about kexec/kdump support. Here we'd need to 
retrieve the scanout address, forward it to the kexec kernel and put 
simpledrm onto that framebuffer until the regular driver takes over. An 
interface like get_scanout_buffer will be helpful for this use case. So 
it makes sense to use it for the panic handler as well.

> 
>> run the atomic_update() on it, and return this commit's framebuffer ?
>>
>> That way each driver have a better control on what the panic handler will
>> do.
>> It can even call directly its internal functions, to avoid the locks of the
>> drm generic functions, and make sure it will only change the format and base
>> address.
>> That's a bit more work for each driver, but should be more reliable I think.
> 
> I don't think that better control there is a good idea, it's a path that
> won't get tested much so we'd be better off not allowing drivers to
> deviate too much from the "ideal" design.
> 
> What I had in mind is something like:
> 
>    - Add a panic hook in drm_mode_config_funcs, with a
>      drm_atomic_helper_panic helper;
> 
>    - Provide an atomic_panic hook or something in drm_plane_helper_funcs;
> 
>    - If they are set, we register the drm_panic handler;
> 
>    - The handler will call drm_mode_config_funcs.panic, which will take
>      its prepared state, fill the framebuffer it allocated with the
>      penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().
> 
>    - The driver now updates the format and fb address.
> 
>    - Halt and catch fire
> 
> Does that make sense?

Please see my other replies. I find this fragile, and unnecessary for 
cases where there already is a working scanout buffer in place. It's 
something a driver could implement internally to provide a scanout 
buffer if none has been set up already.

Best regards
Thomas

> Maxime

Maxime Ripard Oct. 10, 2023, 9:25 a.m. UTC | #18

On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
> > > So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> > > and use that when a panic occurs ?
> > 
> > And have it checked already, yes. We would only wait for a panic to
> > happen to pull the trigger on the commit.
> > 
> > > I have two concern about this approach:
> > > - How much memory would be allocated for this ? a whole framebuffer can be
> > > big for just this use case.
> 
> As I outlined in my email at [1], there are a number of different scenarios.
> The question of atomic state and commits is entirely separate from the DRM
> panic handler. We should not throw them together. Whatever is necessary is
> get a scanout buffer, should happen on the driver side of
> get_scanout_buffer, not on the drm_panic side.
> 
> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
> 
> > 
> > I'dd expect a whole framebuffer for the current
> > configuration/resolution. It would be typically 4MB for a full-HD system
> > which isn't a lot really and I guess we can always add an option to
> > disable the mechanism if needed.
> > 
> > > - I find it risky to completely reconfigure the hardware in a panic handler.
> > 
> > I would expect to only change the format and base address of the
> > framebuffer. I guess it can fail, but it doesn't seem that different to
> > the async plane update we already have and works well.
> 
> The one thing I don't understand is: Why should we use atomic commits in the
> first place? It doesn't make sense for firmware-based drivers.

Because this is generic infrastructure that is valuable for any drivers
and not only firmware-based drivers?

> In some drivers, even the simple ast, we hold locks during the regular
> commit. Trying to run the panic commit concurrently will likely give a
> deadlock.

You're in the middle of a panic. Don't take any locks and you won't deadlock.

> In the end it's a per-driver decision, but in most cases, the driver can
> easily switch to a default mode with some ad-hoc code.

When was the last time a per-driver decision has been a good thing? I'm
sorry, but the get_scanout_buffer approach buffer won't work for any
driver out there.

I'm fine with discussing alternatives if you don't like the ones I
suggested, but they must allow the panic handler infrastructure to work
with any driver we have, not just 4.

Maxime

Maxime Ripard Oct. 10, 2023, 9:33 a.m. UTC | #19

On Tue, Oct 10, 2023 at 11:04:33AM +0200, Thomas Zimmermann wrote:
> Am 09.10.23 um 18:07 schrieb Maxime Ripard:
> > On Mon, Oct 09, 2023 at 04:05:19PM +0200, Jocelyn Falempe wrote:
> > > > > - I find it risky to completely reconfigure the hardware in a panic handler.
> > > > 
> > > > I would expect to only change the format and base address of the
> > > > framebuffer. I guess it can fail, but it doesn't seem that different to
> > > > the async plane update we already have and works well.
> > > > 
> > > In this case it can work, but by using generic drm api, it's hard to know
> > > what the driver will do.
> > 
> > We should document extensively what we expect drivers to do in those
> > hooks, and possibly call cant_sleep() in the framework function to have
> > some reporting at least.
> > 
> > > > > Also how many drivers would need this ?
> > > > > 
> > > > > Currently I was mostly considering x86 platform, so:
> > > > > 
> > > > > simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
> > > > > 
> > > > > i915/amdgpu/nouveau, which are quite complex, and will need to do their own
> > > > > thing anyway.
> > > > 
> > > > I guess we're not entirely aligned there then. I would expect that
> > > > mechanism to work with any atomic KMS driver. You are right that i915,
> > > > amdgpu and nouveau are special enough that some extra internal plumbing
> > > > is going to be required, but I'd expect it to be easy to support with
> > > > any other driver for a memory-mapped device.
> > > > 
> > > > I guess what I'm trying to say is, even though it's totally fine that
> > > > you only support those drivers at first, supporting in vc4 for example
> > > > shouldn't require to rewrite the whole thing.
> > > 
> > > Would that work for you to put that in a drm_panic_helper.c,
> > > so that drivers can opt-in ?
> > > 
> > > So the driver can call a drm_panic_helper_prepare_commit() at
> > > initialization, and then in the get_scanout_buffer() function
> > 
> > If we have a full blown commit with a new framebuffer, why do we need
> > get_scanout_buffer? It should be either the framebuffer itself, or in
> > the plane state if you have a conversion.
> 
> We also have discussions about kexec/kdump support. Here we'd need to
> retrieve the scanout address, forward it to the kexec kernel and put
> simpledrm onto that framebuffer until the regular driver takes over.

Generically speaking, there's strictly no guarantee that the current
scanout buffer is accessible by the CPU. It's not even a driver issue,
it's a workload issue, so it will affect 100% of the times some users,
and some 0% of the time.

> An interface like get_scanout_buffer will be helpful for this use
> case. So it makes sense to use it for the panic handler as well.

It won't be helpful because the vast majority of the ARM drivers (and
thus the vast majority of the KMS drivers) won't be able to implement it
properly and reliably.

> > > run the atomic_update() on it, and return this commit's framebuffer ?
> > > 
> > > That way each driver have a better control on what the panic handler will
> > > do.
> > > It can even call directly its internal functions, to avoid the locks of the
> > > drm generic functions, and make sure it will only change the format and base
> > > address.
> > > That's a bit more work for each driver, but should be more reliable I think.
> > 
> > I don't think that better control there is a good idea, it's a path that
> > won't get tested much so we'd be better off not allowing drivers to
> > deviate too much from the "ideal" design.
> > 
> > What I had in mind is something like:
> > 
> >    - Add a panic hook in drm_mode_config_funcs, with a
> >      drm_atomic_helper_panic helper;
> > 
> >    - Provide an atomic_panic hook or something in drm_plane_helper_funcs;
> > 
> >    - If they are set, we register the drm_panic handler;
> > 
> >    - The handler will call drm_mode_config_funcs.panic, which will take
> >      its prepared state, fill the framebuffer it allocated with the
> >      penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().
> > 
> >    - The driver now updates the format and fb address.
> > 
> >    - Halt and catch fire
> > 
> > Does that make sense?
> 
> Please see my other replies. I find this fragile, and unnecessary for cases
> where there already is a working scanout buffer in place.

And please see my other replies. Depending on a working scanout buffer
in place being accessible by the CPU is incredibly limiting. Ignoring it
when I'm pointing that out won't get us to a solution acceptable for
everyone.

> It's something a driver could implement internally to provide a
> scanout buffer if none has been set up already.

Some drivers need extra resources when setting up a plane. VC4 for
example need for every plane to allocate multiple internal SRAM buffers.
Just allocating an extra framebuffer won't cut it.

This is tied to the state so far, so I guess we would need to allocate a
new state. Oh, and if we have several drivers that need to allocate that
extra framebuffer and state, we could make it part of the core or turn
it into a helper?

Maxime

Noralf Trønnes Oct. 10, 2023, 11:29 a.m. UTC | #20

On 10/10/23 11:25, Maxime Ripard wrote:
> 
> 
> On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
>>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
>>>> and use that when a panic occurs ?
>>>
>>> And have it checked already, yes. We would only wait for a panic to
>>> happen to pull the trigger on the commit.
>>>
>>>> I have two concern about this approach:
>>>> - How much memory would be allocated for this ? a whole framebuffer can be
>>>> big for just this use case.
>>
>> As I outlined in my email at [1], there are a number of different scenarios.
>> The question of atomic state and commits is entirely separate from the DRM
>> panic handler. We should not throw them together. Whatever is necessary is
>> get a scanout buffer, should happen on the driver side of
>> get_scanout_buffer, not on the drm_panic side.
>>
>> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
>>
>>>
>>> I'dd expect a whole framebuffer for the current
>>> configuration/resolution. It would be typically 4MB for a full-HD system
>>> which isn't a lot really and I guess we can always add an option to
>>> disable the mechanism if needed.
>>>
>>>> - I find it risky to completely reconfigure the hardware in a panic handler.
>>>
>>> I would expect to only change the format and base address of the
>>> framebuffer. I guess it can fail, but it doesn't seem that different to
>>> the async plane update we already have and works well.
>>
>> The one thing I don't understand is: Why should we use atomic commits in the
>> first place? It doesn't make sense for firmware-based drivers.
> 
> Because this is generic infrastructure that is valuable for any drivers
> and not only firmware-based drivers?
> 
>> In some drivers, even the simple ast, we hold locks during the regular
>> commit. Trying to run the panic commit concurrently will likely give a
>> deadlock.
> 
> You're in the middle of a panic. Don't take any locks and you won't deadlock.
> 
>> In the end it's a per-driver decision, but in most cases, the driver can
>> easily switch to a default mode with some ad-hoc code.
> 
> When was the last time a per-driver decision has been a good thing? I'm
> sorry, but the get_scanout_buffer approach buffer won't work for any
> driver out there.
> 
> I'm fine with discussing alternatives if you don't like the ones I
> suggested, but they must allow the panic handler infrastructure to work
> with any driver we have, not just 4.
> 

Why can't we use the model[1] suggested by Daniel using a draw_pixel
callback giving drivers full control on how they can put a pixel on the
display?

This will even work for the AMD debug interface.
In the linear CPU accessible buffer case, we can provide a helper for
that, maybe we can do helpers for other common cases as well.

Adding to that we would need a panic_setup/enter and panic_teardown/exit
callback.

Noralf.

[1]
https://lore.kernel.org/dri-devel/20160810091529.GQ6232@phenom.ffwll.local/

Maxime Ripard Oct. 10, 2023, 12:15 p.m. UTC | #21

On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:
> 
> 
> On 10/10/23 11:25, Maxime Ripard wrote:
> > 
> > 
> > On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
> >>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> >>>> and use that when a panic occurs ?
> >>>
> >>> And have it checked already, yes. We would only wait for a panic to
> >>> happen to pull the trigger on the commit.
> >>>
> >>>> I have two concern about this approach:
> >>>> - How much memory would be allocated for this ? a whole framebuffer can be
> >>>> big for just this use case.
> >>
> >> As I outlined in my email at [1], there are a number of different scenarios.
> >> The question of atomic state and commits is entirely separate from the DRM
> >> panic handler. We should not throw them together. Whatever is necessary is
> >> get a scanout buffer, should happen on the driver side of
> >> get_scanout_buffer, not on the drm_panic side.
> >>
> >> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
> >>
> >>>
> >>> I'dd expect a whole framebuffer for the current
> >>> configuration/resolution. It would be typically 4MB for a full-HD system
> >>> which isn't a lot really and I guess we can always add an option to
> >>> disable the mechanism if needed.
> >>>
> >>>> - I find it risky to completely reconfigure the hardware in a panic handler.
> >>>
> >>> I would expect to only change the format and base address of the
> >>> framebuffer. I guess it can fail, but it doesn't seem that different to
> >>> the async plane update we already have and works well.
> >>
> >> The one thing I don't understand is: Why should we use atomic commits in the
> >> first place? It doesn't make sense for firmware-based drivers.
> > 
> > Because this is generic infrastructure that is valuable for any drivers
> > and not only firmware-based drivers?
> > 
> >> In some drivers, even the simple ast, we hold locks during the regular
> >> commit. Trying to run the panic commit concurrently will likely give a
> >> deadlock.
> > 
> > You're in the middle of a panic. Don't take any locks and you won't deadlock.
> > 
> >> In the end it's a per-driver decision, but in most cases, the driver can
> >> easily switch to a default mode with some ad-hoc code.
> > 
> > When was the last time a per-driver decision has been a good thing? I'm
> > sorry, but the get_scanout_buffer approach buffer won't work for any
> > driver out there.
> > 
> > I'm fine with discussing alternatives if you don't like the ones I
> > suggested, but they must allow the panic handler infrastructure to work
> > with any driver we have, not just 4.
> > 
> 
> Why can't we use the model[1] suggested by Daniel using a draw_pixel
> callback giving drivers full control on how they can put a pixel on the
> display?

I share kind of the same general ideas/conclusions: "qthe idea is that
all the fb selection and lookup is handled in shared code (and with
proper locking, but only for atomic drivers)."

2016 is a bit old though and multiple developments happened since
(secure playback is a thing now, framebuffer compression too), so I
still think that their expectation that the framebuffer is accessible to
/ writable by the CPU no longer holds true.

> This will even work for the AMD debug interface.
> In the linear CPU accessible buffer case, we can provide a helper for
> that, maybe we can do helpers for other common cases as well.

Yeah, their idea of a panic_draw would work great for that.

> Adding to that we would need a panic_setup/enter and panic_teardown/exit
> callback.

What for?

Maxime

Daniel Vetter Oct. 10, 2023, 12:59 p.m. UTC | #22

On Tue, Oct 10, 2023 at 02:15:47PM +0200, Maxime Ripard wrote:
> On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:
> > 
> > 
> > On 10/10/23 11:25, Maxime Ripard wrote:
> > > 
> > > 
> > > On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
> > >>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> > >>>> and use that when a panic occurs ?
> > >>>
> > >>> And have it checked already, yes. We would only wait for a panic to
> > >>> happen to pull the trigger on the commit.
> > >>>
> > >>>> I have two concern about this approach:
> > >>>> - How much memory would be allocated for this ? a whole framebuffer can be
> > >>>> big for just this use case.
> > >>
> > >> As I outlined in my email at [1], there are a number of different scenarios.
> > >> The question of atomic state and commits is entirely separate from the DRM
> > >> panic handler. We should not throw them together. Whatever is necessary is
> > >> get a scanout buffer, should happen on the driver side of
> > >> get_scanout_buffer, not on the drm_panic side.
> > >>
> > >> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
> > >>
> > >>>
> > >>> I'dd expect a whole framebuffer for the current
> > >>> configuration/resolution. It would be typically 4MB for a full-HD system
> > >>> which isn't a lot really and I guess we can always add an option to
> > >>> disable the mechanism if needed.
> > >>>
> > >>>> - I find it risky to completely reconfigure the hardware in a panic handler.
> > >>>
> > >>> I would expect to only change the format and base address of the
> > >>> framebuffer. I guess it can fail, but it doesn't seem that different to
> > >>> the async plane update we already have and works well.
> > >>
> > >> The one thing I don't understand is: Why should we use atomic commits in the
> > >> first place? It doesn't make sense for firmware-based drivers.
> > > 
> > > Because this is generic infrastructure that is valuable for any drivers
> > > and not only firmware-based drivers?
> > > 
> > >> In some drivers, even the simple ast, we hold locks during the regular
> > >> commit. Trying to run the panic commit concurrently will likely give a
> > >> deadlock.
> > > 
> > > You're in the middle of a panic. Don't take any locks and you won't deadlock.
> > > 
> > >> In the end it's a per-driver decision, but in most cases, the driver can
> > >> easily switch to a default mode with some ad-hoc code.
> > > 
> > > When was the last time a per-driver decision has been a good thing? I'm
> > > sorry, but the get_scanout_buffer approach buffer won't work for any
> > > driver out there.
> > > 
> > > I'm fine with discussing alternatives if you don't like the ones I
> > > suggested, but they must allow the panic handler infrastructure to work
> > > with any driver we have, not just 4.
> > > 
> > 
> > Why can't we use the model[1] suggested by Daniel using a draw_pixel
> > callback giving drivers full control on how they can put a pixel on the
> > display?
> 
> I share kind of the same general ideas/conclusions: "qthe idea is that
> all the fb selection and lookup is handled in shared code (and with
> proper locking, but only for atomic drivers)."
> 
> 2016 is a bit old though and multiple developments happened since
> (secure playback is a thing now, framebuffer compression too), so I
> still think that their expectation that the framebuffer is accessible to
> / writable by the CPU no longer holds true.

I think largely it should still be ok, because the idea behind the draw_xy
callback was that the driver could take care of anything special, like
- tiling
- clearing compression bits so that just writing the raw pixels works (if
  your compression format allows for that)
- handling any differences in how the pixels need to be written, like
  cache flushing, mmio_write vs normal memory, amd also has peek/poke
  registers to be able to write even into unmappable memory

What would probably be a good idea is to do an s/void */uinptr_t/ over my
interface proposal, or maybe an even more opaque cookie since really the
only thing you can do is get the void * that ->panic_vmap returns and pass
it into ->panic_draw_xy.

I thought (but I didn't dig through details) that the panic fb struct is
essentially meant to serve this purpose in the current series?

> > This will even work for the AMD debug interface.
> > In the linear CPU accessible buffer case, we can provide a helper for
> > that, maybe we can do helpers for other common cases as well.
> 
> Yeah, their idea of a panic_draw would work great for that.
> 
> > Adding to that we would need a panic_setup/enter and panic_teardown/exit
> > callback.
> 
> What for?

So panic teardown would be for testing in CI, to make it non-destructive
and clean up anything panic_vmap (or _enter or whatever you call it) has
done. I thought John Oggness was also looking into how the new panic
handlers/consoles could be made testable in the panic paths, including
lock stealing and getting called from nmi.
-Sima

Thomas Zimmermann Oct. 10, 2023, 1:05 p.m. UTC | #23

Hi

Am 10.10.23 um 11:33 schrieb Maxime Ripard:
[...]
>> We also have discussions about kexec/kdump support. Here we'd need to
>> retrieve the scanout address, forward it to the kexec kernel and put
>> simpledrm onto that framebuffer until the regular driver takes over.
> 
> Generically speaking, there's strictly no guarantee that the current
> scanout buffer is accessible by the CPU. It's not even a driver issue,
> it's a workload issue, so it will affect 100% of the times some users,
> and some 0% of the time.

And I'd be OK with that. It's all best effort anyway.

> 
>> An interface like get_scanout_buffer will be helpful for this use
>> case. So it makes sense to use it for the panic handler as well.
> 
> It won't be helpful because the vast majority of the ARM drivers (and
> thus the vast majority of the KMS drivers) won't be able to implement it
> properly and reliably.

The barrier for firmware-based drivers is low: a pre-configured display 
and mmap'able framebuffer memory. And it's assumed that a callback for 
kexec would attempt to configure hardware accordingly. If a system 
really cannot provide this, so be it.

> 
>>>> run the atomic_update() on it, and return this commit's framebuffer ?
>>>>
>>>> That way each driver have a better control on what the panic handler will
>>>> do.
>>>> It can even call directly its internal functions, to avoid the locks of the
>>>> drm generic functions, and make sure it will only change the format and base
>>>> address.
>>>> That's a bit more work for each driver, but should be more reliable I think.
>>>
>>> I don't think that better control there is a good idea, it's a path that
>>> won't get tested much so we'd be better off not allowing drivers to
>>> deviate too much from the "ideal" design.
>>>
>>> What I had in mind is something like:
>>>
>>>     - Add a panic hook in drm_mode_config_funcs, with a
>>>       drm_atomic_helper_panic helper;
>>>
>>>     - Provide an atomic_panic hook or something in drm_plane_helper_funcs;
>>>
>>>     - If they are set, we register the drm_panic handler;
>>>
>>>     - The handler will call drm_mode_config_funcs.panic, which will take
>>>       its prepared state, fill the framebuffer it allocated with the
>>>       penguin and backtrace, call drm_plane_helper_funcs.atomic_panic().
>>>
>>>     - The driver now updates the format and fb address.
>>>
>>>     - Halt and catch fire
>>>
>>> Does that make sense?
>>
>> Please see my other replies. I find this fragile, and unnecessary for cases
>> where there already is a working scanout buffer in place.
> 
> And please see my other replies. Depending on a working scanout buffer
> in place being accessible by the CPU is incredibly limiting. Ignoring it
> when I'm pointing that out won't get us to a solution acceptable for
> everyone.
> 
>> It's something a driver could implement internally to provide a
>> scanout buffer if none has been set up already.
> 
> Some drivers need extra resources when setting up a plane. VC4 for
> example need for every plane to allocate multiple internal SRAM buffers.
> Just allocating an extra framebuffer won't cut it.
> 
> This is tied to the state so far, so I guess we would need to allocate a
> new state. Oh, and if we have several drivers that need to allocate that
> extra framebuffer and state, we could make it part of the core or turn
> it into a helper?

I mentioned that even the simple drivers hold locks during the atomic 
commit. Some of the drivers use vmap/vunmap, which might make problems 
as well. It's used in the context of damage handling, which also makes 
no sense here. So the regular atomic helpers most likely won't work.

It sounds to me as if you're essentially asking for something like a 
flush or sync operation.

Stepping back from get_scanout_buffer for a moment and adopting your 
proposal from above, I can see something like this working:

   - the driver provides a panic callback in struct drm_driver

   - DRM registers a panic handler to invokes the callback

   - the panic callback has a scanout buffer from somewhere (currently 
active, prepared, from firmware, plain memory, etc)

   - we provide a helper that fills the scanout buffer with information

   - the driver then updates the hardware from/with the scanout buffer; 
details depend on the hardware

The final step is like a flush or commit operation. To give some 
examples: The simple drivers of this patchset probably don't have to do 
anything. Drivers with command/packet queues could attempt to send the 
necessary commands to the device.

Best regards
Thomas

> 
> Maxime

Thomas Zimmermann Oct. 10, 2023, 1:24 p.m. UTC | #24

Hi

Am 10.10.23 um 14:59 schrieb Daniel Vetter:
[...]
>>>>
>>>
>>> Why can't we use the model[1] suggested by Daniel using a draw_pixel
>>> callback giving drivers full control on how they can put a pixel on the
>>> display?
>>
>> I share kind of the same general ideas/conclusions: "qthe idea is that
>> all the fb selection and lookup is handled in shared code (and with
>> proper locking, but only for atomic drivers)."
>>
>> 2016 is a bit old though and multiple developments happened since
>> (secure playback is a thing now, framebuffer compression too), so I
>> still think that their expectation that the framebuffer is accessible to
>> / writable by the CPU no longer holds true.
> 
> I think largely it should still be ok, because the idea behind the draw_xy
> callback was that the driver could take care of anything special, like
> - tiling
> - clearing compression bits so that just writing the raw pixels works (if
>    your compression format allows for that)
> - handling any differences in how the pixels need to be written, like
>    cache flushing, mmio_write vs normal memory, amd also has peek/poke
>    registers to be able to write even into unmappable memory
> 
> What would probably be a good idea is to do an s/void */uinptr_t/ over my
> interface proposal, or maybe an even more opaque cookie since really the
> only thing you can do is get the void * that ->panic_vmap returns and pass
> it into ->panic_draw_xy.
> 
> I thought (but I didn't dig through details) that the panic fb struct is
> essentially meant to serve this purpose in the current series?

I have one detail about the code: While working on the format-conversion 
code, I've always found struct drm_framebuffer to be clunky to work 
with. It's build around the idea of managing GEM buffers, but not 
accessing them.

So I've been thinking about something like a drm_pixmap that contains 
size, color format and data pointers in one place. Reads and writes 
would be callbacks. It could abstract access to any kind of buffer. 
IIRC Jocelyn had something like that in the very first patchset. or 
maybe fb_pixmap could be repurposed.

Best regards
Thomas

> 
>>> This will even work for the AMD debug interface.
>>> In the linear CPU accessible buffer case, we can provide a helper for
>>> that, maybe we can do helpers for other common cases as well.
>>
>> Yeah, their idea of a panic_draw would work great for that.
>>
>>> Adding to that we would need a panic_setup/enter and panic_teardown/exit
>>> callback.
>>
>> What for?
> 
> So panic teardown would be for testing in CI, to make it non-destructive
> and clean up anything panic_vmap (or _enter or whatever you call it) has
> done. I thought John Oggness was also looking into how the new panic
> handlers/consoles could be made testable in the panic paths, including
> lock stealing and getting called from nmi.
> -Sima

Jocelyn Falempe Oct. 10, 2023, 1:24 p.m. UTC | #25

On 10/10/2023 14:59, Daniel Vetter wrote:
> On Tue, Oct 10, 2023 at 02:15:47PM +0200, Maxime Ripard wrote:
>> On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:
>>>
>>>
>>> On 10/10/23 11:25, Maxime Ripard wrote:
>>>>
>>>>
>>>> On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
>>>>>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
>>>>>>> and use that when a panic occurs ?
>>>>>>
>>>>>> And have it checked already, yes. We would only wait for a panic to
>>>>>> happen to pull the trigger on the commit.
>>>>>>
>>>>>>> I have two concern about this approach:
>>>>>>> - How much memory would be allocated for this ? a whole framebuffer can be
>>>>>>> big for just this use case.
>>>>>
>>>>> As I outlined in my email at [1], there are a number of different scenarios.
>>>>> The question of atomic state and commits is entirely separate from the DRM
>>>>> panic handler. We should not throw them together. Whatever is necessary is
>>>>> get a scanout buffer, should happen on the driver side of
>>>>> get_scanout_buffer, not on the drm_panic side.
>>>>>
>>>>> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
>>>>>
>>>>>>
>>>>>> I'dd expect a whole framebuffer for the current
>>>>>> configuration/resolution. It would be typically 4MB for a full-HD system
>>>>>> which isn't a lot really and I guess we can always add an option to
>>>>>> disable the mechanism if needed.
>>>>>>
>>>>>>> - I find it risky to completely reconfigure the hardware in a panic handler.
>>>>>>
>>>>>> I would expect to only change the format and base address of the
>>>>>> framebuffer. I guess it can fail, but it doesn't seem that different to
>>>>>> the async plane update we already have and works well.
>>>>>
>>>>> The one thing I don't understand is: Why should we use atomic commits in the
>>>>> first place? It doesn't make sense for firmware-based drivers.
>>>>
>>>> Because this is generic infrastructure that is valuable for any drivers
>>>> and not only firmware-based drivers?
>>>>
>>>>> In some drivers, even the simple ast, we hold locks during the regular
>>>>> commit. Trying to run the panic commit concurrently will likely give a
>>>>> deadlock.
>>>>
>>>> You're in the middle of a panic. Don't take any locks and you won't deadlock.
>>>>
>>>>> In the end it's a per-driver decision, but in most cases, the driver can
>>>>> easily switch to a default mode with some ad-hoc code.
>>>>
>>>> When was the last time a per-driver decision has been a good thing? I'm
>>>> sorry, but the get_scanout_buffer approach buffer won't work for any
>>>> driver out there.
>>>>
>>>> I'm fine with discussing alternatives if you don't like the ones I
>>>> suggested, but they must allow the panic handler infrastructure to work
>>>> with any driver we have, not just 4.
>>>>
>>>
>>> Why can't we use the model[1] suggested by Daniel using a draw_pixel
>>> callback giving drivers full control on how they can put a pixel on the
>>> display?
>>
>> I share kind of the same general ideas/conclusions: "qthe idea is that
>> all the fb selection and lookup is handled in shared code (and with
>> proper locking, but only for atomic drivers)."
>>
>> 2016 is a bit old though and multiple developments happened since
>> (secure playback is a thing now, framebuffer compression too), so I
>> still think that their expectation that the framebuffer is accessible to
>> / writable by the CPU no longer holds true.
> 
> I think largely it should still be ok, because the idea behind the draw_xy
> callback was that the driver could take care of anything special, like
> - tiling
> - clearing compression bits so that just writing the raw pixels works (if
>    your compression format allows for that)
> - handling any differences in how the pixels need to be written, like
>    cache flushing, mmio_write vs normal memory, amd also has peek/poke
>    registers to be able to write even into unmappable memory
> 
> What would probably be a good idea is to do an s/void */uinptr_t/ over my
> interface proposal, or maybe an even more opaque cookie since really the
> only thing you can do is get the void * that ->panic_vmap returns and pass
> it into ->panic_draw_xy.
> 
> I thought (but I didn't dig through details) that the panic fb struct is
> essentially meant to serve this purpose in the current series?
> 

Yes, in this series there is a struct drm_scanout_buffer, that the 
driver has to provide when a panic occurs:

struct drm_scanout_buffer {
	const struct drm_format_info *format;
	struct iosys_map map;
	unsigned int pitch;
	unsigned int width;
	unsigned int height;
};

That works well for CPU accessible, linear format.
It should be possible to support YUV, or simple tiling with that, but 
for the more complex case, I proposed to add a draw_pixel() callback.

>>> This will even work for the AMD debug interface.
>>> In the linear CPU accessible buffer case, we can provide a helper for
>>> that, maybe we can do helpers for other common cases as well.
>>
>> Yeah, their idea of a panic_draw would work great for that.
>>
>>> Adding to that we would need a panic_setup/enter and panic_teardown/exit
>>> callback.
>>
>> What for?
> 
> So panic teardown would be for testing in CI, to make it non-destructive
> and clean up anything panic_vmap (or _enter or whatever you call it) has
> done. I thought John Oggness was also looking into how the new panic
> handlers/consoles could be made testable in the panic paths, including
> lock stealing and getting called from nmi.

Thanks, I was also wondering why a panic teardown would be necessary, 
since after the panic handler has run, the system will probably halt. 
(or maybe run kdump and reboot).

> -Sima

Jocelyn Falempe Oct. 10, 2023, 1:32 p.m. UTC | #26

On 10/10/2023 15:05, Thomas Zimmermann wrote:
> Hi
> 
> Am 10.10.23 um 11:33 schrieb Maxime Ripard:
> [...]
>>> We also have discussions about kexec/kdump support. Here we'd need to
>>> retrieve the scanout address, forward it to the kexec kernel and put
>>> simpledrm onto that framebuffer until the regular driver takes over.
>>
>> Generically speaking, there's strictly no guarantee that the current
>> scanout buffer is accessible by the CPU. It's not even a driver issue,
>> it's a workload issue, so it will affect 100% of the times some users,
>> and some 0% of the time.
> 
> And I'd be OK with that. It's all best effort anyway.
> 
>>
>>> An interface like get_scanout_buffer will be helpful for this use
>>> case. So it makes sense to use it for the panic handler as well.
>>
>> It won't be helpful because the vast majority of the ARM drivers (and
>> thus the vast majority of the KMS drivers) won't be able to implement it
>> properly and reliably.
> 
> The barrier for firmware-based drivers is low: a pre-configured display 
> and mmap'able framebuffer memory. And it's assumed that a callback for 
> kexec would attempt to configure hardware accordingly. If a system 
> really cannot provide this, so be it.
> 
>>
>>>>> run the atomic_update() on it, and return this commit's framebuffer ?
>>>>>
>>>>> That way each driver have a better control on what the panic 
>>>>> handler will
>>>>> do.
>>>>> It can even call directly its internal functions, to avoid the 
>>>>> locks of the
>>>>> drm generic functions, and make sure it will only change the format 
>>>>> and base
>>>>> address.
>>>>> That's a bit more work for each driver, but should be more reliable 
>>>>> I think.
>>>>
>>>> I don't think that better control there is a good idea, it's a path 
>>>> that
>>>> won't get tested much so we'd be better off not allowing drivers to
>>>> deviate too much from the "ideal" design.
>>>>
>>>> What I had in mind is something like:
>>>>
>>>>     - Add a panic hook in drm_mode_config_funcs, with a
>>>>       drm_atomic_helper_panic helper;
>>>>
>>>>     - Provide an atomic_panic hook or something in 
>>>> drm_plane_helper_funcs;
>>>>
>>>>     - If they are set, we register the drm_panic handler;
>>>>
>>>>     - The handler will call drm_mode_config_funcs.panic, which will 
>>>> take
>>>>       its prepared state, fill the framebuffer it allocated with the
>>>>       penguin and backtrace, call 
>>>> drm_plane_helper_funcs.atomic_panic().
>>>>
>>>>     - The driver now updates the format and fb address.
>>>>
>>>>     - Halt and catch fire
>>>>
>>>> Does that make sense?
>>>
>>> Please see my other replies. I find this fragile, and unnecessary for 
>>> cases
>>> where there already is a working scanout buffer in place.
>>
>> And please see my other replies. Depending on a working scanout buffer
>> in place being accessible by the CPU is incredibly limiting. Ignoring it
>> when I'm pointing that out won't get us to a solution acceptable for
>> everyone.
>>
>>> It's something a driver could implement internally to provide a
>>> scanout buffer if none has been set up already.
>>
>> Some drivers need extra resources when setting up a plane. VC4 for
>> example need for every plane to allocate multiple internal SRAM buffers.
>> Just allocating an extra framebuffer won't cut it.
>>
>> This is tied to the state so far, so I guess we would need to allocate a
>> new state. Oh, and if we have several drivers that need to allocate that
>> extra framebuffer and state, we could make it part of the core or turn
>> it into a helper?
> 
> I mentioned that even the simple drivers hold locks during the atomic 
> commit. Some of the drivers use vmap/vunmap, which might make problems 
> as well. It's used in the context of damage handling, which also makes 
> no sense here. So the regular atomic helpers most likely won't work.
> 
> It sounds to me as if you're essentially asking for something like a 
> flush or sync operation.
> 
> Stepping back from get_scanout_buffer for a moment and adopting your 
> proposal from above, I can see something like this working:
> 
>    - the driver provides a panic callback in struct drm_driver
> 
>    - DRM registers a panic handler to invokes the callback
> 
>    - the panic callback has a scanout buffer from somewhere (currently 
> active, prepared, from firmware, plain memory, etc)
> 
>    - we provide a helper that fills the scanout buffer with information
> 
>    - the driver then updates the hardware from/with the scanout buffer; 
> details depend on the hardware
> 
> The final step is like a flush or commit operation. To give some 
> examples: The simple drivers of this patchset probably don't have to do 
> anything. Drivers with command/packet queues could attempt to send the 
> necessary commands to the device.

Yes that was the second question in the cover letter.
Some drivers probably want a flush_scanout_buffer() to send the 
framebuffer to the hardware.
That callback can be in struct drm_driver as well.

[v4,2/4] drm/panic: Add a drm panic handler

Commit Message

Comments

Patch