mbox series

[RFC,00/16] drm/rockchip: Rockchip EBC ("E-Book Controller") display driver

Message ID 20220413221916.50995-1-samuel@sholland.org (mailing list archive)
Headers show
Series drm/rockchip: Rockchip EBC ("E-Book Controller") display driver | expand

Message

Samuel Holland April 13, 2022, 10:19 p.m. UTC
This series adds a DRM driver for the electrophoretic display controller
found in a few different Rockchip SoCs, specifically the RK3566/RK3568
variant[0] used by the PineNote tablet[1].

This is my first real involvement with the DRM subsystem, so please let
me know where I am misunderstanding things.

This is now the second SoC-integrated EPD controller with a DRM driver
submission -- the first one being the i.MX6 EPDC[2]. I want to thank
Andreas for sending that series, and for his advice while writing this
driver.

One goal I have with sending this series is to discuss how to support
EPDs more generally within the DRM subsystem, so the interfaces with
panels and PMICs and waveform LUTs can be controller-independent.

My understanding is that the i.MX6 EPDC series is at least partly based
on the downstream vendor driver. This driver is a clean-sheet design for
hardware with different (read: fewer) capabilities, so we took some
different design paths, but we ran into many of the same sharp edges.

Here are some of the areas I would like input on:

Panel Lifecycle
===============
Panels use prepare/unprepare callbacks for their power supply. EPDs
should only be powered up when the display contents are changed. Should
the controller call both drm_panel_(un)prepare during each atomic update
when the framebuffer is dirty?

Similarly, panel enable/disable callbacks are tied to backlight state.
For an EPD, it makes sense to have the backlight enabled while the panel
is powered down (because the contents are static). Is it acceptable to
call drm_panel_{en,dis}able while the panel is not prepared?

With panel_bridge, the "normal" callback ordering is enforced, and tied
to the atomic state, so neither of these is possible.

As a result, neither the backlight nor the voltage regulators are tied
to the panel. The panel's regulators are consumed by the EBC itself.

Panel Timing Parameters
=======================
EPDs have more timing parameters than LCDs, and there are several
different ways of labeling these parameters. See for example the timing
diagrams on pp. 2237-2239 of the RK3568 TRM[0], the descriptions in the
ED103TC2 panel datasheet[3], and the submitted EPDC bindings[2].

Both the EPDC and EBC vendor drivers put all of the timing parameters in
the controller's OF node. There is no panel device/node.

I was able to squeeze everything needed for my specific case into a
struct drm_display_mode (see patches 5 and 14), but I don't know if this
is an acceptable use of those fields, or if will work with other
controllers. Is adding more fields to drm_display_mode an option?

See also the discussion of "dumb" LCD TCONs below.

Panel Connector Type / Media Bus Format
=======================================
The EBC supports either an 8-bit or 16-bit wide data bus, where each
pair of data lines represents the source driver polarity (positive,
negative, or neutral) for a pixel.

The only effect of the data bus width is the number of pixels that are
transferred per clock cycle. It has no impact on the number of possible
grayscale levels.

How does that translate to DRM_MODE_CONNECTOR_* or MEDIA_BUS_FMT_*?

Panel Reflection
================
The ED103TC2 panel scans from right to left. Currently, there is no API
or OF property to represent this. I can add something similar to
drm_panel_orientation.

Should this be exposed to userspace? It is acceptable for the kernel
driver to flip the image when blitting from the framebuffer?

CRTC "active" and "enabled" states
==================================
What do these states mean in the context of an EPD? Currently, the
driver ignores "active" and clears the screen to solid white when the
CRTC is disabled.

The vendor drivers can switch to a user-provided image when the CRTC is
disabled. Is this something that can/should be supported upstream? If
so, how? Would userspace provide the image to the kernel, or just tell
the kernel not to clear the screen?

VBLANK Events and Asynchronous Commits
======================================
When should the VBLANK event complete? When the pixels have been blitted
to the kernel's shadow buffer? When the first frame of the waveform is
sent to the panel? When the last frame is sent to the panel?

Currently, the driver is taking the first option, letting
drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
the refresh thread. This is the only way I was able to get good
performance with existing userspace.

Waveform Loading
================
Waveform files are calibrated for each batch of panels. So while a
single waveform file may be "good enough" for all panels of a certain
model, the correctly-calibrated file will have better image quality.

I don't know of a good way to choose the calibrated file. Even the
board's compatible string may not be specific enough, if the board is
manufactured with multiple batches of panels.

Maybe the filename should just be the panel compatible, and the user is
responsible for putting the right file there? In that case, how should I
get the compatible string from the panel_bridge? Traverse the OF graph
myself?

There is also the issue that different controllers need the waveform
data in different formats. ".wbf" appears to be the format provided by
PVI/eInk, the panel manufacturer. The Rockchip EBC hardware expects a
single waveform in a flat array, so the driver has to extract/decompress
that from the .wbf file (this is done in patch 1). On the other hand,
the i.MX EPDC expects a ".wrf" file containing multiple waveforms[8].

I propose that the waveform file on disk should always be what was
shipped with the panel -- the .wbf file -- and any extracting or
reformatting is done in the kernel.

Waveform Selection From Userspace
=================================
EPDs use different waveforms for different purposes: high-quality
grayscale vs. monochrome text vs. dithered monochrome video. How can
userspace select which waveform to use? Should this be a plane property?

It is also likely that userspace will want to use different waveforms at
the same time for different parts of the screen, for example a fast
monochrome waveform for the drawing area of a note-taking app, but a
grayscale waveform for surrounding UI and window manager.

I believe the i.MX6 EPDC supports multiple planes, each with their own
waveform choice. That seems like a good abstraction, but the EBC only
supports one plane in hardware. So using this abstraction with the EBC
would require blending pixels and doing waveform lookups in software.

Blitting/Blending in Software
=============================
There are multiple layers to this topic (pun slightly intended):
 1) Today's userspace does not expect a grayscale framebuffer.
    Currently, the driver advertises XRGB8888 and converts to Y4
    in software. This seems to match other drivers (e.g. repaper).

 2) Ignoring what userspace "wants", the closest existing format is
    DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
    DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
    to use.

 3) The RK356x SoCs have an "RGA" hardware block that can do the
    RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
    which is needed for animation/video. Currently this is exposed with
    a V4L2 platform driver. Can this be inserted into the pipeline in a
    way that is transparent to userspace? Or must some userspace library
    be responsible for setting up the RGA => EBC pipeline?

 4) Supporting multiple planes (for multiple concurrent waveforms)
    implies blending in software. Is that acceptable?

 5) Thoughts on SIMD-optimized blitting and waveform lookup functions?

 5) Currently the driver uses kmalloc() and dma_sync_single_for_device()
    for its buffers, because it needs both fast reads and fast writes to
    several of them. Maybe cma_alloc() or dma_alloc_from_contiguous()
    would be more appropriate, but I don't see any drivers using those
    directly.

EPDs connected to "dumb" LCD TCONs
==================================
This topic is mostly related to my first patch. Some boards exist that
hook up an EPD to a normal LCD TCON, not a dedicated EPD controller. For
example, there's the reMarkable 2[5] and some PocketBook models[6][7].

I have some concerns about this:
 1) If we put EPD panel timings in panel drivers (e.g. panel-simple),
    can the same timings work with LCD TCONs and EPD controllers?

    For example: one cycle of the 16-bit data bus is "one pixel" to an
    LCD controller, but is "8 pixels" to an EPD controller. So there is
    a factor-of-8 difference in horizontal resolution depending on your
    perspective. Should we have the "number of pixel clock cycles" or
    "number of pixels" in .hdisplay/.htotal in the panel timings?

    Patch 14 adds a panel with "number of pixels" horizontal resolution,
    so the correct resolution is reported to userspace, but the existing
    eink_vb3300_kca_timing in panel-simple.c appears to use "number of
    pixel clocks" for its horizontal resolution. This makes the panel
    timing definitions incompatible across controllers.

 2) Using fbdev/fbcon with an EPD hooked up to an LCD TCON will have
    unintended consequences, and possibly damage the panel. Currently,
    there is no way to mark the framebuffer as expecting "source driver
    polarity waveforms" and not pixel data. Is there a specific
    DRM_FORMAT_* we should use for these cases to prevent accidental use
    by userspace?

    Or should we disallow this entirely, and have some wrapper layer to
    do the waveform lookups in kernelspace?

    I like the wrapper layer idea because it allows normal userspace and
    fbcon to work. It would not be much new code, especially since this
    driver already supports doing the whole pipeline in software. So
    that's why I wrote a separate helper library; I hope this code can
    be reused.

Thanks for any input!
Samuel

[0]: https://dl.radxa.com/rock3/docs/hw/datasheet/Rockchip%20RK3568%20TRM%20Part2%20V1.1-20210301.pdf
[1]: https://wiki.pine64.org/wiki/PineNote
[2]: https://lore.kernel.org/lkml/20220206080016.796556-1-andreas@kemnade.info/T/
[3]: https://files.pine64.org/doc/quartz64/Eink%20P-511-828-V1_ED103TC2%20Formal%20Spec%20V1.0_20190514.pdf
[4]: https://lore.kernel.org/lkml/cover.1646683502.git.geert@linux-m68k.org/T/
[5]: https://lore.kernel.org/lkml/20220330094126.30252-1-alistair@alistair23.me/T/
[6]: https://github.com/megous/linux/commits/pb-5.17
[7]: https://github.com/megous/linux/commit/3cdf84388959
[8]: https://github.com/fread-ink/inkwave


Samuel Holland (16):
  drm: Add a helper library for EPD drivers
  dt-bindings: display: rockchip: Add EBC binding
  drm/rockchip: Add EBC platform driver skeleton
  drm/rockchip: ebc: Add DRM driver skeleton
  drm/rockchip: ebc: Add CRTC mode setting
  drm/rockchip: ebc: Add CRTC refresh thread
  drm/rockchip: ebc: Add CRTC buffer management
  drm/rockchip: ebc: Add LUT loading
  drm/rockchip: ebc: Implement global refreshes
  drm/rockchip: ebc: Implement partial refreshes
  drm/rockchip: ebc: Enable diff mode for partial refreshes
  drm/rockchip: ebc: Add support for direct mode
  drm/rockchip: ebc: Add a panel reflection option
  drm/panel-simple: Add eInk ED103TC2
  arm64: dts: rockchip: rk356x: Add EBC node
  [DO NOT MERGE] arm64: dts: rockchip: pinenote: Enable EBC display
    pipeline

 .../display/rockchip/rockchip,rk3568-ebc.yaml |  106 ++
 .../boot/dts/rockchip/rk3566-pinenote.dtsi    |   80 +
 arch/arm64/boot/dts/rockchip/rk356x.dtsi      |   14 +
 drivers/gpu/drm/Kconfig                       |    6 +
 drivers/gpu/drm/Makefile                      |    4 +-
 drivers/gpu/drm/drm_epd_helper.c              |  663 +++++++
 drivers/gpu/drm/panel/panel-simple.c          |   31 +
 drivers/gpu/drm/rockchip/Kconfig              |   12 +
 drivers/gpu/drm/rockchip/Makefile             |    2 +
 drivers/gpu/drm/rockchip/rockchip_ebc.c       | 1586 +++++++++++++++++
 include/drm/drm_epd_helper.h                  |  104 ++
 11 files changed, 2607 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/display/rockchip/rockchip,rk3568-ebc.yaml
 create mode 100644 drivers/gpu/drm/drm_epd_helper.c
 create mode 100644 drivers/gpu/drm/rockchip/rockchip_ebc.c
 create mode 100644 include/drm/drm_epd_helper.h

Comments

Maxime Ripard April 14, 2022, 8:50 a.m. UTC | #1
On Wed, Apr 13, 2022 at 05:19:00PM -0500, Samuel Holland wrote:
> This series adds a DRM driver for the electrophoretic display controller
> found in a few different Rockchip SoCs, specifically the RK3566/RK3568
> variant[0] used by the PineNote tablet[1].
> 
> This is my first real involvement with the DRM subsystem, so please let
> me know where I am misunderstanding things.
> 
> This is now the second SoC-integrated EPD controller with a DRM driver
> submission -- the first one being the i.MX6 EPDC[2]. I want to thank
> Andreas for sending that series, and for his advice while writing this
> driver.
> 
> One goal I have with sending this series is to discuss how to support
> EPDs more generally within the DRM subsystem, so the interfaces with
> panels and PMICs and waveform LUTs can be controller-independent.
> 
> My understanding is that the i.MX6 EPDC series is at least partly based
> on the downstream vendor driver. This driver is a clean-sheet design for
> hardware with different (read: fewer) capabilities, so we took some
> different design paths, but we ran into many of the same sharp edges.
> 
> Here are some of the areas I would like input on:
> 
> Panel Lifecycle
> ===============
> Panels use prepare/unprepare callbacks for their power supply. EPDs
> should only be powered up when the display contents are changed. Should
> the controller call both drm_panel_(un)prepare during each atomic update
> when the framebuffer is dirty?
> 
> Similarly, panel enable/disable callbacks are tied to backlight state.
> For an EPD, it makes sense to have the backlight enabled while the panel
> is powered down (because the contents are static). Is it acceptable to
> call drm_panel_{en,dis}able while the panel is not prepared?
> 
> With panel_bridge, the "normal" callback ordering is enforced, and tied
> to the atomic state, so neither of these is possible.
> 
> As a result, neither the backlight nor the voltage regulators are tied
> to the panel. The panel's regulators are consumed by the EBC itself.

At least to manage the power state, that looks fairly similar to what we
have already to enter / exit from panel self refresh, so maybe we can
leverage that infrastructure?

And thus we would have something like enabling the backlight when we
prepare the panel, but only enable / disable the regulator when we exit
/ enter PSR mode?

Would that make sense?

> Panel Timing Parameters
> =======================
> EPDs have more timing parameters than LCDs, and there are several
> different ways of labeling these parameters. See for example the timing
> diagrams on pp. 2237-2239 of the RK3568 TRM[0], the descriptions in the
> ED103TC2 panel datasheet[3], and the submitted EPDC bindings[2].
> 
> Both the EPDC and EBC vendor drivers put all of the timing parameters in
> the controller's OF node. There is no panel device/node.
> 
> I was able to squeeze everything needed for my specific case into a
> struct drm_display_mode (see patches 5 and 14), but I don't know if this
> is an acceptable use of those fields, or if will work with other
> controllers. Is adding more fields to drm_display_mode an option?
> 
> See also the discussion of "dumb" LCD TCONs below.

Reading that datasheet and patch series, it's not clear to me whether
it's just a set of generic parameters for E-ink display, or if it's some
hardware specific representation of those timings.

Generally speaking, drm_display_mode is an approximation of what the
timings are. The exact clock rate for example will be widely different
between RGB, HDMI or MIPI-DSI (with or without burst). I think that as
long as you can derive a drm_display_mode from those parameters, and can
infer those parameters from a drm_display_mode, you can definitely reuse
it.

> Panel Connector Type / Media Bus Format
> =======================================
> The EBC supports either an 8-bit or 16-bit wide data bus, where each
> pair of data lines represents the source driver polarity (positive,
> negative, or neutral) for a pixel.
> 
> The only effect of the data bus width is the number of pixels that are
> transferred per clock cycle. It has no impact on the number of possible
> grayscale levels.
> 
> How does that translate to DRM_MODE_CONNECTOR_* or MEDIA_BUS_FMT_*?

We'll probably want a separate connector mode, but you could add a
parameter on the OF-graph endpoint to set the media bus format.

> Panel Reflection
> ================
> The ED103TC2 panel scans from right to left. Currently, there is no API
> or OF property to represent this. I can add something similar to
> drm_panel_orientation.

Yeah, leveraging DRM_MODE_REFLECT_X into something similar to
drm_panel_orientation makes sense

> Should this be exposed to userspace? It is acceptable for the kernel
> driver to flip the image when blitting from the framebuffer?

I'm not sure about whether or not we should expose it to userspace. I'd
say yes, but I'll leave it to others :)

> CRTC "active" and "enabled" states
> ==================================
> What do these states mean in the context of an EPD? Currently, the
> driver ignores "active" and clears the screen to solid white when the
> CRTC is disabled.
> 
> The vendor drivers can switch to a user-provided image when the CRTC is
> disabled. Is this something that can/should be supported upstream? If
> so, how? Would userspace provide the image to the kernel, or just tell
> the kernel not to clear the screen?

I think the semantics are that whenever the CRTC is disabled, the panel
is expected to be blank.

Leaving an image on after it's been disabled would have a bunch of
side-effects we probably don't want. For example, let's assume we have
that support, an application sets a "disabled image" and quits. Should
we leave the content on? If so, for how long exactly?

Either way, this is likely to be doable with PSR as well, so I think
it's a bit out of scope of this series for now.

> VBLANK Events and Asynchronous Commits
> ======================================
> When should the VBLANK event complete? When the pixels have been blitted
> to the kernel's shadow buffer? When the first frame of the waveform is
> sent to the panel? When the last frame is sent to the panel?
> 
> Currently, the driver is taking the first option, letting
> drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> the refresh thread. This is the only way I was able to get good
> performance with existing userspace.

I've been having the same kind of discussions in private lately, so I'm
interested by the answer as well :)

It would be worth looking into the SPI/I2C panels for this, since it's
basically the same case.

> Waveform Loading
> ================
> Waveform files are calibrated for each batch of panels. So while a
> single waveform file may be "good enough" for all panels of a certain
> model, the correctly-calibrated file will have better image quality.
> 
> I don't know of a good way to choose the calibrated file. Even the
> board's compatible string may not be specific enough, if the board is
> manufactured with multiple batches of panels.
> 
> Maybe the filename should just be the panel compatible, and the user is
> responsible for putting the right file there? In that case, how should I
> get the compatible string from the panel_bridge? Traverse the OF graph
> myself?

It's not really clear to me what panel_bridge has to do with it? I'm
assuming that file has to be uploaded some way or another to the
encoder?

If so, yeah, you should just follow through the OF-graph and use the
panel compatible. We have a similar case already with panel-mipi-dbi
(even though it's standalone)

> There is also the issue that different controllers need the waveform
> data in different formats. ".wbf" appears to be the format provided by
> PVI/eInk, the panel manufacturer. The Rockchip EBC hardware expects a
> single waveform in a flat array, so the driver has to extract/decompress
> that from the .wbf file (this is done in patch 1). On the other hand,
> the i.MX EPDC expects a ".wrf" file containing multiple waveforms[8].
> 
> I propose that the waveform file on disk should always be what was
> shipped with the panel -- the .wbf file -- and any extracting or
> reformatting is done in the kernel.

Any kind of parsing in the kernel from a file you have no control over
always irks me :)

Why and how are those files different in the first place?

> Waveform Selection From Userspace
> =================================
> EPDs use different waveforms for different purposes: high-quality
> grayscale vs. monochrome text vs. dithered monochrome video. How can
> userspace select which waveform to use? Should this be a plane property?
>
> It is also likely that userspace will want to use different waveforms at
> the same time for different parts of the screen, for example a fast
> monochrome waveform for the drawing area of a note-taking app, but a
> grayscale waveform for surrounding UI and window manager.
> 
> I believe the i.MX6 EPDC supports multiple planes, each with their own
> waveform choice. That seems like a good abstraction,

I agree

> but the EBC only supports one plane in hardware. So using this
> abstraction with the EBC would require blending pixels and doing
> waveform lookups in software.

Not really? You'd have a single plane available, with only one waveform
pick for that plane?

> Blitting/Blending in Software
> =============================
> There are multiple layers to this topic (pun slightly intended):
>  1) Today's userspace does not expect a grayscale framebuffer.
>     Currently, the driver advertises XRGB8888 and converts to Y4
>     in software. This seems to match other drivers (e.g. repaper).
>
>  2) Ignoring what userspace "wants", the closest existing format is
>     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
>     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
>     to use.
> 
>  3) The RK356x SoCs have an "RGA" hardware block that can do the
>     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
>     which is needed for animation/video. Currently this is exposed with
>     a V4L2 platform driver. Can this be inserted into the pipeline in a
>     way that is transparent to userspace? Or must some userspace library
>     be responsible for setting up the RGA => EBC pipeline?

I'm very interested in this answer as well :)

I think the current consensus is that it's up to userspace to set this
up though.

>  4) Supporting multiple planes (for multiple concurrent waveforms)
>     implies blending in software. Is that acceptable?
> 
>  5) Thoughts on SIMD-optimized blitting and waveform lookup functions?
> 
>  5) Currently the driver uses kmalloc() and dma_sync_single_for_device()
>     for its buffers, because it needs both fast reads and fast writes to
>     several of them. Maybe cma_alloc() or dma_alloc_from_contiguous()
>     would be more appropriate, but I don't see any drivers using those
>     directly.

cma_alloc isn't meant to be used directly by drivers anyway, one of the
main reason being that CMA might not be available (or desirable) in the
first place on the platform the code will run.

The most common option would be dma_alloc_coherent. It often means that
the buffer will be mapped non-cacheable, so it kills the access
performances. So it completely depends on your access patterns whether
it makes sense in your driver or not. kmalloc + dma_sync_single or
dma_map_single is also a valid option.

> EPDs connected to "dumb" LCD TCONs
> ==================================
> This topic is mostly related to my first patch. Some boards exist that
> hook up an EPD to a normal LCD TCON, not a dedicated EPD controller. For
> example, there's the reMarkable 2[5] and some PocketBook models[6][7].
> 
> I have some concerns about this:
>  1) If we put EPD panel timings in panel drivers (e.g. panel-simple),
>     can the same timings work with LCD TCONs and EPD controllers?

I'll think we'll need a separate panel driver for this anyway

>     For example: one cycle of the 16-bit data bus is "one pixel" to an
>     LCD controller, but is "8 pixels" to an EPD controller. So there is
>     a factor-of-8 difference in horizontal resolution depending on your
>     perspective. Should we have the "number of pixel clock cycles" or
>     "number of pixels" in .hdisplay/.htotal in the panel timings?
> 
>     Patch 14 adds a panel with "number of pixels" horizontal resolution,
>     so the correct resolution is reported to userspace, but the existing
>     eink_vb3300_kca_timing in panel-simple.c appears to use "number of
>     pixel clocks" for its horizontal resolution. This makes the panel
>     timing definitions incompatible across controllers.
> 
>  2) Using fbdev/fbcon with an EPD hooked up to an LCD TCON will have
>     unintended consequences, and possibly damage the panel. Currently,
>     there is no way to mark the framebuffer as expecting "source driver
>     polarity waveforms" and not pixel data. Is there a specific
>     DRM_FORMAT_* we should use for these cases to prevent accidental use
>     by userspace?
> 
>     Or should we disallow this entirely, and have some wrapper layer to
>     do the waveform lookups in kernelspace?
> 
>     I like the wrapper layer idea because it allows normal userspace and
>     fbcon to work. It would not be much new code, especially since this
>     driver already supports doing the whole pipeline in software. So
>     that's why I wrote a separate helper library; I hope this code can
>     be reused.

If exposing the panel as a KMS connector can damage the display, I don't
think we should expose it at all. Even a property or something won't
work, because older applications won't know about that property and will
try to use it anyway.

So whatever the solution is, it can't be "you have to know that this
device is special, or else...". The default, trivial, case where an
application just comes up and tries to display something should somewhat
work (even if it might be a bit absurd, like ignoring non_desktop)

Maxime
Andreas Kemnade April 21, 2022, 6:43 a.m. UTC | #2
On Wed, 13 Apr 2022 17:19:00 -0500
Samuel Holland <samuel@sholland.org> wrote:

[...]
> Waveform Selection From Userspace
> =================================
> EPDs use different waveforms for different purposes: high-quality
> grayscale vs. monochrome text vs. dithered monochrome video. How can
> userspace select which waveform to use? Should this be a plane property?
> 
Or does userspace rather select a QoS, like low-latency vs. high
quality. Or this will not change for a longer time: like doing full
refreshes.

> It is also likely that userspace will want to use different waveforms at
> the same time for different parts of the screen, for example a fast
> monochrome waveform for the drawing area of a note-taking app, but a
> grayscale waveform for surrounding UI and window manager.
> 

> I believe the i.MX6 EPDC supports multiple planes, each with their own
> waveform choice. That seems like a good abstraction, but the EBC only
> supports one plane in hardware. So using this abstraction with the EBC
> would require blending pixels and doing waveform lookups in software.
> 
The iMX6 EPDC has one working buffer containing the old+new state of
the pixel. That is 16bpp. Then for each update you can specify a
rectangle in an independant 8bpp buffer as a source. For now I am just
using a single buffer. But yes, that construction could be used to do
some multi plane stuff.

> Blitting/Blending in Software
> =============================
> There are multiple layers to this topic (pun slightly intended):
>  1) Today's userspace does not expect a grayscale framebuffer.
>     Currently, the driver advertises XRGB8888 and converts to Y4
>     in software. This seems to match other drivers (e.g. repaper).
> 
>  2) Ignoring what userspace "wants", the closest existing format is
>     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
>     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
>     to use.
>
hmm R=red? That sounds strange. I am unsure whether doing things with
lower bit depths actually really helps. 

>  3) The RK356x SoCs have an "RGA" hardware block that can do the
>     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
>     which is needed for animation/video. Currently this is exposed with
>     a V4L2 platform driver. Can this be inserted into the pipeline in a
>     way that is transparent to userspace? Or must some userspace library
>     be responsible for setting up the RGA => EBC pipeline?

hmm, we have other drivers with some hardware block doing rotation, but
in that cases it is not exposed as v4l2 mem2mem device.

On IMX6 there is also the PXP doing RGB-to-grayscale and rotation but
exposed as v4l2 device. But it can also be used to do undocumented
stuff writing to the 16bpp working buffer. So basically it is similar.
But I would do thoso things in a second step and just have the basic
stuff upstreamed

Regards,
Andreas
Nicolas Frattaroli April 21, 2022, 7:10 a.m. UTC | #3
On Donnerstag, 21. April 2022 08:43:38 CEST Andreas Kemnade wrote:
> On Wed, 13 Apr 2022 17:19:00 -0500
> Samuel Holland <samuel@sholland.org> wrote:
> > Blitting/Blending in Software
> > =============================
> > There are multiple layers to this topic (pun slightly intended):
> >  1) Today's userspace does not expect a grayscale framebuffer.
> >     Currently, the driver advertises XRGB8888 and converts to Y4
> >     in software. This seems to match other drivers (e.g. repaper).
> > 
> >  2) Ignoring what userspace "wants", the closest existing format is
> >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> >     to use.
> >
> hmm R=red? That sounds strange. I am unsure whether doing things with
> lower bit depths actually really helps. 

Hi,

for single-component formats, the name of the component plays
practically no role. Even if said component was really red,
it makes little difference to either side.

For example, the OpenGL straight up refers to all single component
image formats as only using the red component:

	OpenGL only allows "R", "RG", "RGB", or "RGBA"; other
	combinations are not allowed as internal image formats.

from https://www.khronos.org/opengl/wiki/Image_Format

In truth it would of course be nice if the world agreed on
not making the name of data structures imply some way they
are to be processed, but humanity hasn't gotten there yet.

> 
> Regards,
> Andreas
> 

Regards,
Nicolas Frattaroli
Daniel Vetter May 25, 2022, 5:18 p.m. UTC | #4
Some comments on this from my side too, not sure how good they are when it
comes more to the hw side of things :-)

On Thu, Apr 14, 2022 at 10:50:18AM +0200, Maxime Ripard wrote:
> On Wed, Apr 13, 2022 at 05:19:00PM -0500, Samuel Holland wrote:
> > This series adds a DRM driver for the electrophoretic display controller
> > found in a few different Rockchip SoCs, specifically the RK3566/RK3568
> > variant[0] used by the PineNote tablet[1].
> > 
> > This is my first real involvement with the DRM subsystem, so please let
> > me know where I am misunderstanding things.
> > 
> > This is now the second SoC-integrated EPD controller with a DRM driver
> > submission -- the first one being the i.MX6 EPDC[2]. I want to thank
> > Andreas for sending that series, and for his advice while writing this
> > driver.
> > 
> > One goal I have with sending this series is to discuss how to support
> > EPDs more generally within the DRM subsystem, so the interfaces with
> > panels and PMICs and waveform LUTs can be controller-independent.
> > 
> > My understanding is that the i.MX6 EPDC series is at least partly based
> > on the downstream vendor driver. This driver is a clean-sheet design for
> > hardware with different (read: fewer) capabilities, so we took some
> > different design paths, but we ran into many of the same sharp edges.
> > 
> > Here are some of the areas I would like input on:
> > 
> > Panel Lifecycle
> > ===============
> > Panels use prepare/unprepare callbacks for their power supply. EPDs
> > should only be powered up when the display contents are changed. Should
> > the controller call both drm_panel_(un)prepare during each atomic update
> > when the framebuffer is dirty?
> > 
> > Similarly, panel enable/disable callbacks are tied to backlight state.
> > For an EPD, it makes sense to have the backlight enabled while the panel
> > is powered down (because the contents are static). Is it acceptable to
> > call drm_panel_{en,dis}able while the panel is not prepared?
> > 
> > With panel_bridge, the "normal" callback ordering is enforced, and tied
> > to the atomic state, so neither of these is possible.
> > 
> > As a result, neither the backlight nor the voltage regulators are tied
> > to the panel. The panel's regulators are consumed by the EBC itself.
> 
> At least to manage the power state, that looks fairly similar to what we
> have already to enter / exit from panel self refresh, so maybe we can
> leverage that infrastructure?
> 
> And thus we would have something like enabling the backlight when we
> prepare the panel, but only enable / disable the regulator when we exit
> / enter PSR mode?
> 
> Would that make sense?
> 
> > Panel Timing Parameters
> > =======================
> > EPDs have more timing parameters than LCDs, and there are several
> > different ways of labeling these parameters. See for example the timing
> > diagrams on pp. 2237-2239 of the RK3568 TRM[0], the descriptions in the
> > ED103TC2 panel datasheet[3], and the submitted EPDC bindings[2].
> > 
> > Both the EPDC and EBC vendor drivers put all of the timing parameters in
> > the controller's OF node. There is no panel device/node.
> > 
> > I was able to squeeze everything needed for my specific case into a
> > struct drm_display_mode (see patches 5 and 14), but I don't know if this
> > is an acceptable use of those fields, or if will work with other
> > controllers. Is adding more fields to drm_display_mode an option?
> > 
> > See also the discussion of "dumb" LCD TCONs below.
> 
> Reading that datasheet and patch series, it's not clear to me whether
> it's just a set of generic parameters for E-ink display, or if it's some
> hardware specific representation of those timings.
> 
> Generally speaking, drm_display_mode is an approximation of what the
> timings are. The exact clock rate for example will be widely different
> between RGB, HDMI or MIPI-DSI (with or without burst). I think that as
> long as you can derive a drm_display_mode from those parameters, and can
> infer those parameters from a drm_display_mode, you can definitely reuse
> it.
> 
> > Panel Connector Type / Media Bus Format
> > =======================================
> > The EBC supports either an 8-bit or 16-bit wide data bus, where each
> > pair of data lines represents the source driver polarity (positive,
> > negative, or neutral) for a pixel.
> > 
> > The only effect of the data bus width is the number of pixels that are
> > transferred per clock cycle. It has no impact on the number of possible
> > grayscale levels.
> > 
> > How does that translate to DRM_MODE_CONNECTOR_* or MEDIA_BUS_FMT_*?
> 
> We'll probably want a separate connector mode, but you could add a
> parameter on the OF-graph endpoint to set the media bus format.
> 
> > Panel Reflection
> > ================
> > The ED103TC2 panel scans from right to left. Currently, there is no API
> > or OF property to represent this. I can add something similar to
> > drm_panel_orientation.
> 
> Yeah, leveraging DRM_MODE_REFLECT_X into something similar to
> drm_panel_orientation makes sense

Yeah

> > Should this be exposed to userspace? It is acceptable for the kernel
> > driver to flip the image when blitting from the framebuffer?
> 
> I'm not sure about whether or not we should expose it to userspace. I'd
> say yes, but I'll leave it to others :)

Same. I'm very grumpily accepting that we need sw conversion tools from
xrgb8888 to more unusual framebuffer formats, but everything else should
be userspace problems imo.

It's a bit more awkard than a wrongly rotate screen if it's mirrored, but
I guess that's it.

What is surprising is that your hw really doesn't have any hw support to
mirror things, since that's generally really easy to implement.

For the blitter I guess that would be a v4l mem2mem device?

> > CRTC "active" and "enabled" states
> > ==================================
> > What do these states mean in the context of an EPD? Currently, the
> > driver ignores "active" and clears the screen to solid white when the
> > CRTC is disabled.
> > 
> > The vendor drivers can switch to a user-provided image when the CRTC is
> > disabled. Is this something that can/should be supported upstream? If
> > so, how? Would userspace provide the image to the kernel, or just tell
> > the kernel not to clear the screen?
> 
> I think the semantics are that whenever the CRTC is disabled, the panel
> is expected to be blank.
> 
> Leaving an image on after it's been disabled would have a bunch of
> side-effects we probably don't want. For example, let's assume we have
> that support, an application sets a "disabled image" and quits. Should
> we leave the content on? If so, for how long exactly?
> 
> Either way, this is likely to be doable with PSR as well, so I think
> it's a bit out of scope of this series for now.

active is hw state

enabled is a pure sw state on top, to make sure that all the hw resources
you need are still reserved. E.g. when you have 2 crtc and you enable one,
but keep it off (i.e. active = false), then the clocks, memory bw and all
that are still reserved. This is to be able to guarantee that dpms off ->
on transitions always work.

Iow, in atomic_check you need to look at enabled, in atomic commit you
need to look at active.

With a single crtc there should never be any issue here really, since
there's no other crtc where you can steal clocks or similar things from.

Note that kerneldoc should explain this all, pls double check and if it's
not clear submit a patch please.

> > VBLANK Events and Asynchronous Commits
> > ======================================
> > When should the VBLANK event complete? When the pixels have been blitted
> > to the kernel's shadow buffer? When the first frame of the waveform is
> > sent to the panel? When the last frame is sent to the panel?
> > 
> > Currently, the driver is taking the first option, letting
> > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> > the refresh thread. This is the only way I was able to get good
> > performance with existing userspace.
> 
> I've been having the same kind of discussions in private lately, so I'm
> interested by the answer as well :)
> 
> It would be worth looking into the SPI/I2C panels for this, since it's
> basically the same case.

So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
help improve them), but there's two modes:

- drivers which have vblank, which might be somewhat variable (VRR) or
  become simulated (self-refresh panels), but otherwise is a more-or-less
  regular clock. For this case the atomic commit event must match the
  vblank events exactly (frame count and timestamp)

- drivers which don't have vblank at all, mostly these are i2c/spi panels
  or virtual hw and stuff like that. In this case the event simply happens
  when the driver is done with refresh/upload, and the frame count should
  be zero (since it's meaningless).

Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
should be renamed to no_vblank or so (the various flags that control it
are a bit better named).

Again the docs should explain it all, but maybe we should clarify them or
perhaps rename that helper to be more meaningful.

> > Waveform Loading
> > ================
> > Waveform files are calibrated for each batch of panels. So while a
> > single waveform file may be "good enough" for all panels of a certain
> > model, the correctly-calibrated file will have better image quality.
> > 
> > I don't know of a good way to choose the calibrated file. Even the
> > board's compatible string may not be specific enough, if the board is
> > manufactured with multiple batches of panels.
> > 
> > Maybe the filename should just be the panel compatible, and the user is
> > responsible for putting the right file there? In that case, how should I
> > get the compatible string from the panel_bridge? Traverse the OF graph
> > myself?
> 
> It's not really clear to me what panel_bridge has to do with it? I'm
> assuming that file has to be uploaded some way or another to the
> encoder?
> 
> If so, yeah, you should just follow through the OF-graph and use the
> panel compatible. We have a similar case already with panel-mipi-dbi
> (even though it's standalone)

Yeah if there's really on difference then I guess the best we can do is
"make sure you put the right file into the firmware directory". Sucks but
anything else isn't really better.

> > There is also the issue that different controllers need the waveform
> > data in different formats. ".wbf" appears to be the format provided by
> > PVI/eInk, the panel manufacturer. The Rockchip EBC hardware expects a
> > single waveform in a flat array, so the driver has to extract/decompress
> > that from the .wbf file (this is done in patch 1). On the other hand,
> > the i.MX EPDC expects a ".wrf" file containing multiple waveforms[8].
> > 
> > I propose that the waveform file on disk should always be what was
> > shipped with the panel -- the .wbf file -- and any extracting or
> > reformatting is done in the kernel.
> 
> Any kind of parsing in the kernel from a file you have no control over
> always irks me :)
> 
> Why and how are those files different in the first place?
> 
> > Waveform Selection From Userspace
> > =================================
> > EPDs use different waveforms for different purposes: high-quality
> > grayscale vs. monochrome text vs. dithered monochrome video. How can
> > userspace select which waveform to use? Should this be a plane property?
> >
> > It is also likely that userspace will want to use different waveforms at
> > the same time for different parts of the screen, for example a fast
> > monochrome waveform for the drawing area of a note-taking app, but a
> > grayscale waveform for surrounding UI and window manager.
> > 
> > I believe the i.MX6 EPDC supports multiple planes, each with their own
> > waveform choice. That seems like a good abstraction,
> 
> I agree
> 
> > but the EBC only supports one plane in hardware. So using this
> > abstraction with the EBC would require blending pixels and doing
> > waveform lookups in software.
> 
> Not really? You'd have a single plane available, with only one waveform
> pick for that plane?
> 
> > Blitting/Blending in Software
> > =============================
> > There are multiple layers to this topic (pun slightly intended):
> >  1) Today's userspace does not expect a grayscale framebuffer.
> >     Currently, the driver advertises XRGB8888 and converts to Y4
> >     in software. This seems to match other drivers (e.g. repaper).
> >
> >  2) Ignoring what userspace "wants", the closest existing format is
> >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> >     to use.
> > 
> >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> >     which is needed for animation/video. Currently this is exposed with
> >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> >     way that is transparent to userspace? Or must some userspace library
> >     be responsible for setting up the RGA => EBC pipeline?
> 
> I'm very interested in this answer as well :)
> 
> I think the current consensus is that it's up to userspace to set this
> up though.

Yeah I think v4l mem2mem device is the answer for these, and then
userspace gets to set it all up.

> >  4) Supporting multiple planes (for multiple concurrent waveforms)
> >     implies blending in software. Is that acceptable?
> > 
> >  5) Thoughts on SIMD-optimized blitting and waveform lookup functions?
> > 
> >  5) Currently the driver uses kmalloc() and dma_sync_single_for_device()
> >     for its buffers, because it needs both fast reads and fast writes to
> >     several of them. Maybe cma_alloc() or dma_alloc_from_contiguous()
> >     would be more appropriate, but I don't see any drivers using those
> >     directly.
> 
> cma_alloc isn't meant to be used directly by drivers anyway, one of the
> main reason being that CMA might not be available (or desirable) in the
> first place on the platform the code will run.
> 
> The most common option would be dma_alloc_coherent. It often means that
> the buffer will be mapped non-cacheable, so it kills the access
> performances. So it completely depends on your access patterns whether
> it makes sense in your driver or not. kmalloc + dma_sync_single or
> dma_map_single is also a valid option.
> 
> > EPDs connected to "dumb" LCD TCONs
> > ==================================
> > This topic is mostly related to my first patch. Some boards exist that
> > hook up an EPD to a normal LCD TCON, not a dedicated EPD controller. For
> > example, there's the reMarkable 2[5] and some PocketBook models[6][7].
> > 
> > I have some concerns about this:
> >  1) If we put EPD panel timings in panel drivers (e.g. panel-simple),
> >     can the same timings work with LCD TCONs and EPD controllers?
> 
> I'll think we'll need a separate panel driver for this anyway
> 
> >     For example: one cycle of the 16-bit data bus is "one pixel" to an
> >     LCD controller, but is "8 pixels" to an EPD controller. So there is
> >     a factor-of-8 difference in horizontal resolution depending on your
> >     perspective. Should we have the "number of pixel clock cycles" or
> >     "number of pixels" in .hdisplay/.htotal in the panel timings?
> > 
> >     Patch 14 adds a panel with "number of pixels" horizontal resolution,
> >     so the correct resolution is reported to userspace, but the existing
> >     eink_vb3300_kca_timing in panel-simple.c appears to use "number of
> >     pixel clocks" for its horizontal resolution. This makes the panel
> >     timing definitions incompatible across controllers.

Yeah that sounds bad. And I guess this really should be "number of pixels"
and the drivers need to be adjusted/fixed to be consistent.

> > 
> >  2) Using fbdev/fbcon with an EPD hooked up to an LCD TCON will have
> >     unintended consequences, and possibly damage the panel. Currently,
> >     there is no way to mark the framebuffer as expecting "source driver
> >     polarity waveforms" and not pixel data. Is there a specific
> >     DRM_FORMAT_* we should use for these cases to prevent accidental use
> >     by userspace?
> > 
> >     Or should we disallow this entirely, and have some wrapper layer to
> >     do the waveform lookups in kernelspace?
> > 
> >     I like the wrapper layer idea because it allows normal userspace and
> >     fbcon to work. It would not be much new code, especially since this
> >     driver already supports doing the whole pipeline in software. So
> >     that's why I wrote a separate helper library; I hope this code can
> >     be reused.
> 
> If exposing the panel as a KMS connector can damage the display, I don't
> think we should expose it at all. Even a property or something won't
> work, because older applications won't know about that property and will
> try to use it anyway.
> 
> So whatever the solution is, it can't be "you have to know that this
> device is special, or else...". The default, trivial, case where an
> application just comes up and tries to display something should somewhat
> work (even if it might be a bit absurd, like ignoring non_desktop)

Yeah I think if you can wreak the panel that's no good, and should be
hidden I guess. So I guess for these the kernel gets to apply the waveform
stuff internally, which I really don't like but oh well. We have plenty of
cpu slicing and dicing in other spi/i2c/usb drivers too.
-Daniel
Maxime Ripard May 31, 2022, 8:58 a.m. UTC | #5
Hi Daniel,

Thanks for your feedback

On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote:
> > > VBLANK Events and Asynchronous Commits
> > > ======================================
> > > When should the VBLANK event complete? When the pixels have been blitted
> > > to the kernel's shadow buffer? When the first frame of the waveform is
> > > sent to the panel? When the last frame is sent to the panel?
> > > 
> > > Currently, the driver is taking the first option, letting
> > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> > > the refresh thread. This is the only way I was able to get good
> > > performance with existing userspace.
> > 
> > I've been having the same kind of discussions in private lately, so I'm
> > interested by the answer as well :)
> > 
> > It would be worth looking into the SPI/I2C panels for this, since it's
> > basically the same case.
> 
> So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
> help improve them), but there's two modes:
> 
> - drivers which have vblank, which might be somewhat variable (VRR) or
>   become simulated (self-refresh panels), but otherwise is a more-or-less
>   regular clock. For this case the atomic commit event must match the
>   vblank events exactly (frame count and timestamp)

Part of my interrogation there is do we have any kind of expectation
on whether or not, when we commit, the next vblank is going to be the
one matching that commit or we're allowed to defer it by an arbitrary
number of frames (provided that the frame count and timestamps are
correct) ?

> - drivers which don't have vblank at all, mostly these are i2c/spi panels
>   or virtual hw and stuff like that. In this case the event simply happens
>   when the driver is done with refresh/upload, and the frame count should
>   be zero (since it's meaningless).
> 
> Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
> should be renamed to no_vblank or so (the various flags that control it
> are a bit better named).
> 
> Again the docs should explain it all, but maybe we should clarify them or
> perhaps rename that helper to be more meaningful.
> 
> > > Blitting/Blending in Software
> > > =============================
> > > There are multiple layers to this topic (pun slightly intended):
> > >  1) Today's userspace does not expect a grayscale framebuffer.
> > >     Currently, the driver advertises XRGB8888 and converts to Y4
> > >     in software. This seems to match other drivers (e.g. repaper).
> > >
> > >  2) Ignoring what userspace "wants", the closest existing format is
> > >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> > >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> > >     to use.
> > > 
> > >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> > >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> > >     which is needed for animation/video. Currently this is exposed with
> > >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> > >     way that is transparent to userspace? Or must some userspace library
> > >     be responsible for setting up the RGA => EBC pipeline?
> > 
> > I'm very interested in this answer as well :)
> > 
> > I think the current consensus is that it's up to userspace to set this
> > up though.
> 
> Yeah I think v4l mem2mem device is the answer for these, and then
> userspace gets to set it all up.

I think the question wasn't really about where that driver should be,
but more about who gets to set it up, and if the kernel could have
some component to expose the formats supported by the converter, but
whenever a commit is being done pipe that to the v4l2 device before
doing a page flip.

We have a similar use-case for the RaspberryPi where the hardware
codec will produce a framebuffer format that isn't standard. That
format is understood by the display pipeline, and it can do
writeback.

However, some people are using a separate display (like a SPI display
supported by tinydrm) and we would still like to be able to output the
decoded frames there.

Is there some way we could plumb things to "route" that buffer through
the writeback engine to perform a format conversion before sending it
over to the SPI display automatically?

Maxime
Daniel Vetter June 1, 2022, 12:35 p.m. UTC | #6
On Tue, May 31, 2022 at 10:58:35AM +0200, Maxime Ripard wrote:
> Hi Daniel,
> 
> Thanks for your feedback
> 
> On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote:
> > > > VBLANK Events and Asynchronous Commits
> > > > ======================================
> > > > When should the VBLANK event complete? When the pixels have been blitted
> > > > to the kernel's shadow buffer? When the first frame of the waveform is
> > > > sent to the panel? When the last frame is sent to the panel?
> > > > 
> > > > Currently, the driver is taking the first option, letting
> > > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> > > > the refresh thread. This is the only way I was able to get good
> > > > performance with existing userspace.
> > > 
> > > I've been having the same kind of discussions in private lately, so I'm
> > > interested by the answer as well :)
> > > 
> > > It would be worth looking into the SPI/I2C panels for this, since it's
> > > basically the same case.
> > 
> > So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
> > help improve them), but there's two modes:
> > 
> > - drivers which have vblank, which might be somewhat variable (VRR) or
> >   become simulated (self-refresh panels), but otherwise is a more-or-less
> >   regular clock. For this case the atomic commit event must match the
> >   vblank events exactly (frame count and timestamp)
> 
> Part of my interrogation there is do we have any kind of expectation
> on whether or not, when we commit, the next vblank is going to be the
> one matching that commit or we're allowed to defer it by an arbitrary
> number of frames (provided that the frame count and timestamps are
> correct) ?

In general yes, but there's no guarantee. The only guarante we give for
drivers with vblank counters is that if you receive a vblank event (flip
complete or vblank event) for frame #n, then an immediate flip/atomic
ioctl call will display earliest for frame #n+1.

Also usually you should be able to hit #n+1, but even today with fun stuff
like self refresh panels getting out of self refresh mode might take a bit
more than a few frames, and so you might end up being late. But otoh if
you just do a page flip loop then on average (after the crtc is fully
resumed) you should be able to update at vrefresh rate exactly.

> > - drivers which don't have vblank at all, mostly these are i2c/spi panels
> >   or virtual hw and stuff like that. In this case the event simply happens
> >   when the driver is done with refresh/upload, and the frame count should
> >   be zero (since it's meaningless).
> > 
> > Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
> > should be renamed to no_vblank or so (the various flags that control it
> > are a bit better named).
> > 
> > Again the docs should explain it all, but maybe we should clarify them or
> > perhaps rename that helper to be more meaningful.
> > 
> > > > Blitting/Blending in Software
> > > > =============================
> > > > There are multiple layers to this topic (pun slightly intended):
> > > >  1) Today's userspace does not expect a grayscale framebuffer.
> > > >     Currently, the driver advertises XRGB8888 and converts to Y4
> > > >     in software. This seems to match other drivers (e.g. repaper).
> > > >
> > > >  2) Ignoring what userspace "wants", the closest existing format is
> > > >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> > > >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> > > >     to use.
> > > > 
> > > >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> > > >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> > > >     which is needed for animation/video. Currently this is exposed with
> > > >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> > > >     way that is transparent to userspace? Or must some userspace library
> > > >     be responsible for setting up the RGA => EBC pipeline?
> > > 
> > > I'm very interested in this answer as well :)
> > > 
> > > I think the current consensus is that it's up to userspace to set this
> > > up though.
> > 
> > Yeah I think v4l mem2mem device is the answer for these, and then
> > userspace gets to set it all up.
> 
> I think the question wasn't really about where that driver should be,
> but more about who gets to set it up, and if the kernel could have
> some component to expose the formats supported by the converter, but
> whenever a commit is being done pipe that to the v4l2 device before
> doing a page flip.
> 
> We have a similar use-case for the RaspberryPi where the hardware
> codec will produce a framebuffer format that isn't standard. That
> format is understood by the display pipeline, and it can do
> writeback.
> 
> However, some people are using a separate display (like a SPI display
> supported by tinydrm) and we would still like to be able to output the
> decoded frames there.
> 
> Is there some way we could plumb things to "route" that buffer through
> the writeback engine to perform a format conversion before sending it
> over to the SPI display automatically?

Currently not transparently. Or at least no one has done that, and I'm not
sure that's really a great idea. With big gpus all that stuff is done with
separate command submission to the render side of things, and you can
fully pipeline all that with in/out-fences.

Doing that in the kms driver side in the kernel feels very wrong to me :-/
-Daniel
Maxime Ripard June 8, 2022, 2:48 p.m. UTC | #7
On Wed, Jun 01, 2022 at 02:35:35PM +0200, Daniel Vetter wrote:
> On Tue, May 31, 2022 at 10:58:35AM +0200, Maxime Ripard wrote:
> > Hi Daniel,
> > 
> > Thanks for your feedback
> > 
> > On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote:
> > > > > VBLANK Events and Asynchronous Commits
> > > > > ======================================
> > > > > When should the VBLANK event complete? When the pixels have been blitted
> > > > > to the kernel's shadow buffer? When the first frame of the waveform is
> > > > > sent to the panel? When the last frame is sent to the panel?
> > > > > 
> > > > > Currently, the driver is taking the first option, letting
> > > > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> > > > > the refresh thread. This is the only way I was able to get good
> > > > > performance with existing userspace.
> > > > 
> > > > I've been having the same kind of discussions in private lately, so I'm
> > > > interested by the answer as well :)
> > > > 
> > > > It would be worth looking into the SPI/I2C panels for this, since it's
> > > > basically the same case.
> > > 
> > > So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
> > > help improve them), but there's two modes:
> > > 
> > > - drivers which have vblank, which might be somewhat variable (VRR) or
> > >   become simulated (self-refresh panels), but otherwise is a more-or-less
> > >   regular clock. For this case the atomic commit event must match the
> > >   vblank events exactly (frame count and timestamp)
> > 
> > Part of my interrogation there is do we have any kind of expectation
> > on whether or not, when we commit, the next vblank is going to be the
> > one matching that commit or we're allowed to defer it by an arbitrary
> > number of frames (provided that the frame count and timestamps are
> > correct) ?
> 
> In general yes, but there's no guarantee. The only guarante we give for
> drivers with vblank counters is that if you receive a vblank event (flip
> complete or vblank event) for frame #n, then an immediate flip/atomic
> ioctl call will display earliest for frame #n+1.
> 
> Also usually you should be able to hit #n+1, but even today with fun stuff
> like self refresh panels getting out of self refresh mode might take a bit
> more than a few frames, and so you might end up being late. But otoh if
> you just do a page flip loop then on average (after the crtc is fully
> resumed) you should be able to update at vrefresh rate exactly.

I had more the next item in mind there: if we were to write something in
the kernel that would transparently behave like a full-blown KMS driver,
but would pipe the commits through a KMS writeback driver before sending
them to our SPI panel, we would always be at best two vblanks late.

So this would mean that userspace would do a page flip, get a first
vblank, but the actual vblank for that commit would be the next one (at
best), consistently.

> > > - drivers which don't have vblank at all, mostly these are i2c/spi panels
> > >   or virtual hw and stuff like that. In this case the event simply happens
> > >   when the driver is done with refresh/upload, and the frame count should
> > >   be zero (since it's meaningless).
> > > 
> > > Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
> > > should be renamed to no_vblank or so (the various flags that control it
> > > are a bit better named).
> > > 
> > > Again the docs should explain it all, but maybe we should clarify them or
> > > perhaps rename that helper to be more meaningful.
> > > 
> > > > > Blitting/Blending in Software
> > > > > =============================
> > > > > There are multiple layers to this topic (pun slightly intended):
> > > > >  1) Today's userspace does not expect a grayscale framebuffer.
> > > > >     Currently, the driver advertises XRGB8888 and converts to Y4
> > > > >     in software. This seems to match other drivers (e.g. repaper).
> > > > >
> > > > >  2) Ignoring what userspace "wants", the closest existing format is
> > > > >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> > > > >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> > > > >     to use.
> > > > > 
> > > > >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> > > > >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> > > > >     which is needed for animation/video. Currently this is exposed with
> > > > >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> > > > >     way that is transparent to userspace? Or must some userspace library
> > > > >     be responsible for setting up the RGA => EBC pipeline?
> > > > 
> > > > I'm very interested in this answer as well :)
> > > > 
> > > > I think the current consensus is that it's up to userspace to set this
> > > > up though.
> > > 
> > > Yeah I think v4l mem2mem device is the answer for these, and then
> > > userspace gets to set it all up.
> > 
> > I think the question wasn't really about where that driver should be,
> > but more about who gets to set it up, and if the kernel could have
> > some component to expose the formats supported by the converter, but
> > whenever a commit is being done pipe that to the v4l2 device before
> > doing a page flip.
> > 
> > We have a similar use-case for the RaspberryPi where the hardware
> > codec will produce a framebuffer format that isn't standard. That
> > format is understood by the display pipeline, and it can do
> > writeback.
> > 
> > However, some people are using a separate display (like a SPI display
> > supported by tinydrm) and we would still like to be able to output the
> > decoded frames there.
> > 
> > Is there some way we could plumb things to "route" that buffer through
> > the writeback engine to perform a format conversion before sending it
> > over to the SPI display automatically?
> 
> Currently not transparently. Or at least no one has done that, and I'm not
> sure that's really a great idea. With big gpus all that stuff is done with
> separate command submission to the render side of things, and you can
> fully pipeline all that with in/out-fences.
> 
> Doing that in the kms driver side in the kernel feels very wrong to me :-/

So I guess what you're saying is that there's a close to 0% chance of it
being accepted if we were to come up with such an architecture?

Thanks!
Maxime
Daniel Vetter June 8, 2022, 3:34 p.m. UTC | #8
On Wed, Jun 08, 2022 at 04:48:47PM +0200, Maxime Ripard wrote:
> On Wed, Jun 01, 2022 at 02:35:35PM +0200, Daniel Vetter wrote:
> > On Tue, May 31, 2022 at 10:58:35AM +0200, Maxime Ripard wrote:
> > > Hi Daniel,
> > > 
> > > Thanks for your feedback
> > > 
> > > On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote:
> > > > > > VBLANK Events and Asynchronous Commits
> > > > > > ======================================
> > > > > > When should the VBLANK event complete? When the pixels have been blitted
> > > > > > to the kernel's shadow buffer? When the first frame of the waveform is
> > > > > > sent to the panel? When the last frame is sent to the panel?
> > > > > > 
> > > > > > Currently, the driver is taking the first option, letting
> > > > > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting og
> > > > > > the refresh thread. This is the only way I was able to get good
> > > > > > performance with existing userspace.
> > > > > 
> > > > > I've been having the same kind of discussions in private lately, so I'm
> > > > > interested by the answer as well :)
> > > > > 
> > > > > It would be worth looking into the SPI/I2C panels for this, since it's
> > > > > basically the same case.
> > > > 
> > > > So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
> > > > help improve them), but there's two modes:
> > > > 
> > > > - drivers which have vblank, which might be somewhat variable (VRR) or
> > > >   become simulated (self-refresh panels), but otherwise is a more-or-less
> > > >   regular clock. For this case the atomic commit event must match the
> > > >   vblank events exactly (frame count and timestamp)
> > > 
> > > Part of my interrogation there is do we have any kind of expectation
> > > on whether or not, when we commit, the next vblank is going to be the
> > > one matching that commit or we're allowed to defer it by an arbitrary
> > > number of frames (provided that the frame count and timestamps are
> > > correct) ?
> > 
> > In general yes, but there's no guarantee. The only guarante we give for
> > drivers with vblank counters is that if you receive a vblank event (flip
> > complete or vblank event) for frame #n, then an immediate flip/atomic
> > ioctl call will display earliest for frame #n+1.
> > 
> > Also usually you should be able to hit #n+1, but even today with fun stuff
> > like self refresh panels getting out of self refresh mode might take a bit
> > more than a few frames, and so you might end up being late. But otoh if
> > you just do a page flip loop then on average (after the crtc is fully
> > resumed) you should be able to update at vrefresh rate exactly.
> 
> I had more the next item in mind there: if we were to write something in
> the kernel that would transparently behave like a full-blown KMS driver,
> but would pipe the commits through a KMS writeback driver before sending
> them to our SPI panel, we would always be at best two vblanks late.
> 
> So this would mean that userspace would do a page flip, get a first
> vblank, but the actual vblank for that commit would be the next one (at
> best), consistently.
> 
> > > > - drivers which don't have vblank at all, mostly these are i2c/spi panels
> > > >   or virtual hw and stuff like that. In this case the event simply happens
> > > >   when the driver is done with refresh/upload, and the frame count should
> > > >   be zero (since it's meaningless).
> > > > 
> > > > Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
> > > > should be renamed to no_vblank or so (the various flags that control it
> > > > are a bit better named).
> > > > 
> > > > Again the docs should explain it all, but maybe we should clarify them or
> > > > perhaps rename that helper to be more meaningful.
> > > > 
> > > > > > Blitting/Blending in Software
> > > > > > =============================
> > > > > > There are multiple layers to this topic (pun slightly intended):
> > > > > >  1) Today's userspace does not expect a grayscale framebuffer.
> > > > > >     Currently, the driver advertises XRGB8888 and converts to Y4
> > > > > >     in software. This seems to match other drivers (e.g. repaper).
> > > > > >
> > > > > >  2) Ignoring what userspace "wants", the closest existing format is
> > > > > >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> > > > > >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> > > > > >     to use.
> > > > > > 
> > > > > >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> > > > > >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> > > > > >     which is needed for animation/video. Currently this is exposed with
> > > > > >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> > > > > >     way that is transparent to userspace? Or must some userspace library
> > > > > >     be responsible for setting up the RGA => EBC pipeline?
> > > > > 
> > > > > I'm very interested in this answer as well :)
> > > > > 
> > > > > I think the current consensus is that it's up to userspace to set this
> > > > > up though.
> > > > 
> > > > Yeah I think v4l mem2mem device is the answer for these, and then
> > > > userspace gets to set it all up.
> > > 
> > > I think the question wasn't really about where that driver should be,
> > > but more about who gets to set it up, and if the kernel could have
> > > some component to expose the formats supported by the converter, but
> > > whenever a commit is being done pipe that to the v4l2 device before
> > > doing a page flip.
> > > 
> > > We have a similar use-case for the RaspberryPi where the hardware
> > > codec will produce a framebuffer format that isn't standard. That
> > > format is understood by the display pipeline, and it can do
> > > writeback.
> > > 
> > > However, some people are using a separate display (like a SPI display
> > > supported by tinydrm) and we would still like to be able to output the
> > > decoded frames there.
> > > 
> > > Is there some way we could plumb things to "route" that buffer through
> > > the writeback engine to perform a format conversion before sending it
> > > over to the SPI display automatically?
> > 
> > Currently not transparently. Or at least no one has done that, and I'm not
> > sure that's really a great idea. With big gpus all that stuff is done with
> > separate command submission to the render side of things, and you can
> > fully pipeline all that with in/out-fences.
> > 
> > Doing that in the kms driver side in the kernel feels very wrong to me :-/
> 
> So I guess what you're saying is that there's a close to 0% chance of it
> being accepted if we were to come up with such an architecture?

Yup.

I think the only exception is if you have a multi-region memory manager
using ttm (or hand-rolled, but please don't), where we first have to move
the buffer into the right region before it can be scanned out. And that's
generally done with a copy engine, for performance reasons.

But that copy engine is really just a very dumb (but fast!) memcpy, and
doesn't do any format conversion or stride/orientation changes like a
full-blown blitter engine (or mem2mem in v4l speak) can do.

So if it's really just memory management then I think it's fine, but
anything beyond that is a no imo.

Now for an overall full-featured stack we clearly need that, and it would
be great if there's some common userspace libraries for hosting such code.
But thus far all attempts have fallen short :-/ Which I guess is another
indicator that we really shouldn't try to solve this problem in a generic
fashion, and hence really shouldn't try to solve it with magic behind the
generic kms interface in the kernel.

For even more context I do think my old "why is 2d so hard" blogpost rant
still applies:

https://blog.ffwll.ch/2018/08/no-2d-in-drm.html

The "why no 2d api for the more limited problem of handling framebuffers"
is really just a small, but not any less complex, subset of that bigger
conundrum.
-Daniel