mbox series

[v14,RESEND,0/6] drm/imx: Introduce i.MX8qm/qxp DPU DRM

Message ID 20230822085949.816844-1-victor.liu@nxp.com (mailing list archive)
Headers show
Series drm/imx: Introduce i.MX8qm/qxp DPU DRM | expand

Message

Liu Ying Aug. 22, 2023, 8:59 a.m. UTC
Hi,


This is the v14 series to introduce i.MX8qm/qxp Display Processing Unit(DPU)
DRM support.

DPU is comprised of a blit engine for 2D graphics, a display controller
and a command sequencer.  Outside of DPU, optional prefetch engines can
fetch data from memory prior to some DPU fetchunits of blit engine and
display controller.  The pre-fetchers support linear formats and Vivante
GPU tile formats.

Reference manual can be found at:
https://www.nxp.com/webapp/Download?colCode=IMX8DQXPRM


This patch set adds kernel modesetting support for the display controller part.
It supports two CRTCs per display controller, several planes, prefetch
engines and some properties of CRTC and plane.  Currently, the registers of
the controller is accessed without command sequencer involved, instead just by
using CPU.  DRM connectors would be created from the DPU KMS driver.


Patch 1 ~ 3 add dt-bindings for DPU and prefetch engines.
Patch 4 is a minor improvement of a macro to suppress warning as the KMS driver
uses it.
Patch 5 introduces the DPU DRM support.
Patch 6 updates MAINTAINERS.

Welcome comments, thanks.

v13->v14:
* Rebase the patch series to the latest drm-misc-next branch(v6.1-rc6 based).
* Include drm_fbdev_generic.h in dpu_drv.c due to the rebase.
* Fix dpu drm driver suspend/resume by properly get drm device through
  dev_get_drvdata().
* Use pm_ptr() macro for dpu core driver PM operations.
* Use pm_sleep_ptr() macro for dpu drm driver PM operations.
* Use DEFINE_SIMPLE_DEV_PM_OPS() macro to define dpu drm driver PM operations,
  instead of SIMPLE_DEV_PM_OPS().
* Update year of Copyright.
* Add SoC series name 'i.MX8'/'IMX8'/'imx8' to dpu driver module decription,
  Kconfig name, dpu driver names and dpu driver object name.
* Resend based on the latest drm-misc-next branch.

v12->v13:
* Drop 'drm->irq_enabled = true;' from patch 5/6 to fix a potential build
  break reported by 'kernel test robot <lkp@intel.com>'.  drm->irq_enabled
  should not be used by imx-dpu drm as it is only used by legacy drivers
  with userspace modesetting.

v11->v12:
* Rebase the series upon v6.1-rc1.
* Minor update on Kconfigs, struct names and macro names for patch 5/6
  due to the rebase.

v10->v11:
* Rebase the series upon v6.0-rc1.
* Include drm_blend.h and drm_framebuffer.h in dpu-kms.c and dpu-plane.c
  to fix build errors due to the rebase.
* Fix a checkpatch warning for dpu-crtc.c.
* Properly use dev_err_probe() to return it's return value directly where
  possible.

v9->v10:
* Rebase the series upon v5.18-rc1.
* Make 'checkpatch.pl --strict' happier for patch 5/6.
* Add Rob's R-b tag on patch 3/6.
* Add Laurentiu's R-b tag on patch 5/6.
* Add Laurentiu's A-b tag on patch 6/6.

v8->v9:
* Use drm_atomic_get_new_plane_state() in dpu_plane_atomic_update() for
  patch 5/6. (Laurentiu)
* Drop getting DPU DT alias ID for patch 5/6, as it is unused.
* Reference 'interrupts-extended' schema instead of 'interrupts' for patch 3/6
  to require an additional DPR interrupt(r_rtram_stall) because the reference
  manual does mention it, though the driver doesn't get/use it for now.
  Reference 'interrupt-names' schema to define the two DPR interrupt names -
  'dpr_wrap' and 'r_rtram_stall'.  Accordingly, patch 5/6 gets the 'dpr_wrap'
  interrupt by name.
* Drop Rob's R-b tag on patch 3/6, as review is needed.

v7->v8:
* Rebase this series up onto the latest drm-misc-next branch, due to DRM plane
  helper functions API change(atomic_check and atomic_update) from DRM atomic
  core.  So, dpu_plane_atomic_check() and dpu_plane_atomic_update() are updated
  accordingly in patch 5/6.  Also, rename plane->state variables and relevant
  DPU plane state variables in those two functions to reflect they are new
  states, like the patch 'drm: Rename plane->state variables in atomic update
  and disable' recently landed in drm-misc-next.
* Replace drm_gem_fb_prepare_fb() with drm_gem_plane_helper_prepare_fb() in
  patch 5/6, due to DRM core API change.
* Improve DPR burst length for GPU standard tile and 32bpp GPU super tile in
  patch 5/6 to align with the latest version of internal HW documention.

v6->v7:
* Fix return value of dpu_get_irqs() if platform_get_irq() fails. (Laurentiu)
* Use the function array dpu_irq_handler[] to store individual DPU irq handlers.
  (Laurentiu)
* Call get/put() hooks directly to get/put DPU fetchunits for DPU plane groups.
  (Laurentiu)
* Shorten the names of individual DPU irq handlers by using DPU unit abbrev
  names to make writing dpu_irq_handler[] easier.
* Add Rob's R-b tag back on DPU dt-binding patch as change in v6 was reviewed.

v5->v6:
* Use graph schema in the DPU dt-binding.
* Do not use macros where possible in the DPU DRM driver. (Laurentiu)
* Break dpu_plane_atomic_check() into some smaller functions. (Laurentiu)
* Address some minor comments from Laurentiu on the DPU DRM driver.
* Add dpu_crtc_err() helper marco in the DPU DRM driver to tell dmesg
  which CRTC generates error.
* Drop calling dev_set_drvdata() from dpu_drm_bind/unbind() in the DPU DRM
  driver as it is done in dpu_drm_probe().
* Some trivial tweaks.

v4->v5:
* Rebase up onto the latest drm-misc-next branch and remove the hook to
  drm_atomic_helper_legacy_gamma_set() from patch 5/6, because it was dropped
  by the newly landed commit 'drm: automatic legacy gamma support'.
* Remove a redundant blank line from dpu_plane_atomic_update() in patch 5/6.

v3->v4:
* Improve compatible properties in DPU and prefetch engines' dt bindings
  by using enum instead of oneOf+const.
* Add Rob's R-b tags on dt binding patches(patch 1/6, 2/6 and 3/6).
* Add Daniel's A-b tag on patch 4/6.

v2->v3:
* Fix DPU DRM driver build warnings which are
  Reported-by: kernel test robot <lkp@intel.com>.
* Drop DPU DRM driver build dependency on IMX_SCU, as dummy SCU functions have
  been added in header files by the patch 'firmware: imx: add dummy functions'
  which has landed in linux-next/master branch.
* Add a missing blank line in include/drm/drm_atomic.h.

v1->v2:
* Test this patch set also with i.MX8qm LVDS displays.
* Drop the device tree patches because we'll use new dt binding way to
  support i.MX8qm/qxp clocks.  This depends on a not-yet-landed patch set
  to do basic conversions for the platforms.
* Fix dt binding yamllint warnings.
* Require bypass0 and bypass1 clocks for both i.MX8qxp and i.MX8qm in DPU's
  dt binding documentation.
* Use new dt binding way to add clocks in the dt binding examples.
* Address several comments from Laurentiu on the DPU DRM patch.


Liu Ying (6):
  dt-bindings: display: imx: Add i.MX8qxp/qm DPU binding
  dt-bindings: display: imx: Add i.MX8qxp/qm PRG binding
  dt-bindings: display: imx: Add i.MX8qxp/qm DPR channel binding
  drm/atomic: Avoid unused-but-set-variable warning on
    for_each_old_plane_in_state
  drm/imx: Introduce i.MX8qm/qxp DPU DRM
  MAINTAINERS: add maintainer for i.MX8qxp DPU DRM driver

 .../display/imx/fsl,imx8qxp-dprc.yaml         |  100 ++
 .../bindings/display/imx/fsl,imx8qxp-dpu.yaml |  387 ++++++
 .../bindings/display/imx/fsl,imx8qxp-prg.yaml |   60 +
 MAINTAINERS                                   |    9 +
 drivers/gpu/drm/imx/Kconfig                   |    1 +
 drivers/gpu/drm/imx/Makefile                  |    1 +
 drivers/gpu/drm/imx/dpu/Kconfig               |    9 +
 drivers/gpu/drm/imx/dpu/Makefile              |   10 +
 drivers/gpu/drm/imx/dpu/dpu-constframe.c      |  171 +++
 drivers/gpu/drm/imx/dpu/dpu-core.c            | 1044 +++++++++++++++++
 drivers/gpu/drm/imx/dpu/dpu-crtc.c            |  969 +++++++++++++++
 drivers/gpu/drm/imx/dpu/dpu-crtc.h            |   72 ++
 drivers/gpu/drm/imx/dpu/dpu-disengcfg.c       |  117 ++
 drivers/gpu/drm/imx/dpu/dpu-dprc.c            |  715 +++++++++++
 drivers/gpu/drm/imx/dpu/dpu-dprc.h            |   40 +
 drivers/gpu/drm/imx/dpu/dpu-drv.c             |  294 +++++
 drivers/gpu/drm/imx/dpu/dpu-drv.h             |   28 +
 drivers/gpu/drm/imx/dpu/dpu-extdst.c          |  299 +++++
 drivers/gpu/drm/imx/dpu/dpu-fetchdecode.c     |  292 +++++
 drivers/gpu/drm/imx/dpu/dpu-fetcheco.c        |  224 ++++
 drivers/gpu/drm/imx/dpu/dpu-fetchlayer.c      |  152 +++
 drivers/gpu/drm/imx/dpu/dpu-fetchunit.c       |  610 ++++++++++
 drivers/gpu/drm/imx/dpu/dpu-fetchunit.h       |  195 +++
 drivers/gpu/drm/imx/dpu/dpu-fetchwarp.c       |  248 ++++
 drivers/gpu/drm/imx/dpu/dpu-framegen.c        |  395 +++++++
 drivers/gpu/drm/imx/dpu/dpu-gammacor.c        |  223 ++++
 drivers/gpu/drm/imx/dpu/dpu-hscaler.c         |  275 +++++
 drivers/gpu/drm/imx/dpu/dpu-kms.c             |  542 +++++++++
 drivers/gpu/drm/imx/dpu/dpu-kms.h             |   23 +
 drivers/gpu/drm/imx/dpu/dpu-layerblend.c      |  348 ++++++
 drivers/gpu/drm/imx/dpu/dpu-plane.c           |  804 +++++++++++++
 drivers/gpu/drm/imx/dpu/dpu-plane.h           |   59 +
 drivers/gpu/drm/imx/dpu/dpu-prg.c             |  433 +++++++
 drivers/gpu/drm/imx/dpu/dpu-prg.h             |   45 +
 drivers/gpu/drm/imx/dpu/dpu-prv.h             |  231 ++++
 drivers/gpu/drm/imx/dpu/dpu-tcon.c            |  250 ++++
 drivers/gpu/drm/imx/dpu/dpu-vscaler.c         |  308 +++++
 drivers/gpu/drm/imx/dpu/dpu.h                 |  385 ++++++
 include/drm/drm_atomic.h                      |    5 +-
 39 files changed, 10372 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-dprc.yaml
 create mode 100644 Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-dpu.yaml
 create mode 100644 Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-prg.yaml
 create mode 100644 drivers/gpu/drm/imx/dpu/Kconfig
 create mode 100644 drivers/gpu/drm/imx/dpu/Makefile
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-constframe.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-core.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-crtc.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-crtc.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-disengcfg.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-dprc.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-dprc.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-drv.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-drv.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-extdst.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetchdecode.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetcheco.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetchlayer.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetchunit.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetchunit.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-fetchwarp.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-framegen.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-gammacor.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-hscaler.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-kms.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-kms.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-layerblend.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-plane.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-plane.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-prg.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-prg.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-prv.h
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-tcon.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu-vscaler.c
 create mode 100644 drivers/gpu/drm/imx/dpu/dpu.h

 base-commit: 504245a5ab6b6e1bfe0280baa4885c551e082099

Comments

mripard@kernel.org Aug. 22, 2023, 12:59 p.m. UTC | #1
Hi,

Aside from the discussion on the binding and the general architecture, I
have some comments there.

On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> +		unsigned int id, enum dpu_unit_type type,
> +		unsigned long pec_base, unsigned long base)
> +{
> +	struct dpu_constframe *cf;
> +
> +	cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> +	if (!cf)
> +		return -ENOMEM;
> +
> +	dpu->cf_priv[index] = cf;

You can't store structures related to KMS in a device managed structure.
The DRM KMS device will stick around (and be accessible from userspace)
after the device has been removed until the last application closed its
file descriptor to the device.

This can be checked by enabling KASAN and manually unbinding the driver
through sysfs.

> +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> +	if (!cf->pec_base)
> +		return -ENOMEM;
> +
> +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> +	if (!cf->base)
> +		return -ENOMEM;

For the same reason, you need to protect any access to a device managed
resource (so clocks, registers, regulators, etc.) by a call to
drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug instead
of drm_dev_unregister.

> +static int dpu_crtc_pm_runtime_get_sync(struct dpu_crtc *dpu_crtc)
> +{
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(dpu_crtc->dev->parent);
> +	if (ret < 0) {
> +		pm_runtime_put_noidle(dpu_crtc->dev->parent);
> +		dpu_crtc_err(&dpu_crtc->base,
> +			     "failed to get parent device RPM sync: %d\n", ret);
> +	}
> +
> +	return ret;
> +}

That's pm_runtime_resume_and_get.

> +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> +{
> +	int ret;
> +
> +	ret = pm_runtime_put(dpu_crtc->dev->parent);
> +	if (ret < 0)
> +		dpu_crtc_err(&dpu_crtc->base,
> +			     "failed to put parent device RPM: %d\n", ret);
> +
> +	return ret;
> +}
> +
> +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> +{
> +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> +	struct drm_display_mode *adj = &crtc->state->adjusted_mode;
> +	enum dpu_link_id cf_link;
> +
> +	dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n", DRM_MODE_ARG(adj));
> +
> +	/* request power-on when we start to set mode for CRTC */
> +	dpu_crtc_pm_runtime_get_sync(dpu_crtc);

From the drm_crtc_helper_funcs documentation:

"""
	 * Note that the display pipe is completely off when this function is
	 * called. Atomic drivers which need hardware to be running before they
	 * program the new display mode (e.g. because they implement runtime PM)
	 * should not use this hook. This is because the helper library calls
	 * this hook only once per mode change and not every time the display
	 * pipeline is suspended using either DPMS or the new "ACTIVE" property.
	 * Which means register values set in this callback might get reset when
	 * the CRTC is suspended, but not restored.  Such drivers should instead
	 * move all their CRTC setup into the @atomic_enable callback.
"""

> +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> +				   struct drm_atomic_state *state)
> +{
> +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> +	unsigned long flags;
> +
> +	drm_crtc_vblank_on(crtc);
> +
> +	enable_irq(dpu_crtc->dec_shdld_irq);
> +	enable_irq(dpu_crtc->ed_cont_shdld_irq);
> +	enable_irq(dpu_crtc->ed_safe_shdld_irq);
> +
> +	dpu_fg_enable_clock(dpu_crtc->fg);
> +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> +	if (crtc->state->gamma_lut)
> +		dpu_crtc_set_gammacor(dpu_crtc);
> +	else
> +		dpu_crtc_disable_gammacor(dpu_crtc);
> +	dpu_fg_shdtokgen(dpu_crtc->fg);
> +
> +	/* don't relinquish CPU until TCON is set to operation mode */
> +	local_irq_save(flags);
> +	preempt_disable();
> +	dpu_fg_enable(dpu_crtc->fg);

That's super fishy. You shouldn't need that, at all. What is going on
there?

> +
> +	/*
> +	 * TKT320590:

Those are NXP internal references as far as as I can tell. They
shouldn't be here.

> +	 * Turn TCON into operation mode as soon as the first dumb
> +	 * frame is generated by DPU(we don't relinquish CPU to ensure
> +	 * this).  This makes DPR/PRG be able to evade the frame.
> +	 */
> +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc->fg);
> +	dpu_tcon_set_operation_mode(dpu_crtc->tcon);
> +	local_irq_restore(flags);
> +	preempt_enable();
> +
> +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_safe_shdld_done);
> +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_cont_shdld_done);
> +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(dec_shdld_done);
> +
> +	disable_irq(dpu_crtc->ed_safe_shdld_irq);
> +	disable_irq(dpu_crtc->ed_cont_shdld_irq);
> +	disable_irq(dpu_crtc->dec_shdld_irq);
> +
> +	DPU_CRTC_WAIT_FOR_FRAMEGEN_SECONDARY_SYNCUP(dpu_crtc->fg);

The dance around the interrupts doesn't look great either. This need a
proper description of the problem this was trying to solve. Also, what
happens if any of those interrupts fail to trigger before you timeout?

> +	DPU_CRTC_CHECK_FRAMEGEN_FIFO(dpu_crtc->fg);
> +
> +	dpu_crtc_queue_state_event(crtc);
> +}
> +
> +static void dpu_crtc_atomic_disable(struct drm_crtc *crtc,
> +				    struct drm_atomic_state *state)
> +{
> +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> +	struct drm_plane *plane;
> +	struct drm_plane_state *old_plane_state;
> +	struct dpu_plane_state *old_dpstate;
> +	struct dpu_fetchunit *fu;
> +	struct dpu_dprc *dprc;
> +	const struct dpu_fetchunit_ops *fu_ops;
> +	unsigned long flags;
> +	int i;
> +
> +	enable_irq(dpu_crtc->dec_seq_complete_irq);
> +
> +	/* don't relinquish CPU until DPRC repeat_en is disabled */
> +	local_irq_save(flags);
> +	preempt_disable();
> +	/*
> +	 * Sync to FrameGen frame counter moving so that
> +	 * FrameGen can be disabled in the next frame.
> +	 */
> +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc->fg);
> +	dpu_fg_disable(dpu_crtc->fg);
> +	/*
> +	 * There is one frame leftover after FrameGen disablement.
> +	 * Sync to FrameGen frame counter moving so that
> +	 * DPRC repeat_en can be disabled in the next frame.
> +	 */
> +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc->fg);
> +
> +	for_each_old_plane_in_state(state, plane, old_plane_state, i) {
> +		old_dpstate = to_dpu_plane_state(old_plane_state);
> +
> +		if (!old_plane_state->fb)
> +			continue;
> +
> +		if (old_plane_state->crtc != crtc)
> +			continue;
> +
> +		fu = old_dpstate->source;
> +
> +		fu_ops = dpu_fu_get_ops(fu);
> +
> +		dprc = fu_ops->get_dprc(fu);
> +		dpu_dprc_disable_repeat_en(dprc);
> +	}
> +
> +	local_irq_restore(flags);
> +	preempt_enable();
> +
> +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(dec_seq_complete_done);
> +
> +	disable_irq(dpu_crtc->dec_seq_complete_irq);
> +
> +	dpu_fg_disable_clock(dpu_crtc->fg);
> +
> +	drm_crtc_vblank_off(crtc);
> +
> +	spin_lock_irq(&crtc->dev->event_lock);
> +	if (crtc->state->event && !crtc->state->active) {
> +		drm_crtc_send_vblank_event(crtc, crtc->state->event);
> +		crtc->state->event = NULL;
> +	}
> +	spin_unlock_irq(&crtc->dev->event_lock);
> +
> +	/* request power-off when CRTC is disabled */
> +	dpu_crtc_pm_runtime_put(dpu_crtc);
> +}

Same story than in atomic_enable here.


> +static int legacyfb_depth = 32;
> +module_param(legacyfb_depth, uint, 0444);

No custom module parameter

> +static void dpu_atomic_put_plane_state(struct drm_atomic_state *state,
> +				       struct drm_plane *plane)
> +{
> +	int index = drm_plane_index(plane);
> +
> +	plane->funcs->atomic_destroy_state(plane, state->planes[index].state);
> +	state->planes[index].ptr = NULL;
> +	state->planes[index].state = NULL;
> +	state->planes[index].old_state = NULL;
> +	state->planes[index].new_state = NULL;
> +
> +	drm_modeset_unlock(&plane->mutex);
> +
> +	dpu_plane_dbg(plane, "put state\n");
> +}
> +
> +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> +				      struct drm_crtc *crtc)
> +{
> +	int index = drm_crtc_index(crtc);
> +
> +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> +	state->crtcs[index].ptr = NULL;
> +	state->crtcs[index].state = NULL;
> +	state->crtcs[index].old_state = NULL;
> +	state->crtcs[index].new_state = NULL;
> +
> +	drm_modeset_unlock(&crtc->mutex);
> +
> +	dpu_crtc_dbg(crtc, "put state\n");
> +}
> +
> +static void
> +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state *crtc_state)
> +{
> +	struct drm_atomic_state *state = crtc_state->state;
> +	struct drm_crtc *crtc = crtc_state->crtc;
> +	struct drm_plane *plane;
> +	struct drm_plane_state *old_plane_state, *new_plane_state;
> +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> +
> +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> +		old_plane_state = drm_atomic_get_old_plane_state(state, plane);
> +		new_plane_state = drm_atomic_get_new_plane_state(state, plane);
> +
> +		old_dpstate = to_dpu_plane_state(old_plane_state);
> +		new_dpstate = to_dpu_plane_state(new_plane_state);
> +
> +		/* Should be enough to check the below HW plane resources. */
> +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> +		    old_dpstate->source != new_dpstate->source ||
> +		    old_dpstate->blend != new_dpstate->blend)
> +			return;
> +	}
> +
> +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> +		dpu_atomic_put_plane_state(state, plane);
> +
> +	dpu_atomic_put_crtc_state(state, crtc);
> +}

That's super suspicious too. Are you really going around and dropping
and destroying plane and crtc states in a global state?

At the very least, you need to describe what this addresses and why you
think it's a good solution.

I kind of skimmed over the last part of the driver, but we should really
address these first comments. There's a larger discussion on the fact
that this driver does much more that it should and needs to (especially in
atomic_check, but not only), and this applies to the rest of patch.

Maxime
Liu Ying Sept. 5, 2023, 3:32 a.m. UTC | #2
On Tuesday, August 22, 2023 8:59 PM, Maxime Ripard <mripard@kernel.org> wrote:
> 
> Hi,

Hi,

> 
> Aside from the discussion on the binding and the general architecture, I
> have some comments there.

Thanks for your comments.

> 
> On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> > +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> > +		unsigned int id, enum dpu_unit_type type,
> > +		unsigned long pec_base, unsigned long base)
> > +{
> > +	struct dpu_constframe *cf;
> > +
> > +	cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> > +	if (!cf)
> > +		return -ENOMEM;
> > +
> > +	dpu->cf_priv[index] = cf;
> 
> You can't store structures related to KMS in a device managed structure.
> The DRM KMS device will stick around (and be accessible from userspace)
> after the device has been removed until the last application closed its
> file descriptor to the device.

The DRM device is registered after component_bind_all() is called in
dpu_drm_bind().  The CRTC components' platform devices are created
in the dpu_core_probe() where the device managed resources are
created.   So, it looks those resources are safe because the DRM device
will be unregistered before those resources are freed.

> 
> This can be checked by enabling KASAN and manually unbinding the driver
> through sysfs.

I enabled KASAN and manually unbound the dpu-core driver with command:

echo 56180000.dpu > /sys/bus/platform/drivers/dpu-core/56180000.dpu/driver/unbind 

KASAN didin't report memory issue regarding those device managed
resources.  However, it did report another issue in dpu_drm_unbind(),
where drm_device should be got from drv_data->drm_dev instead of
dev_get_drvdata(dev).  I'll fix that in next version.

BTW, the dpu-core driver was successfully bound again after unbinding with
command:

echo  56180000.dpu > /sys/bus/platform/drivers/dpu-core/bind

> 
> > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > +	if (!cf->pec_base)
> > +		return -ENOMEM;
> > +
> > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > +	if (!cf->base)
> > +		return -ENOMEM;
> 
> For the same reason, you need to protect any access to a device managed
> resource (so clocks, registers, regulators, etc.) by a call to
> drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug instead
> of drm_dev_unregister.

That's a good point.  I've tried to do that, but it turns out that the display controller
cannot be enabled again after binding the dpu-core driver manually again.  It seems
that the display controller requires a proper disablement procedure, but the "driver
instance overview " kdoc mentions the shortcoming of no proper disablement if
drm_dev_unplug() is used:

"""
* Drivers that want to support device unplugging (USB, DT overlay unload) should
 * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must protect
 * regions that is accessing device resources to prevent use after they're
 * released. This is done using drm_dev_enter() and drm_dev_exit(). There is one
 * shortcoming however, drm_dev_unplug() marks the drm_device as unplugged before
 * drm_atomic_helper_shutdown() is called. This means that if the disable code
 * paths are protected, they will not run on regular driver module unload,
 * possibly leaving the hardware enabled.
"""

A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
reset line provided by the embodying system.

Even if the reset works, the 2nd DPU instance in i.MX8qm would be a problem,
because it won't be reset or properly disabled if the 1st DPU instance is unbound.
Although the two DPU instances could be wrapped by two DRM devices, I tend
not to do that because downstream bridges in future SoCs might be able to mux
to different DPU instances at runtime.

Due to the disablement issue, can we set drm_dev_enter/exit/unplug aside first?

> 
> > +static int dpu_crtc_pm_runtime_get_sync(struct dpu_crtc *dpu_crtc)
> > +{
> > +	int ret;
> > +
> > +	ret = pm_runtime_get_sync(dpu_crtc->dev->parent);
> > +	if (ret < 0) {
> > +		pm_runtime_put_noidle(dpu_crtc->dev->parent);
> > +		dpu_crtc_err(&dpu_crtc->base,
> > +			     "failed to get parent device RPM sync: %d\n", ret);
> > +	}
> > +
> > +	return ret;
> > +}
> 
> That's pm_runtime_resume_and_get.

Ok, will use it.

> 
> > +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> > +{
> > +	int ret;
> > +
> > +	ret = pm_runtime_put(dpu_crtc->dev->parent);
> > +	if (ret < 0)
> > +		dpu_crtc_err(&dpu_crtc->base,
> > +			     "failed to put parent device RPM: %d\n", ret);
> > +
> > +	return ret;
> > +}
> > +
> > +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> > +{
> > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > +	struct drm_display_mode *adj = &crtc->state->adjusted_mode;
> > +	enum dpu_link_id cf_link;
> > +
> > +	dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n",
> DRM_MODE_ARG(adj));
> > +
> > +	/* request power-on when we start to set mode for CRTC */
> > +	dpu_crtc_pm_runtime_get_sync(dpu_crtc);
> 
> From the drm_crtc_helper_funcs documentation:
> 
> """
> 	 * Note that the display pipe is completely off when this function is
> 	 * called. Atomic drivers which need hardware to be running before
> they
> 	 * program the new display mode (e.g. because they implement
> runtime PM)
> 	 * should not use this hook. This is because the helper library calls
> 	 * this hook only once per mode change and not every time the display
> 	 * pipeline is suspended using either DPMS or the new "ACTIVE"
> property.
> 	 * Which means register values set in this callback might get reset
> when
> 	 * the CRTC is suspended, but not restored.  Such drivers should
> instead
> 	 * move all their CRTC setup into the @atomic_enable callback.
> """

I can use drm_atomic_helper_commit_tail() but not
drm_atomic_helper_commit_tail_rpm() because the planes need to be
updated prior to modeset_enables(where trigger shadow registers in
plane's HW resources to take effect).   Using the former one means that
RPM needs to be get/put in drm_atomic_helper_commit_planes(), which
doesn't seem good because in some cases the power of the display controller
might be lost after RPM put and I'm not sure if the registers set there will
be lost or not.   So, to avoid the issue the documentation mentions,
crtc_state->mode_changed is forced to be true in dpu_crtc_atomic_check()
if the CRTC is changed to active.  Is this acceptable?

> 
> > +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> > +				   struct drm_atomic_state *state)
> > +{
> > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > +	unsigned long flags;
> > +
> > +	drm_crtc_vblank_on(crtc);
> > +
> > +	enable_irq(dpu_crtc->dec_shdld_irq);
> > +	enable_irq(dpu_crtc->ed_cont_shdld_irq);
> > +	enable_irq(dpu_crtc->ed_safe_shdld_irq);
> > +
> > +	dpu_fg_enable_clock(dpu_crtc->fg);
> > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> > +	if (crtc->state->gamma_lut)
> > +		dpu_crtc_set_gammacor(dpu_crtc);
> > +	else
> > +		dpu_crtc_disable_gammacor(dpu_crtc);
> > +	dpu_fg_shdtokgen(dpu_crtc->fg);
> > +
> > +	/* don't relinquish CPU until TCON is set to operation mode */
> > +	local_irq_save(flags);
> > +	preempt_disable();
> > +	dpu_fg_enable(dpu_crtc->fg);
> 
> That's super fishy. You shouldn't need that, at all. What is going on
> there?

This aims to fully workaround the below errata recently released by
NXP.

ERR010910: DC: Display Subsystem First Frame Programming Restriction
Link: https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf

In short, to make sure the display controller can be enabled properly with
prefetch engine(DPRC + PRG), the TCON must be switch from bypass mode
to operation mode after FrameGen(FG) generates the first frame.

Timing is critical here, so local irq and preemption are disabled during the
switch procedure.

BTW, the driver always use prefetch engines for KMS, although they can
be bypassed.

> 
> > +
> > +	/*
> > +	 * TKT320590:
> 
> Those are NXP internal references as far as as I can tell. They
> shouldn't be here.

Ok, will change it to be ERR010910.

> 
> > +	 * Turn TCON into operation mode as soon as the first dumb
> > +	 * frame is generated by DPU(we don't relinquish CPU to ensure
> > +	 * this).  This makes DPR/PRG be able to evade the frame.
> > +	 */
> > +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc-
> >fg);
> > +	dpu_tcon_set_operation_mode(dpu_crtc->tcon);
> > +	local_irq_restore(flags);
> > +	preempt_enable();
> > +
> > +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_safe_shdld_done);
> > +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_cont_shdld_done);
> > +	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(dec_shdld_done);
> > +
> > +	disable_irq(dpu_crtc->ed_safe_shdld_irq);
> > +	disable_irq(dpu_crtc->ed_cont_shdld_irq);
> > +	disable_irq(dpu_crtc->dec_shdld_irq);
> > +
> > +	DPU_CRTC_WAIT_FOR_FRAMEGEN_SECONDARY_SYNCUP(dpu_crtc-
> >fg);
> 
> The dance around the interrupts doesn't look great either. This need a
> proper description of the problem this was trying to solve. Also, what
> happens if any of those interrupts fail to trigger before you timeout?

Hmm, this is just following the display controller spec which provides detail
steps to enable the display pipeline which include waiting for the interrupts
and the FrameGen primary or secondary channel syncup.

If the interrupts fail to trigger and we timeout, then there must be something
wrong either caused by DPU HW itself or driver bug.   Here, warnings are
generated only.

> 
> > +	DPU_CRTC_CHECK_FRAMEGEN_FIFO(dpu_crtc->fg);
> > +
> > +	dpu_crtc_queue_state_event(crtc);
> > +}
> > +
> > +static void dpu_crtc_atomic_disable(struct drm_crtc *crtc,
> > +				    struct drm_atomic_state *state)
> > +{
> > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > +	struct drm_plane *plane;
> > +	struct drm_plane_state *old_plane_state;
> > +	struct dpu_plane_state *old_dpstate;
> > +	struct dpu_fetchunit *fu;
> > +	struct dpu_dprc *dprc;
> > +	const struct dpu_fetchunit_ops *fu_ops;
> > +	unsigned long flags;
> > +	int i;
> > +
> > +	enable_irq(dpu_crtc->dec_seq_complete_irq);
> > +
> > +	/* don't relinquish CPU until DPRC repeat_en is disabled */
> > +	local_irq_save(flags);
> > +	preempt_disable();
> > +	/*
> > +	 * Sync to FrameGen frame counter moving so that
> > +	 * FrameGen can be disabled in the next frame.
> > +	 */
> > +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc-
> >fg);
> > +	dpu_fg_disable(dpu_crtc->fg);
> > +	/*
> > +	 * There is one frame leftover after FrameGen disablement.
> > +	 * Sync to FrameGen frame counter moving so that
> > +	 * DPRC repeat_en can be disabled in the next frame.
> > +	 */
> > +	DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc-
> >fg);
> > +
> > +	for_each_old_plane_in_state(state, plane, old_plane_state, i) {
> > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > +
> > +		if (!old_plane_state->fb)
> > +			continue;
> > +
> > +		if (old_plane_state->crtc != crtc)
> > +			continue;
> > +
> > +		fu = old_dpstate->source;
> > +
> > +		fu_ops = dpu_fu_get_ops(fu);
> > +
> > +		dprc = fu_ops->get_dprc(fu);
> > +		dpu_dprc_disable_repeat_en(dprc);
> > +	}
> > +
> > +	local_irq_restore(flags);
> > +	preempt_enable();
> > +
> > +
> 	DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(dec_seq_complete_d
> one);
> > +
> > +	disable_irq(dpu_crtc->dec_seq_complete_irq);
> > +
> > +	dpu_fg_disable_clock(dpu_crtc->fg);
> > +
> > +	drm_crtc_vblank_off(crtc);
> > +
> > +	spin_lock_irq(&crtc->dev->event_lock);
> > +	if (crtc->state->event && !crtc->state->active) {
> > +		drm_crtc_send_vblank_event(crtc, crtc->state->event);
> > +		crtc->state->event = NULL;
> > +	}
> > +	spin_unlock_irq(&crtc->dev->event_lock);
> > +
> > +	/* request power-off when CRTC is disabled */
> > +	dpu_crtc_pm_runtime_put(dpu_crtc);
> > +}
> 
> Same story than in atomic_enable here.

Similar to atomic_enable, strict procedure needs to be followed
to disable the CRTC properly, including disabling FrameGen by syncing
to frame counter moving and disabling DPRC repeat_en as soon as
possible.   The critical timing is achieved by disabling local irq and
preemption during the procedure.

> 
> 
> > +static int legacyfb_depth = 32;
> > +module_param(legacyfb_depth, uint, 0444);
> 
> No custom module parameter

Ok, will remove them.

> 
> > +static void dpu_atomic_put_plane_state(struct drm_atomic_state *state,
> > +				       struct drm_plane *plane)
> > +{
> > +	int index = drm_plane_index(plane);
> > +
> > +	plane->funcs->atomic_destroy_state(plane, state->planes[index].state);
> > +	state->planes[index].ptr = NULL;
> > +	state->planes[index].state = NULL;
> > +	state->planes[index].old_state = NULL;
> > +	state->planes[index].new_state = NULL;
> > +
> > +	drm_modeset_unlock(&plane->mutex);
> > +
> > +	dpu_plane_dbg(plane, "put state\n");
> > +}
> > +
> > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > +				      struct drm_crtc *crtc)
> > +{
> > +	int index = drm_crtc_index(crtc);
> > +
> > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > +	state->crtcs[index].ptr = NULL;
> > +	state->crtcs[index].state = NULL;
> > +	state->crtcs[index].old_state = NULL;
> > +	state->crtcs[index].new_state = NULL;
> > +
> > +	drm_modeset_unlock(&crtc->mutex);
> > +
> > +	dpu_crtc_dbg(crtc, "put state\n");
> > +}
> > +
> > +static void
> > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> *crtc_state)
> > +{
> > +	struct drm_atomic_state *state = crtc_state->state;
> > +	struct drm_crtc *crtc = crtc_state->crtc;
> > +	struct drm_plane *plane;
> > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > +
> > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> plane);
> > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> plane);
> > +
> > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > +
> > +		/* Should be enough to check the below HW plane resources.
> */
> > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > +		    old_dpstate->source != new_dpstate->source ||
> > +		    old_dpstate->blend != new_dpstate->blend)
> > +			return;
> > +	}
> > +
> > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > +		dpu_atomic_put_plane_state(state, plane);
> > +
> > +	dpu_atomic_put_crtc_state(state, crtc);
> > +}
> 
> That's super suspicious too. Are you really going around and dropping
> and destroying plane and crtc states in a global state?

Yes.

> 
> At the very least, you need to describe what this addresses and why you
> think it's a good solution.

This is the solution to assign HW resources of a plane group to the two CRTCs
in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
comments which hint this:

"""
/*
 * fetchunit/scaler/layerblend resources of a plane group are
 * shared by the two CRTCs in a CRTC group
 */
"""

I can add a DPU display controller block diagram in dpu_kms.c to tell the HW
architecture and some SW architecture to clarify this more.

Regards,
Liu Ying

> 
> I kind of skimmed over the last part of the driver, but we should really
> address these first comments. There's a larger discussion on the fact
> that this driver does much more that it should and needs to (especially in
> atomic_check, but not only), and this applies to the rest of patch.
> 
> Maxime
mripard@kernel.org Sept. 5, 2023, 8:37 a.m. UTC | #3
On Tue, Sep 05, 2023 at 03:32:47AM +0000, Ying Liu wrote:
> > On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> > > +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> > > +		unsigned int id, enum dpu_unit_type type,
> > > +		unsigned long pec_base, unsigned long base)
> > > +{
> > > +	struct dpu_constframe *cf;
> > > +
> > > +	cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> > > +	if (!cf)
> > > +		return -ENOMEM;
> > > +
> > > +	dpu->cf_priv[index] = cf;
> > 
> > You can't store structures related to KMS in a device managed structure.
> > The DRM KMS device will stick around (and be accessible from userspace)
> > after the device has been removed until the last application closed its
> > file descriptor to the device.
> 
> The DRM device is registered after component_bind_all() is called in
> dpu_drm_bind().  The CRTC components' platform devices are created
> in the dpu_core_probe() where the device managed resources are
> created.   So, it looks those resources are safe because the DRM device
> will be unregistered before those resources are freed.

Not, it's not, because the KMS device isn't freed when devices will be
unbound/removed, but when the last application closes its fd to it, and
so you'll get dangling pointers.

The general rule to get it right is to use drmm for anything but device
resources (like clocks, regulators, memory mapping, etc.). You can
deviate from the rule, of course, but you'll need a long and clear
explanation on why it doesn't work, and why your new approach works.
Your current approach doesn't though.

> > This can be checked by enabling KASAN and manually unbinding the driver
> > through sysfs.
> 
> I enabled KASAN and manually unbound the dpu-core driver with command:
> 
> echo 56180000.dpu > /sys/bus/platform/drivers/dpu-core/56180000.dpu/driver/unbind 
> 
> KASAN didin't report memory issue regarding those device managed
> resources.  However, it did report another issue in dpu_drm_unbind(),
> where drm_device should be got from drv_data->drm_dev instead of
> dev_get_drvdata(dev).  I'll fix that in next version.
> 
> BTW, the dpu-core driver was successfully bound again after unbinding with
> command:
> 
> echo  56180000.dpu > /sys/bus/platform/drivers/dpu-core/bind

Guess you're lucky. That doesn't make it safe or right.

> > > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > +	if (!cf->pec_base)
> > > +		return -ENOMEM;
> > > +
> > > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > +	if (!cf->base)
> > > +		return -ENOMEM;
> > 
> > For the same reason, you need to protect any access to a device managed
> > resource (so clocks, registers, regulators, etc.) by a call to
> > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug instead
> > of drm_dev_unregister.
> 
> That's a good point. I've tried to do that, but it turns out that the
> display controller cannot be enabled again after binding the dpu-core
> driver manually again. It seems that the display controller requires a
> proper disablement procedure, but the "driver instance overview " kdoc
> mentions the shortcoming of no proper disablement if drm_dev_unplug()
> is used:
> 
> """
> * Drivers that want to support device unplugging (USB, DT overlay unload) should
>  * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must protect
>  * regions that is accessing device resources to prevent use after they're
>  * released. This is done using drm_dev_enter() and drm_dev_exit(). There is one
>  * shortcoming however, drm_dev_unplug() marks the drm_device as unplugged before
>  * drm_atomic_helper_shutdown() is called. This means that if the disable code
>  * paths are protected, they will not run on regular driver module unload,
>  * possibly leaving the hardware enabled.
> """
>
> A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> reset line provided by the embodying system.

Generally speaking, you shouldn't rely on the device being in any
particuliar state before your driver loads. So a reset at probe/bind
time is a good idea.

> Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> problem, because it won't be reset or properly disabled if the 1st DPU
> instance is unbound.

Why it wouldn't be reset?

> Although the two DPU instances could be wrapped by two DRM devices, I
> tend not to do that because downstream bridges in future SoCs might be
> able to mux to different DPU instances at runtime.
>
> Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> aside first?

I'd rather have that figured out prior to merging.
> >
> > > +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = pm_runtime_put(dpu_crtc->dev->parent);
> > > +	if (ret < 0)
> > > +		dpu_crtc_err(&dpu_crtc->base,
> > > +			     "failed to put parent device RPM: %d\n", ret);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> > > +{
> > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > +	struct drm_display_mode *adj = &crtc->state->adjusted_mode;
> > > +	enum dpu_link_id cf_link;
> > > +
> > > +	dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n",
> > DRM_MODE_ARG(adj));
> > > +
> > > +	/* request power-on when we start to set mode for CRTC */
> > > +	dpu_crtc_pm_runtime_get_sync(dpu_crtc);
> > 
> > From the drm_crtc_helper_funcs documentation:
> > 
> > """
> > 	 * Note that the display pipe is completely off when this function is
> > 	 * called. Atomic drivers which need hardware to be running before
> > they
> > 	 * program the new display mode (e.g. because they implement
> > runtime PM)
> > 	 * should not use this hook. This is because the helper library calls
> > 	 * this hook only once per mode change and not every time the display
> > 	 * pipeline is suspended using either DPMS or the new "ACTIVE"
> > property.
> > 	 * Which means register values set in this callback might get reset
> > when
> > 	 * the CRTC is suspended, but not restored.  Such drivers should
> > instead
> > 	 * move all their CRTC setup into the @atomic_enable callback.
> > """
> 
> I can use drm_atomic_helper_commit_tail() but not
> drm_atomic_helper_commit_tail_rpm() because the planes need to be
> updated prior to modeset_enables(where trigger shadow registers in
> plane's HW resources to take effect).   Using the former one means that
> RPM needs to be get/put in drm_atomic_helper_commit_planes(), which
> doesn't seem good because in some cases the power of the display controller
> might be lost after RPM put and I'm not sure if the registers set there will
> be lost or not.   So, to avoid the issue the documentation mentions,
> crtc_state->mode_changed is forced to be true in dpu_crtc_atomic_check()
> if the CRTC is changed to active.  Is this acceptable?

No, just move the crtc setup to atomic_enable like the doc suggests.

> > > +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> > > +				   struct drm_atomic_state *state)
> > > +{
> > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > +	unsigned long flags;
> > > +
> > > +	drm_crtc_vblank_on(crtc);
> > > +
> > > +	enable_irq(dpu_crtc->dec_shdld_irq);
> > > +	enable_irq(dpu_crtc->ed_cont_shdld_irq);
> > > +	enable_irq(dpu_crtc->ed_safe_shdld_irq);
> > > +
> > > +	dpu_fg_enable_clock(dpu_crtc->fg);
> > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> > > +	if (crtc->state->gamma_lut)
> > > +		dpu_crtc_set_gammacor(dpu_crtc);
> > > +	else
> > > +		dpu_crtc_disable_gammacor(dpu_crtc);
> > > +	dpu_fg_shdtokgen(dpu_crtc->fg);
> > > +
> > > +	/* don't relinquish CPU until TCON is set to operation mode */
> > > +	local_irq_save(flags);
> > > +	preempt_disable();
> > > +	dpu_fg_enable(dpu_crtc->fg);
> > 
> > That's super fishy. You shouldn't need that, at all. What is going on
> > there?
> 
> This aims to fully workaround the below errata recently released by
> NXP.
> 
> ERR010910: DC: Display Subsystem First Frame Programming Restriction
> Link: https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf
> 
> In short, to make sure the display controller can be enabled properly with
> prefetch engine(DPRC + PRG), the TCON must be switch from bypass mode
> to operation mode after FrameGen(FG) generates the first frame.
> 
> Timing is critical here, so local irq and preemption are disabled during the
> switch procedure.
> 
> BTW, the driver always use prefetch engines for KMS, although they can
> be bypassed.

Ok. So I think it would help your driver getting merged to split that
workaround into a separate patch. It's hard to tell what are the
implications of that workaround on your driver when it's lost in the
middle of, well, the driver :)

I guess it would be much easier for everyone if you submitted that
driver without the errata handling, or with prefetch bypassed, at first.
And then you can submit / rework the hard parts.

> > 
> > > +
> > > +	/*
> > > +	 * TKT320590:
> > 
> > Those are NXP internal references as far as as I can tell. They
> > shouldn't be here.
> 
> Ok, will change it to be ERR010910.

A link to the errata description would be a good idea too.

> > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state *state,
> > > +				       struct drm_plane *plane)
> > > +{
> > > +	int index = drm_plane_index(plane);
> > > +
> > > +	plane->funcs->atomic_destroy_state(plane, state->planes[index].state);
> > > +	state->planes[index].ptr = NULL;
> > > +	state->planes[index].state = NULL;
> > > +	state->planes[index].old_state = NULL;
> > > +	state->planes[index].new_state = NULL;
> > > +
> > > +	drm_modeset_unlock(&plane->mutex);
> > > +
> > > +	dpu_plane_dbg(plane, "put state\n");
> > > +}
> > > +
> > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > +				      struct drm_crtc *crtc)
> > > +{
> > > +	int index = drm_crtc_index(crtc);
> > > +
> > > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > +	state->crtcs[index].ptr = NULL;
> > > +	state->crtcs[index].state = NULL;
> > > +	state->crtcs[index].old_state = NULL;
> > > +	state->crtcs[index].new_state = NULL;
> > > +
> > > +	drm_modeset_unlock(&crtc->mutex);
> > > +
> > > +	dpu_crtc_dbg(crtc, "put state\n");
> > > +}
> > > +
> > > +static void
> > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > *crtc_state)
> > > +{
> > > +	struct drm_atomic_state *state = crtc_state->state;
> > > +	struct drm_crtc *crtc = crtc_state->crtc;
> > > +	struct drm_plane *plane;
> > > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > +
> > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> > plane);
> > > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> > plane);
> > > +
> > > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > > +
> > > +		/* Should be enough to check the below HW plane resources.
> > */
> > > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > +		    old_dpstate->source != new_dpstate->source ||
> > > +		    old_dpstate->blend != new_dpstate->blend)
> > > +			return;
> > > +	}
> > > +
> > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > +		dpu_atomic_put_plane_state(state, plane);
> > > +
> > > +	dpu_atomic_put_crtc_state(state, crtc);
> > > +}
> > 
> > That's super suspicious too. Are you really going around and dropping
> > and destroying plane and crtc states in a global state?
> 
> Yes.

That's really not a good idea. Adding states are fine, dropping ones
aren't.

> > 
> > At the very least, you need to describe what this addresses and why you
> > think it's a good solution.
> 
> This is the solution to assign HW resources of a plane group to the two CRTCs
> in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
> comments which hint this:
> 
> """
> /*
>  * fetchunit/scaler/layerblend resources of a plane group are
>  * shared by the two CRTCs in a CRTC group
>  */
> """
> 
> I can add a DPU display controller block diagram in dpu_kms.c to tell the HW
> architecture and some SW architecture to clarify this more.

It's not so much the diagram that I'm looking for, but an accurate
description of the problem. What resource is there, why and how does it
need to be shared, so we can understand what you are doing there, and
possibly suggest other things.

Maxime
Liu Ying Sept. 26, 2023, 3:55 a.m. UTC | #4
On Tuesday, September 5, 2023 4:37 PM, Maxime Ripard <mripard@kernel.org> wrote:
> On Tue, Sep 05, 2023 at 03:32:47AM +0000, Ying Liu wrote:
> > > On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> > > > +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> > > > +		unsigned int id, enum dpu_unit_type type,
> > > > +		unsigned long pec_base, unsigned long base)
> > > > +{
> > > > +	struct dpu_constframe *cf;
> > > > +
> > > > +	cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> > > > +	if (!cf)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	dpu->cf_priv[index] = cf;
> > >
> > > You can't store structures related to KMS in a device managed structure.
> > > The DRM KMS device will stick around (and be accessible from userspace)
> > > after the device has been removed until the last application closed its
> > > file descriptor to the device.
> >
> > The DRM device is registered after component_bind_all() is called in
> > dpu_drm_bind().  The CRTC components' platform devices are created
> > in the dpu_core_probe() where the device managed resources are
> > created.   So, it looks those resources are safe because the DRM device
> > will be unregistered before those resources are freed.
> 
> Not, it's not, because the KMS device isn't freed when devices will be
> unbound/removed, but when the last application closes its fd to it, and
> so you'll get dangling pointers.
> 
> The general rule to get it right is to use drmm for anything but device
> resources (like clocks, regulators, memory mapping, etc.). You can
> deviate from the rule, of course, but you'll need a long and clear
> explanation on why it doesn't work, and why your new approach works.
> Your current approach doesn't though.

I'll try to follow the rule in next version. Will call drmm_kzalloc() instead of
devm_kzalloc() here.

> 
> > > This can be checked by enabling KASAN and manually unbinding the
> driver
> > > through sysfs.
> >
> > I enabled KASAN and manually unbound the dpu-core driver with
> command:
> >
> > echo 56180000.dpu > /sys/bus/platform/drivers/dpu-
> core/56180000.dpu/driver/unbind
> >
> > KASAN didin't report memory issue regarding those device managed
> > resources.  However, it did report another issue in dpu_drm_unbind(),
> > where drm_device should be got from drv_data->drm_dev instead of
> > dev_get_drvdata(dev).  I'll fix that in next version.
> >
> > BTW, the dpu-core driver was successfully bound again after unbinding
> with
> > command:
> >
> > echo  56180000.dpu > /sys/bus/platform/drivers/dpu-core/bind
> 
> Guess you're lucky. That doesn't make it safe or right.
> 
> > > > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > > +	if (!cf->pec_base)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > > +	if (!cf->base)
> > > > +		return -ENOMEM;
> > >
> > > For the same reason, you need to protect any access to a device managed
> > > resource (so clocks, registers, regulators, etc.) by a call to
> > > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug
> instead
> > > of drm_dev_unregister.
> >
> > That's a good point. I've tried to do that, but it turns out that the
> > display controller cannot be enabled again after binding the dpu-core
> > driver manually again. It seems that the display controller requires a
> > proper disablement procedure, but the "driver instance overview " kdoc
> > mentions the shortcoming of no proper disablement if drm_dev_unplug()
> > is used:
> >
> > """
> > * Drivers that want to support device unplugging (USB, DT overlay unload)
> should
> >  * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must
> protect
> >  * regions that is accessing device resources to prevent use after they're
> >  * released. This is done using drm_dev_enter() and drm_dev_exit(). There
> is one
> >  * shortcoming however, drm_dev_unplug() marks the drm_device as
> unplugged before
> >  * drm_atomic_helper_shutdown() is called. This means that if the disable
> code
> >  * paths are protected, they will not run on regular driver module unload,
> >  * possibly leaving the hardware enabled.
> > """
> >
> > A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> > reset line provided by the embodying system.
> 
> Generally speaking, you shouldn't rely on the device being in any
> particuliar state before your driver loads. So a reset at probe/bind
> time is a good idea.

Yes. I'll drop the platform device creations for CRTCs from dpu-core.c 
and drop the aggregation of CRTC components from different DPU
instances into one DRM device.  This way, there will be only two CRTCs
of one DPU in one DRM device. Then, the driver will be simpler and
users cannot unbind the driver of one of the two DPU instances, which
means drm_dev_unplug() won't be needed any more(?) and the reset
issue will be gone.  The controller will be shutdown properly through
drm_atomic_helper_shutdown() when the driver module is removed.

> 
> > Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> > problem, because it won't be reset or properly disabled if the 1st DPU
> > instance is unbound.
> 
> Why it wouldn't be reset?

Because dpu_core_remove() is not called for the 2nd DPU instance.
Anyway, with the above new design, this doesn't seem to be a problem
any more.

> 
> > Although the two DPU instances could be wrapped by two DRM devices, I
> > tend not to do that because downstream bridges in future SoCs might be
> > able to mux to different DPU instances at runtime.
> >
> > Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> > aside first?
> 
> I'd rather have that figured out prior to merging.

I'm assuming that drm_dev_enter/exit/unplug won't be needed with the above
new design - one DPU instance wrapped by one DRM device.

> > >
> > > > +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	ret = pm_runtime_put(dpu_crtc->dev->parent);
> > > > +	if (ret < 0)
> > > > +		dpu_crtc_err(&dpu_crtc->base,
> > > > +			     "failed to put parent device RPM: %d\n", ret);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> > > > +{
> > > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > > +	struct drm_display_mode *adj = &crtc->state->adjusted_mode;
> > > > +	enum dpu_link_id cf_link;
> > > > +
> > > > +	dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n",
> > > DRM_MODE_ARG(adj));
> > > > +
> > > > +	/* request power-on when we start to set mode for CRTC */
> > > > +	dpu_crtc_pm_runtime_get_sync(dpu_crtc);
> > >
> > > From the drm_crtc_helper_funcs documentation:
> > >
> > > """
> > > 	 * Note that the display pipe is completely off when this function is
> > > 	 * called. Atomic drivers which need hardware to be running before
> > > they
> > > 	 * program the new display mode (e.g. because they implement
> > > runtime PM)
> > > 	 * should not use this hook. This is because the helper library calls
> > > 	 * this hook only once per mode change and not every time the
> display
> > > 	 * pipeline is suspended using either DPMS or the new "ACTIVE"
> > > property.
> > > 	 * Which means register values set in this callback might get reset
> > > when
> > > 	 * the CRTC is suspended, but not restored.  Such drivers should
> > > instead
> > > 	 * move all their CRTC setup into the @atomic_enable callback.
> > > """
> >
> > I can use drm_atomic_helper_commit_tail() but not
> > drm_atomic_helper_commit_tail_rpm() because the planes need to be
> > updated prior to modeset_enables(where trigger shadow registers in
> > plane's HW resources to take effect).   Using the former one means that
> > RPM needs to be get/put in drm_atomic_helper_commit_planes(), which
> > doesn't seem good because in some cases the power of the display
> controller
> > might be lost after RPM put and I'm not sure if the registers set there will
> > be lost or not.   So, to avoid the issue the documentation mentions,
> > crtc_state->mode_changed is forced to be true in dpu_crtc_atomic_check()
> > if the CRTC is changed to active.  Is this acceptable?
> 
> No, just move the crtc setup to atomic_enable like the doc suggests.

Ok, will do.

> 
> > > > +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> > > > +				   struct drm_atomic_state *state)
> > > > +{
> > > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > > +	unsigned long flags;
> > > > +
> > > > +	drm_crtc_vblank_on(crtc);
> > > > +
> > > > +	enable_irq(dpu_crtc->dec_shdld_irq);
> > > > +	enable_irq(dpu_crtc->ed_cont_shdld_irq);
> > > > +	enable_irq(dpu_crtc->ed_safe_shdld_irq);
> > > > +
> > > > +	dpu_fg_enable_clock(dpu_crtc->fg);
> > > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> > > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> > > > +	if (crtc->state->gamma_lut)
> > > > +		dpu_crtc_set_gammacor(dpu_crtc);
> > > > +	else
> > > > +		dpu_crtc_disable_gammacor(dpu_crtc);
> > > > +	dpu_fg_shdtokgen(dpu_crtc->fg);
> > > > +
> > > > +	/* don't relinquish CPU until TCON is set to operation mode */
> > > > +	local_irq_save(flags);
> > > > +	preempt_disable();
> > > > +	dpu_fg_enable(dpu_crtc->fg);
> > >
> > > That's super fishy. You shouldn't need that, at all. What is going on
> > > there?
> >
> > This aims to fully workaround the below errata recently released by
> > NXP.
> >
> > ERR010910: DC: Display Subsystem First Frame Programming Restriction
> > Link:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> nxp.com%2Fdocs%2Fen%2Ferrata%2FIMX8_1N94W.pdf&data=05%7C01%7C
> victor.liu%40nxp.com%7C1250637791414caf559b08dbadeb597e%7C686ea1d
> 3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C638294998555649537%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
> aWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mKAtsgN30jqfUiczAFe3
> dXPUZOXhM%2Fnlq9%2F%2B%2F8%2Bjj24%3D&reserved=0
> >
> > In short, to make sure the display controller can be enabled properly with
> > prefetch engine(DPRC + PRG), the TCON must be switch from bypass mode
> > to operation mode after FrameGen(FG) generates the first frame.
> >
> > Timing is critical here, so local irq and preemption are disabled during the
> > switch procedure.
> >
> > BTW, the driver always use prefetch engines for KMS, although they can
> > be bypassed.
> 
> Ok. So I think it would help your driver getting merged to split that
> workaround into a separate patch. It's hard to tell what are the
> implications of that workaround on your driver when it's lost in the
> middle of, well, the driver :)
> 
> I guess it would be much easier for everyone if you submitted that
> driver without the errata handling, or with prefetch bypassed, at first.
> And then you can submit / rework the hard parts.

Ok, I'll drop the prefetch engine support and the errata handling.
Also, I'll drop add-on features, like gamma, csc and alpha, to achieve
kind of minimal feature set.

> 
> > >
> > > > +
> > > > +	/*
> > > > +	 * TKT320590:
> > >
> > > Those are NXP internal references as far as as I can tell. They
> > > shouldn't be here.
> >
> > Ok, will change it to be ERR010910.
> 
> A link to the errata description would be a good idea too.

Ok, will add a link when errata handling is supported.

> 
> > > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state
> *state,
> > > > +				       struct drm_plane *plane)
> > > > +{
> > > > +	int index = drm_plane_index(plane);
> > > > +
> > > > +	plane->funcs->atomic_destroy_state(plane, state-
> >planes[index].state);
> > > > +	state->planes[index].ptr = NULL;
> > > > +	state->planes[index].state = NULL;
> > > > +	state->planes[index].old_state = NULL;
> > > > +	state->planes[index].new_state = NULL;
> > > > +
> > > > +	drm_modeset_unlock(&plane->mutex);
> > > > +
> > > > +	dpu_plane_dbg(plane, "put state\n");
> > > > +}
> > > > +
> > > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > > +				      struct drm_crtc *crtc)
> > > > +{
> > > > +	int index = drm_crtc_index(crtc);
> > > > +
> > > > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > > +	state->crtcs[index].ptr = NULL;
> > > > +	state->crtcs[index].state = NULL;
> > > > +	state->crtcs[index].old_state = NULL;
> > > > +	state->crtcs[index].new_state = NULL;
> > > > +
> > > > +	drm_modeset_unlock(&crtc->mutex);
> > > > +
> > > > +	dpu_crtc_dbg(crtc, "put state\n");
> > > > +}
> > > > +
> > > > +static void
> > > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > > *crtc_state)
> > > > +{
> > > > +	struct drm_atomic_state *state = crtc_state->state;
> > > > +	struct drm_crtc *crtc = crtc_state->crtc;
> > > > +	struct drm_plane *plane;
> > > > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > > +
> > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> > > plane);
> > > > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> > > plane);
> > > > +
> > > > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > > > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > > > +
> > > > +		/* Should be enough to check the below HW plane resources.
> > > */
> > > > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > > +		    old_dpstate->source != new_dpstate->source ||
> > > > +		    old_dpstate->blend != new_dpstate->blend)
> > > > +			return;
> > > > +	}
> > > > +
> > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > > +		dpu_atomic_put_plane_state(state, plane);
> > > > +
> > > > +	dpu_atomic_put_crtc_state(state, crtc);
> > > > +}
> > >
> > > That's super suspicious too. Are you really going around and dropping
> > > and destroying plane and crtc states in a global state?
> >
> > Yes.
> 
> That's really not a good idea. Adding states are fine, dropping ones
> aren't.

The idea is to add all active planes on two CRTCs into one commit and
try to allocate fetchunits for those planes as a whole to best meet user's
requirements, because ...

> 
> > >
> > > At the very least, you need to describe what this addresses and why you
> > > think it's a good solution.
> >
> > This is the solution to assign HW resources of a plane group to the two
> CRTCs
> > in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
> > comments which hint this:
> >
> > """
> > /*
> >  * fetchunit/scaler/layerblend resources of a plane group are
> >  * shared by the two CRTCs in a CRTC group
> >  */
> > """
> >
> > I can add a DPU display controller block diagram in dpu_kms.c to tell the
> HW
> > architecture and some SW architecture to clarify this more.
> 
> It's not so much the diagram that I'm looking for, but an accurate
> description of the problem. What resource is there, why and how does it
> need to be shared, so we can understand what you are doing there, and
> possibly suggest other things.

... the problem is that fetchunits(fetchdecode/fetchlayer/fetchwarp) have
different capabilities/feature sets and they can be built upon either of the
two CRTCs through layerblends.  The 4 fetchunits for one DPU display
controller are fetchdecode0, fetchdecode1, fetchlayer0 and fetchwarp2.
Correspondingly, there are 4 layerblends(0 to 3).  Layerblends blend two
input sources(primary input and secondary input). The primary input can
be, say, constframe or another layerblend's output.  The secondary input
can be, say, one of those fetchunits.  For example, a real use case could
be:
- CRTC0:
  Primary plane:
    Layerblend0:
      Primary:  constframe4
      Secondary: fetchlayer0
  Overlay plane:
    Layerblend1:
      Primary: Layerblend0 output
      Secondary: fetchdecode1 + vscaler5 + hscaler4

- CRTC1:
  Primary plane:
    Layerblend2:
      Primary:  constframe5
      Secondary: fetchwarp2 + fetcheco2
  Overlay plane:
    Layerblend3:
      Primary: Layerblend2 output
      Secondary: fetchdecode0 + fetcheco0 + vscaler4

Some fetchunit features:
- fetchdecode: Horizontoal/vertical downscaling through video
   processing blocks and YUV fb with two planars(use fetcheco).
- fetchwarp: YUV fb with two planars(use fetcheco), up to
  8 sub-layers and warp.
- fetchlayer: up to 8 sub-layers.

All fetchunits support RGB fb.


What I do is:
- Add all active planes(including primary and overlays) on two CRTCs
  into one commit even if the user only wants to update the plane(s)
  on one CRTC.
- Those manually added planes/CRTCs are prone to put, because
   the relevant fetchunits and layerblends are likely/eventually
   unchanged after the allocation.
- Traverse through fetchunit list to try to find an available one to
   meet a plane's requirements(say CSC? Scalers?). Those prone-to-put
   planes are handled first to use the current fetchunits if modeset
   is not allowed, otherwise planes with lower z-positions go first.
- Do not allow fetchunit hot-migration between two CRTCs.
- Assign layerblend with the lowest possible index to planes.  Planes
   with lower z-positions go first.
- Put the prone-to-put planes/CRTC if possible to gain some
  performance .

Regards,
Liu Ying

> 
> Maxime
mripard@kernel.org Oct. 3, 2023, 11:52 a.m. UTC | #5
On Tue, Sep 26, 2023 at 03:55:35AM +0000, Ying Liu wrote:
> > > > > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > > > +	if (!cf->pec_base)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > > > +	if (!cf->base)
> > > > > +		return -ENOMEM;
> > > >
> > > > For the same reason, you need to protect any access to a device managed
> > > > resource (so clocks, registers, regulators, etc.) by a call to
> > > > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug
> > instead
> > > > of drm_dev_unregister.
> > >
> > > That's a good point. I've tried to do that, but it turns out that the
> > > display controller cannot be enabled again after binding the dpu-core
> > > driver manually again. It seems that the display controller requires a
> > > proper disablement procedure, but the "driver instance overview " kdoc
> > > mentions the shortcoming of no proper disablement if drm_dev_unplug()
> > > is used:
> > >
> > > """
> > > * Drivers that want to support device unplugging (USB, DT overlay unload)
> > should
> > >  * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must
> > protect
> > >  * regions that is accessing device resources to prevent use after they're
> > >  * released. This is done using drm_dev_enter() and drm_dev_exit(). There
> > is one
> > >  * shortcoming however, drm_dev_unplug() marks the drm_device as
> > unplugged before
> > >  * drm_atomic_helper_shutdown() is called. This means that if the disable
> > code
> > >  * paths are protected, they will not run on regular driver module unload,
> > >  * possibly leaving the hardware enabled.
> > > """
> > >
> > > A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> > > reset line provided by the embodying system.
> > 
> > Generally speaking, you shouldn't rely on the device being in any
> > particuliar state before your driver loads. So a reset at probe/bind
> > time is a good idea.
> 
> Yes. I'll drop the platform device creations for CRTCs from dpu-core.c 
> and drop the aggregation of CRTC components from different DPU
> instances into one DRM device.  This way, there will be only two CRTCs
> of one DPU in one DRM device.

Ok.

> Then, the driver will be simpler and users cannot unbind the driver of
> one of the two DPU instances,

Uh? They would still be able to do that.

> which means drm_dev_unplug() won't be needed any more(?)

So this would still be needed

> and the reset issue will be gone. The controller will be shutdown
> properly through drm_atomic_helper_shutdown() when the driver module
> is removed.

Again, you shouldn't rely on a particular state at boot. For all you
know, you could have been reset by some watchdog or been kexec'd.

> > > Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> > > problem, because it won't be reset or properly disabled if the 1st DPU
> > > instance is unbound.
> > 
> > Why it wouldn't be reset?
> 
> Because dpu_core_remove() is not called for the 2nd DPU instance.
> Anyway, with the above new design, this doesn't seem to be a problem
> any more.

Ok.

> > 
> > > Although the two DPU instances could be wrapped by two DRM devices, I
> > > tend not to do that because downstream bridges in future SoCs might be
> > > able to mux to different DPU instances at runtime.
> > >
> > > Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> > > aside first?
> > 
> > I'd rather have that figured out prior to merging.
> 
> I'm assuming that drm_dev_enter/exit/unplug won't be needed with the above
> new design - one DPU instance wrapped by one DRM device.

I'm not sure why you are making that claim. And again, that's good
practice: it does no harm while preventing unsafe behaviour in the
future.

> > > > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state
> > *state,
> > > > > +				       struct drm_plane *plane)
> > > > > +{
> > > > > +	int index = drm_plane_index(plane);
> > > > > +
> > > > > +	plane->funcs->atomic_destroy_state(plane, state-
> > >planes[index].state);
> > > > > +	state->planes[index].ptr = NULL;
> > > > > +	state->planes[index].state = NULL;
> > > > > +	state->planes[index].old_state = NULL;
> > > > > +	state->planes[index].new_state = NULL;
> > > > > +
> > > > > +	drm_modeset_unlock(&plane->mutex);
> > > > > +
> > > > > +	dpu_plane_dbg(plane, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > > > +				      struct drm_crtc *crtc)
> > > > > +{
> > > > > +	int index = drm_crtc_index(crtc);
> > > > > +
> > > > > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > > > +	state->crtcs[index].ptr = NULL;
> > > > > +	state->crtcs[index].state = NULL;
> > > > > +	state->crtcs[index].old_state = NULL;
> > > > > +	state->crtcs[index].new_state = NULL;
> > > > > +
> > > > > +	drm_modeset_unlock(&crtc->mutex);
> > > > > +
> > > > > +	dpu_crtc_dbg(crtc, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > > > *crtc_state)
> > > > > +{
> > > > > +	struct drm_atomic_state *state = crtc_state->state;
> > > > > +	struct drm_crtc *crtc = crtc_state->crtc;
> > > > > +	struct drm_plane *plane;
> > > > > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > > > +
> > > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > > > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> > > > plane);
> > > > > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> > > > plane);
> > > > > +
> > > > > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > > > > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > > > > +
> > > > > +		/* Should be enough to check the below HW plane resources.
> > > > */
> > > > > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > > > +		    old_dpstate->source != new_dpstate->source ||
> > > > > +		    old_dpstate->blend != new_dpstate->blend)
> > > > > +			return;
> > > > > +	}
> > > > > +
> > > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > > > +		dpu_atomic_put_plane_state(state, plane);
> > > > > +
> > > > > +	dpu_atomic_put_crtc_state(state, crtc);
> > > > > +}
> > > >
> > > > That's super suspicious too. Are you really going around and dropping
> > > > and destroying plane and crtc states in a global state?
> > >
> > > Yes.
> > 
> > That's really not a good idea. Adding states are fine, dropping ones
> > aren't.
> 
> The idea is to add all active planes on two CRTCs into one commit and
> try to allocate fetchunits for those planes as a whole to best meet user's
> requirements, because ...
> 
> > 
> > > >
> > > > At the very least, you need to describe what this addresses and why you
> > > > think it's a good solution.
> > >
> > > This is the solution to assign HW resources of a plane group to the two
> > CRTCs
> > > in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
> > > comments which hint this:
> > >
> > > """
> > > /*
> > >  * fetchunit/scaler/layerblend resources of a plane group are
> > >  * shared by the two CRTCs in a CRTC group
> > >  */
> > > """
> > >
> > > I can add a DPU display controller block diagram in dpu_kms.c to tell the
> > HW
> > > architecture and some SW architecture to clarify this more.
> > 
> > It's not so much the diagram that I'm looking for, but an accurate
> > description of the problem. What resource is there, why and how does it
> > need to be shared, so we can understand what you are doing there, and
> > possibly suggest other things.
> 
> ... the problem is that fetchunits(fetchdecode/fetchlayer/fetchwarp) have
> different capabilities/feature sets and they can be built upon either of the
> two CRTCs through layerblends.  The 4 fetchunits for one DPU display
> controller are fetchdecode0, fetchdecode1, fetchlayer0 and fetchwarp2.
> Correspondingly, there are 4 layerblends(0 to 3).  Layerblends blend two
> input sources(primary input and secondary input). The primary input can
> be, say, constframe or another layerblend's output.  The secondary input
> can be, say, one of those fetchunits.  For example, a real use case could
> be:
> - CRTC0:
>   Primary plane:
>     Layerblend0:
>       Primary:  constframe4
>       Secondary: fetchlayer0
>   Overlay plane:
>     Layerblend1:
>       Primary: Layerblend0 output
>       Secondary: fetchdecode1 + vscaler5 + hscaler4
> 
> - CRTC1:
>   Primary plane:
>     Layerblend2:
>       Primary:  constframe5
>       Secondary: fetchwarp2 + fetcheco2
>   Overlay plane:
>     Layerblend3:
>       Primary: Layerblend2 output
>       Secondary: fetchdecode0 + fetcheco0 + vscaler4
> 
> Some fetchunit features:
> - fetchdecode: Horizontoal/vertical downscaling through video
>    processing blocks and YUV fb with two planars(use fetcheco).
> - fetchwarp: YUV fb with two planars(use fetcheco), up to
>   8 sub-layers and warp.
> - fetchlayer: up to 8 sub-layers.
> 
> All fetchunits support RGB fb.
> 
> 
> What I do is:
> - Add all active planes(including primary and overlays) on two CRTCs
>   into one commit even if the user only wants to update the plane(s)
>   on one CRTC.
> - Those manually added planes/CRTCs are prone to put, because
>    the relevant fetchunits and layerblends are likely/eventually
>    unchanged after the allocation.
> - Traverse through fetchunit list to try to find an available one to
>    meet a plane's requirements(say CSC? Scalers?). Those prone-to-put
>    planes are handled first to use the current fetchunits if modeset
>    is not allowed, otherwise planes with lower z-positions go first.
> - Do not allow fetchunit hot-migration between two CRTCs.
> - Assign layerblend with the lowest possible index to planes.  Planes
>    with lower z-positions go first.
> - Put the prone-to-put planes/CRTC if possible to gain some
>   performance .

So, you shouldn't do it the way you did it so far, but if I got you
right, this seems similar to what we have on vc4 where all planes go
through another device (called HVS) that we maintain a global state for.
That way, any plane change will pull that global state in, and you are
getting a global view of what resources are being used in the system.

See vc4_pv_muxing_atomic_check() for an example.

Maxime