diff mbox series

[v3] drm/rockchip: update cursors asynchronously through atomic.

Message ID 20181119190805.19139-1-helen.koike@collabora.com (mailing list archive)
State New, archived
Headers show
Series [v3] drm/rockchip: update cursors asynchronously through atomic. | expand

Commit Message

Helen Koike Nov. 19, 2018, 7:08 p.m. UTC
From: Enric Balletbo i Serra <enric.balletbo@collabora.com>

Add support to async updates of cursors by using the new atomic
interface for that.

Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
[updated for upstream]
Signed-off-by: Helen Koike <helen.koike@collabora.com>

---
Hello,

This is the third version of the async-plane update suport to the
Rockchip driver.

I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.

Note that before the patch, the following igt tests failed:

        basic-flip-before-cursor-atomic
        basic-flip-before-cursor-legacy
        cursor-vs-flip-atomic
        cursor-vs-flip-legacy
        cursor-vs-flip-toggle
        flip-vs-cursor-atomic
        flip-vs-cursor-busy-crc-atomic
        flip-vs-cursor-busy-crc-legacy
        flip-vs-cursor-crc-atomic
        flip-vs-cursor-crc-legacy
        flip-vs-cursor-legacy

Full log: https://people.collabora.com/~koike/results-4.20/html/

Now with the patch applied the following were fixed:
        basic-flip-before-cursor-atomic
        basic-flip-before-cursor-legacy
        flip-vs-cursor-atomic
        flip-vs-cursor-legacy

Full log: https://people.collabora.com/~koike/results-4.20-async/html/

Tomasz, as you mentined in v2 about waiting the hardware before updating
the framebuffer, now I call the loop you pointed out in the async path,
was that what you had in mind? Or do you think I would make sense to
call the vop_crtc_atomic_flush() instead of just exposing that loop?

Thanks
Helen

Changes in v3:
- Rebased on top of drm-misc
- Fix missing include in rockchip_drm_vop.c
- New function vop_crtc_atomic_commit_flush

Changes in v2:
- v2: https://patchwork.freedesktop.org/patch/254180/
- Change the framebuffer as well to cover jumpy cursor when hovering
  text boxes or hyperlink. (Tomasz)
- Use the PSR inhibit mechanism when accessing VOP hardware instead of
  PSR flushing (Tomasz)

Changes in v1:
- Rebased on top of drm-misc
- In async_check call drm_atomic_helper_check_plane_state to check that
  the desired plane is valid and update various bits of derived state
  (clipped coordinates etc.)
- In async_check allow to configure new scaling in the fast path.
- In async_update force to flush all registered PSR encoders.
- In async_update call atomic_update directly.
- In async_update call vop_cfg_done needed to set the vop registers and take effect.

 drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
 drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
 drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
 4 files changed, 131 insertions(+), 53 deletions(-)

Comments

Tomasz Figa Nov. 20, 2018, 6:48 a.m. UTC | #1
Hi Helen,

On Tue, Nov 20, 2018 at 4:08 AM Helen Koike <helen.koike@collabora.com> wrote:
>
> From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>
> Add support to async updates of cursors by using the new atomic
> interface for that.
>
> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> [updated for upstream]
> Signed-off-by: Helen Koike <helen.koike@collabora.com>
>
> ---
> Hello,
>
> This is the third version of the async-plane update suport to the
> Rockchip driver.
>

Thanks for a quick respin. Please see my comments inline. (I'll try to
be better at responding from now on...)

> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.
>
> Note that before the patch, the following igt tests failed:
>
>         basic-flip-before-cursor-atomic
>         basic-flip-before-cursor-legacy
>         cursor-vs-flip-atomic
>         cursor-vs-flip-legacy
>         cursor-vs-flip-toggle
>         flip-vs-cursor-atomic
>         flip-vs-cursor-busy-crc-atomic
>         flip-vs-cursor-busy-crc-legacy
>         flip-vs-cursor-crc-atomic
>         flip-vs-cursor-crc-legacy
>         flip-vs-cursor-legacy
>
> Full log: https://people.collabora.com/~koike/results-4.20/html/
>
> Now with the patch applied the following were fixed:
>         basic-flip-before-cursor-atomic
>         basic-flip-before-cursor-legacy
>         flip-vs-cursor-atomic
>         flip-vs-cursor-legacy
>
> Full log: https://people.collabora.com/~koike/results-4.20-async/html/

Could you also test modetest, with the -C switch to test the legacy
cursor API? I remember it triggering crashes due to synchronization
issues easily.

>
> Tomasz, as you mentined in v2 about waiting the hardware before updating
> the framebuffer, now I call the loop you pointed out in the async path,
> was that what you had in mind? Or do you think I would make sense to
> call the vop_crtc_atomic_flush() instead of just exposing that loop?
>
> Thanks
> Helen
>
> Changes in v3:
> - Rebased on top of drm-misc
> - Fix missing include in rockchip_drm_vop.c
> - New function vop_crtc_atomic_commit_flush
>
> Changes in v2:
> - v2: https://patchwork.freedesktop.org/patch/254180/
> - Change the framebuffer as well to cover jumpy cursor when hovering
>   text boxes or hyperlink. (Tomasz)
> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
>   PSR flushing (Tomasz)
>
> Changes in v1:
> - Rebased on top of drm-misc
> - In async_check call drm_atomic_helper_check_plane_state to check that
>   the desired plane is valid and update various bits of derived state
>   (clipped coordinates etc.)
> - In async_check allow to configure new scaling in the fast path.
> - In async_update force to flush all registered PSR encoders.
> - In async_update call atomic_update directly.
> - In async_update call vop_cfg_done needed to set the vop registers and take effect.
>
>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
>  4 files changed, 131 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> index ea18cb2a76c0..08bec50d9c5d 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
>         return ERR_PTR(ret);
>  }
>
> -static void
> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> -{
> -       struct drm_crtc *crtc;
> -       struct drm_crtc_state *crtc_state;
> -       struct drm_encoder *encoder;
> -       u32 encoder_mask = 0;
> -       int i;
> -
> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -               encoder_mask |= crtc_state->encoder_mask;
> -               encoder_mask |= crtc->state->encoder_mask;
> -       }
> -
> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -               rockchip_drm_psr_inhibit_get(encoder);
> -}
> -
> -static void
> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> -{
> -       struct drm_crtc *crtc;
> -       struct drm_crtc_state *crtc_state;
> -       struct drm_encoder *encoder;
> -       u32 encoder_mask = 0;
> -       int i;
> -
> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -               encoder_mask |= crtc_state->encoder_mask;
> -               encoder_mask |= crtc->state->encoder_mask;
> -       }
> -
> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -               rockchip_drm_psr_inhibit_put(encoder);
> -}
> -
>  static void
>  rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
>  {
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> index 01ff3c858875..22a70ab6e214 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> @@ -13,6 +13,7 @@
>   */
>
>  #include <drm/drmP.h>
> +#include <drm/drm_atomic.h>
>  #include <drm/drm_crtc_helper.h>
>
>  #include "rockchip_drm_drv.h"
> @@ -109,6 +110,42 @@ int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
>  }
>  EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
>
> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> +{
> +       struct drm_crtc *crtc;
> +       struct drm_crtc_state *crtc_state;
> +       struct drm_encoder *encoder;
> +       u32 encoder_mask = 0;
> +       int i;
> +
> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> +               encoder_mask |= crtc_state->encoder_mask;
> +               encoder_mask |= crtc->state->encoder_mask;
> +       }
> +
> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> +               rockchip_drm_psr_inhibit_get(encoder);
> +}
> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
> +
> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> +{
> +       struct drm_crtc *crtc;
> +       struct drm_crtc_state *crtc_state;
> +       struct drm_encoder *encoder;
> +       u32 encoder_mask = 0;
> +       int i;
> +
> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> +               encoder_mask |= crtc_state->encoder_mask;
> +               encoder_mask |= crtc->state->encoder_mask;
> +       }
> +
> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> +               rockchip_drm_psr_inhibit_put(encoder);
> +}
> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
> +
>  /**
>   * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
>   * @encoder: encoder to obtain the PSR encoder
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> index 860c62494496..25350ba3237b 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> @@ -20,6 +20,9 @@ void rockchip_drm_psr_flush_all(struct drm_device *dev);
>  int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
>  int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
>
> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
> +
>  int rockchip_drm_psr_register(struct drm_encoder *encoder,
>                         int (*psr_set)(struct drm_encoder *, bool enable));
>  void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> index fb70fb486fbf..176d6e8207ed 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> @@ -15,6 +15,7 @@
>  #include <drm/drm.h>
>  #include <drm/drmP.h>
>  #include <drm/drm_atomic.h>
> +#include <drm/drm_atomic_uapi.h>
>  #include <drm/drm_crtc.h>
>  #include <drm/drm_crtc_helper.h>
>  #include <drm/drm_flip_work.h>
> @@ -819,10 +820,99 @@ static void vop_plane_atomic_update(struct drm_plane *plane,
>         spin_unlock(&vop->reg_lock);
>  }
>
> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
> +                                       struct drm_plane_state *state)
> +{
> +       struct vop_win *vop_win = to_vop_win(plane);
> +       const struct vop_win_data *win = vop_win->data;
> +       int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
> +                                       DRM_PLANE_HELPER_NO_SCALING;
> +       int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
> +                                       DRM_PLANE_HELPER_NO_SCALING;
> +       struct drm_crtc_state *crtc_state;
> +       int ret;
> +
> +       if (plane != state->crtc->cursor)
> +               return -EINVAL;
> +
> +       if (!plane->state)
> +               return -EINVAL;
> +
> +       if (!plane->state->fb)
> +               return -EINVAL;
> +
> +       if (state->state)
> +               crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> +                                                               state->crtc);
> +       else /* Special case for asynchronous cursor updates. */
> +               crtc_state = plane->crtc->state;
> +
> +       ret = drm_atomic_helper_check_plane_state(plane->state,
> +                                                 crtc_state,
> +                                                 min_scale, max_scale,
> +                                                 true, true);
> +       return ret;
> +}
> +
> +static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
> +                                        struct drm_crtc_state *old_crtc_state)
> +{
> +       struct drm_atomic_state *old_state = old_crtc_state->state;
> +       struct drm_plane_state *old_plane_state, *new_plane_state;
> +       struct vop *vop = to_vop(crtc);
> +       struct drm_plane *plane;
> +       int i;
> +
> +       for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
> +                                      new_plane_state, i) {

Hmm, from what I can see, we're not going through the full atomic
commit sequence, with state flip, so I'm not sure where we would get
the new state here from.

> +               if (!old_plane_state->fb)
> +                       continue;
> +
> +               if (old_plane_state->fb == new_plane_state->fb)
> +                       continue;
> +
> +               drm_framebuffer_get(old_plane_state->fb);
> +               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> +               drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
> +               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> +       }
> +}
> +
> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
> +                                         struct drm_plane_state *new_state)
> +{
> +       struct vop *vop = to_vop(plane->state->crtc);
> +
> +       if (vop->crtc.state->state)
> +               vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);

Since we just operate on one plane here, we could just do like this:

if (plane->state->fb && plane->state->fb != new_state->fb) {
               drm_framebuffer_get(plane->state->fb);
               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
               drm_flip_work_queue(&vop->fb_unref_work, plane->state->fb);
               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
}

However, we cannot simply to this here, because it races with the
vblank interrupt. We need to program the hw plane with the new fb
first and trigger the update. This needs all the careful handling that
is done in vop_crtc_atomic_flush() and so my original suggestion to
just call it.

Of course to call it in its current shape, one needs to have a full
atomic state from a commit, after a flip, but we only have the new
plane state here. Perhaps you could duplicate existing state, update
the desired plane state, flip and then call vop_crtc_atomic_flush()?

Best regards,
Tomasz
Helen Koike Nov. 22, 2018, 11:30 p.m. UTC | #2
Hi Tomasz,

On 11/20/18 4:48 AM, Tomasz Figa wrote:
> Hi Helen,
> 
> On Tue, Nov 20, 2018 at 4:08 AM Helen Koike <helen.koike@collabora.com> wrote:
>>
>> From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>>
>> Add support to async updates of cursors by using the new atomic
>> interface for that.
>>
>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>> [updated for upstream]
>> Signed-off-by: Helen Koike <helen.koike@collabora.com>
>>
>> ---
>> Hello,
>>
>> This is the third version of the async-plane update suport to the
>> Rockchip driver.
>>
> 
> Thanks for a quick respin. Please see my comments inline. (I'll try to
> be better at responding from now on...)
> 
>> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.
>>
>> Note that before the patch, the following igt tests failed:
>>
>>         basic-flip-before-cursor-atomic
>>         basic-flip-before-cursor-legacy
>>         cursor-vs-flip-atomic
>>         cursor-vs-flip-legacy
>>         cursor-vs-flip-toggle
>>         flip-vs-cursor-atomic
>>         flip-vs-cursor-busy-crc-atomic
>>         flip-vs-cursor-busy-crc-legacy
>>         flip-vs-cursor-crc-atomic
>>         flip-vs-cursor-crc-legacy
>>         flip-vs-cursor-legacy
>>
>> Full log: https://people.collabora.com/~koike/results-4.20/html/
>>
>> Now with the patch applied the following were fixed:
>>         basic-flip-before-cursor-atomic
>>         basic-flip-before-cursor-legacy
>>         flip-vs-cursor-atomic
>>         flip-vs-cursor-legacy
>>
>> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> 
> Could you also test modetest, with the -C switch to test the legacy
> cursor API? I remember it triggering crashes due to synchronization
> issues easily.

Sure. I tested with
$ modetest -M rockchip -s 37:1920x1080 -C

I also vary the mode but I couldn't trigger any crashes.

> 
>>
>> Tomasz, as you mentined in v2 about waiting the hardware before updating
>> the framebuffer, now I call the loop you pointed out in the async path,
>> was that what you had in mind? Or do you think I would make sense to
>> call the vop_crtc_atomic_flush() instead of just exposing that loop?
>>
>> Thanks
>> Helen
>>
>> Changes in v3:
>> - Rebased on top of drm-misc
>> - Fix missing include in rockchip_drm_vop.c
>> - New function vop_crtc_atomic_commit_flush
>>
>> Changes in v2:
>> - v2: https://patchwork.freedesktop.org/patch/254180/
>> - Change the framebuffer as well to cover jumpy cursor when hovering
>>   text boxes or hyperlink. (Tomasz)
>> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
>>   PSR flushing (Tomasz)
>>
>> Changes in v1:
>> - Rebased on top of drm-misc
>> - In async_check call drm_atomic_helper_check_plane_state to check that
>>   the desired plane is valid and update various bits of derived state
>>   (clipped coordinates etc.)
>> - In async_check allow to configure new scaling in the fast path.
>> - In async_update force to flush all registered PSR encoders.
>> - In async_update call atomic_update directly.
>> - In async_update call vop_cfg_done needed to set the vop registers and take effect.
>>
>>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
>>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
>>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
>>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
>>  4 files changed, 131 insertions(+), 53 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>> index ea18cb2a76c0..08bec50d9c5d 100644
>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
>>         return ERR_PTR(ret);
>>  }
>>
>> -static void
>> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
>> -{
>> -       struct drm_crtc *crtc;
>> -       struct drm_crtc_state *crtc_state;
>> -       struct drm_encoder *encoder;
>> -       u32 encoder_mask = 0;
>> -       int i;
>> -
>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>> -               encoder_mask |= crtc_state->encoder_mask;
>> -               encoder_mask |= crtc->state->encoder_mask;
>> -       }
>> -
>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>> -               rockchip_drm_psr_inhibit_get(encoder);
>> -}
>> -
>> -static void
>> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
>> -{
>> -       struct drm_crtc *crtc;
>> -       struct drm_crtc_state *crtc_state;
>> -       struct drm_encoder *encoder;
>> -       u32 encoder_mask = 0;
>> -       int i;
>> -
>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>> -               encoder_mask |= crtc_state->encoder_mask;
>> -               encoder_mask |= crtc->state->encoder_mask;
>> -       }
>> -
>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>> -               rockchip_drm_psr_inhibit_put(encoder);
>> -}
>> -
>>  static void
>>  rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
>>  {
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>> index 01ff3c858875..22a70ab6e214 100644
>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>> @@ -13,6 +13,7 @@
>>   */
>>
>>  #include <drm/drmP.h>
>> +#include <drm/drm_atomic.h>
>>  #include <drm/drm_crtc_helper.h>
>>
>>  #include "rockchip_drm_drv.h"
>> @@ -109,6 +110,42 @@ int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
>>  }
>>  EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
>>
>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
>> +{
>> +       struct drm_crtc *crtc;
>> +       struct drm_crtc_state *crtc_state;
>> +       struct drm_encoder *encoder;
>> +       u32 encoder_mask = 0;
>> +       int i;
>> +
>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>> +               encoder_mask |= crtc_state->encoder_mask;
>> +               encoder_mask |= crtc->state->encoder_mask;
>> +       }
>> +
>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>> +               rockchip_drm_psr_inhibit_get(encoder);
>> +}
>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
>> +
>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
>> +{
>> +       struct drm_crtc *crtc;
>> +       struct drm_crtc_state *crtc_state;
>> +       struct drm_encoder *encoder;
>> +       u32 encoder_mask = 0;
>> +       int i;
>> +
>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>> +               encoder_mask |= crtc_state->encoder_mask;
>> +               encoder_mask |= crtc->state->encoder_mask;
>> +       }
>> +
>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>> +               rockchip_drm_psr_inhibit_put(encoder);
>> +}
>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
>> +
>>  /**
>>   * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
>>   * @encoder: encoder to obtain the PSR encoder
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>> index 860c62494496..25350ba3237b 100644
>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>> @@ -20,6 +20,9 @@ void rockchip_drm_psr_flush_all(struct drm_device *dev);
>>  int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
>>  int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
>>
>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
>> +
>>  int rockchip_drm_psr_register(struct drm_encoder *encoder,
>>                         int (*psr_set)(struct drm_encoder *, bool enable));
>>  void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>> index fb70fb486fbf..176d6e8207ed 100644
>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>> @@ -15,6 +15,7 @@
>>  #include <drm/drm.h>
>>  #include <drm/drmP.h>
>>  #include <drm/drm_atomic.h>
>> +#include <drm/drm_atomic_uapi.h>
>>  #include <drm/drm_crtc.h>
>>  #include <drm/drm_crtc_helper.h>
>>  #include <drm/drm_flip_work.h>
>> @@ -819,10 +820,99 @@ static void vop_plane_atomic_update(struct drm_plane *plane,
>>         spin_unlock(&vop->reg_lock);
>>  }
>>
>> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
>> +                                       struct drm_plane_state *state)
>> +{
>> +       struct vop_win *vop_win = to_vop_win(plane);
>> +       const struct vop_win_data *win = vop_win->data;
>> +       int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
>> +                                       DRM_PLANE_HELPER_NO_SCALING;
>> +       int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
>> +                                       DRM_PLANE_HELPER_NO_SCALING;
>> +       struct drm_crtc_state *crtc_state;
>> +       int ret;
>> +
>> +       if (plane != state->crtc->cursor)
>> +               return -EINVAL;
>> +
>> +       if (!plane->state)
>> +               return -EINVAL;
>> +
>> +       if (!plane->state->fb)
>> +               return -EINVAL;
>> +
>> +       if (state->state)
>> +               crtc_state = drm_atomic_get_existing_crtc_state(state->state,
>> +                                                               state->crtc);
>> +       else /* Special case for asynchronous cursor updates. */
>> +               crtc_state = plane->crtc->state;
>> +
>> +       ret = drm_atomic_helper_check_plane_state(plane->state,
>> +                                                 crtc_state,
>> +                                                 min_scale, max_scale,
>> +                                                 true, true);
>> +       return ret;
>> +}
>> +
>> +static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
>> +                                        struct drm_crtc_state *old_crtc_state)
>> +{
>> +       struct drm_atomic_state *old_state = old_crtc_state->state;
>> +       struct drm_plane_state *old_plane_state, *new_plane_state;
>> +       struct vop *vop = to_vop(crtc);
>> +       struct drm_plane *plane;
>> +       int i;
>> +
>> +       for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
>> +                                      new_plane_state, i) {
> 
> Hmm, from what I can see, we're not going through the full atomic
> commit sequence, with state flip, so I'm not sure where we would get
> the new state here from.
> 
>> +               if (!old_plane_state->fb)
>> +                       continue;
>> +
>> +               if (old_plane_state->fb == new_plane_state->fb)
>> +                       continue;
>> +
>> +               drm_framebuffer_get(old_plane_state->fb);
>> +               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>> +               drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
>> +               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
>> +       }
>> +}
>> +
>> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
>> +                                         struct drm_plane_state *new_state)
>> +{
>> +       struct vop *vop = to_vop(plane->state->crtc);
>> +
>> +       if (vop->crtc.state->state)
>> +               vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);
> 
> Since we just operate on one plane here, we could just do like this:
> 
> if (plane->state->fb && plane->state->fb != new_state->fb) {
>                drm_framebuffer_get(plane->state->fb);
>                WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>                drm_flip_work_queue(&vop->fb_unref_work, plane->state->fb);
>                set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> }
> 
> However, we cannot simply to this here, because it races with the
> vblank interrupt. We need to program the hw plane with the new fb
> first and trigger the update. This needs all the careful handling that
> is done in vop_crtc_atomic_flush() and so my original suggestion to
> just call it.

vop_crtc_atomic_flush() also updates the crtc->state->event, I don't
think we want that.

And actually I don't think we have this race condition, please see below

> 
> Of course to call it in its current shape, one needs to have a full
> atomic state from a commit, after a flip, but we only have the new
> plane state here. Perhaps you could duplicate existing state, update
> the desired plane state, flip and then call vop_crtc_atomic_flush()?

Could you please clarify your proposal? You mean duplicating
plane->state ? I'm trying to see how this would fit it in the code.
The drm_atomic_state structure at plate->state->state is actually always
NULL (as an async update is applied right away).


> 
> Best regards,
> Tomasz
> 

From your comment in v2:

> Isn't this going to drop the old fb reference on the floor without
> waiting for the hardware to actually stop scanning out from it?

I've been trying to analyze this better, I also got some help from
Gustavo Padovan, and I think there is no problem here as the
configuration we are doing here will just be taken into consideration by
the hardware in the next vblank, so I guess we can set and re-set
plane->fb as much as we want.

From the async_update docs:

     *  - Some hw might still scan out the old buffer until the next
     *    vblank, however we let go of the fb references as soon as
     *    we run this hook. For now drivers must implement their own workers
     *    for deferring if needed, until a common solution is created.

So the idea is to let the reference go asap, then we shouldn't need an
extra drm_framebuffer_get() as done by vop_crtc_atomic_flush().

Does it make sense to you?

Unless I am missing something, the patch v2 is essentially correct, if
not, then vc4 driver is also broken.

Please, let me know what you think and I'll send the v4.

Regards,
Helen
Tomasz Figa Nov. 23, 2018, 2:27 a.m. UTC | #3
Hi Helen,

On Fri, Nov 23, 2018 at 8:31 AM Helen Koike <helen.koike@collabora.com> wrote:
>
> Hi Tomasz,
>
> On 11/20/18 4:48 AM, Tomasz Figa wrote:
> > Hi Helen,
> >
> > On Tue, Nov 20, 2018 at 4:08 AM Helen Koike <helen.koike@collabora.com> wrote:
> >>
> >> From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> >>
> >> Add support to async updates of cursors by using the new atomic
> >> interface for that.
> >>
> >> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> >> [updated for upstream]
> >> Signed-off-by: Helen Koike <helen.koike@collabora.com>
> >>
> >> ---
> >> Hello,
> >>
> >> This is the third version of the async-plane update suport to the
> >> Rockchip driver.
> >>
> >
> > Thanks for a quick respin. Please see my comments inline. (I'll try to
> > be better at responding from now on...)
> >
> >> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.
> >>
> >> Note that before the patch, the following igt tests failed:
> >>
> >>         basic-flip-before-cursor-atomic
> >>         basic-flip-before-cursor-legacy
> >>         cursor-vs-flip-atomic
> >>         cursor-vs-flip-legacy
> >>         cursor-vs-flip-toggle
> >>         flip-vs-cursor-atomic
> >>         flip-vs-cursor-busy-crc-atomic
> >>         flip-vs-cursor-busy-crc-legacy
> >>         flip-vs-cursor-crc-atomic
> >>         flip-vs-cursor-crc-legacy
> >>         flip-vs-cursor-legacy
> >>
> >> Full log: https://people.collabora.com/~koike/results-4.20/html/
> >>
> >> Now with the patch applied the following were fixed:
> >>         basic-flip-before-cursor-atomic
> >>         basic-flip-before-cursor-legacy
> >>         flip-vs-cursor-atomic
> >>         flip-vs-cursor-legacy
> >>
> >> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> >
> > Could you also test modetest, with the -C switch to test the legacy
> > cursor API? I remember it triggering crashes due to synchronization
> > issues easily.
>
> Sure. I tested with
> $ modetest -M rockchip -s 37:1920x1080 -C
>
> I also vary the mode but I couldn't trigger any crashes.
>
> >
> >>
> >> Tomasz, as you mentined in v2 about waiting the hardware before updating
> >> the framebuffer, now I call the loop you pointed out in the async path,
> >> was that what you had in mind? Or do you think I would make sense to
> >> call the vop_crtc_atomic_flush() instead of just exposing that loop?
> >>
> >> Thanks
> >> Helen
> >>
> >> Changes in v3:
> >> - Rebased on top of drm-misc
> >> - Fix missing include in rockchip_drm_vop.c
> >> - New function vop_crtc_atomic_commit_flush
> >>
> >> Changes in v2:
> >> - v2: https://patchwork.freedesktop.org/patch/254180/
> >> - Change the framebuffer as well to cover jumpy cursor when hovering
> >>   text boxes or hyperlink. (Tomasz)
> >> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
> >>   PSR flushing (Tomasz)
> >>
> >> Changes in v1:
> >> - Rebased on top of drm-misc
> >> - In async_check call drm_atomic_helper_check_plane_state to check that
> >>   the desired plane is valid and update various bits of derived state
> >>   (clipped coordinates etc.)
> >> - In async_check allow to configure new scaling in the fast path.
> >> - In async_update force to flush all registered PSR encoders.
> >> - In async_update call atomic_update directly.
> >> - In async_update call vop_cfg_done needed to set the vop registers and take effect.
> >>
> >>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
> >>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
> >>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
> >>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
> >>  4 files changed, 131 insertions(+), 53 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> index ea18cb2a76c0..08bec50d9c5d 100644
> >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
> >>         return ERR_PTR(ret);
> >>  }
> >>
> >> -static void
> >> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> >> -{
> >> -       struct drm_crtc *crtc;
> >> -       struct drm_crtc_state *crtc_state;
> >> -       struct drm_encoder *encoder;
> >> -       u32 encoder_mask = 0;
> >> -       int i;
> >> -
> >> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >> -               encoder_mask |= crtc_state->encoder_mask;
> >> -               encoder_mask |= crtc->state->encoder_mask;
> >> -       }
> >> -
> >> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >> -               rockchip_drm_psr_inhibit_get(encoder);
> >> -}
> >> -
> >> -static void
> >> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> >> -{
> >> -       struct drm_crtc *crtc;
> >> -       struct drm_crtc_state *crtc_state;
> >> -       struct drm_encoder *encoder;
> >> -       u32 encoder_mask = 0;
> >> -       int i;
> >> -
> >> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >> -               encoder_mask |= crtc_state->encoder_mask;
> >> -               encoder_mask |= crtc->state->encoder_mask;
> >> -       }
> >> -
> >> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >> -               rockchip_drm_psr_inhibit_put(encoder);
> >> -}
> >> -
> >>  static void
> >>  rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
> >>  {
> >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >> index 01ff3c858875..22a70ab6e214 100644
> >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >> @@ -13,6 +13,7 @@
> >>   */
> >>
> >>  #include <drm/drmP.h>
> >> +#include <drm/drm_atomic.h>
> >>  #include <drm/drm_crtc_helper.h>
> >>
> >>  #include "rockchip_drm_drv.h"
> >> @@ -109,6 +110,42 @@ int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
> >>  }
> >>  EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
> >>
> >> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> >> +{
> >> +       struct drm_crtc *crtc;
> >> +       struct drm_crtc_state *crtc_state;
> >> +       struct drm_encoder *encoder;
> >> +       u32 encoder_mask = 0;
> >> +       int i;
> >> +
> >> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >> +               encoder_mask |= crtc_state->encoder_mask;
> >> +               encoder_mask |= crtc->state->encoder_mask;
> >> +       }
> >> +
> >> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >> +               rockchip_drm_psr_inhibit_get(encoder);
> >> +}
> >> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
> >> +
> >> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> >> +{
> >> +       struct drm_crtc *crtc;
> >> +       struct drm_crtc_state *crtc_state;
> >> +       struct drm_encoder *encoder;
> >> +       u32 encoder_mask = 0;
> >> +       int i;
> >> +
> >> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >> +               encoder_mask |= crtc_state->encoder_mask;
> >> +               encoder_mask |= crtc->state->encoder_mask;
> >> +       }
> >> +
> >> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >> +               rockchip_drm_psr_inhibit_put(encoder);
> >> +}
> >> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
> >> +
> >>  /**
> >>   * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
> >>   * @encoder: encoder to obtain the PSR encoder
> >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >> index 860c62494496..25350ba3237b 100644
> >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >> @@ -20,6 +20,9 @@ void rockchip_drm_psr_flush_all(struct drm_device *dev);
> >>  int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
> >>  int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
> >>
> >> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
> >> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
> >> +
> >>  int rockchip_drm_psr_register(struct drm_encoder *encoder,
> >>                         int (*psr_set)(struct drm_encoder *, bool enable));
> >>  void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
> >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >> index fb70fb486fbf..176d6e8207ed 100644
> >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >> @@ -15,6 +15,7 @@
> >>  #include <drm/drm.h>
> >>  #include <drm/drmP.h>
> >>  #include <drm/drm_atomic.h>
> >> +#include <drm/drm_atomic_uapi.h>
> >>  #include <drm/drm_crtc.h>
> >>  #include <drm/drm_crtc_helper.h>
> >>  #include <drm/drm_flip_work.h>
> >> @@ -819,10 +820,99 @@ static void vop_plane_atomic_update(struct drm_plane *plane,
> >>         spin_unlock(&vop->reg_lock);
> >>  }
> >>
> >> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
> >> +                                       struct drm_plane_state *state)
> >> +{
> >> +       struct vop_win *vop_win = to_vop_win(plane);
> >> +       const struct vop_win_data *win = vop_win->data;
> >> +       int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
> >> +                                       DRM_PLANE_HELPER_NO_SCALING;
> >> +       int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
> >> +                                       DRM_PLANE_HELPER_NO_SCALING;
> >> +       struct drm_crtc_state *crtc_state;
> >> +       int ret;
> >> +
> >> +       if (plane != state->crtc->cursor)
> >> +               return -EINVAL;
> >> +
> >> +       if (!plane->state)
> >> +               return -EINVAL;
> >> +
> >> +       if (!plane->state->fb)
> >> +               return -EINVAL;
> >> +
> >> +       if (state->state)
> >> +               crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> >> +                                                               state->crtc);
> >> +       else /* Special case for asynchronous cursor updates. */
> >> +               crtc_state = plane->crtc->state;
> >> +
> >> +       ret = drm_atomic_helper_check_plane_state(plane->state,
> >> +                                                 crtc_state,
> >> +                                                 min_scale, max_scale,
> >> +                                                 true, true);
> >> +       return ret;
> >> +}
> >> +
> >> +static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
> >> +                                        struct drm_crtc_state *old_crtc_state)
> >> +{
> >> +       struct drm_atomic_state *old_state = old_crtc_state->state;
> >> +       struct drm_plane_state *old_plane_state, *new_plane_state;
> >> +       struct vop *vop = to_vop(crtc);
> >> +       struct drm_plane *plane;
> >> +       int i;
> >> +
> >> +       for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
> >> +                                      new_plane_state, i) {
> >
> > Hmm, from what I can see, we're not going through the full atomic
> > commit sequence, with state flip, so I'm not sure where we would get
> > the new state here from.
> >
> >> +               if (!old_plane_state->fb)
> >> +                       continue;
> >> +
> >> +               if (old_plane_state->fb == new_plane_state->fb)
> >> +                       continue;
> >> +
> >> +               drm_framebuffer_get(old_plane_state->fb);
> >> +               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> >> +               drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
> >> +               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> >> +       }
> >> +}
> >> +
> >> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
> >> +                                         struct drm_plane_state *new_state)
> >> +{
> >> +       struct vop *vop = to_vop(plane->state->crtc);
> >> +
> >> +       if (vop->crtc.state->state)
> >> +               vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);
> >
> > Since we just operate on one plane here, we could just do like this:
> >
> > if (plane->state->fb && plane->state->fb != new_state->fb) {
> >                drm_framebuffer_get(plane->state->fb);
> >                WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> >                drm_flip_work_queue(&vop->fb_unref_work, plane->state->fb);
> >                set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> > }
> >
> > However, we cannot simply to this here, because it races with the
> > vblank interrupt. We need to program the hw plane with the new fb
> > first and trigger the update. This needs all the careful handling that
> > is done in vop_crtc_atomic_flush() and so my original suggestion to
> > just call it.
>
> vop_crtc_atomic_flush() also updates the crtc->state->event, I don't
> think we want that.

Good point, we don't want to touch the event.

>
> And actually I don't think we have this race condition, please see below
>

Just to clarify, the race conditions I'm talking here are not anything
new to this patch alone. I had actually observed those when
implementing the Chrome OS downstream async cursor code before:

https://chromium.googlesource.com/chromiumos/third_party/kernel/+/65d4ff0af3f8c7ebecad8f3b402b546f277d9225/drivers/gpu/drm/rockchip/rockchip_drm_vop.c#1015

Note that the code there actually duplicates the current plane state,
updates the copy, swaps both states and then update the hardware based
on both new and old states, as the regular atomic commit flow would
do. With that, no reference counts are dropped until the old state
gets destroyed at the end of the function. Also note the if block that
handles the condition of (plane_state->fb && plane_state->fb !=
plane->state->fb) (plane_state is the old state after the swap).

> >
> > Of course to call it in its current shape, one needs to have a full
> > atomic state from a commit, after a flip, but we only have the new
> > plane state here. Perhaps you could duplicate existing state, update
> > the desired plane state, flip and then call vop_crtc_atomic_flush()?
>
> Could you please clarify your proposal? You mean duplicating
> plane->state ? I'm trying to see how this would fit it in the code.
> The drm_atomic_state structure at plate->state->state is actually always
> NULL (as an async update is applied right away).
>
>
> >
> > Best regards,
> > Tomasz
> >
>
> From your comment in v2:
>
> > Isn't this going to drop the old fb reference on the floor without
> > waiting for the hardware to actually stop scanning out from it?
>
> I've been trying to analyze this better, I also got some help from
> Gustavo Padovan, and I think there is no problem here as the
> configuration we are doing here will just be taken into consideration by
> the hardware in the next vblank, so I guess we can set and re-set
> plane->fb as much as we want.

The point here is not about setting and resetting the plane->fb
pointer. It's about what happens inside drm_atomic_set_fb_for_plane().

It calls drm_framebuffer_get() for the new fb and
drm_framebuffer_put() for the old fb. In result, if the fb changes,
the old fb, which had its reference count incremented in the atomic
commit that set it to the plane before, has its reference count
decremented. Moreover, if the new reference count becomes 0,
drm_framebuffer_put() will immediately free the buffer.

Freeing a buffer when the hardware is still scanning out of it isn't a
good idea, is it?

>
> From the async_update docs:
>
>      *  - Some hw might still scan out the old buffer until the next
>      *    vblank, however we let go of the fb references as soon as
>      *    we run this hook. For now drivers must implement their own workers
>      *    for deferring if needed, until a common solution is created.
>
> So the idea is to let the reference go asap, then we shouldn't need an
> extra drm_framebuffer_get() as done by vop_crtc_atomic_flush().

The comment above says exactly what I said. The driver must implement
a worker to keep the reference until the hardware stops using the
buffer.

>
> Does it make sense to you?
>
> Unless I am missing something, the patch v2 is essentially correct, if
> not, then vc4 driver is also broken.
>
> Please, let me know what you think and I'll send the v4.

The vc4 driver seems to be able to program the hardware to switch the
scanout to the new buffer immediately:

https://elixir.bootlin.com/linux/v4.20-rc3/source/drivers/gpu/drm/vc4/vc4_plane.c#L794

Although I wonder if there isn't still a tiny race there - the
hardware may have just started refilling the FIFO from the old
address. Still, if the FIFO is small, the FIFO refill operation may be
much shorter than it takes for the kernel code to actually free the
buffer. Eric and Michael, could you confirm?

Best regards,
Tomasz
Michael Zoran Nov. 23, 2018, 4:58 a.m. UTC | #4
On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:
> 
> The point here is not about setting and resetting the plane->fb
> pointer. It's about what happens inside
> drm_atomic_set_fb_for_plane().
> 
> It calls drm_framebuffer_get() for the new fb and
> drm_framebuffer_put() for the old fb. In result, if the fb changes,
> the old fb, which had its reference count incremented in the atomic
> commit that set it to the plane before, has its reference count
> decremented. Moreover, if the new reference count becomes 0,
> drm_framebuffer_put() will immediately free the buffer.
> 
> Freeing a buffer when the hardware is still scanning out of it isn't
> a
> good idea, is it?

No, it's not.  But the board I submitted the patch for doesn't have
anything like hot swapable ram.  The ram access is still going to work,
just it might display something it shouldn't. Say for example if that
frame buffer got reused by somethig else and filled with new data in
the very small window.

But yes, I agree the best solution would be to not release the buffer
until the next vblank.

Perhaps a good solution would be for the DRM api to have the concept of
a deferred release?  Meaning if the put() call just added the frame
buffer to a list that DRM core could walk during the vblank.  That
might be better then every single driver trying to work up a custom
solution.

> The vc4 driver seems to be able to program the hardware to switch the
> scanout to the new buffer immediately:
> 
> https://elixir.bootlin.com/linux/v4.20-rc3/source/drivers/gpu/drm/vc4/vc4_plane.c#L794
> 
> Although I wonder if there isn't still a tiny race there - the
> hardware may have just started refilling the FIFO from the old
> address. Still, if the FIFO is small, the FIFO refill operation may
> be
> much shorter than it takes for the kernel code to actually free the
> buffer. Eric and Michael, could you confirm?
> 

I don't have those boards anymore, and I don't have access to any
technical documentation on the GPU so I can't really add much here.  
Eric can probably provide the best information.

I submitted the patch because I was working on arm64 support for fun
and was becomming very annoyed by desktop lockups for long periods of
time on the desktop enviroment of my choice due to the driver being
flooded with curser animation updates.  I sent the patch to Eric who
was kind enough to review it and suggest some improvements.
Tomasz Figa Nov. 23, 2018, 5:35 a.m. UTC | #5
Hi Michael,

On Fri, Nov 23, 2018 at 1:58 PM Michael Zoran <mzoran@crowfest.net> wrote:
>
> On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:
> >
> > The point here is not about setting and resetting the plane->fb
> > pointer. It's about what happens inside
> > drm_atomic_set_fb_for_plane().
> >
> > It calls drm_framebuffer_get() for the new fb and
> > drm_framebuffer_put() for the old fb. In result, if the fb changes,
> > the old fb, which had its reference count incremented in the atomic
> > commit that set it to the plane before, has its reference count
> > decremented. Moreover, if the new reference count becomes 0,
> > drm_framebuffer_put() will immediately free the buffer.
> >
> > Freeing a buffer when the hardware is still scanning out of it isn't
> > a
> > good idea, is it?
>
> No, it's not.  But the board I submitted the patch for doesn't have
> anything like hot swapable ram.  The ram access is still going to work,
> just it might display something it shouldn't. Say for example if that
> frame buffer got reused by somethig else and filled with new data in
> the very small window.

Thanks for a quick reply!

To clarify, on the Rockchip platform this patch is for (and many other
arm/arm64 SoCs) the display controller is behind an IOMMU. Freeing the
buffer would mean unmapping the related IOVAs from the IOMMU. If the
hardware is still scanning out from the unmapped addresses, it would
cause IOMMU page faults. We don't have any good IOMMU page fault
handling in the kernel, so on most platforms that would likely end up
stalling the display controller completely (on Rockchip it does).

>
> But yes, I agree the best solution would be to not release the buffer
> until the next vblank.
>
> Perhaps a good solution would be for the DRM api to have the concept of
> a deferred release?  Meaning if the put() call just added the frame
> buffer to a list that DRM core could walk during the vblank.  That
> might be better then every single driver trying to work up a custom
> solution.

Agreed.

Best regards,
Tomasz
Eric Anholt Nov. 26, 2018, 8:36 p.m. UTC | #6
Michael Zoran <mzoran@crowfest.net> writes:

> On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:
>> 
>> The point here is not about setting and resetting the plane->fb
>> pointer. It's about what happens inside
>> drm_atomic_set_fb_for_plane().
>> 
>> It calls drm_framebuffer_get() for the new fb and
>> drm_framebuffer_put() for the old fb. In result, if the fb changes,
>> the old fb, which had its reference count incremented in the atomic
>> commit that set it to the plane before, has its reference count
>> decremented. Moreover, if the new reference count becomes 0,
>> drm_framebuffer_put() will immediately free the buffer.
>> 
>> Freeing a buffer when the hardware is still scanning out of it isn't
>> a
>> good idea, is it?
>
> No, it's not.  But the board I submitted the patch for doesn't have
> anything like hot swapable ram.  The ram access is still going to work,
> just it might display something it shouldn't. Say for example if that
> frame buffer got reused by somethig else and filled with new data in
> the very small window.
>
> But yes, I agree the best solution would be to not release the buffer
> until the next vblank.
>
> Perhaps a good solution would be for the DRM api to have the concept of
> a deferred release?  Meaning if the put() call just added the frame
> buffer to a list that DRM core could walk during the vblank.  That
> might be better then every single driver trying to work up a custom
> solution.
>
>> The vc4 driver seems to be able to program the hardware to switch the
>> scanout to the new buffer immediately:
>> 
>> https://elixir.bootlin.com/linux/v4.20-rc3/source/drivers/gpu/drm/vc4/vc4_plane.c#L794
>> 
>> Although I wonder if there isn't still a tiny race there - the
>> hardware may have just started refilling the FIFO from the old
>> address. Still, if the FIFO is small, the FIFO refill operation may
>> be
>> much shorter than it takes for the kernel code to actually free the
>> buffer. Eric and Michael, could you confirm?
>> 
>
> I don't have those boards anymore, and I don't have access to any
> technical documentation on the GPU so I can't really add much here.  
> Eric can probably provide the best information.

I don't think I understood my scanout hardware well enough when I
started on the async update stuff for rpi.  vc4 probably needs to wait
until the HW starts scanning out a new line before letting the old BO
get freed.
Boris Brezillon Nov. 26, 2018, 9:41 p.m. UTC | #7
On Mon, 26 Nov 2018 12:36:03 -0800
Eric Anholt <eric@anholt.net> wrote:

> Michael Zoran <mzoran@crowfest.net> writes:
> 
> > On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:  
> >> 
> >> The point here is not about setting and resetting the plane->fb
> >> pointer. It's about what happens inside
> >> drm_atomic_set_fb_for_plane().
> >> 
> >> It calls drm_framebuffer_get() for the new fb and
> >> drm_framebuffer_put() for the old fb. In result, if the fb changes,
> >> the old fb, which had its reference count incremented in the atomic
> >> commit that set it to the plane before, has its reference count
> >> decremented. Moreover, if the new reference count becomes 0,
> >> drm_framebuffer_put() will immediately free the buffer.
> >> 
> >> Freeing a buffer when the hardware is still scanning out of it isn't
> >> a
> >> good idea, is it?  
> >
> > No, it's not.  But the board I submitted the patch for doesn't have
> > anything like hot swapable ram.  The ram access is still going to work,
> > just it might display something it shouldn't. Say for example if that
> > frame buffer got reused by somethig else and filled with new data in
> > the very small window.
> >
> > But yes, I agree the best solution would be to not release the buffer
> > until the next vblank.
> >
> > Perhaps a good solution would be for the DRM api to have the concept of
> > a deferred release?  Meaning if the put() call just added the frame
> > buffer to a list that DRM core could walk during the vblank.  That
> > might be better then every single driver trying to work up a custom
> > solution.
> >  
> >> The vc4 driver seems to be able to program the hardware to switch the
> >> scanout to the new buffer immediately:
> >> 
> >> https://elixir.bootlin.com/linux/v4.20-rc3/source/drivers/gpu/drm/vc4/vc4_plane.c#L794
> >> 
> >> Although I wonder if there isn't still a tiny race there - the
> >> hardware may have just started refilling the FIFO from the old
> >> address. Still, if the FIFO is small, the FIFO refill operation may
> >> be
> >> much shorter than it takes for the kernel code to actually free the
> >> buffer. Eric and Michael, could you confirm?
> >>   
> >
> > I don't have those boards anymore, and I don't have access to any
> > technical documentation on the GPU so I can't really add much here.  
> > Eric can probably provide the best information.  
> 
> I don't think I understood my scanout hardware well enough when I
> started on the async update stuff for rpi.  vc4 probably needs to wait
> until the HW starts scanning out a new line before letting the old BO
> get freed.

That's also my understanding. Note that even if the BO is freed before
the HVS has finished generating a line it shouldn't crash, because
accesses are done through DMA. Might be a security issue though if we
start re-using the memory for something else while it's still being
accessed.

The solution would be to wait for the EOL (End Of Line) interrupt in
the async update path, but I'm not sure this is allowed.
Gustavo Padovan Nov. 26, 2018, 11:54 p.m. UTC | #8
Hi Tomasz,

On 11/23/18 12:27 AM, Tomasz Figa wrote:
> Hi Helen,
>
> On Fri, Nov 23, 2018 at 8:31 AM Helen Koike <helen.koike@collabora.com> wrote:
>> Hi Tomasz,
>>
>> On 11/20/18 4:48 AM, Tomasz Figa wrote:
>>> Hi Helen,
>>>
>>> On Tue, Nov 20, 2018 at 4:08 AM Helen Koike <helen.koike@collabora.com> wrote:
>>>> From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>>>>
>>>> Add support to async updates of cursors by using the new atomic
>>>> interface for that.
>>>>
>>>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>>>> [updated for upstream]
>>>> Signed-off-by: Helen Koike <helen.koike@collabora.com>
>>>>
>>>> ---
>>>> Hello,
>>>>
>>>> This is the third version of the async-plane update suport to the
>>>> Rockchip driver.
>>>>
>>> Thanks for a quick respin. Please see my comments inline. (I'll try to
>>> be better at responding from now on...)
>>>
>>>> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.
>>>>
>>>> Note that before the patch, the following igt tests failed:
>>>>
>>>>          basic-flip-before-cursor-atomic
>>>>          basic-flip-before-cursor-legacy
>>>>          cursor-vs-flip-atomic
>>>>          cursor-vs-flip-legacy
>>>>          cursor-vs-flip-toggle
>>>>          flip-vs-cursor-atomic
>>>>          flip-vs-cursor-busy-crc-atomic
>>>>          flip-vs-cursor-busy-crc-legacy
>>>>          flip-vs-cursor-crc-atomic
>>>>          flip-vs-cursor-crc-legacy
>>>>          flip-vs-cursor-legacy
>>>>
>>>> Full log: https://people.collabora.com/~koike/results-4.20/html/
>>>>
>>>> Now with the patch applied the following were fixed:
>>>>          basic-flip-before-cursor-atomic
>>>>          basic-flip-before-cursor-legacy
>>>>          flip-vs-cursor-atomic
>>>>          flip-vs-cursor-legacy
>>>>
>>>> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
>>> Could you also test modetest, with the -C switch to test the legacy
>>> cursor API? I remember it triggering crashes due to synchronization
>>> issues easily.
>> Sure. I tested with
>> $ modetest -M rockchip -s 37:1920x1080 -C
>>
>> I also vary the mode but I couldn't trigger any crashes.
>>
>>>> Tomasz, as you mentined in v2 about waiting the hardware before updating
>>>> the framebuffer, now I call the loop you pointed out in the async path,
>>>> was that what you had in mind? Or do you think I would make sense to
>>>> call the vop_crtc_atomic_flush() instead of just exposing that loop?
>>>>
>>>> Thanks
>>>> Helen
>>>>
>>>> Changes in v3:
>>>> - Rebased on top of drm-misc
>>>> - Fix missing include in rockchip_drm_vop.c
>>>> - New function vop_crtc_atomic_commit_flush
>>>>
>>>> Changes in v2:
>>>> - v2: https://patchwork.freedesktop.org/patch/254180/
>>>> - Change the framebuffer as well to cover jumpy cursor when hovering
>>>>    text boxes or hyperlink. (Tomasz)
>>>> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
>>>>    PSR flushing (Tomasz)
>>>>
>>>> Changes in v1:
>>>> - Rebased on top of drm-misc
>>>> - In async_check call drm_atomic_helper_check_plane_state to check that
>>>>    the desired plane is valid and update various bits of derived state
>>>>    (clipped coordinates etc.)
>>>> - In async_check allow to configure new scaling in the fast path.
>>>> - In async_update force to flush all registered PSR encoders.
>>>> - In async_update call atomic_update directly.
>>>> - In async_update call vop_cfg_done needed to set the vop registers and take effect.
>>>>
>>>>   drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
>>>>   drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
>>>>   drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
>>>>   drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
>>>>   4 files changed, 131 insertions(+), 53 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>>>> index ea18cb2a76c0..08bec50d9c5d 100644
>>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>>>> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
>>>>          return ERR_PTR(ret);
>>>>   }
>>>>
>>>> -static void
>>>> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
>>>> -{
>>>> -       struct drm_crtc *crtc;
>>>> -       struct drm_crtc_state *crtc_state;
>>>> -       struct drm_encoder *encoder;
>>>> -       u32 encoder_mask = 0;
>>>> -       int i;
>>>> -
>>>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>>>> -               encoder_mask |= crtc_state->encoder_mask;
>>>> -               encoder_mask |= crtc->state->encoder_mask;
>>>> -       }
>>>> -
>>>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>>>> -               rockchip_drm_psr_inhibit_get(encoder);
>>>> -}
>>>> -
>>>> -static void
>>>> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
>>>> -{
>>>> -       struct drm_crtc *crtc;
>>>> -       struct drm_crtc_state *crtc_state;
>>>> -       struct drm_encoder *encoder;
>>>> -       u32 encoder_mask = 0;
>>>> -       int i;
>>>> -
>>>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>>>> -               encoder_mask |= crtc_state->encoder_mask;
>>>> -               encoder_mask |= crtc->state->encoder_mask;
>>>> -       }
>>>> -
>>>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>>>> -               rockchip_drm_psr_inhibit_put(encoder);
>>>> -}
>>>> -
>>>>   static void
>>>>   rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
>>>>   {
>>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>>>> index 01ff3c858875..22a70ab6e214 100644
>>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>>>> @@ -13,6 +13,7 @@
>>>>    */
>>>>
>>>>   #include <drm/drmP.h>
>>>> +#include <drm/drm_atomic.h>
>>>>   #include <drm/drm_crtc_helper.h>
>>>>
>>>>   #include "rockchip_drm_drv.h"
>>>> @@ -109,6 +110,42 @@ int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
>>>>   }
>>>>   EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
>>>>
>>>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
>>>> +{
>>>> +       struct drm_crtc *crtc;
>>>> +       struct drm_crtc_state *crtc_state;
>>>> +       struct drm_encoder *encoder;
>>>> +       u32 encoder_mask = 0;
>>>> +       int i;
>>>> +
>>>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>>>> +               encoder_mask |= crtc_state->encoder_mask;
>>>> +               encoder_mask |= crtc->state->encoder_mask;
>>>> +       }
>>>> +
>>>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>>>> +               rockchip_drm_psr_inhibit_get(encoder);
>>>> +}
>>>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
>>>> +
>>>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
>>>> +{
>>>> +       struct drm_crtc *crtc;
>>>> +       struct drm_crtc_state *crtc_state;
>>>> +       struct drm_encoder *encoder;
>>>> +       u32 encoder_mask = 0;
>>>> +       int i;
>>>> +
>>>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
>>>> +               encoder_mask |= crtc_state->encoder_mask;
>>>> +               encoder_mask |= crtc->state->encoder_mask;
>>>> +       }
>>>> +
>>>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
>>>> +               rockchip_drm_psr_inhibit_put(encoder);
>>>> +}
>>>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
>>>> +
>>>>   /**
>>>>    * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
>>>>    * @encoder: encoder to obtain the PSR encoder
>>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>>>> index 860c62494496..25350ba3237b 100644
>>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
>>>> @@ -20,6 +20,9 @@ void rockchip_drm_psr_flush_all(struct drm_device *dev);
>>>>   int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
>>>>   int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
>>>>
>>>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
>>>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
>>>> +
>>>>   int rockchip_drm_psr_register(struct drm_encoder *encoder,
>>>>                          int (*psr_set)(struct drm_encoder *, bool enable));
>>>>   void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
>>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>>>> index fb70fb486fbf..176d6e8207ed 100644
>>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>>>> @@ -15,6 +15,7 @@
>>>>   #include <drm/drm.h>
>>>>   #include <drm/drmP.h>
>>>>   #include <drm/drm_atomic.h>
>>>> +#include <drm/drm_atomic_uapi.h>
>>>>   #include <drm/drm_crtc.h>
>>>>   #include <drm/drm_crtc_helper.h>
>>>>   #include <drm/drm_flip_work.h>
>>>> @@ -819,10 +820,99 @@ static void vop_plane_atomic_update(struct drm_plane *plane,
>>>>          spin_unlock(&vop->reg_lock);
>>>>   }
>>>>
>>>> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
>>>> +                                       struct drm_plane_state *state)
>>>> +{
>>>> +       struct vop_win *vop_win = to_vop_win(plane);
>>>> +       const struct vop_win_data *win = vop_win->data;
>>>> +       int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
>>>> +                                       DRM_PLANE_HELPER_NO_SCALING;
>>>> +       int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
>>>> +                                       DRM_PLANE_HELPER_NO_SCALING;
>>>> +       struct drm_crtc_state *crtc_state;
>>>> +       int ret;
>>>> +
>>>> +       if (plane != state->crtc->cursor)
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (!plane->state)
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (!plane->state->fb)
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (state->state)
>>>> +               crtc_state = drm_atomic_get_existing_crtc_state(state->state,
>>>> +                                                               state->crtc);
>>>> +       else /* Special case for asynchronous cursor updates. */
>>>> +               crtc_state = plane->crtc->state;
>>>> +
>>>> +       ret = drm_atomic_helper_check_plane_state(plane->state,
>>>> +                                                 crtc_state,
>>>> +                                                 min_scale, max_scale,
>>>> +                                                 true, true);
>>>> +       return ret;
>>>> +}
>>>> +
>>>> +static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
>>>> +                                        struct drm_crtc_state *old_crtc_state)
>>>> +{
>>>> +       struct drm_atomic_state *old_state = old_crtc_state->state;
>>>> +       struct drm_plane_state *old_plane_state, *new_plane_state;
>>>> +       struct vop *vop = to_vop(crtc);
>>>> +       struct drm_plane *plane;
>>>> +       int i;
>>>> +
>>>> +       for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
>>>> +                                      new_plane_state, i) {
>>> Hmm, from what I can see, we're not going through the full atomic
>>> commit sequence, with state flip, so I'm not sure where we would get
>>> the new state here from.
>>>
>>>> +               if (!old_plane_state->fb)
>>>> +                       continue;
>>>> +
>>>> +               if (old_plane_state->fb == new_plane_state->fb)
>>>> +                       continue;
>>>> +
>>>> +               drm_framebuffer_get(old_plane_state->fb);
>>>> +               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>>>> +               drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
>>>> +               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
>>>> +       }
>>>> +}
>>>> +
>>>> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
>>>> +                                         struct drm_plane_state *new_state)
>>>> +{
>>>> +       struct vop *vop = to_vop(plane->state->crtc);
>>>> +
>>>> +       if (vop->crtc.state->state)
>>>> +               vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);
>>> Since we just operate on one plane here, we could just do like this:
>>>
>>> if (plane->state->fb && plane->state->fb != new_state->fb) {
>>>                 drm_framebuffer_get(plane->state->fb);
>>>                 WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>>>                 drm_flip_work_queue(&vop->fb_unref_work, plane->state->fb);
>>>                 set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
>>> }
>>>
>>> However, we cannot simply to this here, because it races with the
>>> vblank interrupt. We need to program the hw plane with the new fb
>>> first and trigger the update. This needs all the careful handling that
>>> is done in vop_crtc_atomic_flush() and so my original suggestion to
>>> just call it.
>> vop_crtc_atomic_flush() also updates the crtc->state->event, I don't
>> think we want that.
> Good point, we don't want to touch the event.
>
>> And actually I don't think we have this race condition, please see below
>>
> Just to clarify, the race conditions I'm talking here are not anything
> new to this patch alone. I had actually observed those when
> implementing the Chrome OS downstream async cursor code before:
>
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/65d4ff0af3f8c7ebecad8f3b402b546f277d9225/drivers/gpu/drm/rockchip/rockchip_drm_vop.c#1015
>
> Note that the code there actually duplicates the current plane state,
> updates the copy, swaps both states and then update the hardware based
> on both new and old states, as the regular atomic commit flow would
> do. With that, no reference counts are dropped until the old state
> gets destroyed at the end of the function. Also note the if block that
> handles the condition of (plane_state->fb && plane_state->fb !=
> plane->state->fb) (plane_state is the old state after the swap).
>
>>> Of course to call it in its current shape, one needs to have a full
>>> atomic state from a commit, after a flip, but we only have the new
>>> plane state here. Perhaps you could duplicate existing state, update
>>> the desired plane state, flip and then call vop_crtc_atomic_flush()?
>> Could you please clarify your proposal? You mean duplicating
>> plane->state ? I'm trying to see how this would fit it in the code.
>> The drm_atomic_state structure at plate->state->state is actually always
>> NULL (as an async update is applied right away).
>>
>>
>>> Best regards,
>>> Tomasz
>>>
>>  From your comment in v2:
>>
>>> Isn't this going to drop the old fb reference on the floor without
>>> waiting for the hardware to actually stop scanning out from it?
>> I've been trying to analyze this better, I also got some help from
>> Gustavo Padovan, and I think there is no problem here as the
>> configuration we are doing here will just be taken into consideration by
>> the hardware in the next vblank, so I guess we can set and re-set
>> plane->fb as much as we want.
> The point here is not about setting and resetting the plane->fb
> pointer. It's about what happens inside drm_atomic_set_fb_for_plane().
>
> It calls drm_framebuffer_get() for the new fb and
> drm_framebuffer_put() for the old fb. In result, if the fb changes,
> the old fb, which had its reference count incremented in the atomic
> commit that set it to the plane before, has its reference count
> decremented. Moreover, if the new reference count becomes 0,
> drm_framebuffer_put() will immediately free the buffer.
>
> Freeing a buffer when the hardware is still scanning out of it isn't a
> good idea, is it?

My understanding is that in the async update path we never touch the fb 
that is still being scanned out - what you are referring as old fb. That 
is the design of async update. drm_atomic_set_fb_for_plane() replace 
pointers and refs between the fb we just received from userspace with 
the one on plane->state (that after the atomic swap holds the new state) 
which won't be scanned out until the next flip. However I don't 
understand this part of the rockchip hardware, the fb is not the one 
being scanned out. Is this still a problem?

Regards,

Gustavo
Tomasz Figa Nov. 27, 2018, 7:54 a.m. UTC | #9
Hi Gustavo,

On Tue, Nov 27, 2018 at 8:54 AM Gustavo Padovan
<gustavo.padovan@collabora.com> wrote:
>
> Hi Tomasz,
>
> On 11/23/18 12:27 AM, Tomasz Figa wrote:
> > Hi Helen,
> >
> > On Fri, Nov 23, 2018 at 8:31 AM Helen Koike <helen.koike@collabora.com> wrote:
> >> Hi Tomasz,
> >>
> >> On 11/20/18 4:48 AM, Tomasz Figa wrote:
> >>> Hi Helen,
> >>>
> >>> On Tue, Nov 20, 2018 at 4:08 AM Helen Koike <helen.koike@collabora.com> wrote:
> >>>> From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> >>>>
> >>>> Add support to async updates of cursors by using the new atomic
> >>>> interface for that.
> >>>>
> >>>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> >>>> [updated for upstream]
> >>>> Signed-off-by: Helen Koike <helen.koike@collabora.com>
> >>>>
> >>>> ---
> >>>> Hello,
> >>>>
> >>>> This is the third version of the async-plane update suport to the
> >>>> Rockchip driver.
> >>>>
> >>> Thanks for a quick respin. Please see my comments inline. (I'll try to
> >>> be better at responding from now on...)
> >>>
> >>>> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards Ficus.
> >>>>
> >>>> Note that before the patch, the following igt tests failed:
> >>>>
> >>>>          basic-flip-before-cursor-atomic
> >>>>          basic-flip-before-cursor-legacy
> >>>>          cursor-vs-flip-atomic
> >>>>          cursor-vs-flip-legacy
> >>>>          cursor-vs-flip-toggle
> >>>>          flip-vs-cursor-atomic
> >>>>          flip-vs-cursor-busy-crc-atomic
> >>>>          flip-vs-cursor-busy-crc-legacy
> >>>>          flip-vs-cursor-crc-atomic
> >>>>          flip-vs-cursor-crc-legacy
> >>>>          flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20/html/
> >>>>
> >>>> Now with the patch applied the following were fixed:
> >>>>          basic-flip-before-cursor-atomic
> >>>>          basic-flip-before-cursor-legacy
> >>>>          flip-vs-cursor-atomic
> >>>>          flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> >>> Could you also test modetest, with the -C switch to test the legacy
> >>> cursor API? I remember it triggering crashes due to synchronization
> >>> issues easily.
> >> Sure. I tested with
> >> $ modetest -M rockchip -s 37:1920x1080 -C
> >>
> >> I also vary the mode but I couldn't trigger any crashes.
> >>
> >>>> Tomasz, as you mentined in v2 about waiting the hardware before updating
> >>>> the framebuffer, now I call the loop you pointed out in the async path,
> >>>> was that what you had in mind? Or do you think I would make sense to
> >>>> call the vop_crtc_atomic_flush() instead of just exposing that loop?
> >>>>
> >>>> Thanks
> >>>> Helen
> >>>>
> >>>> Changes in v3:
> >>>> - Rebased on top of drm-misc
> >>>> - Fix missing include in rockchip_drm_vop.c
> >>>> - New function vop_crtc_atomic_commit_flush
> >>>>
> >>>> Changes in v2:
> >>>> - v2: https://patchwork.freedesktop.org/patch/254180/
> >>>> - Change the framebuffer as well to cover jumpy cursor when hovering
> >>>>    text boxes or hyperlink. (Tomasz)
> >>>> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
> >>>>    PSR flushing (Tomasz)
> >>>>
> >>>> Changes in v1:
> >>>> - Rebased on top of drm-misc
> >>>> - In async_check call drm_atomic_helper_check_plane_state to check that
> >>>>    the desired plane is valid and update various bits of derived state
> >>>>    (clipped coordinates etc.)
> >>>> - In async_check allow to configure new scaling in the fast path.
> >>>> - In async_update force to flush all registered PSR encoders.
> >>>> - In async_update call atomic_update directly.
> >>>> - In async_update call vop_cfg_done needed to set the vop registers and take effect.
> >>>>
> >>>>   drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 -------
> >>>>   drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++++++
> >>>>   drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
> >>>>   drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +++++++++++++++++---
> >>>>   4 files changed, 131 insertions(+), 53 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >>>> index ea18cb2a76c0..08bec50d9c5d 100644
> >>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >>>> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
> >>>>          return ERR_PTR(ret);
> >>>>   }
> >>>>
> >>>> -static void
> >>>> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> >>>> -{
> >>>> -       struct drm_crtc *crtc;
> >>>> -       struct drm_crtc_state *crtc_state;
> >>>> -       struct drm_encoder *encoder;
> >>>> -       u32 encoder_mask = 0;
> >>>> -       int i;
> >>>> -
> >>>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >>>> -               encoder_mask |= crtc_state->encoder_mask;
> >>>> -               encoder_mask |= crtc->state->encoder_mask;
> >>>> -       }
> >>>> -
> >>>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >>>> -               rockchip_drm_psr_inhibit_get(encoder);
> >>>> -}
> >>>> -
> >>>> -static void
> >>>> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> >>>> -{
> >>>> -       struct drm_crtc *crtc;
> >>>> -       struct drm_crtc_state *crtc_state;
> >>>> -       struct drm_encoder *encoder;
> >>>> -       u32 encoder_mask = 0;
> >>>> -       int i;
> >>>> -
> >>>> -       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >>>> -               encoder_mask |= crtc_state->encoder_mask;
> >>>> -               encoder_mask |= crtc->state->encoder_mask;
> >>>> -       }
> >>>> -
> >>>> -       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >>>> -               rockchip_drm_psr_inhibit_put(encoder);
> >>>> -}
> >>>> -
> >>>>   static void
> >>>>   rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
> >>>>   {
> >>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >>>> index 01ff3c858875..22a70ab6e214 100644
> >>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> >>>> @@ -13,6 +13,7 @@
> >>>>    */
> >>>>
> >>>>   #include <drm/drmP.h>
> >>>> +#include <drm/drm_atomic.h>
> >>>>   #include <drm/drm_crtc_helper.h>
> >>>>
> >>>>   #include "rockchip_drm_drv.h"
> >>>> @@ -109,6 +110,42 @@ int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
> >>>>   }
> >>>>   EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
> >>>>
> >>>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> >>>> +{
> >>>> +       struct drm_crtc *crtc;
> >>>> +       struct drm_crtc_state *crtc_state;
> >>>> +       struct drm_encoder *encoder;
> >>>> +       u32 encoder_mask = 0;
> >>>> +       int i;
> >>>> +
> >>>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >>>> +               encoder_mask |= crtc_state->encoder_mask;
> >>>> +               encoder_mask |= crtc->state->encoder_mask;
> >>>> +       }
> >>>> +
> >>>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >>>> +               rockchip_drm_psr_inhibit_get(encoder);
> >>>> +}
> >>>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
> >>>> +
> >>>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> >>>> +{
> >>>> +       struct drm_crtc *crtc;
> >>>> +       struct drm_crtc_state *crtc_state;
> >>>> +       struct drm_encoder *encoder;
> >>>> +       u32 encoder_mask = 0;
> >>>> +       int i;
> >>>> +
> >>>> +       for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> >>>> +               encoder_mask |= crtc_state->encoder_mask;
> >>>> +               encoder_mask |= crtc->state->encoder_mask;
> >>>> +       }
> >>>> +
> >>>> +       drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> >>>> +               rockchip_drm_psr_inhibit_put(encoder);
> >>>> +}
> >>>> +EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
> >>>> +
> >>>>   /**
> >>>>    * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
> >>>>    * @encoder: encoder to obtain the PSR encoder
> >>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >>>> index 860c62494496..25350ba3237b 100644
> >>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
> >>>> @@ -20,6 +20,9 @@ void rockchip_drm_psr_flush_all(struct drm_device *dev);
> >>>>   int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
> >>>>   int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
> >>>>
> >>>> +void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
> >>>> +void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
> >>>> +
> >>>>   int rockchip_drm_psr_register(struct drm_encoder *encoder,
> >>>>                          int (*psr_set)(struct drm_encoder *, bool enable));
> >>>>   void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
> >>>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >>>> index fb70fb486fbf..176d6e8207ed 100644
> >>>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >>>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> >>>> @@ -15,6 +15,7 @@
> >>>>   #include <drm/drm.h>
> >>>>   #include <drm/drmP.h>
> >>>>   #include <drm/drm_atomic.h>
> >>>> +#include <drm/drm_atomic_uapi.h>
> >>>>   #include <drm/drm_crtc.h>
> >>>>   #include <drm/drm_crtc_helper.h>
> >>>>   #include <drm/drm_flip_work.h>
> >>>> @@ -819,10 +820,99 @@ static void vop_plane_atomic_update(struct drm_plane *plane,
> >>>>          spin_unlock(&vop->reg_lock);
> >>>>   }
> >>>>
> >>>> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
> >>>> +                                       struct drm_plane_state *state)
> >>>> +{
> >>>> +       struct vop_win *vop_win = to_vop_win(plane);
> >>>> +       const struct vop_win_data *win = vop_win->data;
> >>>> +       int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
> >>>> +                                       DRM_PLANE_HELPER_NO_SCALING;
> >>>> +       int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
> >>>> +                                       DRM_PLANE_HELPER_NO_SCALING;
> >>>> +       struct drm_crtc_state *crtc_state;
> >>>> +       int ret;
> >>>> +
> >>>> +       if (plane != state->crtc->cursor)
> >>>> +               return -EINVAL;
> >>>> +
> >>>> +       if (!plane->state)
> >>>> +               return -EINVAL;
> >>>> +
> >>>> +       if (!plane->state->fb)
> >>>> +               return -EINVAL;
> >>>> +
> >>>> +       if (state->state)
> >>>> +               crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> >>>> +                                                               state->crtc);
> >>>> +       else /* Special case for asynchronous cursor updates. */
> >>>> +               crtc_state = plane->crtc->state;
> >>>> +
> >>>> +       ret = drm_atomic_helper_check_plane_state(plane->state,
> >>>> +                                                 crtc_state,
> >>>> +                                                 min_scale, max_scale,
> >>>> +                                                 true, true);
> >>>> +       return ret;
> >>>> +}
> >>>> +
> >>>> +static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
> >>>> +                                        struct drm_crtc_state *old_crtc_state)
> >>>> +{
> >>>> +       struct drm_atomic_state *old_state = old_crtc_state->state;
> >>>> +       struct drm_plane_state *old_plane_state, *new_plane_state;
> >>>> +       struct vop *vop = to_vop(crtc);
> >>>> +       struct drm_plane *plane;
> >>>> +       int i;
> >>>> +
> >>>> +       for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
> >>>> +                                      new_plane_state, i) {
> >>> Hmm, from what I can see, we're not going through the full atomic
> >>> commit sequence, with state flip, so I'm not sure where we would get
> >>> the new state here from.
> >>>
> >>>> +               if (!old_plane_state->fb)
> >>>> +                       continue;
> >>>> +
> >>>> +               if (old_plane_state->fb == new_plane_state->fb)
> >>>> +                       continue;
> >>>> +
> >>>> +               drm_framebuffer_get(old_plane_state->fb);
> >>>> +               WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> >>>> +               drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
> >>>> +               set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> >>>> +       }
> >>>> +}
> >>>> +
> >>>> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
> >>>> +                                         struct drm_plane_state *new_state)
> >>>> +{
> >>>> +       struct vop *vop = to_vop(plane->state->crtc);
> >>>> +
> >>>> +       if (vop->crtc.state->state)
> >>>> +               vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);
> >>> Since we just operate on one plane here, we could just do like this:
> >>>
> >>> if (plane->state->fb && plane->state->fb != new_state->fb) {
> >>>                 drm_framebuffer_get(plane->state->fb);
> >>>                 WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> >>>                 drm_flip_work_queue(&vop->fb_unref_work, plane->state->fb);
> >>>                 set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
> >>> }
> >>>
> >>> However, we cannot simply to this here, because it races with the
> >>> vblank interrupt. We need to program the hw plane with the new fb
> >>> first and trigger the update. This needs all the careful handling that
> >>> is done in vop_crtc_atomic_flush() and so my original suggestion to
> >>> just call it.
> >> vop_crtc_atomic_flush() also updates the crtc->state->event, I don't
> >> think we want that.
> > Good point, we don't want to touch the event.
> >
> >> And actually I don't think we have this race condition, please see below
> >>
> > Just to clarify, the race conditions I'm talking here are not anything
> > new to this patch alone. I had actually observed those when
> > implementing the Chrome OS downstream async cursor code before:
> >
> > https://chromium.googlesource.com/chromiumos/third_party/kernel/+/65d4ff0af3f8c7ebecad8f3b402b546f277d9225/drivers/gpu/drm/rockchip/rockchip_drm_vop.c#1015
> >
> > Note that the code there actually duplicates the current plane state,
> > updates the copy, swaps both states and then update the hardware based
> > on both new and old states, as the regular atomic commit flow would
> > do. With that, no reference counts are dropped until the old state
> > gets destroyed at the end of the function. Also note the if block that
> > handles the condition of (plane_state->fb && plane_state->fb !=
> > plane->state->fb) (plane_state is the old state after the swap).
> >
> >>> Of course to call it in its current shape, one needs to have a full
> >>> atomic state from a commit, after a flip, but we only have the new
> >>> plane state here. Perhaps you could duplicate existing state, update
> >>> the desired plane state, flip and then call vop_crtc_atomic_flush()?
> >> Could you please clarify your proposal? You mean duplicating
> >> plane->state ? I'm trying to see how this would fit it in the code.
> >> The drm_atomic_state structure at plate->state->state is actually always
> >> NULL (as an async update is applied right away).
> >>
> >>
> >>> Best regards,
> >>> Tomasz
> >>>
> >>  From your comment in v2:
> >>
> >>> Isn't this going to drop the old fb reference on the floor without
> >>> waiting for the hardware to actually stop scanning out from it?
> >> I've been trying to analyze this better, I also got some help from
> >> Gustavo Padovan, and I think there is no problem here as the
> >> configuration we are doing here will just be taken into consideration by
> >> the hardware in the next vblank, so I guess we can set and re-set
> >> plane->fb as much as we want.
> > The point here is not about setting and resetting the plane->fb
> > pointer. It's about what happens inside drm_atomic_set_fb_for_plane().
> >
> > It calls drm_framebuffer_get() for the new fb and
> > drm_framebuffer_put() for the old fb. In result, if the fb changes,
> > the old fb, which had its reference count incremented in the atomic
> > commit that set it to the plane before, has its reference count
> > decremented. Moreover, if the new reference count becomes 0,
> > drm_framebuffer_put() will immediately free the buffer.
> >
> > Freeing a buffer when the hardware is still scanning out of it isn't a
> > good idea, is it?
>
> My understanding is that in the async update path we never touch the fb
> that is still being scanned out - what you are referring as old fb. That
> is the design of async update. drm_atomic_set_fb_for_plane() replace
> pointers and refs between the fb we just received from userspace with
> the one on plane->state (that after the atomic swap holds the new state)
> which won't be scanned out until the next flip.

My understanding of async_update is that it is expected to instantly
program the hardware with the new plane state.

You can see that an async update can only happen if
drm_atomic_helper_async_check() succeeds:
https://github.com/freedesktop/drm-misc/blob/drm-misc-fixes/drivers/gpu/drm/drm_atomic_helper.c#L927

In turn drm_atomic_helper_async_check() succeeds only if all the
conditions are met:
1) No CRTC in the new state needs a modeset.
2) There is only one plane state in the new state.
3) The plane is not moved between CRTCs.
4) The plane ops include .atomic_async_update().
5) There is no fence attached to the new state.
6) Current plane state is already committed to hardware, i.e. the
hardware started scanning out according to that state.
(https://github.com/freedesktop/drm-misc/blob/drm-misc-fixes/drivers/gpu/drm/drm_atomic_helper.c#L1545)

Given that, the current plane state that the .atomic_async_update()
sees is the state that is currently used by the hardware. That in turn
means that the hardware is currently scanning out from the fb pointed
out by that state and care must be taken so that the access to it is
not taken away from the hardware until it switches to scanning out the
new fb.

> However I don't
> understand this part of the rockchip hardware, the fb is not the one
> being scanned out. Is this still a problem?

Given the above, I think this question doesn't hold anymore, right?

Best regards,
Tomasz
Daniel Vetter Nov. 27, 2018, 7:56 a.m. UTC | #10
On Mon, Nov 26, 2018 at 10:41:02PM +0100, Boris Brezillon wrote:
> On Mon, 26 Nov 2018 12:36:03 -0800
> Eric Anholt <eric@anholt.net> wrote:
> 
> > Michael Zoran <mzoran@crowfest.net> writes:
> > 
> > > On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:  
> > >> 
> > >> The point here is not about setting and resetting the plane->fb
> > >> pointer. It's about what happens inside
> > >> drm_atomic_set_fb_for_plane().
> > >> 
> > >> It calls drm_framebuffer_get() for the new fb and
> > >> drm_framebuffer_put() for the old fb. In result, if the fb changes,
> > >> the old fb, which had its reference count incremented in the atomic
> > >> commit that set it to the plane before, has its reference count
> > >> decremented. Moreover, if the new reference count becomes 0,
> > >> drm_framebuffer_put() will immediately free the buffer.
> > >> 
> > >> Freeing a buffer when the hardware is still scanning out of it isn't
> > >> a
> > >> good idea, is it?  
> > >
> > > No, it's not.  But the board I submitted the patch for doesn't have
> > > anything like hot swapable ram.  The ram access is still going to work,
> > > just it might display something it shouldn't. Say for example if that
> > > frame buffer got reused by somethig else and filled with new data in
> > > the very small window.
> > >
> > > But yes, I agree the best solution would be to not release the buffer
> > > until the next vblank.
> > >
> > > Perhaps a good solution would be for the DRM api to have the concept of
> > > a deferred release?  Meaning if the put() call just added the frame
> > > buffer to a list that DRM core could walk during the vblank.  That
> > > might be better then every single driver trying to work up a custom
> > > solution.
> > >  
> > >> The vc4 driver seems to be able to program the hardware to switch the
> > >> scanout to the new buffer immediately:
> > >> 
> > >> https://elixir.bootlin.com/linux/v4.20-rc3/source/drivers/gpu/drm/vc4/vc4_plane.c#L794
> > >> 
> > >> Although I wonder if there isn't still a tiny race there - the
> > >> hardware may have just started refilling the FIFO from the old
> > >> address. Still, if the FIFO is small, the FIFO refill operation may
> > >> be
> > >> much shorter than it takes for the kernel code to actually free the
> > >> buffer. Eric and Michael, could you confirm?
> > >>   
> > >
> > > I don't have those boards anymore, and I don't have access to any
> > > technical documentation on the GPU so I can't really add much here.  
> > > Eric can probably provide the best information.  
> > 
> > I don't think I understood my scanout hardware well enough when I
> > started on the async update stuff for rpi.  vc4 probably needs to wait
> > until the HW starts scanning out a new line before letting the old BO
> > get freed.
> 
> That's also my understanding. Note that even if the BO is freed before
> the HVS has finished generating a line it shouldn't crash, because
> accesses are done through DMA. Might be a security issue though if we
> start re-using the memory for something else while it's still being
> accessed.
> 
> The solution would be to wait for the EOL (End Of Line) interrupt in
> the async update path, but I'm not sure this is allowed.

For subsuming async page_flip into the async atomic commit stuff we need
completion events anyway. Firing those from the EOL (or some other
hw-dependent interrupt, could also be the next vblank) should be hard to
arrange for drivers. And the heleprs could then easily offload the
cleanup_planes work to a worker and delay it until after the event has
fired. Like it already does for normal atomic commits.

But yeah right now that's all missing. The biggest issue here is figuring
out what these events should look like, and how generic userspace needs to
use them.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
index ea18cb2a76c0..08bec50d9c5d 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
@@ -127,42 +127,6 @@  rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
 	return ERR_PTR(ret);
 }
 
-static void
-rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
-{
-	struct drm_crtc *crtc;
-	struct drm_crtc_state *crtc_state;
-	struct drm_encoder *encoder;
-	u32 encoder_mask = 0;
-	int i;
-
-	for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
-		encoder_mask |= crtc_state->encoder_mask;
-		encoder_mask |= crtc->state->encoder_mask;
-	}
-
-	drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
-		rockchip_drm_psr_inhibit_get(encoder);
-}
-
-static void
-rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
-{
-	struct drm_crtc *crtc;
-	struct drm_crtc_state *crtc_state;
-	struct drm_encoder *encoder;
-	u32 encoder_mask = 0;
-	int i;
-
-	for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
-		encoder_mask |= crtc_state->encoder_mask;
-		encoder_mask |= crtc->state->encoder_mask;
-	}
-
-	drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
-		rockchip_drm_psr_inhibit_put(encoder);
-}
-
 static void
 rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
 {
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
index 01ff3c858875..22a70ab6e214 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
@@ -13,6 +13,7 @@ 
  */
 
 #include <drm/drmP.h>
+#include <drm/drm_atomic.h>
 #include <drm/drm_crtc_helper.h>
 
 #include "rockchip_drm_drv.h"
@@ -109,6 +110,42 @@  int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder)
 }
 EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put);
 
+void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
+{
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *crtc_state;
+	struct drm_encoder *encoder;
+	u32 encoder_mask = 0;
+	int i;
+
+	for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
+		encoder_mask |= crtc_state->encoder_mask;
+		encoder_mask |= crtc->state->encoder_mask;
+	}
+
+	drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
+		rockchip_drm_psr_inhibit_get(encoder);
+}
+EXPORT_SYMBOL(rockchip_drm_psr_inhibit_get_state);
+
+void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
+{
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *crtc_state;
+	struct drm_encoder *encoder;
+	u32 encoder_mask = 0;
+	int i;
+
+	for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
+		encoder_mask |= crtc_state->encoder_mask;
+		encoder_mask |= crtc->state->encoder_mask;
+	}
+
+	drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
+		rockchip_drm_psr_inhibit_put(encoder);
+}
+EXPORT_SYMBOL(rockchip_drm_psr_inhibit_put_state);
+
 /**
  * rockchip_drm_psr_inhibit_get - acquire PSR inhibit on given encoder
  * @encoder: encoder to obtain the PSR encoder
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
index 860c62494496..25350ba3237b 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.h
@@ -20,6 +20,9 @@  void rockchip_drm_psr_flush_all(struct drm_device *dev);
 int rockchip_drm_psr_inhibit_put(struct drm_encoder *encoder);
 int rockchip_drm_psr_inhibit_get(struct drm_encoder *encoder);
 
+void rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state);
+void rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state);
+
 int rockchip_drm_psr_register(struct drm_encoder *encoder,
 			int (*psr_set)(struct drm_encoder *, bool enable));
 void rockchip_drm_psr_unregister(struct drm_encoder *encoder);
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fb70fb486fbf..176d6e8207ed 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -15,6 +15,7 @@ 
 #include <drm/drm.h>
 #include <drm/drmP.h>
 #include <drm/drm_atomic.h>
+#include <drm/drm_atomic_uapi.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_flip_work.h>
@@ -819,10 +820,99 @@  static void vop_plane_atomic_update(struct drm_plane *plane,
 	spin_unlock(&vop->reg_lock);
 }
 
+static int vop_plane_atomic_async_check(struct drm_plane *plane,
+					struct drm_plane_state *state)
+{
+	struct vop_win *vop_win = to_vop_win(plane);
+	const struct vop_win_data *win = vop_win->data;
+	int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
+					DRM_PLANE_HELPER_NO_SCALING;
+	int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
+					DRM_PLANE_HELPER_NO_SCALING;
+	struct drm_crtc_state *crtc_state;
+	int ret;
+
+	if (plane != state->crtc->cursor)
+		return -EINVAL;
+
+	if (!plane->state)
+		return -EINVAL;
+
+	if (!plane->state->fb)
+		return -EINVAL;
+
+	if (state->state)
+		crtc_state = drm_atomic_get_existing_crtc_state(state->state,
+								state->crtc);
+	else /* Special case for asynchronous cursor updates. */
+		crtc_state = plane->crtc->state;
+
+	ret = drm_atomic_helper_check_plane_state(plane->state,
+						  crtc_state,
+						  min_scale, max_scale,
+						  true, true);
+	return ret;
+}
+
+static void vop_crtc_atomic_commit_flush(struct drm_crtc *crtc,
+					 struct drm_crtc_state *old_crtc_state)
+{
+	struct drm_atomic_state *old_state = old_crtc_state->state;
+	struct drm_plane_state *old_plane_state, *new_plane_state;
+	struct vop *vop = to_vop(crtc);
+	struct drm_plane *plane;
+	int i;
+
+	for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
+				       new_plane_state, i) {
+		if (!old_plane_state->fb)
+			continue;
+
+		if (old_plane_state->fb == new_plane_state->fb)
+			continue;
+
+		drm_framebuffer_get(old_plane_state->fb);
+		WARN_ON(drm_crtc_vblank_get(crtc) != 0);
+		drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
+		set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
+	}
+}
+
+static void vop_plane_atomic_async_update(struct drm_plane *plane,
+					  struct drm_plane_state *new_state)
+{
+	struct vop *vop = to_vop(plane->state->crtc);
+
+	if (vop->crtc.state->state)
+		vop_crtc_atomic_commit_flush(&vop->crtc, vop->crtc.state);
+
+	plane->state->crtc_x = new_state->crtc_x;
+	plane->state->crtc_y = new_state->crtc_y;
+	plane->state->crtc_h = new_state->crtc_h;
+	plane->state->crtc_w = new_state->crtc_w;
+	plane->state->src_x = new_state->src_x;
+	plane->state->src_y = new_state->src_y;
+	plane->state->src_h = new_state->src_h;
+	plane->state->src_w = new_state->src_w;
+
+	drm_atomic_set_fb_for_plane(plane->state, new_state->fb);
+
+	if (vop->is_enabled) {
+		rockchip_drm_psr_inhibit_get_state(new_state->state);
+		vop_plane_atomic_update(plane, plane->state);
+		spin_lock(&vop->reg_lock);
+		vop_cfg_done(vop);
+		spin_unlock(&vop->reg_lock);
+		rockchip_drm_psr_inhibit_put_state(new_state->state);
+	}
+}
+
 static const struct drm_plane_helper_funcs plane_helper_funcs = {
 	.atomic_check = vop_plane_atomic_check,
 	.atomic_update = vop_plane_atomic_update,
 	.atomic_disable = vop_plane_atomic_disable,
+	.atomic_async_check = vop_plane_atomic_async_check,
+	.atomic_async_update = vop_plane_atomic_async_update,
 };
 
 static const struct drm_plane_funcs vop_plane_funcs = {
@@ -1010,11 +1100,7 @@  static void vop_wait_for_irq_handler(struct vop *vop)
 static void vop_crtc_atomic_flush(struct drm_crtc *crtc,
 				  struct drm_crtc_state *old_crtc_state)
 {
-	struct drm_atomic_state *old_state = old_crtc_state->state;
-	struct drm_plane_state *old_plane_state, *new_plane_state;
 	struct vop *vop = to_vop(crtc);
-	struct drm_plane *plane;
-	int i;
 
 	if (WARN_ON(!vop->is_enabled))
 		return;
@@ -1042,19 +1128,7 @@  static void vop_crtc_atomic_flush(struct drm_crtc *crtc,
 	}
 	spin_unlock_irq(&crtc->dev->event_lock);
 
-	for_each_oldnew_plane_in_state(old_state, plane, old_plane_state,
-				       new_plane_state, i) {
-		if (!old_plane_state->fb)
-			continue;
-
-		if (old_plane_state->fb == new_plane_state->fb)
-			continue;
-
-		drm_framebuffer_get(old_plane_state->fb);
-		WARN_ON(drm_crtc_vblank_get(crtc) != 0);
-		drm_flip_work_queue(&vop->fb_unref_work, old_plane_state->fb);
-		set_bit(VOP_PENDING_FB_UNREF, &vop->pending);
-	}
+	vop_crtc_atomic_commit_flush(crtc, old_crtc_state);
 }
 
 static const struct drm_crtc_helper_funcs vop_crtc_helper_funcs = {