diff mbox

[RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

Message ID 20180403224225.26776-1-robdclark@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Rob Clark April 3, 2018, 10:42 p.m. UTC
Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark <robdclark@gmail.com>
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

 drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
 drivers/gpu/drm/msm/msm_fb.c        |  1 +
 include/drm/drm_atomic_helper.h     |  4 +++
 include/drm/drm_plane.h             |  9 +++++
 5 files changed, 84 insertions(+), 1 deletion(-)

Comments

Daniel Vetter April 4, 2018, 6:58 a.m. UTC | #1
On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
> Add an atomic helper to implement dirtyfb support.  This is needed to
> support DSI command-mode panels with x11 userspace (ie. when we can't
> rely on pageflips to trigger a flush to the panel).
>
> To signal to the driver that the async atomic update needs to
> synchronize with fences, even though the fb didn't change, the
> drm_atomic_state::dirty flag is added.
>
> Signed-off-by: Rob Clark <robdclark@gmail.com>
> ---
> Background: there are a number of different folks working on getting
> upstream kernel working on various different phones/tablets with qcom
> SoC's.. many of them have command mode panels, so we kind of need a
> way to support the legacy dirtyfb ioctl for x11 support.
>
> I know there is work on a proprer non-legacy atomic property for
> userspace to communicate dirty-rect(s) to the kernel, so this can
> be improved from triggering a full-frame flush once that is in
> place.  But we kinda needa a stop-gap solution.
>
> I had considered an in-driver solution for this, but things get a
> bit tricky if userspace ands up combining dirtyfb ioctls with page-
> flips, because we need to synchronize setting various CTL.FLUSH bits
> with setting the CTL.START bit.  (ie. really all we need to do for
> cmd mode panels is bang CTL.START, but is this ends up racing with
> pageflips setting FLUSH bits, then bad things.)  The easiest soln
> is to wrap this up as an atomic commit and rely on the worker to
> serialize things.  Hence adding an atomic dirtyfb helper.
>
> I guess at least the helper, with some small addition to translate
> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> rect property solution.  Depending on how far off that is, a stop-
> gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

>
>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>  include/drm/drm_atomic_helper.h     |  4 +++
>  include/drm/drm_plane.h             |  9 +++++
>  5 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index c35654591c12..a578dc681b27 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>         if (state->fb)
>                 drm_framebuffer_get(state->fb);
>
> +       state->dirty = false;
>         state->fence = NULL;
>         state->commit = NULL;
>  }
> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>  }
>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>
> +/**
> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> + *
> + * A helper to implement drm_framebuffer_funcs::dirty
> + */
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> +                             struct drm_file *file_priv, unsigned flags,
> +                             unsigned color, struct drm_clip_rect *clips,
> +                             unsigned num_clips)
> +{
> +       struct drm_modeset_acquire_ctx ctx;
> +       struct drm_atomic_state *state;
> +       struct drm_plane *plane;
> +       int ret = 0;
> +
> +       /*
> +        * When called from ioctl, we are interruptable, but not when
> +        * called internally (ie. defio worker)
> +        */
> +       drm_modeset_acquire_init(&ctx,
> +               file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> +
> +       state = drm_atomic_state_alloc(fb->dev);
> +       if (!state) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +       state->acquire_ctx = &ctx;
> +
> +retry:
> +       drm_for_each_plane(plane, fb->dev) {
> +               struct drm_plane_state *plane_state;
> +
> +               if (plane->state->fb != fb)
> +                       continue;
> +
> +               plane_state = drm_atomic_get_plane_state(state, plane);
> +               if (IS_ERR(plane_state)) {
> +                       ret = PTR_ERR(plane_state);
> +                       goto out;
> +               }
> +
> +               plane_state->dirty = true;
> +       }
> +
> +       ret = drm_atomic_nonblocking_commit(state);
> +
> +out:
> +       if (ret == -EDEADLK) {
> +               drm_atomic_state_clear(state);
> +               ret = drm_modeset_backoff(&ctx);
> +               if (!ret)
> +                       goto retry;
> +       }
> +
> +       drm_atomic_state_put(state);
> +
> +       drm_modeset_drop_locks(&ctx);
> +       drm_modeset_acquire_fini(&ctx);
> +
> +       return ret;
> +
> +}
> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> +
>  /**
>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>   * @obj: CRTC object
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index bf5f8c39f34d..bb55a048e98b 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>          * Figure out what fence to wait for:
>          */
>         for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> -               if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> +               bool sync_fb = new_plane_state->fb &&
> +                       ((new_plane_state->fb != old_plane_state->fb) ||
> +                        new_plane_state->dirty);
> +               if (sync_fb) {
>                         struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>                         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>                         struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> index 0e0c87252ab0..a5d882a34a33 100644
> --- a/drivers/gpu/drm/msm/msm_fb.c
> +++ b/drivers/gpu/drm/msm/msm_fb.c
> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>         .create_handle = msm_framebuffer_create_handle,
>         .destroy = msm_framebuffer_destroy,
> +       .dirty = drm_atomic_helper_dirtyfb,
>  };
>
>  #ifdef CONFIG_DEBUG_FS
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 26aaba58d6ce..9b7a95c2643d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>                                        u16 *red, u16 *green, u16 *blue,
>                                        uint32_t size,
>                                        struct drm_modeset_acquire_ctx *ctx);
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> +                             struct drm_file *file_priv, unsigned flags,
> +                             unsigned color, struct drm_clip_rect *clips,
> +                             unsigned num_clips);
>  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>                                                      struct drm_private_state *state);
>
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index f7bf4a48b1c3..296fa22bda7a 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -76,6 +76,15 @@ struct drm_plane_state {
>          */
>         struct drm_framebuffer *fb;
>
> +       /**
> +        * @dirty:
> +        *
> +        * Flag that indicates the fb contents have changed even though the
> +        * fb has not.  This is mostly a stop-gap solution until we have
> +        * atomic dirty-rect(s) property.
> +        */
> +       bool dirty;
> +
>         /**
>          * @fence:
>          *
> --
> 2.14.3
>
Thomas Hellström (VMware) April 4, 2018, 8:22 a.m. UTC | #2
Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>> Add an atomic helper to implement dirtyfb support.  This is needed to
>> support DSI command-mode panels with x11 userspace (ie. when we can't
>> rely on pageflips to trigger a flush to the panel).
>>
>> To signal to the driver that the async atomic update needs to
>> synchronize with fences, even though the fb didn't change, the
>> drm_atomic_state::dirty flag is added.
>>
>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>> ---
>> Background: there are a number of different folks working on getting
>> upstream kernel working on various different phones/tablets with qcom
>> SoC's.. many of them have command mode panels, so we kind of need a
>> way to support the legacy dirtyfb ioctl for x11 support.
>>
>> I know there is work on a proprer non-legacy atomic property for
>> userspace to communicate dirty-rect(s) to the kernel, so this can
>> be improved from triggering a full-frame flush once that is in
>> place.  But we kinda needa a stop-gap solution.
>>
>> I had considered an in-driver solution for this, but things get a
>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> flips, because we need to synchronize setting various CTL.FLUSH bits
>> with setting the CTL.START bit.  (ie. really all we need to do for
>> cmd mode panels is bang CTL.START, but is this ends up racing with
>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>> is to wrap this up as an atomic commit and rely on the worker to
>> serialize things.  Hence adding an atomic dirtyfb helper.
>>
>> I guess at least the helper, with some small addition to translate
>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> rect property solution.  Depending on how far off that is, a stop-
>> gap solution could be useful.
> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> -Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty 
blob on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with 
front-buffer rendering and the dirty ioctl. In the page-flip context, 
the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict 
the damage region that can be fully ignored by the driver, new content 
is indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag 
like you do here, or provide a content_age state member. Since we clear 
the dirty flag on state copies, I guess that would be sufficient. The 
blob rectangles would then become a hint to restrict the damage region.

Another approach would be to have the presence of dirty rects without 
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for 
user-space apps to just set the dirty bit instead of providing the full 
damage region.

/Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 4, 2018, 8:43 a.m. UTC | #3
On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> Hi,
> 
> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
> > > Add an atomic helper to implement dirtyfb support.  This is needed to
> > > support DSI command-mode panels with x11 userspace (ie. when we can't
> > > rely on pageflips to trigger a flush to the panel).
> > > 
> > > To signal to the driver that the async atomic update needs to
> > > synchronize with fences, even though the fb didn't change, the
> > > drm_atomic_state::dirty flag is added.
> > > 
> > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > > ---
> > > Background: there are a number of different folks working on getting
> > > upstream kernel working on various different phones/tablets with qcom
> > > SoC's.. many of them have command mode panels, so we kind of need a
> > > way to support the legacy dirtyfb ioctl for x11 support.
> > > 
> > > I know there is work on a proprer non-legacy atomic property for
> > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > be improved from triggering a full-frame flush once that is in
> > > place.  But we kinda needa a stop-gap solution.
> > > 
> > > I had considered an in-driver solution for this, but things get a
> > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > flips, because we need to synchronize setting various CTL.FLUSH bits
> > > with setting the CTL.START bit.  (ie. really all we need to do for
> > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> > > is to wrap this up as an atomic commit and rely on the worker to
> > > serialize things.  Hence adding an atomic dirtyfb helper.
> > > 
> > > I guess at least the helper, with some small addition to translate
> > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > > rect property solution.  Depending on how far off that is, a stop-
> > > gap solution could be useful.
> > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> > -Daniel
> 
> I've asked Deepak to RFC the core changes suggested for the full dirty blob
> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> get to the desired coordinates.
> 
> One thing to perhaps discuss is how we would like to fit this with
> front-buffer rendering and the dirty ioctl. In the page-flip context, the
> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> damage region that can be fully ignored by the driver, new content is
> indicated by a new framebuffer.
> 
> We could do the same for frontbuffer rendering: Either set a dirty flag like
> you do here, or provide a content_age state member. Since we clear the dirty
> flag on state copies, I guess that would be sufficient. The blob rectangles
> would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.

> Another approach would be to have the presence of dirty rects without
> framebuffer change to indicate frontbuffer rendering.
> 
> I think I like the first approach best, although it may be tempting for
> user-space apps to just set the dirty bit instead of providing the full
> damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel
Thomas Hellström (VMware) April 4, 2018, 9:10 a.m. UTC | #4
On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> Hi,
>>
>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>> rely on pageflips to trigger a flush to the panel).
>>>>
>>>> To signal to the driver that the async atomic update needs to
>>>> synchronize with fences, even though the fb didn't change, the
>>>> drm_atomic_state::dirty flag is added.
>>>>
>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>> ---
>>>> Background: there are a number of different folks working on getting
>>>> upstream kernel working on various different phones/tablets with qcom
>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>
>>>> I know there is work on a proprer non-legacy atomic property for
>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>> be improved from triggering a full-frame flush once that is in
>>>> place.  But we kinda needa a stop-gap solution.
>>>>
>>>> I had considered an in-driver solution for this, but things get a
>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>
>>>> I guess at least the helper, with some small addition to translate
>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>> rect property solution.  Depending on how far off that is, a stop-
>>>> gap solution could be useful.
>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>> -Daniel
>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> get to the desired coordinates.
>>
>> One thing to perhaps discuss is how we would like to fit this with
>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> damage region that can be fully ignored by the driver, new content is
>> indicated by a new framebuffer.
>>
>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>> you do here, or provide a content_age state member. Since we clear the dirty
>> flag on state copies, I guess that would be sufficient. The blob rectangles
>> would then become a hint to restrict the damage region.
> I'm not entirely following here - I thought for frontbuffer rendering the
> dirty rects have always just been a hint, and that the driver was always
> free to re-upload the entire buffer to the screen.
>
> And through a helper like Rob's proposing here (and have floated around in
> different versions already) we'd essentially map a frontbuffer dirtyfb
> call to a fake flip with dirty rect. Manual upload drivers already need to
> upload the entire screen if they get a flip, since some userspace uses
> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>
> So from that pov the new dirty flag is kinda not necessary imo.
>
>> Another approach would be to have the presence of dirty rects without
>> framebuffer change to indicate frontbuffer rendering.
>>
>> I think I like the first approach best, although it may be tempting for
>> user-space apps to just set the dirty bit instead of providing the full
>> damage region.
> Or I'm not following you here, because I don't quite see the difference
> between dirtyfb and a flip.
> -Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and 
the need for manual upload is indicated by a change in fb attached to 
the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content 
and the need for manual upload is identified by the dirty flag, (since 
the fb stays the same and the driver thus never identifies a page-flip).

In both these situations, clip rects can provide a hint to restrict the 
dirty region.

Now I was thinking about the preferred way for user-space to communicate 
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the 
same underlying buffer object.

Are you saying that people are already using 3) and we should keep using 
that?

/Thomas




--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 4, 2018, 9:56 a.m. UTC | #5
On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> > > Hi,
> > > 
> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
> > > > > Add an atomic helper to implement dirtyfb support.  This is needed to
> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
> > > > > rely on pageflips to trigger a flush to the panel).
> > > > > 
> > > > > To signal to the driver that the async atomic update needs to
> > > > > synchronize with fences, even though the fb didn't change, the
> > > > > drm_atomic_state::dirty flag is added.
> > > > > 
> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > > > > ---
> > > > > Background: there are a number of different folks working on getting
> > > > > upstream kernel working on various different phones/tablets with qcom
> > > > > SoC's.. many of them have command mode panels, so we kind of need a
> > > > > way to support the legacy dirtyfb ioctl for x11 support.
> > > > > 
> > > > > I know there is work on a proprer non-legacy atomic property for
> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > > > be improved from triggering a full-frame flush once that is in
> > > > > place.  But we kinda needa a stop-gap solution.
> > > > > 
> > > > > I had considered an in-driver solution for this, but things get a
> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
> > > > > with setting the CTL.START bit.  (ie. really all we need to do for
> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> > > > > is to wrap this up as an atomic commit and rely on the worker to
> > > > > serialize things.  Hence adding an atomic dirtyfb helper.
> > > > > 
> > > > > I guess at least the helper, with some small addition to translate
> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > > > > rect property solution.  Depending on how far off that is, a stop-
> > > > > gap solution could be useful.
> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> > > > -Daniel
> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> > > get to the desired coordinates.
> > > 
> > > One thing to perhaps discuss is how we would like to fit this with
> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> > > damage region that can be fully ignored by the driver, new content is
> > > indicated by a new framebuffer.
> > > 
> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
> > > you do here, or provide a content_age state member. Since we clear the dirty
> > > flag on state copies, I guess that would be sufficient. The blob rectangles
> > > would then become a hint to restrict the damage region.
> > I'm not entirely following here - I thought for frontbuffer rendering the
> > dirty rects have always just been a hint, and that the driver was always
> > free to re-upload the entire buffer to the screen.
> > 
> > And through a helper like Rob's proposing here (and have floated around in
> > different versions already) we'd essentially map a frontbuffer dirtyfb
> > call to a fake flip with dirty rect. Manual upload drivers already need to
> > upload the entire screen if they get a flip, since some userspace uses
> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> > 
> > So from that pov the new dirty flag is kinda not necessary imo.
> > 
> > > Another approach would be to have the presence of dirty rects without
> > > framebuffer change to indicate frontbuffer rendering.
> > > 
> > > I think I like the first approach best, although it may be tempting for
> > > user-space apps to just set the dirty bit instead of providing the full
> > > damage region.
> > Or I'm not following you here, because I don't quite see the difference
> > between dirtyfb and a flip.
> > -Daniel
> 
> OK, let me rephrase:
> 
> From the driver's point-of-view, in the atomic world, new content and the
> need for manual upload is indicated by a change in fb attached to the plane.
> 
> With Rob's patch here, (correct me if I'm wrong) in addition new content and
> the need for manual upload is identified by the dirty flag, (since the fb
> stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.

> In both these situations, clip rects can provide a hint to restrict the
> dirty region.
> 
> Now I was thinking about the preferred way for user-space to communicate
> front buffer rendering through the atomic ioctl:
> 
> 1) Expose a dirty (or content_age property)
> 2) Attach a clip-rect blob property.
> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
> underlying buffer object.
> 
> Are you saying that people are already using 3) and we should keep using
> that?

I'm saying they're using 3b), flip the same buffer wrapped in the same
drm_framebuffer, and expect it to work.

The only advantage dirtyfb has is that it allows you to supply the
optimized upload rectangles, but at the cost of a funny api (it's working
on the fb, not the plane/crtc you want to upload) and lack of drm_event to
confirm when exactly you uploaded your stuff. But imo they should be the
same underlying operation.

Also note that atomic helpers don't optimize out plane flips for same
buffers. We only optimize out some of the waiting, in a failed attempt at
making cursors stall less, but that's not fixed with the async plane
update stuff. And we can obviously optimize out the prepare/cleanup hooks,
because the buffer should be pinned already.

So imo for a quick fix I think we need:
1) Fix drivers (or at least msm) to upload buffers for manual upload on
all flips (well, all screen updates). Probably should do the fence waiting
unconditionally too (and handle cursors with the new async stuff).
2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
commit, which currently means we're throwing the dirty rect optimization
into the wind. But that could easily be added to the drm_plane_state,
without exposing it to userspace as a property just yet.

Cheers, Daniel
Daniel Vetter April 4, 2018, 10:03 a.m. UTC | #6
On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> Add an atomic helper to implement dirtyfb support.  This is needed to
> support DSI command-mode panels with x11 userspace (ie. when we can't
> rely on pageflips to trigger a flush to the panel).
> 
> To signal to the driver that the async atomic update needs to
> synchronize with fences, even though the fb didn't change, the
> drm_atomic_state::dirty flag is added.
> 
> Signed-off-by: Rob Clark <robdclark@gmail.com>
> ---
> Background: there are a number of different folks working on getting
> upstream kernel working on various different phones/tablets with qcom
> SoC's.. many of them have command mode panels, so we kind of need a
> way to support the legacy dirtyfb ioctl for x11 support.
> 
> I know there is work on a proprer non-legacy atomic property for
> userspace to communicate dirty-rect(s) to the kernel, so this can
> be improved from triggering a full-frame flush once that is in
> place.  But we kinda needa a stop-gap solution.
> 
> I had considered an in-driver solution for this, but things get a
> bit tricky if userspace ands up combining dirtyfb ioctls with page-
> flips, because we need to synchronize setting various CTL.FLUSH bits
> with setting the CTL.START bit.  (ie. really all we need to do for
> cmd mode panels is bang CTL.START, but is this ends up racing with
> pageflips setting FLUSH bits, then bad things.)  The easiest soln
> is to wrap this up as an atomic commit and rely on the worker to
> serialize things.  Hence adding an atomic dirtyfb helper.
> 
> I guess at least the helper, with some small addition to translate
> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> rect property solution.  Depending on how far off that is, a stop-
> gap solution could be useful.
> 
>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>  include/drm/drm_atomic_helper.h     |  4 +++
>  include/drm/drm_plane.h             |  9 +++++
>  5 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index c35654591c12..a578dc681b27 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>  	if (state->fb)
>  		drm_framebuffer_get(state->fb);
>  
> +	state->dirty = false;
>  	state->fence = NULL;
>  	state->commit = NULL;
>  }
> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>  }
>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>  
> +/**
> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> + *
> + * A helper to implement drm_framebuffer_funcs::dirty
> + */
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> +			      struct drm_file *file_priv, unsigned flags,
> +			      unsigned color, struct drm_clip_rect *clips,
> +			      unsigned num_clips)
> +{
> +	struct drm_modeset_acquire_ctx ctx;
> +	struct drm_atomic_state *state;
> +	struct drm_plane *plane;
> +	int ret = 0;
> +
> +	/*
> +	 * When called from ioctl, we are interruptable, but not when
> +	 * called internally (ie. defio worker)
> +	 */
> +	drm_modeset_acquire_init(&ctx,
> +		file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> +
> +	state = drm_atomic_state_alloc(fb->dev);
> +	if (!state) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	state->acquire_ctx = &ctx;
> +
> +retry:
> +	drm_for_each_plane(plane, fb->dev) {
> +		struct drm_plane_state *plane_state;
> +
> +		if (plane->state->fb != fb)
> +			continue;
> +
> +		plane_state = drm_atomic_get_plane_state(state, plane);
> +		if (IS_ERR(plane_state)) {
> +			ret = PTR_ERR(plane_state);
> +			goto out;
> +		}
> +
> +		plane_state->dirty = true;
> +	}
> +
> +	ret = drm_atomic_nonblocking_commit(state);
> +
> +out:
> +	if (ret == -EDEADLK) {
> +		drm_atomic_state_clear(state);
> +		ret = drm_modeset_backoff(&ctx);
> +		if (!ret)
> +			goto retry;
> +	}
> +
> +	drm_atomic_state_put(state);
> +
> +	drm_modeset_drop_locks(&ctx);
> +	drm_modeset_acquire_fini(&ctx);
> +
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> +
>  /**
>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>   * @obj: CRTC object
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index bf5f8c39f34d..bb55a048e98b 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>  	 * Figure out what fence to wait for:
>  	 */
>  	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> -		if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> +		bool sync_fb = new_plane_state->fb &&
> +			((new_plane_state->fb != old_plane_state->fb) ||
> +			 new_plane_state->dirty);

Why do you have this optimization even here? Imo flipping to the same fb
should result in the fb getting fully uploaded, whether you're doing a
legacy page_flip, and atomic one or just a plane update.

Iirc some userspace does use that as essentially a full-plane frontbuffer
rendering flush already. IOW I don't think we need your
plane_state->dirty, it's implied to always be true - why would userspace
do a flip otherwise?

The helper itself to map dirtyfb to a nonblocking atomic commit looks
reasonable, but misses a bunch of the trickery discussed with Noralf and
others I think.
-Daniel

> +		if (sync_fb) {
>  			struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>  			struct msm_gem_object *msm_obj = to_msm_bo(obj);
>  			struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> index 0e0c87252ab0..a5d882a34a33 100644
> --- a/drivers/gpu/drm/msm/msm_fb.c
> +++ b/drivers/gpu/drm/msm/msm_fb.c
> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>  	.create_handle = msm_framebuffer_create_handle,
>  	.destroy = msm_framebuffer_destroy,
> +	.dirty = drm_atomic_helper_dirtyfb,
>  };
>  
>  #ifdef CONFIG_DEBUG_FS
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 26aaba58d6ce..9b7a95c2643d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>  				       u16 *red, u16 *green, u16 *blue,
>  				       uint32_t size,
>  				       struct drm_modeset_acquire_ctx *ctx);
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> +			      struct drm_file *file_priv, unsigned flags,
> +			      unsigned color, struct drm_clip_rect *clips,
> +			      unsigned num_clips);
>  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>  						     struct drm_private_state *state);
>  
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index f7bf4a48b1c3..296fa22bda7a 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -76,6 +76,15 @@ struct drm_plane_state {
>  	 */
>  	struct drm_framebuffer *fb;
>  
> +	/**
> +	 * @dirty:
> +	 *
> +	 * Flag that indicates the fb contents have changed even though the
> +	 * fb has not.  This is mostly a stop-gap solution until we have
> +	 * atomic dirty-rect(s) property.
> +	 */
> +	bool dirty;
> +
>  	/**
>  	 * @fence:
>  	 *
> -- 
> 2.14.3
>
Daniel Vetter April 4, 2018, 10:21 a.m. UTC | #7
On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> > Add an atomic helper to implement dirtyfb support.  This is needed to
> > support DSI command-mode panels with x11 userspace (ie. when we can't
> > rely on pageflips to trigger a flush to the panel).
> > 
> > To signal to the driver that the async atomic update needs to
> > synchronize with fences, even though the fb didn't change, the
> > drm_atomic_state::dirty flag is added.
> > 
> > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > ---
> > Background: there are a number of different folks working on getting
> > upstream kernel working on various different phones/tablets with qcom
> > SoC's.. many of them have command mode panels, so we kind of need a
> > way to support the legacy dirtyfb ioctl for x11 support.
> > 
> > I know there is work on a proprer non-legacy atomic property for
> > userspace to communicate dirty-rect(s) to the kernel, so this can
> > be improved from triggering a full-frame flush once that is in
> > place.  But we kinda needa a stop-gap solution.
> > 
> > I had considered an in-driver solution for this, but things get a
> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > flips, because we need to synchronize setting various CTL.FLUSH bits
> > with setting the CTL.START bit.  (ie. really all we need to do for
> > cmd mode panels is bang CTL.START, but is this ends up racing with
> > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> > is to wrap this up as an atomic commit and rely on the worker to
> > serialize things.  Hence adding an atomic dirtyfb helper.
> > 
> > I guess at least the helper, with some small addition to translate
> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > rect property solution.  Depending on how far off that is, a stop-
> > gap solution could be useful.
> > 
> >  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
> >  drivers/gpu/drm/msm/msm_fb.c        |  1 +
> >  include/drm/drm_atomic_helper.h     |  4 +++
> >  include/drm/drm_plane.h             |  9 +++++
> >  5 files changed, 84 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index c35654591c12..a578dc681b27 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> >  	if (state->fb)
> >  		drm_framebuffer_get(state->fb);
> >  
> > +	state->dirty = false;
> >  	state->fence = NULL;
> >  	state->commit = NULL;
> >  }
> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >  }
> >  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
> >  
> > +/**
> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> > + *
> > + * A helper to implement drm_framebuffer_funcs::dirty
> > + */
> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> > +			      struct drm_file *file_priv, unsigned flags,
> > +			      unsigned color, struct drm_clip_rect *clips,
> > +			      unsigned num_clips)
> > +{
> > +	struct drm_modeset_acquire_ctx ctx;
> > +	struct drm_atomic_state *state;
> > +	struct drm_plane *plane;
> > +	int ret = 0;
> > +
> > +	/*
> > +	 * When called from ioctl, we are interruptable, but not when
> > +	 * called internally (ie. defio worker)
> > +	 */
> > +	drm_modeset_acquire_init(&ctx,
> > +		file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> > +
> > +	state = drm_atomic_state_alloc(fb->dev);
> > +	if (!state) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +	state->acquire_ctx = &ctx;
> > +
> > +retry:
> > +	drm_for_each_plane(plane, fb->dev) {
> > +		struct drm_plane_state *plane_state;
> > +
> > +		if (plane->state->fb != fb)
> > +			continue;
> > +
> > +		plane_state = drm_atomic_get_plane_state(state, plane);
> > +		if (IS_ERR(plane_state)) {
> > +			ret = PTR_ERR(plane_state);
> > +			goto out;
> > +		}
> > +
> > +		plane_state->dirty = true;
> > +	}
> > +
> > +	ret = drm_atomic_nonblocking_commit(state);
> > +
> > +out:
> > +	if (ret == -EDEADLK) {
> > +		drm_atomic_state_clear(state);
> > +		ret = drm_modeset_backoff(&ctx);
> > +		if (!ret)
> > +			goto retry;
> > +	}
> > +
> > +	drm_atomic_state_put(state);
> > +
> > +	drm_modeset_drop_locks(&ctx);
> > +	drm_modeset_acquire_fini(&ctx);
> > +
> > +	return ret;
> > +
> > +}
> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> > +
> >  /**
> >   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> >   * @obj: CRTC object
> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> > index bf5f8c39f34d..bb55a048e98b 100644
> > --- a/drivers/gpu/drm/msm/msm_atomic.c
> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> >  	 * Figure out what fence to wait for:
> >  	 */
> >  	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> > -		if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> > +		bool sync_fb = new_plane_state->fb &&
> > +			((new_plane_state->fb != old_plane_state->fb) ||
> > +			 new_plane_state->dirty);
> 
> Why do you have this optimization even here? Imo flipping to the same fb
> should result in the fb getting fully uploaded, whether you're doing a
> legacy page_flip, and atomic one or just a plane update.
> 
> Iirc some userspace does use that as essentially a full-plane frontbuffer
> rendering flush already. IOW I don't think we need your
> plane_state->dirty, it's implied to always be true - why would userspace
> do a flip otherwise?
> 
> The helper itself to map dirtyfb to a nonblocking atomic commit looks
> reasonable, but misses a bunch of the trickery discussed with Noralf and
> others I think.

Ok, I've done some history digging:

- i915 and nouveau unconditionally wait for fences, even for same-fb
  flips.
- no idea what amdgpu and vmwgfx are doing, they're not using
  plane_state->fence for implicit fences.
- most arm-soc drivers do have this "optimization" in their code, and it
  even managed to get into the new drm_gem_fb_prepare_fb helper (which I
  reviewed, or well claimed to have ... oops). Afaict it goes back to the
  original msm atomic code, and was then dutifully copypasted all over the
  place.

If folks are ok I'll do a patch series to align drivers with i915/nouveau.
Well, any driver using reservation_object_get_excl_rcu +
drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
no idea what they're doing or whether they might have the same bug.

From looking at at least the various prepare_fb callbacks I don't see any
other drivers doing funny stuff around implicit fences.
-Daniel

> > +		if (sync_fb) {
> >  			struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> >  			struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >  			struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> > index 0e0c87252ab0..a5d882a34a33 100644
> > --- a/drivers/gpu/drm/msm/msm_fb.c
> > +++ b/drivers/gpu/drm/msm/msm_fb.c
> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> >  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> >  	.create_handle = msm_framebuffer_create_handle,
> >  	.destroy = msm_framebuffer_destroy,
> > +	.dirty = drm_atomic_helper_dirtyfb,
> >  };
> >  
> >  #ifdef CONFIG_DEBUG_FS
> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> > index 26aaba58d6ce..9b7a95c2643d 100644
> > --- a/include/drm/drm_atomic_helper.h
> > +++ b/include/drm/drm_atomic_helper.h
> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >  				       u16 *red, u16 *green, u16 *blue,
> >  				       uint32_t size,
> >  				       struct drm_modeset_acquire_ctx *ctx);
> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> > +			      struct drm_file *file_priv, unsigned flags,
> > +			      unsigned color, struct drm_clip_rect *clips,
> > +			      unsigned num_clips);
> >  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> >  						     struct drm_private_state *state);
> >  
> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> > index f7bf4a48b1c3..296fa22bda7a 100644
> > --- a/include/drm/drm_plane.h
> > +++ b/include/drm/drm_plane.h
> > @@ -76,6 +76,15 @@ struct drm_plane_state {
> >  	 */
> >  	struct drm_framebuffer *fb;
> >  
> > +	/**
> > +	 * @dirty:
> > +	 *
> > +	 * Flag that indicates the fb contents have changed even though the
> > +	 * fb has not.  This is mostly a stop-gap solution until we have
> > +	 * atomic dirty-rect(s) property.
> > +	 */
> > +	bool dirty;
> > +
> >  	/**
> >  	 * @fence:
> >  	 *
> > -- 
> > 2.14.3
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Thomas Hellström (VMware) April 4, 2018, 10:28 a.m. UTC | #8
On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>> Hi,
>>>>
>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>>>> -Daniel
>>>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>>> get to the desired coordinates.
>>>>
>>>> One thing to perhaps discuss is how we would like to fit this with
>>>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>>> damage region that can be fully ignored by the driver, new content is
>>>> indicated by a new framebuffer.
>>>>
>>>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>>>> you do here, or provide a content_age state member. Since we clear the dirty
>>>> flag on state copies, I guess that would be sufficient. The blob rectangles
>>>> would then become a hint to restrict the damage region.
>>> I'm not entirely following here - I thought for frontbuffer rendering the
>>> dirty rects have always just been a hint, and that the driver was always
>>> free to re-upload the entire buffer to the screen.
>>>
>>> And through a helper like Rob's proposing here (and have floated around in
>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>> call to a fake flip with dirty rect. Manual upload drivers already need to
>>> upload the entire screen if they get a flip, since some userspace uses
>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>
>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>
>>>> Another approach would be to have the presence of dirty rects without
>>>> framebuffer change to indicate frontbuffer rendering.
>>>>
>>>> I think I like the first approach best, although it may be tempting for
>>>> user-space apps to just set the dirty bit instead of providing the full
>>>> damage region.
>>> Or I'm not following you here, because I don't quite see the difference
>>> between dirtyfb and a flip.
>>> -Daniel
>> OK, let me rephrase:
>>
>>  From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.
>
>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>
> Also note that atomic helpers don't optimize out plane flips for same
> buffers. We only optimize out some of the waiting, in a failed attempt at
> making cursors stall less, but that's not fixed with the async plane
> update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> because the buffer should be pinned already.
>

I'm a bit confused.

Apparently we have different opinions in when an uploading driver should 
assume altered plane content and the need for re-upload.
vmwgfx is from what I know currently assuming that this happens only on 
changed fb attachment (what we call a page-flip) whereas if I understand 
you correctly it should happen on each atomic state commit?

If we should assume the latter, then it has odd implications, let's say 
you have 8 screens up, and you pan one of them on the large fb, why 
would you upload the contents of the other 7?

Likewise for cursors,  why would you want to upload the cursor image on 
each cursor move?

So in my POW, option 1) is the option that aligns with the current 
vmwgfx implementation and from what I can tell, what Rob has implemented 
in his patch.

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maarten Lankhorst April 4, 2018, 10:36 a.m. UTC | #9
Op 04-04-18 om 12:21 schreef Daniel Vetter:
> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>> rely on pageflips to trigger a flush to the panel).
>>>
>>> To signal to the driver that the async atomic update needs to
>>> synchronize with fences, even though the fb didn't change, the
>>> drm_atomic_state::dirty flag is added.
>>>
>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>> ---
>>> Background: there are a number of different folks working on getting
>>> upstream kernel working on various different phones/tablets with qcom
>>> SoC's.. many of them have command mode panels, so we kind of need a
>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>
>>> I know there is work on a proprer non-legacy atomic property for
>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>> be improved from triggering a full-frame flush once that is in
>>> place.  But we kinda needa a stop-gap solution.
>>>
>>> I had considered an in-driver solution for this, but things get a
>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>> is to wrap this up as an atomic commit and rely on the worker to
>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>
>>> I guess at least the helper, with some small addition to translate
>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>> rect property solution.  Depending on how far off that is, a stop-
>>> gap solution could be useful.
>>>
>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>  include/drm/drm_plane.h             |  9 +++++
>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>> index c35654591c12..a578dc681b27 100644
>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>  	if (state->fb)
>>>  		drm_framebuffer_get(state->fb);
>>>  
>>> +	state->dirty = false;
>>>  	state->fence = NULL;
>>>  	state->commit = NULL;
>>>  }
>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>  }
>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>  
>>> +/**
>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>> + *
>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>> + */
>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>> +			      struct drm_file *file_priv, unsigned flags,
>>> +			      unsigned color, struct drm_clip_rect *clips,
>>> +			      unsigned num_clips)
>>> +{
>>> +	struct drm_modeset_acquire_ctx ctx;
>>> +	struct drm_atomic_state *state;
>>> +	struct drm_plane *plane;
>>> +	int ret = 0;
>>> +
>>> +	/*
>>> +	 * When called from ioctl, we are interruptable, but not when
>>> +	 * called internally (ie. defio worker)
>>> +	 */
>>> +	drm_modeset_acquire_init(&ctx,
>>> +		file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>> +
>>> +	state = drm_atomic_state_alloc(fb->dev);
>>> +	if (!state) {
>>> +		ret = -ENOMEM;
>>> +		goto out;
>>> +	}
>>> +	state->acquire_ctx = &ctx;
>>> +
>>> +retry:
>>> +	drm_for_each_plane(plane, fb->dev) {
>>> +		struct drm_plane_state *plane_state;
>>> +
>>> +		if (plane->state->fb != fb)
>>> +			continue;
>>> +
>>> +		plane_state = drm_atomic_get_plane_state(state, plane);
>>> +		if (IS_ERR(plane_state)) {
>>> +			ret = PTR_ERR(plane_state);
>>> +			goto out;
>>> +		}
>>> +
>>> +		plane_state->dirty = true;
>>> +	}
>>> +
>>> +	ret = drm_atomic_nonblocking_commit(state);
>>> +
>>> +out:
>>> +	if (ret == -EDEADLK) {
>>> +		drm_atomic_state_clear(state);
>>> +		ret = drm_modeset_backoff(&ctx);
>>> +		if (!ret)
>>> +			goto retry;
>>> +	}
>>> +
>>> +	drm_atomic_state_put(state);
>>> +
>>> +	drm_modeset_drop_locks(&ctx);
>>> +	drm_modeset_acquire_fini(&ctx);
>>> +
>>> +	return ret;
>>> +
>>> +}
>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>> +
>>>  /**
>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>   * @obj: CRTC object
>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>> index bf5f8c39f34d..bb55a048e98b 100644
>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>  	 * Figure out what fence to wait for:
>>>  	 */
>>>  	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>> -		if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>> +		bool sync_fb = new_plane_state->fb &&
>>> +			((new_plane_state->fb != old_plane_state->fb) ||
>>> +			 new_plane_state->dirty);
>> Why do you have this optimization even here? Imo flipping to the same fb
>> should result in the fb getting fully uploaded, whether you're doing a
>> legacy page_flip, and atomic one or just a plane update.
>>
>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>> rendering flush already. IOW I don't think we need your
>> plane_state->dirty, it's implied to always be true - why would userspace
>> do a flip otherwise?
>>
>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>> others I think.
> Ok, I've done some history digging:
>
> - i915 and nouveau unconditionally wait for fences, even for same-fb
>   flips.
> - no idea what amdgpu and vmwgfx are doing, they're not using
>   plane_state->fence for implicit fences.
I thought plane_state->fence was used for explicit fences, so its use by drivers
would interfere with it? I don't think fencing would work on msm or vc4..
> - most arm-soc drivers do have this "optimization" in their code, and it
>   even managed to get into the new drm_gem_fb_prepare_fb helper (which I
>   reviewed, or well claimed to have ... oops). Afaict it goes back to the
>   original msm atomic code, and was then dutifully copypasted all over the
>   place.
>
> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> Well, any driver using reservation_object_get_excl_rcu +
> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> no idea what they're doing or whether they might have the same bug.
>
> From looking at at least the various prepare_fb callbacks I don't see any
> other drivers doing funny stuff around implicit fences.
> -Daniel
>
>>> +		if (sync_fb) {
>>>  			struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>>>  			struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>>  			struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>>> index 0e0c87252ab0..a5d882a34a33 100644
>>> --- a/drivers/gpu/drm/msm/msm_fb.c
>>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>>>  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>>>  	.create_handle = msm_framebuffer_create_handle,
>>>  	.destroy = msm_framebuffer_destroy,
>>> +	.dirty = drm_atomic_helper_dirtyfb,
>>>  };
>>>  
>>>  #ifdef CONFIG_DEBUG_FS
>>> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>>> index 26aaba58d6ce..9b7a95c2643d 100644
>>> --- a/include/drm/drm_atomic_helper.h
>>> +++ b/include/drm/drm_atomic_helper.h
>>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>  				       u16 *red, u16 *green, u16 *blue,
>>>  				       uint32_t size,
>>>  				       struct drm_modeset_acquire_ctx *ctx);
>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>> +			      struct drm_file *file_priv, unsigned flags,
>>> +			      unsigned color, struct drm_clip_rect *clips,
>>> +			      unsigned num_clips);
>>>  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>>>  						     struct drm_private_state *state);
>>>  
>>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>>> index f7bf4a48b1c3..296fa22bda7a 100644
>>> --- a/include/drm/drm_plane.h
>>> +++ b/include/drm/drm_plane.h
>>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>>>  	 */
>>>  	struct drm_framebuffer *fb;
>>>  
>>> +	/**
>>> +	 * @dirty:
>>> +	 *
>>> +	 * Flag that indicates the fb contents have changed even though the
>>> +	 * fb has not.  This is mostly a stop-gap solution until we have
>>> +	 * atomic dirty-rect(s) property.
>>> +	 */
>>> +	bool dirty;
>>> +
>>>  	/**
>>>  	 * @fence:
>>>  	 *
>>> -- 
>>> 2.14.3
>>>
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch


--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 11:35 a.m. UTC | #10
On Wed, Apr 4, 2018 at 6:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>> > Add an atomic helper to implement dirtyfb support.  This is needed to
>> > support DSI command-mode panels with x11 userspace (ie. when we can't
>> > rely on pageflips to trigger a flush to the panel).
>> >
>> > To signal to the driver that the async atomic update needs to
>> > synchronize with fences, even though the fb didn't change, the
>> > drm_atomic_state::dirty flag is added.
>> >
>> > Signed-off-by: Rob Clark <robdclark@gmail.com>
>> > ---
>> > Background: there are a number of different folks working on getting
>> > upstream kernel working on various different phones/tablets with qcom
>> > SoC's.. many of them have command mode panels, so we kind of need a
>> > way to support the legacy dirtyfb ioctl for x11 support.
>> >
>> > I know there is work on a proprer non-legacy atomic property for
>> > userspace to communicate dirty-rect(s) to the kernel, so this can
>> > be improved from triggering a full-frame flush once that is in
>> > place.  But we kinda needa a stop-gap solution.
>> >
>> > I had considered an in-driver solution for this, but things get a
>> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> > flips, because we need to synchronize setting various CTL.FLUSH bits
>> > with setting the CTL.START bit.  (ie. really all we need to do for
>> > cmd mode panels is bang CTL.START, but is this ends up racing with
>> > pageflips setting FLUSH bits, then bad things.)  The easiest soln
>> > is to wrap this up as an atomic commit and rely on the worker to
>> > serialize things.  Hence adding an atomic dirtyfb helper.
>> >
>> > I guess at least the helper, with some small addition to translate
>> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> > rect property solution.  Depending on how far off that is, a stop-
>> > gap solution could be useful.
>> >
>> >  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>> >  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>> >  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>> >  include/drm/drm_atomic_helper.h     |  4 +++
>> >  include/drm/drm_plane.h             |  9 +++++
>> >  5 files changed, 84 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>> > index c35654591c12..a578dc681b27 100644
>> > --- a/drivers/gpu/drm/drm_atomic_helper.c
>> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>> >     if (state->fb)
>> >             drm_framebuffer_get(state->fb);
>> >
>> > +   state->dirty = false;
>> >     state->fence = NULL;
>> >     state->commit = NULL;
>> >  }
>> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>> >  }
>> >  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>> >
>> > +/**
>> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>> > + *
>> > + * A helper to implement drm_framebuffer_funcs::dirty
>> > + */
>> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> > +                         struct drm_file *file_priv, unsigned flags,
>> > +                         unsigned color, struct drm_clip_rect *clips,
>> > +                         unsigned num_clips)
>> > +{
>> > +   struct drm_modeset_acquire_ctx ctx;
>> > +   struct drm_atomic_state *state;
>> > +   struct drm_plane *plane;
>> > +   int ret = 0;
>> > +
>> > +   /*
>> > +    * When called from ioctl, we are interruptable, but not when
>> > +    * called internally (ie. defio worker)
>> > +    */
>> > +   drm_modeset_acquire_init(&ctx,
>> > +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>> > +
>> > +   state = drm_atomic_state_alloc(fb->dev);
>> > +   if (!state) {
>> > +           ret = -ENOMEM;
>> > +           goto out;
>> > +   }
>> > +   state->acquire_ctx = &ctx;
>> > +
>> > +retry:
>> > +   drm_for_each_plane(plane, fb->dev) {
>> > +           struct drm_plane_state *plane_state;
>> > +
>> > +           if (plane->state->fb != fb)
>> > +                   continue;
>> > +
>> > +           plane_state = drm_atomic_get_plane_state(state, plane);
>> > +           if (IS_ERR(plane_state)) {
>> > +                   ret = PTR_ERR(plane_state);
>> > +                   goto out;
>> > +           }
>> > +
>> > +           plane_state->dirty = true;
>> > +   }
>> > +
>> > +   ret = drm_atomic_nonblocking_commit(state);
>> > +
>> > +out:
>> > +   if (ret == -EDEADLK) {
>> > +           drm_atomic_state_clear(state);
>> > +           ret = drm_modeset_backoff(&ctx);
>> > +           if (!ret)
>> > +                   goto retry;
>> > +   }
>> > +
>> > +   drm_atomic_state_put(state);
>> > +
>> > +   drm_modeset_drop_locks(&ctx);
>> > +   drm_modeset_acquire_fini(&ctx);
>> > +
>> > +   return ret;
>> > +
>> > +}
>> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>> > +
>> >  /**
>> >   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>> >   * @obj: CRTC object
>> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>> > index bf5f8c39f34d..bb55a048e98b 100644
>> > --- a/drivers/gpu/drm/msm/msm_atomic.c
>> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
>> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>> >      * Figure out what fence to wait for:
>> >      */
>> >     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>> > -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>> > +           bool sync_fb = new_plane_state->fb &&
>> > +                   ((new_plane_state->fb != old_plane_state->fb) ||
>> > +                    new_plane_state->dirty);
>>
>> Why do you have this optimization even here? Imo flipping to the same fb
>> should result in the fb getting fully uploaded, whether you're doing a
>> legacy page_flip, and atomic one or just a plane update.
>>
>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>> rendering flush already. IOW I don't think we need your
>> plane_state->dirty, it's implied to always be true - why would userspace
>> do a flip otherwise?
>>
>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>> others I think.
>
> Ok, I've done some history digging:
>
> - i915 and nouveau unconditionally wait for fences, even for same-fb
>   flips.
> - no idea what amdgpu and vmwgfx are doing, they're not using
>   plane_state->fence for implicit fences.
> - most arm-soc drivers do have this "optimization" in their code, and it
>   even managed to get into the new drm_gem_fb_prepare_fb helper (which I
>   reviewed, or well claimed to have ... oops). Afaict it goes back to the
>   original msm atomic code, and was then dutifully copypasted all over the
>   place.
>
> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> Well, any driver using reservation_object_get_excl_rcu +
> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> no idea what they're doing or whether they might have the same bug.

if you do a patch series, I think do it the other way around and align
to msm ;-)

I think otherwise we could end up stalling while x11 does front buffer
rendering for something unrelated (maybe I was hitting that with
cursor updates?)

BR,
-R

> From looking at at least the various prepare_fb callbacks I don't see any
> other drivers doing funny stuff around implicit fences.
> -Daniel
>
>> > +           if (sync_fb) {
>> >                     struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>> >                     struct msm_gem_object *msm_obj = to_msm_bo(obj);
>> >                     struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>> > index 0e0c87252ab0..a5d882a34a33 100644
>> > --- a/drivers/gpu/drm/msm/msm_fb.c
>> > +++ b/drivers/gpu/drm/msm/msm_fb.c
>> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>> >  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>> >     .create_handle = msm_framebuffer_create_handle,
>> >     .destroy = msm_framebuffer_destroy,
>> > +   .dirty = drm_atomic_helper_dirtyfb,
>> >  };
>> >
>> >  #ifdef CONFIG_DEBUG_FS
>> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>> > index 26aaba58d6ce..9b7a95c2643d 100644
>> > --- a/include/drm/drm_atomic_helper.h
>> > +++ b/include/drm/drm_atomic_helper.h
>> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>> >                                    u16 *red, u16 *green, u16 *blue,
>> >                                    uint32_t size,
>> >                                    struct drm_modeset_acquire_ctx *ctx);
>> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> > +                         struct drm_file *file_priv, unsigned flags,
>> > +                         unsigned color, struct drm_clip_rect *clips,
>> > +                         unsigned num_clips);
>> >  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>> >                                                  struct drm_private_state *state);
>> >
>> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>> > index f7bf4a48b1c3..296fa22bda7a 100644
>> > --- a/include/drm/drm_plane.h
>> > +++ b/include/drm/drm_plane.h
>> > @@ -76,6 +76,15 @@ struct drm_plane_state {
>> >      */
>> >     struct drm_framebuffer *fb;
>> >
>> > +   /**
>> > +    * @dirty:
>> > +    *
>> > +    * Flag that indicates the fb contents have changed even though the
>> > +    * fb has not.  This is mostly a stop-gap solution until we have
>> > +    * atomic dirty-rect(s) property.
>> > +    */
>> > +   bool dirty;
>> > +
>> >     /**
>> >      * @fence:
>> >      *
>> > --
>> > 2.14.3
>> >
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 11:37 a.m. UTC | #11
On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>> rely on pageflips to trigger a flush to the panel).
>>>>
>>>> To signal to the driver that the async atomic update needs to
>>>> synchronize with fences, even though the fb didn't change, the
>>>> drm_atomic_state::dirty flag is added.
>>>>
>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>> ---
>>>> Background: there are a number of different folks working on getting
>>>> upstream kernel working on various different phones/tablets with qcom
>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>
>>>> I know there is work on a proprer non-legacy atomic property for
>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>> be improved from triggering a full-frame flush once that is in
>>>> place.  But we kinda needa a stop-gap solution.
>>>>
>>>> I had considered an in-driver solution for this, but things get a
>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>
>>>> I guess at least the helper, with some small addition to translate
>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>> rect property solution.  Depending on how far off that is, a stop-
>>>> gap solution could be useful.
>>>>
>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>> index c35654591c12..a578dc681b27 100644
>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>     if (state->fb)
>>>>             drm_framebuffer_get(state->fb);
>>>>
>>>> +   state->dirty = false;
>>>>     state->fence = NULL;
>>>>     state->commit = NULL;
>>>>  }
>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>  }
>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>
>>>> +/**
>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>> + *
>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>> + */
>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>> +                         unsigned num_clips)
>>>> +{
>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>> +   struct drm_atomic_state *state;
>>>> +   struct drm_plane *plane;
>>>> +   int ret = 0;
>>>> +
>>>> +   /*
>>>> +    * When called from ioctl, we are interruptable, but not when
>>>> +    * called internally (ie. defio worker)
>>>> +    */
>>>> +   drm_modeset_acquire_init(&ctx,
>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>> +
>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>> +   if (!state) {
>>>> +           ret = -ENOMEM;
>>>> +           goto out;
>>>> +   }
>>>> +   state->acquire_ctx = &ctx;
>>>> +
>>>> +retry:
>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>> +           struct drm_plane_state *plane_state;
>>>> +
>>>> +           if (plane->state->fb != fb)
>>>> +                   continue;
>>>> +
>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>> +           if (IS_ERR(plane_state)) {
>>>> +                   ret = PTR_ERR(plane_state);
>>>> +                   goto out;
>>>> +           }
>>>> +
>>>> +           plane_state->dirty = true;
>>>> +   }
>>>> +
>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>> +
>>>> +out:
>>>> +   if (ret == -EDEADLK) {
>>>> +           drm_atomic_state_clear(state);
>>>> +           ret = drm_modeset_backoff(&ctx);
>>>> +           if (!ret)
>>>> +                   goto retry;
>>>> +   }
>>>> +
>>>> +   drm_atomic_state_put(state);
>>>> +
>>>> +   drm_modeset_drop_locks(&ctx);
>>>> +   drm_modeset_acquire_fini(&ctx);
>>>> +
>>>> +   return ret;
>>>> +
>>>> +}
>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>> +
>>>>  /**
>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>   * @obj: CRTC object
>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>      * Figure out what fence to wait for:
>>>>      */
>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>> +           bool sync_fb = new_plane_state->fb &&
>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>> +                    new_plane_state->dirty);
>>> Why do you have this optimization even here? Imo flipping to the same fb
>>> should result in the fb getting fully uploaded, whether you're doing a
>>> legacy page_flip, and atomic one or just a plane update.
>>>
>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>> rendering flush already. IOW I don't think we need your
>>> plane_state->dirty, it's implied to always be true - why would userspace
>>> do a flip otherwise?
>>>
>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>> others I think.
>> Ok, I've done some history digging:
>>
>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>   flips.
>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>   plane_state->fence for implicit fences.
> I thought plane_state->fence was used for explicit fences, so its use by drivers
> would interfere with it? I don't think fencing would work on msm or vc4..

for implicit fencing we fish out the implicit fence and stuff it in
plane_state->fence..

BR,
-R

>> - most arm-soc drivers do have this "optimization" in their code, and it
>>   even managed to get into the new drm_gem_fb_prepare_fb helper (which I
>>   reviewed, or well claimed to have ... oops). Afaict it goes back to the
>>   original msm atomic code, and was then dutifully copypasted all over the
>>   place.
>>
>> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
>> Well, any driver using reservation_object_get_excl_rcu +
>> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
>> no idea what they're doing or whether they might have the same bug.
>>
>> From looking at at least the various prepare_fb callbacks I don't see any
>> other drivers doing funny stuff around implicit fences.
>> -Daniel
>>
>>>> +           if (sync_fb) {
>>>>                     struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>>>>                     struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>>>                     struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>>>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>>>> index 0e0c87252ab0..a5d882a34a33 100644
>>>> --- a/drivers/gpu/drm/msm/msm_fb.c
>>>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>>>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>>>>  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>>>>     .create_handle = msm_framebuffer_create_handle,
>>>>     .destroy = msm_framebuffer_destroy,
>>>> +   .dirty = drm_atomic_helper_dirtyfb,
>>>>  };
>>>>
>>>>  #ifdef CONFIG_DEBUG_FS
>>>> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>>>> index 26aaba58d6ce..9b7a95c2643d 100644
>>>> --- a/include/drm/drm_atomic_helper.h
>>>> +++ b/include/drm/drm_atomic_helper.h
>>>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>                                    u16 *red, u16 *green, u16 *blue,
>>>>                                    uint32_t size,
>>>>                                    struct drm_modeset_acquire_ctx *ctx);
>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>> +                         unsigned num_clips);
>>>>  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>>>>                                                  struct drm_private_state *state);
>>>>
>>>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>>>> index f7bf4a48b1c3..296fa22bda7a 100644
>>>> --- a/include/drm/drm_plane.h
>>>> +++ b/include/drm/drm_plane.h
>>>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>>>>      */
>>>>     struct drm_framebuffer *fb;
>>>>
>>>> +   /**
>>>> +    * @dirty:
>>>> +    *
>>>> +    * Flag that indicates the fb contents have changed even though the
>>>> +    * fb has not.  This is mostly a stop-gap solution until we have
>>>> +    * atomic dirty-rect(s) property.
>>>> +    */
>>>> +   bool dirty;
>>>> +
>>>>     /**
>>>>      * @fence:
>>>>      *
>>>> --
>>>> 2.14.3
>>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 11:40 a.m. UTC | #12
On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> > > Hi,
>> > >
>> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>> > > > > Add an atomic helper to implement dirtyfb support.  This is needed to
>> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>> > > > > rely on pageflips to trigger a flush to the panel).
>> > > > >
>> > > > > To signal to the driver that the async atomic update needs to
>> > > > > synchronize with fences, even though the fb didn't change, the
>> > > > > drm_atomic_state::dirty flag is added.
>> > > > >
>> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
>> > > > > ---
>> > > > > Background: there are a number of different folks working on getting
>> > > > > upstream kernel working on various different phones/tablets with qcom
>> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>> > > > >
>> > > > > I know there is work on a proprer non-legacy atomic property for
>> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>> > > > > be improved from triggering a full-frame flush once that is in
>> > > > > place.  But we kinda needa a stop-gap solution.
>> > > > >
>> > > > > I had considered an in-driver solution for this, but things get a
>> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>> > > > > with setting the CTL.START bit.  (ie. really all we need to do for
>> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>> > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
>> > > > > is to wrap this up as an atomic commit and rely on the worker to
>> > > > > serialize things.  Hence adding an atomic dirtyfb helper.
>> > > > >
>> > > > > I guess at least the helper, with some small addition to translate
>> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> > > > > rect property solution.  Depending on how far off that is, a stop-
>> > > > > gap solution could be useful.
>> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>> > > > -Daniel
>> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> > > get to the desired coordinates.
>> > >
>> > > One thing to perhaps discuss is how we would like to fit this with
>> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> > > damage region that can be fully ignored by the driver, new content is
>> > > indicated by a new framebuffer.
>> > >
>> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>> > > you do here, or provide a content_age state member. Since we clear the dirty
>> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>> > > would then become a hint to restrict the damage region.
>> > I'm not entirely following here - I thought for frontbuffer rendering the
>> > dirty rects have always just been a hint, and that the driver was always
>> > free to re-upload the entire buffer to the screen.
>> >
>> > And through a helper like Rob's proposing here (and have floated around in
>> > different versions already) we'd essentially map a frontbuffer dirtyfb
>> > call to a fake flip with dirty rect. Manual upload drivers already need to
>> > upload the entire screen if they get a flip, since some userspace uses
>> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>> >
>> > So from that pov the new dirty flag is kinda not necessary imo.
>> >
>> > > Another approach would be to have the presence of dirty rects without
>> > > framebuffer change to indicate frontbuffer rendering.
>> > >
>> > > I think I like the first approach best, although it may be tempting for
>> > > user-space apps to just set the dirty bit instead of providing the full
>> > > damage region.
>> > Or I'm not following you here, because I don't quite see the difference
>> > between dirtyfb and a flip.
>> > -Daniel
>>
>> OK, let me rephrase:
>>
>> From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
>
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.

Userspace could do a pageflip, but (with buffer-age extension) only
re-render part of the frame.  So there is a use-case for full frame
pageflip with hints about what region(s) changed since previous frame.

(Of course userspace could screw that up and get different results on
"command" vs "video" style display.. in that case, it gets to keep
both pieces)

BR,
-R


>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
>
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>
> Also note that atomic helpers don't optimize out plane flips for same
> buffers. We only optimize out some of the waiting, in a failed attempt at
> making cursors stall less, but that's not fixed with the async plane
> update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> because the buffer should be pinned already.
>
> So imo for a quick fix I think we need:
> 1) Fix drivers (or at least msm) to upload buffers for manual upload on
> all flips (well, all screen updates). Probably should do the fence waiting
> unconditionally too (and handle cursors with the new async stuff).
> 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
> commit, which currently means we're throwing the dirty rect optimization
> into the wind. But that could easily be added to the drm_plane_state,
> without exposing it to userspace as a property just yet.
>
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Hellström (VMware) April 4, 2018, 11:46 a.m. UTC | #13
On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> 
>>>>>> wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support.  This is 
>>>>>>> needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we 
>>>>>>> can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on 
>>>>>>> getting
>>>>>>> upstream kernel working on various different phones/tablets with 
>>>>>>> qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH 
>>>>>>> bits
>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic 
>>>>>>> dirty-
>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers 
>>>>>> already somewhere.
>>>>>> -Daniel
>>>>> I've asked Deepak to RFC the core changes suggested for the full 
>>>>> dirty blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple 
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip 
>>>>> context, the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to 
>>>>> restrict the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty 
>>>>> flag like
>>>>> you do here, or provide a content_age state member. Since we clear 
>>>>> the dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob 
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>> I'm not entirely following here - I thought for frontbuffer 
>>>> rendering the
>>>> dirty rects have always just been a hint, and that the driver was 
>>>> always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated 
>>>> around in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already 
>>>> need to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be 
>>>>> tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the 
>>>>> full
>>>>> damage region.
>>>> Or I'm not following you here, because I don't quite see the 
>>>> difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>> OK, let me rephrase:
>>>
>>>  From the driver's point-of-view, in the atomic world, new content 
>>> and the
>>> need for manual upload is indicated by a change in fb attached to 
>>> the plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new 
>>> content and
>>> the need for manual upload is identified by the dirty flag, (since 
>>> the fb
>>> stays the same and the driver thus never identifies a page-flip).
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the 
>> same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to 
>>> communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to 
>>> the same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep 
>>> using
>>> that?
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's 
>> working
>> on the fb, not the plane/crtc you want to upload) and lack of 
>> drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>> Also note that atomic helpers don't optimize out plane flips for same
>> buffers. We only optimize out some of the waiting, in a failed 
>> attempt at
>> making cursors stall less, but that's not fixed with the async plane
>> update stuff. And we can obviously optimize out the prepare/cleanup 
>> hooks,
>> because the buffer should be pinned already.
>>
>
> I'm a bit confused.
>
> Apparently we have different opinions in when an uploading driver 
> should assume altered plane content and the need for re-upload.
> vmwgfx is from what I know currently assuming that this happens only 
> on changed fb attachment (what we call a page-flip) whereas if I 
> understand you correctly it should happen on each atomic state commit?
>
> If we should assume the latter, then it has odd implications, let's 
> say you have 8 screens up, and you pan one of them on the large fb, 
> why would you upload the contents of the other 7?
>
> Likewise for cursors,  why would you want to upload the cursor image 
> on each cursor move?
>
> So in my POW, option 1) is the option that aligns with the current 
> vmwgfx implementation and from what I can tell, what Rob has 
> implemented in his patch.
>

Hmm.

After doing some apparently well needed reading up on the code it looks 
like vmwgfx is actually doing a full upload on each plane state change,
only those planes that actually got changed are referenced in the 
update. So that takes care of the panning example, assuming user-space 
is smart enough to leave the unchanged planes / crtcs out of the update.

However, the cursor example still holds, and IMHO we should have a 
better way to define content change than plane state update...

/Thomas



> /Thomas
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maarten Lankhorst April 4, 2018, 11:49 a.m. UTC | #14
Op 04-04-18 om 13:37 schreef Rob Clark:
> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>
>>>>> To signal to the driver that the async atomic update needs to
>>>>> synchronize with fences, even though the fb didn't change, the
>>>>> drm_atomic_state::dirty flag is added.
>>>>>
>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>> ---
>>>>> Background: there are a number of different folks working on getting
>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>
>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>> be improved from triggering a full-frame flush once that is in
>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>
>>>>> I had considered an in-driver solution for this, but things get a
>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>
>>>>> I guess at least the helper, with some small addition to translate
>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>> gap solution could be useful.
>>>>>
>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>> index c35654591c12..a578dc681b27 100644
>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>     if (state->fb)
>>>>>             drm_framebuffer_get(state->fb);
>>>>>
>>>>> +   state->dirty = false;
>>>>>     state->fence = NULL;
>>>>>     state->commit = NULL;
>>>>>  }
>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>  }
>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>
>>>>> +/**
>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>> + *
>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>> + */
>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>> +                         unsigned num_clips)
>>>>> +{
>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>> +   struct drm_atomic_state *state;
>>>>> +   struct drm_plane *plane;
>>>>> +   int ret = 0;
>>>>> +
>>>>> +   /*
>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>> +    * called internally (ie. defio worker)
>>>>> +    */
>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>> +
>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>> +   if (!state) {
>>>>> +           ret = -ENOMEM;
>>>>> +           goto out;
>>>>> +   }
>>>>> +   state->acquire_ctx = &ctx;
>>>>> +
>>>>> +retry:
>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>> +           struct drm_plane_state *plane_state;
>>>>> +
>>>>> +           if (plane->state->fb != fb)
>>>>> +                   continue;
>>>>> +
>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>> +           if (IS_ERR(plane_state)) {
>>>>> +                   ret = PTR_ERR(plane_state);
>>>>> +                   goto out;
>>>>> +           }
>>>>> +
>>>>> +           plane_state->dirty = true;
>>>>> +   }
>>>>> +
>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>> +
>>>>> +out:
>>>>> +   if (ret == -EDEADLK) {
>>>>> +           drm_atomic_state_clear(state);
>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>> +           if (!ret)
>>>>> +                   goto retry;
>>>>> +   }
>>>>> +
>>>>> +   drm_atomic_state_put(state);
>>>>> +
>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>> +
>>>>> +   return ret;
>>>>> +
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>> +
>>>>>  /**
>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>   * @obj: CRTC object
>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>      * Figure out what fence to wait for:
>>>>>      */
>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>> +                    new_plane_state->dirty);
>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>> legacy page_flip, and atomic one or just a plane update.
>>>>
>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>> rendering flush already. IOW I don't think we need your
>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>> do a flip otherwise?
>>>>
>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>> others I think.
>>> Ok, I've done some history digging:
>>>
>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>   flips.
>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>   plane_state->fence for implicit fences.
>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>> would interfere with it? I don't think fencing would work on msm or vc4..
> for implicit fencing we fish out the implicit fence and stuff it in
> plane_state->fence..
What happens when userspace passes a fence fd to in_fence_fd?
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 12:05 p.m. UTC | #15
On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
> Op 04-04-18 om 13:37 schreef Rob Clark:
>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>> <maarten.lankhorst@linux.intel.com> wrote:
>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>>>
>>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>     if (state->fb)
>>>>>>             drm_framebuffer_get(state->fb);
>>>>>>
>>>>>> +   state->dirty = false;
>>>>>>     state->fence = NULL;
>>>>>>     state->commit = NULL;
>>>>>>  }
>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>  }
>>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>
>>>>>> +/**
>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>> + *
>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>> + */
>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>>> +                         unsigned num_clips)
>>>>>> +{
>>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>>> +   struct drm_atomic_state *state;
>>>>>> +   struct drm_plane *plane;
>>>>>> +   int ret = 0;
>>>>>> +
>>>>>> +   /*
>>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>>> +    * called internally (ie. defio worker)
>>>>>> +    */
>>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>> +
>>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>>> +   if (!state) {
>>>>>> +           ret = -ENOMEM;
>>>>>> +           goto out;
>>>>>> +   }
>>>>>> +   state->acquire_ctx = &ctx;
>>>>>> +
>>>>>> +retry:
>>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>>> +           struct drm_plane_state *plane_state;
>>>>>> +
>>>>>> +           if (plane->state->fb != fb)
>>>>>> +                   continue;
>>>>>> +
>>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>> +           if (IS_ERR(plane_state)) {
>>>>>> +                   ret = PTR_ERR(plane_state);
>>>>>> +                   goto out;
>>>>>> +           }
>>>>>> +
>>>>>> +           plane_state->dirty = true;
>>>>>> +   }
>>>>>> +
>>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>>> +
>>>>>> +out:
>>>>>> +   if (ret == -EDEADLK) {
>>>>>> +           drm_atomic_state_clear(state);
>>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>>> +           if (!ret)
>>>>>> +                   goto retry;
>>>>>> +   }
>>>>>> +
>>>>>> +   drm_atomic_state_put(state);
>>>>>> +
>>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>>> +
>>>>>> +   return ret;
>>>>>> +
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>> +
>>>>>>  /**
>>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>   * @obj: CRTC object
>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>      * Figure out what fence to wait for:
>>>>>>      */
>>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>> +                    new_plane_state->dirty);
>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>
>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>> rendering flush already. IOW I don't think we need your
>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>> do a flip otherwise?
>>>>>
>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>> others I think.
>>>> Ok, I've done some history digging:
>>>>
>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>   flips.
>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>   plane_state->fence for implicit fences.
>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>> would interfere with it? I don't think fencing would work on msm or vc4..
>> for implicit fencing we fish out the implicit fence and stuff it in
>> plane_state->fence..
> What happens when userspace passes a fence fd to in_fence_fd?

mixing implicit sync and explicit sync is undefined

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 4, 2018, 12:13 p.m. UTC | #16
On Wed, Apr 04, 2018 at 01:46:37PM +0200, Thomas Hellstrom wrote:
> On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:
> > On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> > > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> > > > On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> > > > > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > > > > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark
> > > > > > > <robdclark@gmail.com> wrote:
> > > > > > > > Add an atomic helper to implement dirtyfb
> > > > > > > > support.  This is needed to
> > > > > > > > support DSI command-mode panels with x11
> > > > > > > > userspace (ie. when we can't
> > > > > > > > rely on pageflips to trigger a flush to the panel).
> > > > > > > > 
> > > > > > > > To signal to the driver that the async atomic update needs to
> > > > > > > > synchronize with fences, even though the fb didn't change, the
> > > > > > > > drm_atomic_state::dirty flag is added.
> > > > > > > > 
> > > > > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> > > > > > > > ---
> > > > > > > > Background: there are a number of different
> > > > > > > > folks working on getting
> > > > > > > > upstream kernel working on various different
> > > > > > > > phones/tablets with qcom
> > > > > > > > SoC's.. many of them have command mode panels, so we kind of need a
> > > > > > > > way to support the legacy dirtyfb ioctl for x11 support.
> > > > > > > > 
> > > > > > > > I know there is work on a proprer non-legacy atomic property for
> > > > > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > > > > > > be improved from triggering a full-frame flush once that is in
> > > > > > > > place.  But we kinda needa a stop-gap solution.
> > > > > > > > 
> > > > > > > > I had considered an in-driver solution for this, but things get a
> > > > > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > > > > > > flips, because we need to synchronize setting
> > > > > > > > various CTL.FLUSH bits
> > > > > > > > with setting the CTL.START bit.  (ie. really all we need to do for
> > > > > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > > > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> > > > > > > > is to wrap this up as an atomic commit and rely on the worker to
> > > > > > > > serialize things.  Hence adding an atomic dirtyfb helper.
> > > > > > > > 
> > > > > > > > I guess at least the helper, with some small addition to translate
> > > > > > > > and pass-thru the dirty rect(s) is useful to the
> > > > > > > > final atomic dirty-
> > > > > > > > rect property solution.  Depending on how far off that is, a stop-
> > > > > > > > gap solution could be useful.
> > > > > > > Adding Noralf, who iirc already posted the full
> > > > > > > dirty helpers already somewhere.
> > > > > > > -Daniel
> > > > > > I've asked Deepak to RFC the core changes suggested for
> > > > > > the full dirty blob
> > > > > > on dri-devel. It builds on DisplayLink's suggestion,
> > > > > > with a simple helper to
> > > > > > get to the desired coordinates.
> > > > > > 
> > > > > > One thing to perhaps discuss is how we would like to fit this with
> > > > > > front-buffer rendering and the dirty ioctl. In the
> > > > > > page-flip context, the
> > > > > > dirty rects, like egl's swapbuffer_with_damage is a hint
> > > > > > to restrict the
> > > > > > damage region that can be fully ignored by the driver, new content is
> > > > > > indicated by a new framebuffer.
> > > > > > 
> > > > > > We could do the same for frontbuffer rendering: Either
> > > > > > set a dirty flag like
> > > > > > you do here, or provide a content_age state member.
> > > > > > Since we clear the dirty
> > > > > > flag on state copies, I guess that would be sufficient.
> > > > > > The blob rectangles
> > > > > > would then become a hint to restrict the damage region.
> > > > > I'm not entirely following here - I thought for frontbuffer
> > > > > rendering the
> > > > > dirty rects have always just been a hint, and that the
> > > > > driver was always
> > > > > free to re-upload the entire buffer to the screen.
> > > > > 
> > > > > And through a helper like Rob's proposing here (and have
> > > > > floated around in
> > > > > different versions already) we'd essentially map a frontbuffer dirtyfb
> > > > > call to a fake flip with dirty rect. Manual upload drivers
> > > > > already need to
> > > > > upload the entire screen if they get a flip, since some userspace uses
> > > > > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> > > > > 
> > > > > So from that pov the new dirty flag is kinda not necessary imo.
> > > > > 
> > > > > > Another approach would be to have the presence of dirty rects without
> > > > > > framebuffer change to indicate frontbuffer rendering.
> > > > > > 
> > > > > > I think I like the first approach best, although it may
> > > > > > be tempting for
> > > > > > user-space apps to just set the dirty bit instead of
> > > > > > providing the full
> > > > > > damage region.
> > > > > Or I'm not following you here, because I don't quite see the
> > > > > difference
> > > > > between dirtyfb and a flip.
> > > > > -Daniel
> > > > OK, let me rephrase:
> > > > 
> > > >  From the driver's point-of-view, in the atomic world, new
> > > > content and the
> > > > need for manual upload is indicated by a change in fb attached
> > > > to the plane.
> > > > 
> > > > With Rob's patch here, (correct me if I'm wrong) in addition new
> > > > content and
> > > > the need for manual upload is identified by the dirty flag,
> > > > (since the fb
> > > > stays the same and the driver thus never identifies a page-flip).
> > > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> > > (atomic or not) should result in the entire buffer getting uploaded. The
> > > dirty flag is kinda redundant, a flip with the same buffer works the
> > > same
> > > way as a dirtyfb with the entire buffer as the dirty rectangle.
> > > 
> > > > In both these situations, clip rects can provide a hint to restrict the
> > > > dirty region.
> > > > 
> > > > Now I was thinking about the preferred way for user-space to
> > > > communicate
> > > > front buffer rendering through the atomic ioctl:
> > > > 
> > > > 1) Expose a dirty (or content_age property)
> > > > 2) Attach a clip-rect blob property.
> > > > 3) Fake a page-flip by ping-ponging two frame-buffers pointing
> > > > to the same
> > > > underlying buffer object.
> > > > 
> > > > Are you saying that people are already using 3) and we should
> > > > keep using
> > > > that?
> > > I'm saying they're using 3b), flip the same buffer wrapped in the same
> > > drm_framebuffer, and expect it to work.
> > > 
> > > The only advantage dirtyfb has is that it allows you to supply the
> > > optimized upload rectangles, but at the cost of a funny api (it's
> > > working
> > > on the fb, not the plane/crtc you want to upload) and lack of
> > > drm_event to
> > > confirm when exactly you uploaded your stuff. But imo they should be the
> > > same underlying operation.
> > > 
> > > Also note that atomic helpers don't optimize out plane flips for same
> > > buffers. We only optimize out some of the waiting, in a failed
> > > attempt at
> > > making cursors stall less, but that's not fixed with the async plane
> > > update stuff. And we can obviously optimize out the prepare/cleanup
> > > hooks,
> > > because the buffer should be pinned already.
> > > 
> > 
> > I'm a bit confused.
> > 
> > Apparently we have different opinions in when an uploading driver should
> > assume altered plane content and the need for re-upload.
> > vmwgfx is from what I know currently assuming that this happens only on
> > changed fb attachment (what we call a page-flip) whereas if I understand
> > you correctly it should happen on each atomic state commit?
> > 
> > If we should assume the latter, then it has odd implications, let's say
> > you have 8 screens up, and you pan one of them on the large fb, why
> > would you upload the contents of the other 7?
> > 
> > Likewise for cursors,  why would you want to upload the cursor image on
> > each cursor move?
> > 
> > So in my POW, option 1) is the option that aligns with the current
> > vmwgfx implementation and from what I can tell, what Rob has implemented
> > in his patch.
> > 
> 
> Hmm.
> 
> After doing some apparently well needed reading up on the code it looks like
> vmwgfx is actually doing a full upload on each plane state change,
> only those planes that actually got changed are referenced in the update. So
> that takes care of the panning example, assuming user-space is smart enough
> to leave the unchanged planes / crtcs out of the update.
> 
> However, the cursor example still holds, and IMHO we should have a better
> way to define content change than plane state update...

We're definitely inconsistent here. I guess for manual upload drivers on
real hw panning isn't a note-worthy special case, since if you pan you
need to upload the entire damaged area on the screen anyway. But virtual
hw drivers like vmwgfx are different this way, since panning only means
you transmit the new position to the host driver, but don't have to upload
the entire thing.

And the legacy cursor ioctl seems to be the only thing with that
expectation (although I think funny experiments on i915 used gpu
rendering for that too).

Simply for robustness reasons I'd say that with maybe the exception of
cursors, we should assume that a flip must upload the entire buffer that's
getting flipped, since that's how it works on most drivers.

And then we can add the dirty rectangle stuff so that clever userspace and
clever drives can optimize this all again. And preferrably the cursor
trick would be implemented by having an empty dirty rect, instead of one
that spans the entire fb.
-Daniel

> 
> /Thomas
> 
> 
> 
> > /Thomas
> > 
> > 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter April 4, 2018, 12:16 p.m. UTC | #17
On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> >> > > Hi,
> >> > >
> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
> >> > > > > Add an atomic helper to implement dirtyfb support.  This is needed to
> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
> >> > > > > rely on pageflips to trigger a flush to the panel).
> >> > > > >
> >> > > > > To signal to the driver that the async atomic update needs to
> >> > > > > synchronize with fences, even though the fb didn't change, the
> >> > > > > drm_atomic_state::dirty flag is added.
> >> > > > >
> >> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
> >> > > > > ---
> >> > > > > Background: there are a number of different folks working on getting
> >> > > > > upstream kernel working on various different phones/tablets with qcom
> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
> >> > > > >
> >> > > > > I know there is work on a proprer non-legacy atomic property for
> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> >> > > > > be improved from triggering a full-frame flush once that is in
> >> > > > > place.  But we kinda needa a stop-gap solution.
> >> > > > >
> >> > > > > I had considered an in-driver solution for this, but things get a
> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
> >> > > > > with setting the CTL.START bit.  (ie. really all we need to do for
> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> >> > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
> >> > > > > serialize things.  Hence adding an atomic dirtyfb helper.
> >> > > > >
> >> > > > > I guess at least the helper, with some small addition to translate
> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> >> > > > > rect property solution.  Depending on how far off that is, a stop-
> >> > > > > gap solution could be useful.
> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> >> > > > -Daniel
> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> >> > > get to the desired coordinates.
> >> > >
> >> > > One thing to perhaps discuss is how we would like to fit this with
> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> >> > > damage region that can be fully ignored by the driver, new content is
> >> > > indicated by a new framebuffer.
> >> > >
> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
> >> > > you do here, or provide a content_age state member. Since we clear the dirty
> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
> >> > > would then become a hint to restrict the damage region.
> >> > I'm not entirely following here - I thought for frontbuffer rendering the
> >> > dirty rects have always just been a hint, and that the driver was always
> >> > free to re-upload the entire buffer to the screen.
> >> >
> >> > And through a helper like Rob's proposing here (and have floated around in
> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
> >> > upload the entire screen if they get a flip, since some userspace uses
> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> >> >
> >> > So from that pov the new dirty flag is kinda not necessary imo.
> >> >
> >> > > Another approach would be to have the presence of dirty rects without
> >> > > framebuffer change to indicate frontbuffer rendering.
> >> > >
> >> > > I think I like the first approach best, although it may be tempting for
> >> > > user-space apps to just set the dirty bit instead of providing the full
> >> > > damage region.
> >> > Or I'm not following you here, because I don't quite see the difference
> >> > between dirtyfb and a flip.
> >> > -Daniel
> >>
> >> OK, let me rephrase:
> >>
> >> From the driver's point-of-view, in the atomic world, new content and the
> >> need for manual upload is indicated by a change in fb attached to the plane.
> >>
> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
> >> the need for manual upload is identified by the dirty flag, (since the fb
> >> stays the same and the driver thus never identifies a page-flip).
> >
> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> > (atomic or not) should result in the entire buffer getting uploaded. The
> > dirty flag is kinda redundant, a flip with the same buffer works the same
> > way as a dirtyfb with the entire buffer as the dirty rectangle.
> 
> Userspace could do a pageflip, but (with buffer-age extension) only
> re-render part of the frame.  So there is a use-case for full frame
> pageflip with hints about what region(s) changed since previous frame.
> 
> (Of course userspace could screw that up and get different results on
> "command" vs "video" style display.. in that case, it gets to keep
> both pieces)

I'm not against dirty on flips with the same buffer as an optimization.
Disagreement here is around whether a flip to the same buffer means:
a) nothing changed, upload nothing
b) entire buffer might have changed

I'm proposing we standardize on b) (with maybe the exception of legacy
cursor ioctls because those are special) and add the dirty rects as an
option to optimize this more.

Either way you'd always have to follow the implicit fence (which is what
your msm hunk seems to be doing, but only for dirtyfb).
-Daniel

> 
> BR,
> -R
> 
> 
> >> In both these situations, clip rects can provide a hint to restrict the
> >> dirty region.
> >>
> >> Now I was thinking about the preferred way for user-space to communicate
> >> front buffer rendering through the atomic ioctl:
> >>
> >> 1) Expose a dirty (or content_age property)
> >> 2) Attach a clip-rect blob property.
> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
> >> underlying buffer object.
> >>
> >> Are you saying that people are already using 3) and we should keep using
> >> that?
> >
> > I'm saying they're using 3b), flip the same buffer wrapped in the same
> > drm_framebuffer, and expect it to work.
> >
> > The only advantage dirtyfb has is that it allows you to supply the
> > optimized upload rectangles, but at the cost of a funny api (it's working
> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> > confirm when exactly you uploaded your stuff. But imo they should be the
> > same underlying operation.
> >
> > Also note that atomic helpers don't optimize out plane flips for same
> > buffers. We only optimize out some of the waiting, in a failed attempt at
> > making cursors stall less, but that's not fixed with the async plane
> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> > because the buffer should be pinned already.
> >
> > So imo for a quick fix I think we need:
> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
> > all flips (well, all screen updates). Probably should do the fence waiting
> > unconditionally too (and handle cursors with the new async stuff).
> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
> > commit, which currently means we're throwing the dirty rect optimization
> > into the wind. But that could easily be added to the drm_plane_state,
> > without exposing it to userspace as a property just yet.
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter April 4, 2018, 12:18 p.m. UTC | #18
On Wed, Apr 04, 2018 at 07:35:58AM -0400, Rob Clark wrote:
> On Wed, Apr 4, 2018 at 6:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
> >> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> >> > Add an atomic helper to implement dirtyfb support.  This is needed to
> >> > support DSI command-mode panels with x11 userspace (ie. when we can't
> >> > rely on pageflips to trigger a flush to the panel).
> >> >
> >> > To signal to the driver that the async atomic update needs to
> >> > synchronize with fences, even though the fb didn't change, the
> >> > drm_atomic_state::dirty flag is added.
> >> >
> >> > Signed-off-by: Rob Clark <robdclark@gmail.com>
> >> > ---
> >> > Background: there are a number of different folks working on getting
> >> > upstream kernel working on various different phones/tablets with qcom
> >> > SoC's.. many of them have command mode panels, so we kind of need a
> >> > way to support the legacy dirtyfb ioctl for x11 support.
> >> >
> >> > I know there is work on a proprer non-legacy atomic property for
> >> > userspace to communicate dirty-rect(s) to the kernel, so this can
> >> > be improved from triggering a full-frame flush once that is in
> >> > place.  But we kinda needa a stop-gap solution.
> >> >
> >> > I had considered an in-driver solution for this, but things get a
> >> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> >> > flips, because we need to synchronize setting various CTL.FLUSH bits
> >> > with setting the CTL.START bit.  (ie. really all we need to do for
> >> > cmd mode panels is bang CTL.START, but is this ends up racing with
> >> > pageflips setting FLUSH bits, then bad things.)  The easiest soln
> >> > is to wrap this up as an atomic commit and rely on the worker to
> >> > serialize things.  Hence adding an atomic dirtyfb helper.
> >> >
> >> > I guess at least the helper, with some small addition to translate
> >> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> >> > rect property solution.  Depending on how far off that is, a stop-
> >> > gap solution could be useful.
> >> >
> >> >  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> >> >  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
> >> >  drivers/gpu/drm/msm/msm_fb.c        |  1 +
> >> >  include/drm/drm_atomic_helper.h     |  4 +++
> >> >  include/drm/drm_plane.h             |  9 +++++
> >> >  5 files changed, 84 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> >> > index c35654591c12..a578dc681b27 100644
> >> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> >> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> >> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> >> >     if (state->fb)
> >> >             drm_framebuffer_get(state->fb);
> >> >
> >> > +   state->dirty = false;
> >> >     state->fence = NULL;
> >> >     state->commit = NULL;
> >> >  }
> >> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >> >  }
> >> >  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
> >> >
> >> > +/**
> >> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> >> > + *
> >> > + * A helper to implement drm_framebuffer_funcs::dirty
> >> > + */
> >> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> >> > +                         struct drm_file *file_priv, unsigned flags,
> >> > +                         unsigned color, struct drm_clip_rect *clips,
> >> > +                         unsigned num_clips)
> >> > +{
> >> > +   struct drm_modeset_acquire_ctx ctx;
> >> > +   struct drm_atomic_state *state;
> >> > +   struct drm_plane *plane;
> >> > +   int ret = 0;
> >> > +
> >> > +   /*
> >> > +    * When called from ioctl, we are interruptable, but not when
> >> > +    * called internally (ie. defio worker)
> >> > +    */
> >> > +   drm_modeset_acquire_init(&ctx,
> >> > +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> >> > +
> >> > +   state = drm_atomic_state_alloc(fb->dev);
> >> > +   if (!state) {
> >> > +           ret = -ENOMEM;
> >> > +           goto out;
> >> > +   }
> >> > +   state->acquire_ctx = &ctx;
> >> > +
> >> > +retry:
> >> > +   drm_for_each_plane(plane, fb->dev) {
> >> > +           struct drm_plane_state *plane_state;
> >> > +
> >> > +           if (plane->state->fb != fb)
> >> > +                   continue;
> >> > +
> >> > +           plane_state = drm_atomic_get_plane_state(state, plane);
> >> > +           if (IS_ERR(plane_state)) {
> >> > +                   ret = PTR_ERR(plane_state);
> >> > +                   goto out;
> >> > +           }
> >> > +
> >> > +           plane_state->dirty = true;
> >> > +   }
> >> > +
> >> > +   ret = drm_atomic_nonblocking_commit(state);
> >> > +
> >> > +out:
> >> > +   if (ret == -EDEADLK) {
> >> > +           drm_atomic_state_clear(state);
> >> > +           ret = drm_modeset_backoff(&ctx);
> >> > +           if (!ret)
> >> > +                   goto retry;
> >> > +   }
> >> > +
> >> > +   drm_atomic_state_put(state);
> >> > +
> >> > +   drm_modeset_drop_locks(&ctx);
> >> > +   drm_modeset_acquire_fini(&ctx);
> >> > +
> >> > +   return ret;
> >> > +
> >> > +}
> >> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> >> > +
> >> >  /**
> >> >   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> >> >   * @obj: CRTC object
> >> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> >> > index bf5f8c39f34d..bb55a048e98b 100644
> >> > --- a/drivers/gpu/drm/msm/msm_atomic.c
> >> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
> >> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> >> >      * Figure out what fence to wait for:
> >> >      */
> >> >     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> >> > -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> >> > +           bool sync_fb = new_plane_state->fb &&
> >> > +                   ((new_plane_state->fb != old_plane_state->fb) ||
> >> > +                    new_plane_state->dirty);
> >>
> >> Why do you have this optimization even here? Imo flipping to the same fb
> >> should result in the fb getting fully uploaded, whether you're doing a
> >> legacy page_flip, and atomic one or just a plane update.
> >>
> >> Iirc some userspace does use that as essentially a full-plane frontbuffer
> >> rendering flush already. IOW I don't think we need your
> >> plane_state->dirty, it's implied to always be true - why would userspace
> >> do a flip otherwise?
> >>
> >> The helper itself to map dirtyfb to a nonblocking atomic commit looks
> >> reasonable, but misses a bunch of the trickery discussed with Noralf and
> >> others I think.
> >
> > Ok, I've done some history digging:
> >
> > - i915 and nouveau unconditionally wait for fences, even for same-fb
> >   flips.
> > - no idea what amdgpu and vmwgfx are doing, they're not using
> >   plane_state->fence for implicit fences.
> > - most arm-soc drivers do have this "optimization" in their code, and it
> >   even managed to get into the new drm_gem_fb_prepare_fb helper (which I
> >   reviewed, or well claimed to have ... oops). Afaict it goes back to the
> >   original msm atomic code, and was then dutifully copypasted all over the
> >   place.
> >
> > If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> > Well, any driver using reservation_object_get_excl_rcu +
> > drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> > no idea what they're doing or whether they might have the same bug.
> 
> if you do a patch series, I think do it the other way around and align
> to msm ;-)

The problem is that this would break i915 userspace afaik, which I think
is using a frontbuffer page flip to flush out uploads (and re-enable self
refresh and stuff like that). So that we can't do easily.

> I think otherwise we could end up stalling while x11 does front buffer
> rendering for something unrelated (maybe I was hitting that with
> cursor updates?)

Why would you stall on something unrelated if we unconditionally fish out
the fence? Especially cursor rendering isn't done through the gpu at all
afaik.
-Daniel

> 
> BR,
> -R
> 
> > From looking at at least the various prepare_fb callbacks I don't see any
> > other drivers doing funny stuff around implicit fences.
> > -Daniel
> >
> >> > +           if (sync_fb) {
> >> >                     struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> >> >                     struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >> >                     struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> >> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> >> > index 0e0c87252ab0..a5d882a34a33 100644
> >> > --- a/drivers/gpu/drm/msm/msm_fb.c
> >> > +++ b/drivers/gpu/drm/msm/msm_fb.c
> >> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> >> >  static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> >> >     .create_handle = msm_framebuffer_create_handle,
> >> >     .destroy = msm_framebuffer_destroy,
> >> > +   .dirty = drm_atomic_helper_dirtyfb,
> >> >  };
> >> >
> >> >  #ifdef CONFIG_DEBUG_FS
> >> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> >> > index 26aaba58d6ce..9b7a95c2643d 100644
> >> > --- a/include/drm/drm_atomic_helper.h
> >> > +++ b/include/drm/drm_atomic_helper.h
> >> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >> >                                    u16 *red, u16 *green, u16 *blue,
> >> >                                    uint32_t size,
> >> >                                    struct drm_modeset_acquire_ctx *ctx);
> >> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> >> > +                         struct drm_file *file_priv, unsigned flags,
> >> > +                         unsigned color, struct drm_clip_rect *clips,
> >> > +                         unsigned num_clips);
> >> >  void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> >> >                                                  struct drm_private_state *state);
> >> >
> >> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> >> > index f7bf4a48b1c3..296fa22bda7a 100644
> >> > --- a/include/drm/drm_plane.h
> >> > +++ b/include/drm/drm_plane.h
> >> > @@ -76,6 +76,15 @@ struct drm_plane_state {
> >> >      */
> >> >     struct drm_framebuffer *fb;
> >> >
> >> > +   /**
> >> > +    * @dirty:
> >> > +    *
> >> > +    * Flag that indicates the fb contents have changed even though the
> >> > +    * fb has not.  This is mostly a stop-gap solution until we have
> >> > +    * atomic dirty-rect(s) property.
> >> > +    */
> >> > +   bool dirty;
> >> > +
> >> >     /**
> >> >      * @fence:
> >> >      *
> >> > --
> >> > 2.14.3
> >> >
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Maarten Lankhorst April 4, 2018, 12:23 p.m. UTC | #19
Op 04-04-18 om 14:05 schreef Rob Clark:
> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>> <maarten.lankhorst@linux.intel.com> wrote:
>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>>
>>>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>     if (state->fb)
>>>>>>>             drm_framebuffer_get(state->fb);
>>>>>>>
>>>>>>> +   state->dirty = false;
>>>>>>>     state->fence = NULL;
>>>>>>>     state->commit = NULL;
>>>>>>>  }
>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>> + *
>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>> + */
>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>>>> +                         unsigned num_clips)
>>>>>>> +{
>>>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>>>> +   struct drm_atomic_state *state;
>>>>>>> +   struct drm_plane *plane;
>>>>>>> +   int ret = 0;
>>>>>>> +
>>>>>>> +   /*
>>>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>>>> +    * called internally (ie. defio worker)
>>>>>>> +    */
>>>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>> +
>>>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>>>> +   if (!state) {
>>>>>>> +           ret = -ENOMEM;
>>>>>>> +           goto out;
>>>>>>> +   }
>>>>>>> +   state->acquire_ctx = &ctx;
>>>>>>> +
>>>>>>> +retry:
>>>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>>>> +           struct drm_plane_state *plane_state;
>>>>>>> +
>>>>>>> +           if (plane->state->fb != fb)
>>>>>>> +                   continue;
>>>>>>> +
>>>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>> +           if (IS_ERR(plane_state)) {
>>>>>>> +                   ret = PTR_ERR(plane_state);
>>>>>>> +                   goto out;
>>>>>>> +           }
>>>>>>> +
>>>>>>> +           plane_state->dirty = true;
>>>>>>> +   }
>>>>>>> +
>>>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +   if (ret == -EDEADLK) {
>>>>>>> +           drm_atomic_state_clear(state);
>>>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>>>> +           if (!ret)
>>>>>>> +                   goto retry;
>>>>>>> +   }
>>>>>>> +
>>>>>>> +   drm_atomic_state_put(state);
>>>>>>> +
>>>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>>>> +
>>>>>>> +   return ret;
>>>>>>> +
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>> +
>>>>>>>  /**
>>>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>   * @obj: CRTC object
>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>      * Figure out what fence to wait for:
>>>>>>>      */
>>>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>> +                    new_plane_state->dirty);
>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>
>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>> rendering flush already. IOW I don't think we need your
>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>> do a flip otherwise?
>>>>>>
>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>> others I think.
>>>>> Ok, I've done some history digging:
>>>>>
>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>   flips.
>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>   plane_state->fence for implicit fences.
>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>> for implicit fencing we fish out the implicit fence and stuff it in
>>> plane_state->fence..
>> What happens when userspace passes a fence fd to in_fence_fd?
> mixing implicit sync and explicit sync is undefined
So a drivers implicit sync may override the explicit sync? Would make more sense if the fences were merged.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 4, 2018, 12:26 p.m. UTC | #20
On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>> <maarten.lankhorst@linux.intel.com> wrote:
>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>>
>>>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>     if (state->fb)
>>>>>>>             drm_framebuffer_get(state->fb);
>>>>>>>
>>>>>>> +   state->dirty = false;
>>>>>>>     state->fence = NULL;
>>>>>>>     state->commit = NULL;
>>>>>>>  }
>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>> + *
>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>> + */
>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>>>> +                         unsigned num_clips)
>>>>>>> +{
>>>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>>>> +   struct drm_atomic_state *state;
>>>>>>> +   struct drm_plane *plane;
>>>>>>> +   int ret = 0;
>>>>>>> +
>>>>>>> +   /*
>>>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>>>> +    * called internally (ie. defio worker)
>>>>>>> +    */
>>>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>> +
>>>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>>>> +   if (!state) {
>>>>>>> +           ret = -ENOMEM;
>>>>>>> +           goto out;
>>>>>>> +   }
>>>>>>> +   state->acquire_ctx = &ctx;
>>>>>>> +
>>>>>>> +retry:
>>>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>>>> +           struct drm_plane_state *plane_state;
>>>>>>> +
>>>>>>> +           if (plane->state->fb != fb)
>>>>>>> +                   continue;
>>>>>>> +
>>>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>> +           if (IS_ERR(plane_state)) {
>>>>>>> +                   ret = PTR_ERR(plane_state);
>>>>>>> +                   goto out;
>>>>>>> +           }
>>>>>>> +
>>>>>>> +           plane_state->dirty = true;
>>>>>>> +   }
>>>>>>> +
>>>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>>>> +
>>>>>>> +out:
>>>>>>> +   if (ret == -EDEADLK) {
>>>>>>> +           drm_atomic_state_clear(state);
>>>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>>>> +           if (!ret)
>>>>>>> +                   goto retry;
>>>>>>> +   }
>>>>>>> +
>>>>>>> +   drm_atomic_state_put(state);
>>>>>>> +
>>>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>>>> +
>>>>>>> +   return ret;
>>>>>>> +
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>> +
>>>>>>>  /**
>>>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>   * @obj: CRTC object
>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>      * Figure out what fence to wait for:
>>>>>>>      */
>>>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>> +                    new_plane_state->dirty);
>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>
>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>> rendering flush already. IOW I don't think we need your
>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>> do a flip otherwise?
>>>>>>
>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>> others I think.
>>>>> Ok, I've done some history digging:
>>>>>
>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>   flips.
>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>   plane_state->fence for implicit fences.
>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>> for implicit fencing we fish out the implicit fence and stuff it in
>>> plane_state->fence..
>> What happens when userspace passes a fence fd to in_fence_fd?
>
> mixing implicit sync and explicit sync is undefined

It's very well-defined, explicit sync wins. See the kerneldoc for
drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
the reality might be undefined though :-)
-Daniel

>
> BR,
> -R
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Maarten Lankhorst April 4, 2018, 12:28 p.m. UTC | #21
Op 04-04-18 om 14:26 schreef Daniel Vetter:
> On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <robdclark@gmail.com> wrote:
>> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
>> <maarten.lankhorst@linux.intel.com> wrote:
>>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>>> <maarten.lankhorst@linux.intel.com> wrote:
>>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>>
>>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>>> ---
>>>>>>>> Background: there are a number of different folks working on getting
>>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>>
>>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>>
>>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>>
>>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>>> gap solution could be useful.
>>>>>>>>
>>>>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>>     if (state->fb)
>>>>>>>>             drm_framebuffer_get(state->fb);
>>>>>>>>
>>>>>>>> +   state->dirty = false;
>>>>>>>>     state->fence = NULL;
>>>>>>>>     state->commit = NULL;
>>>>>>>>  }
>>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>>  }
>>>>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>>> + *
>>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>>> + */
>>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>>>>> +                         unsigned num_clips)
>>>>>>>> +{
>>>>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>>>>> +   struct drm_atomic_state *state;
>>>>>>>> +   struct drm_plane *plane;
>>>>>>>> +   int ret = 0;
>>>>>>>> +
>>>>>>>> +   /*
>>>>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>>>>> +    * called internally (ie. defio worker)
>>>>>>>> +    */
>>>>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>>> +
>>>>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>>>>> +   if (!state) {
>>>>>>>> +           ret = -ENOMEM;
>>>>>>>> +           goto out;
>>>>>>>> +   }
>>>>>>>> +   state->acquire_ctx = &ctx;
>>>>>>>> +
>>>>>>>> +retry:
>>>>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>>>>> +           struct drm_plane_state *plane_state;
>>>>>>>> +
>>>>>>>> +           if (plane->state->fb != fb)
>>>>>>>> +                   continue;
>>>>>>>> +
>>>>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>>> +           if (IS_ERR(plane_state)) {
>>>>>>>> +                   ret = PTR_ERR(plane_state);
>>>>>>>> +                   goto out;
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +           plane_state->dirty = true;
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>>>>> +
>>>>>>>> +out:
>>>>>>>> +   if (ret == -EDEADLK) {
>>>>>>>> +           drm_atomic_state_clear(state);
>>>>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>>>>> +           if (!ret)
>>>>>>>> +                   goto retry;
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>> +   drm_atomic_state_put(state);
>>>>>>>> +
>>>>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>>>>> +
>>>>>>>> +   return ret;
>>>>>>>> +
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>>> +
>>>>>>>>  /**
>>>>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>>   * @obj: CRTC object
>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>>      * Figure out what fence to wait for:
>>>>>>>>      */
>>>>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>>> +                    new_plane_state->dirty);
>>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>>
>>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>>> rendering flush already. IOW I don't think we need your
>>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>>> do a flip otherwise?
>>>>>>>
>>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>>> others I think.
>>>>>> Ok, I've done some history digging:
>>>>>>
>>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>>   flips.
>>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>>   plane_state->fence for implicit fences.
>>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>>> for implicit fencing we fish out the implicit fence and stuff it in
>>>> plane_state->fence..
>>> What happens when userspace passes a fence fd to in_fence_fd?
>> mixing implicit sync and explicit sync is undefined
> It's very well-defined, explicit sync wins. See the kerneldoc for
> drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
> the reality might be undefined though :-)
Ah right, missed that part.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 1:24 p.m. UTC | #22
On Wed, Apr 4, 2018 at 8:16 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
>> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> >> > > Hi,
>> >> > >
>> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>> >> > > > > Add an atomic helper to implement dirtyfb support.  This is needed to
>> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>> >> > > > > rely on pageflips to trigger a flush to the panel).
>> >> > > > >
>> >> > > > > To signal to the driver that the async atomic update needs to
>> >> > > > > synchronize with fences, even though the fb didn't change, the
>> >> > > > > drm_atomic_state::dirty flag is added.
>> >> > > > >
>> >> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
>> >> > > > > ---
>> >> > > > > Background: there are a number of different folks working on getting
>> >> > > > > upstream kernel working on various different phones/tablets with qcom
>> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>> >> > > > >
>> >> > > > > I know there is work on a proprer non-legacy atomic property for
>> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>> >> > > > > be improved from triggering a full-frame flush once that is in
>> >> > > > > place.  But we kinda needa a stop-gap solution.
>> >> > > > >
>> >> > > > > I had considered an in-driver solution for this, but things get a
>> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>> >> > > > > with setting the CTL.START bit.  (ie. really all we need to do for
>> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>> >> > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
>> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
>> >> > > > > serialize things.  Hence adding an atomic dirtyfb helper.
>> >> > > > >
>> >> > > > > I guess at least the helper, with some small addition to translate
>> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> >> > > > > rect property solution.  Depending on how far off that is, a stop-
>> >> > > > > gap solution could be useful.
>> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>> >> > > > -Daniel
>> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> >> > > get to the desired coordinates.
>> >> > >
>> >> > > One thing to perhaps discuss is how we would like to fit this with
>> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> >> > > damage region that can be fully ignored by the driver, new content is
>> >> > > indicated by a new framebuffer.
>> >> > >
>> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>> >> > > you do here, or provide a content_age state member. Since we clear the dirty
>> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>> >> > > would then become a hint to restrict the damage region.
>> >> > I'm not entirely following here - I thought for frontbuffer rendering the
>> >> > dirty rects have always just been a hint, and that the driver was always
>> >> > free to re-upload the entire buffer to the screen.
>> >> >
>> >> > And through a helper like Rob's proposing here (and have floated around in
>> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
>> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
>> >> > upload the entire screen if they get a flip, since some userspace uses
>> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>> >> >
>> >> > So from that pov the new dirty flag is kinda not necessary imo.
>> >> >
>> >> > > Another approach would be to have the presence of dirty rects without
>> >> > > framebuffer change to indicate frontbuffer rendering.
>> >> > >
>> >> > > I think I like the first approach best, although it may be tempting for
>> >> > > user-space apps to just set the dirty bit instead of providing the full
>> >> > > damage region.
>> >> > Or I'm not following you here, because I don't quite see the difference
>> >> > between dirtyfb and a flip.
>> >> > -Daniel
>> >>
>> >> OK, let me rephrase:
>> >>
>> >> From the driver's point-of-view, in the atomic world, new content and the
>> >> need for manual upload is indicated by a change in fb attached to the plane.
>> >>
>> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> >> the need for manual upload is identified by the dirty flag, (since the fb
>> >> stays the same and the driver thus never identifies a page-flip).
>> >
>> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> > (atomic or not) should result in the entire buffer getting uploaded. The
>> > dirty flag is kinda redundant, a flip with the same buffer works the same
>> > way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>> Userspace could do a pageflip, but (with buffer-age extension) only
>> re-render part of the frame.  So there is a use-case for full frame
>> pageflip with hints about what region(s) changed since previous frame.
>>
>> (Of course userspace could screw that up and get different results on
>> "command" vs "video" style display.. in that case, it gets to keep
>> both pieces)
>
> I'm not against dirty on flips with the same buffer as an optimization.
> Disagreement here is around whether a flip to the same buffer means:
> a) nothing changed, upload nothing
> b) entire buffer might have changed
>
> I'm proposing we standardize on b) (with maybe the exception of legacy
> cursor ioctls because those are special) and add the dirty rects as an
> option to optimize this more.

ok, I can agree w/ flip to same buffer, if FB_ID is in the incoming
property list..  I just want a way to differentiate that from "plane
got added to new state for unrelated reasons"..  so new_fb != old_fb
isn't a good test for "should we synchronize", but new plane state
isn't either.

Keeping the dirty flag (or sequence count or whatever mechanism) to
indicate that an FB_ID that happened to be the same fb was part of
what userspace passed in would do the trick.

BR,
-R

> Either way you'd always have to follow the implicit fence (which is what
> your msm hunk seems to be doing, but only for dirtyfb).
> -Daniel
>
>>
>> BR,
>> -R
>>
>>
>> >> In both these situations, clip rects can provide a hint to restrict the
>> >> dirty region.
>> >>
>> >> Now I was thinking about the preferred way for user-space to communicate
>> >> front buffer rendering through the atomic ioctl:
>> >>
>> >> 1) Expose a dirty (or content_age property)
>> >> 2) Attach a clip-rect blob property.
>> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> >> underlying buffer object.
>> >>
>> >> Are you saying that people are already using 3) and we should keep using
>> >> that?
>> >
>> > I'm saying they're using 3b), flip the same buffer wrapped in the same
>> > drm_framebuffer, and expect it to work.
>> >
>> > The only advantage dirtyfb has is that it allows you to supply the
>> > optimized upload rectangles, but at the cost of a funny api (it's working
>> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> > confirm when exactly you uploaded your stuff. But imo they should be the
>> > same underlying operation.
>> >
>> > Also note that atomic helpers don't optimize out plane flips for same
>> > buffers. We only optimize out some of the waiting, in a failed attempt at
>> > making cursors stall less, but that's not fixed with the async plane
>> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
>> > because the buffer should be pinned already.
>> >
>> > So imo for a quick fix I think we need:
>> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
>> > all flips (well, all screen updates). Probably should do the fence waiting
>> > unconditionally too (and handle cursors with the new async stuff).
>> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
>> > commit, which currently means we're throwing the dirty rect optimization
>> > into the wind. But that could easily be added to the drm_plane_state,
>> > without exposing it to userspace as a property just yet.
>> >
>> > Cheers, Daniel
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > http://blog.ffwll.ch
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Clark April 4, 2018, 1:26 p.m. UTC | #23
On Wed, Apr 4, 2018 at 8:26 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <robdclark@gmail.com> wrote:
>> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
>> <maarten.lankhorst@linux.intel.com> wrote:
>>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>>> <maarten.lankhorst@linux.intel.com> wrote:
>>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>>
>>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>>> ---
>>>>>>>> Background: there are a number of different folks working on getting
>>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>>
>>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>>
>>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>>
>>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>>> gap solution could be useful.
>>>>>>>>
>>>>>>>>  drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>>  drivers/gpu/drm/msm/msm_atomic.c    |  5 ++-
>>>>>>>>  drivers/gpu/drm/msm/msm_fb.c        |  1 +
>>>>>>>>  include/drm/drm_atomic_helper.h     |  4 +++
>>>>>>>>  include/drm/drm_plane.h             |  9 +++++
>>>>>>>>  5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>>     if (state->fb)
>>>>>>>>             drm_framebuffer_get(state->fb);
>>>>>>>>
>>>>>>>> +   state->dirty = false;
>>>>>>>>     state->fence = NULL;
>>>>>>>>     state->commit = NULL;
>>>>>>>>  }
>>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>>  }
>>>>>>>>  EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>>> + *
>>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>>> + */
>>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>>> +                         struct drm_file *file_priv, unsigned flags,
>>>>>>>> +                         unsigned color, struct drm_clip_rect *clips,
>>>>>>>> +                         unsigned num_clips)
>>>>>>>> +{
>>>>>>>> +   struct drm_modeset_acquire_ctx ctx;
>>>>>>>> +   struct drm_atomic_state *state;
>>>>>>>> +   struct drm_plane *plane;
>>>>>>>> +   int ret = 0;
>>>>>>>> +
>>>>>>>> +   /*
>>>>>>>> +    * When called from ioctl, we are interruptable, but not when
>>>>>>>> +    * called internally (ie. defio worker)
>>>>>>>> +    */
>>>>>>>> +   drm_modeset_acquire_init(&ctx,
>>>>>>>> +           file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>>> +
>>>>>>>> +   state = drm_atomic_state_alloc(fb->dev);
>>>>>>>> +   if (!state) {
>>>>>>>> +           ret = -ENOMEM;
>>>>>>>> +           goto out;
>>>>>>>> +   }
>>>>>>>> +   state->acquire_ctx = &ctx;
>>>>>>>> +
>>>>>>>> +retry:
>>>>>>>> +   drm_for_each_plane(plane, fb->dev) {
>>>>>>>> +           struct drm_plane_state *plane_state;
>>>>>>>> +
>>>>>>>> +           if (plane->state->fb != fb)
>>>>>>>> +                   continue;
>>>>>>>> +
>>>>>>>> +           plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>>> +           if (IS_ERR(plane_state)) {
>>>>>>>> +                   ret = PTR_ERR(plane_state);
>>>>>>>> +                   goto out;
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +           plane_state->dirty = true;
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>> +   ret = drm_atomic_nonblocking_commit(state);
>>>>>>>> +
>>>>>>>> +out:
>>>>>>>> +   if (ret == -EDEADLK) {
>>>>>>>> +           drm_atomic_state_clear(state);
>>>>>>>> +           ret = drm_modeset_backoff(&ctx);
>>>>>>>> +           if (!ret)
>>>>>>>> +                   goto retry;
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>> +   drm_atomic_state_put(state);
>>>>>>>> +
>>>>>>>> +   drm_modeset_drop_locks(&ctx);
>>>>>>>> +   drm_modeset_acquire_fini(&ctx);
>>>>>>>> +
>>>>>>>> +   return ret;
>>>>>>>> +
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>>> +
>>>>>>>>  /**
>>>>>>>>   * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>>   * @obj: CRTC object
>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>>      * Figure out what fence to wait for:
>>>>>>>>      */
>>>>>>>>     for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>>> -           if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>>> +           bool sync_fb = new_plane_state->fb &&
>>>>>>>> +                   ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>>> +                    new_plane_state->dirty);
>>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>>
>>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>>> rendering flush already. IOW I don't think we need your
>>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>>> do a flip otherwise?
>>>>>>>
>>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>>> others I think.
>>>>>> Ok, I've done some history digging:
>>>>>>
>>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>>   flips.
>>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>>   plane_state->fence for implicit fences.
>>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>>> for implicit fencing we fish out the implicit fence and stuff it in
>>>> plane_state->fence..
>>> What happens when userspace passes a fence fd to in_fence_fd?
>>
>> mixing implicit sync and explicit sync is undefined
>
> It's very well-defined, explicit sync wins. See the kerneldoc for
> drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
> the reality might be undefined though :-)

ahh, msm_atomic does do drm_atomic_set_fence_for_plane() for
fished-out-fences so if that dtrt, it should be good..

BR,
-R

> -Daniel
>
>>
>> BR,
>> -R
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 4, 2018, 1:28 p.m. UTC | #24
On Wed, Apr 4, 2018 at 3:24 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Wed, Apr 4, 2018 at 8:16 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
>>> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>> >> > > Hi,
>>> >> > >
>>> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>>> >> > > > > Add an atomic helper to implement dirtyfb support.  This is needed to
>>> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>>> >> > > > > rely on pageflips to trigger a flush to the panel).
>>> >> > > > >
>>> >> > > > > To signal to the driver that the async atomic update needs to
>>> >> > > > > synchronize with fences, even though the fb didn't change, the
>>> >> > > > > drm_atomic_state::dirty flag is added.
>>> >> > > > >
>>> >> > > > > Signed-off-by: Rob Clark <robdclark@gmail.com>
>>> >> > > > > ---
>>> >> > > > > Background: there are a number of different folks working on getting
>>> >> > > > > upstream kernel working on various different phones/tablets with qcom
>>> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>>> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>>> >> > > > >
>>> >> > > > > I know there is work on a proprer non-legacy atomic property for
>>> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>>> >> > > > > be improved from triggering a full-frame flush once that is in
>>> >> > > > > place.  But we kinda needa a stop-gap solution.
>>> >> > > > >
>>> >> > > > > I had considered an in-driver solution for this, but things get a
>>> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>>> >> > > > > with setting the CTL.START bit.  (ie. really all we need to do for
>>> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>>> >> > > > > pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
>>> >> > > > > serialize things.  Hence adding an atomic dirtyfb helper.
>>> >> > > > >
>>> >> > > > > I guess at least the helper, with some small addition to translate
>>> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>> >> > > > > rect property solution.  Depending on how far off that is, a stop-
>>> >> > > > > gap solution could be useful.
>>> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>> >> > > > -Daniel
>>> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>> >> > > get to the desired coordinates.
>>> >> > >
>>> >> > > One thing to perhaps discuss is how we would like to fit this with
>>> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>> >> > > damage region that can be fully ignored by the driver, new content is
>>> >> > > indicated by a new framebuffer.
>>> >> > >
>>> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>>> >> > > you do here, or provide a content_age state member. Since we clear the dirty
>>> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>>> >> > > would then become a hint to restrict the damage region.
>>> >> > I'm not entirely following here - I thought for frontbuffer rendering the
>>> >> > dirty rects have always just been a hint, and that the driver was always
>>> >> > free to re-upload the entire buffer to the screen.
>>> >> >
>>> >> > And through a helper like Rob's proposing here (and have floated around in
>>> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
>>> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
>>> >> > upload the entire screen if they get a flip, since some userspace uses
>>> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>> >> >
>>> >> > So from that pov the new dirty flag is kinda not necessary imo.
>>> >> >
>>> >> > > Another approach would be to have the presence of dirty rects without
>>> >> > > framebuffer change to indicate frontbuffer rendering.
>>> >> > >
>>> >> > > I think I like the first approach best, although it may be tempting for
>>> >> > > user-space apps to just set the dirty bit instead of providing the full
>>> >> > > damage region.
>>> >> > Or I'm not following you here, because I don't quite see the difference
>>> >> > between dirtyfb and a flip.
>>> >> > -Daniel
>>> >>
>>> >> OK, let me rephrase:
>>> >>
>>> >> From the driver's point-of-view, in the atomic world, new content and the
>>> >> need for manual upload is indicated by a change in fb attached to the plane.
>>> >>
>>> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>>> >> the need for manual upload is identified by the dirty flag, (since the fb
>>> >> stays the same and the driver thus never identifies a page-flip).
>>> >
>>> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>>> > (atomic or not) should result in the entire buffer getting uploaded. The
>>> > dirty flag is kinda redundant, a flip with the same buffer works the same
>>> > way as a dirtyfb with the entire buffer as the dirty rectangle.
>>>
>>> Userspace could do a pageflip, but (with buffer-age extension) only
>>> re-render part of the frame.  So there is a use-case for full frame
>>> pageflip with hints about what region(s) changed since previous frame.
>>>
>>> (Of course userspace could screw that up and get different results on
>>> "command" vs "video" style display.. in that case, it gets to keep
>>> both pieces)
>>
>> I'm not against dirty on flips with the same buffer as an optimization.
>> Disagreement here is around whether a flip to the same buffer means:
>> a) nothing changed, upload nothing
>> b) entire buffer might have changed
>>
>> I'm proposing we standardize on b) (with maybe the exception of legacy
>> cursor ioctls because those are special) and add the dirty rects as an
>> option to optimize this more.
>
> ok, I can agree w/ flip to same buffer, if FB_ID is in the incoming
> property list..  I just want a way to differentiate that from "plane
> got added to new state for unrelated reasons"..  so new_fb != old_fb
> isn't a good test for "should we synchronize", but new plane state
> isn't either.

Why do you add planes for unrelated reasons? Also I'm not clear on how
looking at FB_ID will work for all the legacy ioctls (plane_update,
page_flip, setcrtc can also be used to do a blocking flip, and is
used).

Atomic core doesn't add any unecessary planes. Atomic helpers add only
the planes for all the crtc you're touching anyway, so maybe we need
to add a flag there. Anything else is up to your driver code.
-Daniel

> Keeping the dirty flag (or sequence count or whatever mechanism) to
> indicate that an FB_ID that happened to be the same fb was part of
> what userspace passed in would do the trick.
>
> BR,
> -R
>
>> Either way you'd always have to follow the implicit fence (which is what
>> your msm hunk seems to be doing, but only for dirtyfb).
>> -Daniel
>>
>>>
>>> BR,
>>> -R
>>>
>>>
>>> >> In both these situations, clip rects can provide a hint to restrict the
>>> >> dirty region.
>>> >>
>>> >> Now I was thinking about the preferred way for user-space to communicate
>>> >> front buffer rendering through the atomic ioctl:
>>> >>
>>> >> 1) Expose a dirty (or content_age property)
>>> >> 2) Attach a clip-rect blob property.
>>> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>>> >> underlying buffer object.
>>> >>
>>> >> Are you saying that people are already using 3) and we should keep using
>>> >> that?
>>> >
>>> > I'm saying they're using 3b), flip the same buffer wrapped in the same
>>> > drm_framebuffer, and expect it to work.
>>> >
>>> > The only advantage dirtyfb has is that it allows you to supply the
>>> > optimized upload rectangles, but at the cost of a funny api (it's working
>>> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>>> > confirm when exactly you uploaded your stuff. But imo they should be the
>>> > same underlying operation.
>>> >
>>> > Also note that atomic helpers don't optimize out plane flips for same
>>> > buffers. We only optimize out some of the waiting, in a failed attempt at
>>> > making cursors stall less, but that's not fixed with the async plane
>>> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
>>> > because the buffer should be pinned already.
>>> >
>>> > So imo for a quick fix I think we need:
>>> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
>>> > all flips (well, all screen updates). Probably should do the fence waiting
>>> > unconditionally too (and handle cursors with the new async stuff).
>>> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
>>> > commit, which currently means we're throwing the dirty rect optimization
>>> > into the wind. But that could easily be added to the drm_plane_state,
>>> > without exposing it to userspace as a property just yet.
>>> >
>>> > Cheers, Daniel
>>> > --
>>> > Daniel Vetter
>>> > Software Engineer, Intel Corporation
>>> > http://blog.ffwll.ch
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Thomas Hellström (VMware) April 5, 2018, 1:30 p.m. UTC | #25
On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>> Hi,
>>>>
>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com> wrote:
>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>>>> -Daniel
>>>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>>> get to the desired coordinates.
>>>>
>>>> One thing to perhaps discuss is how we would like to fit this with
>>>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>>> damage region that can be fully ignored by the driver, new content is
>>>> indicated by a new framebuffer.
>>>>
>>>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>>>> you do here, or provide a content_age state member. Since we clear the dirty
>>>> flag on state copies, I guess that would be sufficient. The blob rectangles
>>>> would then become a hint to restrict the damage region.
>>> I'm not entirely following here - I thought for frontbuffer rendering the
>>> dirty rects have always just been a hint, and that the driver was always
>>> free to re-upload the entire buffer to the screen.
>>>
>>> And through a helper like Rob's proposing here (and have floated around in
>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>> call to a fake flip with dirty rect. Manual upload drivers already need to
>>> upload the entire screen if they get a flip, since some userspace uses
>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>
>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>
>>>> Another approach would be to have the presence of dirty rects without
>>>> framebuffer change to indicate frontbuffer rendering.
>>>>
>>>> I think I like the first approach best, although it may be tempting for
>>>> user-space apps to just set the dirty bit instead of providing the full
>>>> damage region.
>>> Or I'm not following you here, because I don't quite see the difference
>>> between dirtyfb and a flip.
>>> -Daniel
>> OK, let me rephrase:
>>
>>  From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.
>
>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>

FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like 
rendering operation without any synchronization,
We've guaranteed that only the rects that are present are uploaded, but 
only xf86-video-vmware has taken advantage of this to keep
CPU- and GPU rendered content apart.
I think we've at some point run into problems with number of cliprects, 
(Old KDE lock screen?) and use multiple dirtyfb for the same
update...

IIRC the reason for working with the fb, is that it's much easier for 
user-space, which doesn't have to care where planes are scanning out and 
why.
"Present my new content on any screen that's affected".

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Vetter April 5, 2018, 1:35 p.m. UTC | #26
On Thu, Apr 5, 2018 at 3:30 PM, Thomas Hellstrom <thomas@shipmail.org> wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>>
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>>
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>>
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>>
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers already
>>>>>> somewhere.
>>>>>> -Daniel
>>>>>
>>>>> I've asked Deepak to RFC the core changes suggested for the full dirty
>>>>> blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip context,
>>>>> the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict
>>>>> the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty flag
>>>>> like
>>>>> you do here, or provide a content_age state member. Since we clear the
>>>>> dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>>
>>>> I'm not entirely following here - I thought for frontbuffer rendering
>>>> the
>>>> dirty rects have always just been a hint, and that the driver was always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated around
>>>> in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already need
>>>> to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the full
>>>>> damage region.
>>>>
>>>> Or I'm not following you here, because I don't quite see the difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>>
>>> OK, let me rephrase:
>>>
>>>  From the driver's point-of-view, in the atomic world, new content and
>>> the
>>> need for manual upload is indicated by a change in fb attached to the
>>> plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new content
>>> and
>>> the need for manual upload is identified by the dirty flag, (since the fb
>>> stays the same and the driver thus never identifies a page-flip).
>>
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
>>> same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep using
>>> that?
>>
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's working
>> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>
> FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
> rendering operation without any synchronization,
> We've guaranteed that only the rects that are present are uploaded, but only
> xf86-video-vmware has taken advantage of this to keep
> CPU- and GPU rendered content apart.
> I think we've at some point run into problems with number of cliprects, (Old
> KDE lock screen?) and use multiple dirtyfb for the same
> update...

Ok, if we can hit this in practices then I think it's ok to have the
limit. Just need to make sure userspace actually condenses the
cliprects down to something within the limit, since with atomic flips
you can't just do multiple updates - that would tear badly.

Wrt not syncing: I think general use pretty clearly says lots of
userspace renders into buffers with gpus (not even necessarily your
own) and then expects dirtyfb or a flip to upload all the bits.
Otherwise Rob Clark wouldn't need his patch. Given that I think we
need to make general semantics follow that requirement - I don't
expect it'll harm vmwgfx since it doesn't render into the frontbuffer
anyway?

> IIRC the reason for working with the fb, is that it's much easier for
> user-space, which doesn't have to care where planes are scanning out and
> why.
> "Present my new content on any screen that's affected".

Yeah, dirtyfb makes tons of sense for frontbuffer-rendering X, which
also doesn't do per-scanout pixmaps. But for atomic flips you really
want to flip on the crtc (or well plane), since otherwise with
multiple planes it comes up all teared up. vmwgfx doesn't care I guess
since you only have 1 primary plane, but all the SoC chips very much
do.
-Daniel
Deepak Singh Rawat April 6, 2018, 12:19 a.m. UTC | #27
> >>> 1) Expose a dirty (or content_age property)

> >>> 2) Attach a clip-rect blob property.

> >>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the

> >>> same

> >>> underlying buffer object.

> >>>

> >>> Are you saying that people are already using 3) and we should keep

> using

> >>> that?

> >>

> >> I'm saying they're using 3b), flip the same buffer wrapped in the same

> >> drm_framebuffer, and expect it to work.

> >>

> >> The only advantage dirtyfb has is that it allows you to supply the

> >> optimized upload rectangles, but at the cost of a funny api (it's working

> >> on the fb, not the plane/crtc you want to upload) and lack of drm_event

> to

> >> confirm when exactly you uploaded your stuff. But imo they should be

> the

> >> same underlying operation.

> >>

> >

> > FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like

> > rendering operation without any synchronization,

> > We've guaranteed that only the rects that are present are uploaded, but

> only

> > xf86-video-vmware has taken advantage of this to keep

> > CPU- and GPU rendered content apart.

> > I think we've at some point run into problems with number of cliprects,

> (Old

> > KDE lock screen?) and use multiple dirtyfb for the same

> > update...

> 

> Ok, if we can hit this in practices then I think it's ok to have the

> limit. Just need to make sure userspace actually condenses the

> cliprects down to something within the limit, since with atomic flips

> you can't just do multiple updates - that would tear badly.


So I think the conclusion is to have damage clip rect limit with proper
documentation stating limitation of doing multiple updates.

> 

> Wrt not syncing: I think general use pretty clearly says lots of

> userspace renders into buffers with gpus (not even necessarily your

> own) and then expects dirtyfb or a flip to upload all the bits.

> Otherwise Rob Clark wouldn't need his patch. Given that I think we

> need to make general semantics follow that requirement - I don't

> expect it'll harm vmwgfx since it doesn't render into the frontbuffer

> anyway?

> 

> > IIRC the reason for working with the fb, is that it's much easier for

> > user-space, which doesn't have to care where planes are scanning out and

> > why.

> > "Present my new content on any screen that's affected".

> 

> Yeah, dirtyfb makes tons of sense for frontbuffer-rendering X, which

> also doesn't do per-scanout pixmaps. But for atomic flips you really

> want to flip on the crtc (or well plane), since otherwise with

> multiple planes it comes up all teared up. vmwgfx doesn't care I guess

> since you only have 1 primary plane, but all the SoC chips very much

> do.

> -Daniel

> --

> Daniel Vetter

> Software Engineer, Intel Corporation

> +41 (0) 79 365 57 48 - https://urldefense.proofpoint.com/v2/url?u=http-

> 3A__blog.ffwll.ch&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=zOOG28inJK

> 0762SxAf-cyZdStnd2NQpRu98lJP2iYGw&m=XKgN7GPElFapBWdozPSC-

> 9rcfKmDvQC1QHhsFghexWc&s=SH9q5tw-

> UqpUBJVrr2v1mLpRo28Aau7iJ3YWlrgbPmU&e=
Rob Clark April 6, 2018, 12:36 a.m. UTC | #28
On Thu, Apr 5, 2018 at 9:30 AM, Thomas Hellstrom <thomas@shipmail.org> wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>>
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>>
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>>
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>>
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdclark@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Add an atomic helper to implement dirtyfb support.  This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <robdclark@gmail.com>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place.  But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit.  (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.)  The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things.  Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution.  Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers already
>>>>>> somewhere.
>>>>>> -Daniel
>>>>>
>>>>> I've asked Deepak to RFC the core changes suggested for the full dirty
>>>>> blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip context,
>>>>> the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict
>>>>> the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty flag
>>>>> like
>>>>> you do here, or provide a content_age state member. Since we clear the
>>>>> dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>>
>>>> I'm not entirely following here - I thought for frontbuffer rendering
>>>> the
>>>> dirty rects have always just been a hint, and that the driver was always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated around
>>>> in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already need
>>>> to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the full
>>>>> damage region.
>>>>
>>>> Or I'm not following you here, because I don't quite see the difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>>
>>> OK, let me rephrase:
>>>
>>>  From the driver's point-of-view, in the atomic world, new content and
>>> the
>>> need for manual upload is indicated by a change in fb attached to the
>>> plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new content
>>> and
>>> the need for manual upload is identified by the dirty flag, (since the fb
>>> stays the same and the driver thus never identifies a page-flip).
>>
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
>>> same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep using
>>> that?
>>
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's working
>> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>
> FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
> rendering operation without any synchronization,

I guess this works for you because dirtyfb msg to host and rendering
msgs to host end up serialized?

> We've guaranteed that only the rects that are present are uploaded, but only
> xf86-video-vmware has taken advantage of this to keep
> CPU- and GPU rendered content apart.
> I think we've at some point run into problems with number of cliprects, (Old
> KDE lock screen?) and use multiple dirtyfb for the same
> update...

fwiw, I haven't really looked at how it works on msm yet.. my memories
from omap where that we could set a single clip-rect on tx to panel
but we'd need to block tx of contents of next clip-rect until previous
was finished, so we kinda have a limit of 1.. and I expect msm (or
most other dsi drivers) are similar.. but I'm expecting that
serializing everything on atomic commit worker helps here

BR,
-R


>
> IIRC the reason for working with the fb, is that it's much easier for
> user-space, which doesn't have to care where planes are scanning out and
> why.
> "Present my new content on any screen that's affected".
>
> /Thomas
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index c35654591c12..a578dc681b27 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3504,6 +3504,7 @@  void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
 	if (state->fb)
 		drm_framebuffer_get(state->fb);
 
+	state->dirty = false;
 	state->fence = NULL;
 	state->commit = NULL;
 }
@@ -3847,6 +3848,71 @@  int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
 }
 EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
 
+/**
+ * drm_atomic_helper_dirtyfb - helper for dirtyfb
+ *
+ * A helper to implement drm_framebuffer_funcs::dirty
+ */
+int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+			      struct drm_file *file_priv, unsigned flags,
+			      unsigned color, struct drm_clip_rect *clips,
+			      unsigned num_clips)
+{
+	struct drm_modeset_acquire_ctx ctx;
+	struct drm_atomic_state *state;
+	struct drm_plane *plane;
+	int ret = 0;
+
+	/*
+	 * When called from ioctl, we are interruptable, but not when
+	 * called internally (ie. defio worker)
+	 */
+	drm_modeset_acquire_init(&ctx,
+		file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
+
+	state = drm_atomic_state_alloc(fb->dev);
+	if (!state) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	state->acquire_ctx = &ctx;
+
+retry:
+	drm_for_each_plane(plane, fb->dev) {
+		struct drm_plane_state *plane_state;
+
+		if (plane->state->fb != fb)
+			continue;
+
+		plane_state = drm_atomic_get_plane_state(state, plane);
+		if (IS_ERR(plane_state)) {
+			ret = PTR_ERR(plane_state);
+			goto out;
+		}
+
+		plane_state->dirty = true;
+	}
+
+	ret = drm_atomic_nonblocking_commit(state);
+
+out:
+	if (ret == -EDEADLK) {
+		drm_atomic_state_clear(state);
+		ret = drm_modeset_backoff(&ctx);
+		if (!ret)
+			goto retry;
+	}
+
+	drm_atomic_state_put(state);
+
+	drm_modeset_drop_locks(&ctx);
+	drm_modeset_acquire_fini(&ctx);
+
+	return ret;
+
+}
+EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
+
 /**
  * __drm_atomic_helper_private_duplicate_state - copy atomic private state
  * @obj: CRTC object
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index bf5f8c39f34d..bb55a048e98b 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -201,7 +201,10 @@  int msm_atomic_commit(struct drm_device *dev,
 	 * Figure out what fence to wait for:
 	 */
 	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
-		if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
+		bool sync_fb = new_plane_state->fb &&
+			((new_plane_state->fb != old_plane_state->fb) ||
+			 new_plane_state->dirty);
+		if (sync_fb) {
 			struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
 			struct msm_gem_object *msm_obj = to_msm_bo(obj);
 			struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index 0e0c87252ab0..a5d882a34a33 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -62,6 +62,7 @@  static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
 static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
 	.create_handle = msm_framebuffer_create_handle,
 	.destroy = msm_framebuffer_destroy,
+	.dirty = drm_atomic_helper_dirtyfb,
 };
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 26aaba58d6ce..9b7a95c2643d 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -183,6 +183,10 @@  int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
 				       u16 *red, u16 *green, u16 *blue,
 				       uint32_t size,
 				       struct drm_modeset_acquire_ctx *ctx);
+int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+			      struct drm_file *file_priv, unsigned flags,
+			      unsigned color, struct drm_clip_rect *clips,
+			      unsigned num_clips);
 void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
 						     struct drm_private_state *state);
 
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index f7bf4a48b1c3..296fa22bda7a 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -76,6 +76,15 @@  struct drm_plane_state {
 	 */
 	struct drm_framebuffer *fb;
 
+	/**
+	 * @dirty:
+	 *
+	 * Flag that indicates the fb contents have changed even though the
+	 * fb has not.  This is mostly a stop-gap solution until we have
+	 * atomic dirty-rect(s) property.
+	 */
+	bool dirty;
+
 	/**
 	 * @fence:
 	 *