diff mbox series

[13/14] drm/vc4: crtc: Fix out of order frames during asynchronous page flips

Message ID 20220503121341.983842-14-maxime@cerno.tech (mailing list archive)
State New, archived
Headers show
Series drm/vc4: Properly separate v3d on BCM2711, and fix frame ordering | expand

Commit Message

Maxime Ripard May 3, 2022, 12:13 p.m. UTC
When doing an asynchronous page flip (PAGE_FLIP ioctl with the
DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
possible GPU buffer being rendered through a call to
vc4_queue_seqno_cb().

On the BCM2835-37, the GPU driver is part of the vc4 driver and that
function is defined in vc4_gem.c to wait for the buffer to be rendered,
and once it's done, call a callback.

However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
separate (v3d) and that function won't do anything. This was working
because we were going into a path, due to uninitialized variables, that
was always scheduling the callback.

However, we were never actually waiting for the buffer to be rendered
which was resulting in frames being displayed out of order.

The generic API to signal those kind of completion in the kernel are the
DMA fences, and fortunately the v3d drivers supports them and signal
when its job is done. That API also provides an equivalent function that
allows to have a callback being executed when the fence is signalled as
done.

Let's change our driver a bit to rely on the previous function for the
older SoCs, and on DMA fences for the BCM2711.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
---
 drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

Comments

kernel test robot May 3, 2022, 1:53 p.m. UTC | #1
Hi Maxime,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-tip/drm-tip next-20220503]
[cannot apply to anholt/for-next v5.18-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Maxime-Ripard/drm-vc4-Properly-separate-v3d-on-BCM2711-and-fix-frame-ordering/20220503-201745
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220503/202205032134.wMMUQ85T-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/0db44bd12236a0a64cf67bbef6b11df5bbe37134
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Maxime-Ripard/drm-vc4-Properly-separate-v3d-on-BCM2711-and-fix-frame-ordering/20220503-201745
        git checkout 0db44bd12236a0a64cf67bbef6b11df5bbe37134
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash drivers/gpu/drm/vc4/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/vc4/vc4_crtc.c: In function 'vc4_async_set_fence_cb':
>> drivers/gpu/drm/vc4/vc4_crtc.c:865:57: warning: implicit conversion from 'enum <anonymous>' to 'enum dma_resv_usage' [-Wenum-conversion]
     865 |         ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
         |                                                         ^~~~~


vim +865 drivers/gpu/drm/vc4/vc4_crtc.c

   848	
   849	static int vc4_async_set_fence_cb(struct drm_device *dev,
   850					  struct vc4_async_flip_state *flip_state)
   851	{
   852		struct drm_framebuffer *fb = flip_state->fb;
   853		struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
   854		struct vc4_dev *vc4 = to_vc4_dev(dev);
   855		struct dma_fence *fence;
   856		int ret;
   857	
   858		if (!vc4->is_vc5) {
   859			struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
   860	
   861			return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
   862						  vc4_async_page_flip_seqno_complete);
   863		}
   864	
 > 865		ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
   866		if (ret)
   867			return ret;
   868	
   869		if (dma_fence_add_callback(fence, &flip_state->cb.fence,
   870					   vc4_async_page_flip_fence_complete))
   871			vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
   872	
   873		return 0;
   874	}
   875
Melissa Wen May 9, 2022, 5:10 p.m. UTC | #2
On 05/03, Maxime Ripard wrote:
> When doing an asynchronous page flip (PAGE_FLIP ioctl with the
> DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
> possible GPU buffer being rendered through a call to
> vc4_queue_seqno_cb().
> 
> On the BCM2835-37, the GPU driver is part of the vc4 driver and that
> function is defined in vc4_gem.c to wait for the buffer to be rendered,
> and once it's done, call a callback.
> 
> However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
> separate (v3d) and that function won't do anything. This was working
> because we were going into a path, due to uninitialized variables, that
> was always scheduling the callback.
> 
> However, we were never actually waiting for the buffer to be rendered
> which was resulting in frames being displayed out of order.
> 
> The generic API to signal those kind of completion in the kernel are the
> DMA fences, and fortunately the v3d drivers supports them and signal
> when its job is done. That API also provides an equivalent function that
> allows to have a callback being executed when the fence is signalled as
> done.
> 
> Let's change our driver a bit to rely on the previous function for the
> older SoCs, and on DMA fences for the BCM2711.
> 
> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> ---
>  drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> index e0ae7bef08fa..8e1369fca937 100644
> --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> +++ b/drivers/gpu/drm/vc4/vc4_crtc.c
> @@ -776,6 +776,7 @@ struct vc4_async_flip_state {
>  	struct drm_pending_vblank_event *event;
>  
>  	union {
> +		struct dma_fence_cb fence;
>  		struct vc4_seqno_cb seqno;
>  	} cb;
>  };
> @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
>  		vc4_bo_dec_usecnt(bo);
>  }
>  
> +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
> +					       struct dma_fence_cb *cb)
> +{
> +	struct vc4_async_flip_state *flip_state =
> +		container_of(cb, struct vc4_async_flip_state, cb.fence);
> +
> +	vc4_async_page_flip_complete(flip_state);
> +	dma_fence_put(fence);
> +}
> +
> +static int vc4_async_set_fence_cb(struct drm_device *dev,
> +				  struct vc4_async_flip_state *flip_state)
> +{
> +	struct drm_framebuffer *fb = flip_state->fb;
> +	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
> +	struct vc4_dev *vc4 = to_vc4_dev(dev);
> +	struct dma_fence *fence;
> +	int ret;
> +
> +	if (!vc4->is_vc5) {
> +		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
> +
> +		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> +					  vc4_async_page_flip_seqno_complete);
> +	}
> +
> +	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
> +	if (ret)
> +		return ret;
> +
> +	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
> +				   vc4_async_page_flip_fence_complete))
> +		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
> +
> +	return 0;
> +}
> +
>  static int
>  vc4_async_page_flip_common(struct drm_crtc *crtc,
>  			   struct drm_framebuffer *fb,
> @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc,
>  	 */
>  	drm_atomic_set_fb_for_plane(plane->state, fb);
>  
> -	vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> -			   vc4_async_page_flip_seqno_complete);
> +	vc4_async_set_fence_cb(dev, flip_state);

I tried to run igt kms_async_flip, but the test still seems very tied to
the i915 driver. So, as far as I could check, I didn't see any red
flags and sync page flip behaviour seems preserved.

>  
>  	/* Driver takes ownership of state on successful async commit. */
>  	return 0;
> -- 
> 2.35.1
>
Melissa Wen May 9, 2022, 5:15 p.m. UTC | #3
O 05/09, Melissa Wen wrote:
> On 05/03, Maxime Ripard wrote:
> > When doing an asynchronous page flip (PAGE_FLIP ioctl with the
> > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
> > possible GPU buffer being rendered through a call to
> > vc4_queue_seqno_cb().
> > 
> > On the BCM2835-37, the GPU driver is part of the vc4 driver and that
> > function is defined in vc4_gem.c to wait for the buffer to be rendered,
> > and once it's done, call a callback.
> > 
> > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
> > separate (v3d) and that function won't do anything. This was working
> > because we were going into a path, due to uninitialized variables, that
> > was always scheduling the callback.
> > 
> > However, we were never actually waiting for the buffer to be rendered
> > which was resulting in frames being displayed out of order.
> > 
> > The generic API to signal those kind of completion in the kernel are the
> > DMA fences, and fortunately the v3d drivers supports them and signal
> > when its job is done. That API also provides an equivalent function that
> > allows to have a callback being executed when the fence is signalled as
> > done.
> > 
> > Let's change our driver a bit to rely on the previous function for the
> > older SoCs, and on DMA fences for the BCM2711.
> > 
> > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > ---
> >  drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
> >  1 file changed, 39 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> > index e0ae7bef08fa..8e1369fca937 100644
> > --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c
> > @@ -776,6 +776,7 @@ struct vc4_async_flip_state {
> >  	struct drm_pending_vblank_event *event;
> >  
> >  	union {
> > +		struct dma_fence_cb fence;
> >  		struct vc4_seqno_cb seqno;
> >  	} cb;
> >  };
> > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
> >  		vc4_bo_dec_usecnt(bo);
> >  }
> >  
> > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
> > +					       struct dma_fence_cb *cb)
> > +{
> > +	struct vc4_async_flip_state *flip_state =
> > +		container_of(cb, struct vc4_async_flip_state, cb.fence);
> > +
> > +	vc4_async_page_flip_complete(flip_state);
> > +	dma_fence_put(fence);
> > +}
> > +
> > +static int vc4_async_set_fence_cb(struct drm_device *dev,
> > +				  struct vc4_async_flip_state *flip_state)
> > +{
> > +	struct drm_framebuffer *fb = flip_state->fb;
> > +	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
> > +	struct vc4_dev *vc4 = to_vc4_dev(dev);
> > +	struct dma_fence *fence;
> > +	int ret;
> > +
> > +	if (!vc4->is_vc5) {
> > +		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
> > +
> > +		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > +					  vc4_async_page_flip_seqno_complete);
> > +	}
> > +
> > +	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
+ for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ`
to run some tests

> > +	if (ret)
> > +		return ret;
> > +
> > +	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
> > +				   vc4_async_page_flip_fence_complete))
> > +		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
> > +
> > +	return 0;
> > +}
> > +
> >  static int
> >  vc4_async_page_flip_common(struct drm_crtc *crtc,
> >  			   struct drm_framebuffer *fb,
> > @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc,
> >  	 */
> >  	drm_atomic_set_fb_for_plane(plane->state, fb);
> >  
> > -	vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > -			   vc4_async_page_flip_seqno_complete);
> > +	vc4_async_set_fence_cb(dev, flip_state);
> 
> I tried to run igt kms_async_flip, but the test still seems very tied to
> the i915 driver. So, as far as I could check, I didn't see any red
> flags and sync page flip behaviour seems preserved.
> 
> >  
> >  	/* Driver takes ownership of state on successful async commit. */
> >  	return 0;
> > -- 
> > 2.35.1
> >
Melissa Wen May 12, 2022, 10:44 a.m. UTC | #4
On 05/09, Melissa Wen wrote:
> O 05/09, Melissa Wen wrote:
> > On 05/03, Maxime Ripard wrote:
> > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the
> > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
> > > possible GPU buffer being rendered through a call to
> > > vc4_queue_seqno_cb().
> > > 
> > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that
> > > function is defined in vc4_gem.c to wait for the buffer to be rendered,
> > > and once it's done, call a callback.
> > > 
> > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
> > > separate (v3d) and that function won't do anything. This was working
> > > because we were going into a path, due to uninitialized variables, that
> > > was always scheduling the callback.
> > > 
> > > However, we were never actually waiting for the buffer to be rendered
> > > which was resulting in frames being displayed out of order.
> > > 
> > > The generic API to signal those kind of completion in the kernel are the
> > > DMA fences, and fortunately the v3d drivers supports them and signal
> > > when its job is done. That API also provides an equivalent function that
> > > allows to have a callback being executed when the fence is signalled as
> > > done.
> > > 
> > > Let's change our driver a bit to rely on the previous function for the
> > > older SoCs, and on DMA fences for the BCM2711.
> > > 
> > > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > > ---
> > >  drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
> > >  1 file changed, 39 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > index e0ae7bef08fa..8e1369fca937 100644
> > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state {
> > >  	struct drm_pending_vblank_event *event;
> > >  
> > >  	union {
> > > +		struct dma_fence_cb fence;
> > >  		struct vc4_seqno_cb seqno;
> > >  	} cb;
> > >  };
> > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
> > >  		vc4_bo_dec_usecnt(bo);
> > >  }
> > >  
> > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
> > > +					       struct dma_fence_cb *cb)
> > > +{
> > > +	struct vc4_async_flip_state *flip_state =
> > > +		container_of(cb, struct vc4_async_flip_state, cb.fence);
> > > +
> > > +	vc4_async_page_flip_complete(flip_state);
> > > +	dma_fence_put(fence);
> > > +}
> > > +
> > > +static int vc4_async_set_fence_cb(struct drm_device *dev,
> > > +				  struct vc4_async_flip_state *flip_state)
> > > +{
> > > +	struct drm_framebuffer *fb = flip_state->fb;
> > > +	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
> > > +	struct vc4_dev *vc4 = to_vc4_dev(dev);
> > > +	struct dma_fence *fence;
> > > +	int ret;
> > > +
> > > +	if (!vc4->is_vc5) {
> > > +		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
> > > +
> > > +		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > > +					  vc4_async_page_flip_seqno_complete);
> > > +	}
> > > +
> > > +	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
> + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ`
> to run some tests
> 
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
me again :)

I was thinking if we should add a check here for !fence and just complete the page flip,
instead of letting `dma_fence_add_callback` warns whenever fence is NULL.
I think there are situation in which fence is NULL and it is not an
issue, right? Does it make sense?

> > > +				   vc4_async_page_flip_fence_complete))
> > > +		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int
> > >  vc4_async_page_flip_common(struct drm_crtc *crtc,
> > >  			   struct drm_framebuffer *fb,
> > > @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc,
> > >  	 */
> > >  	drm_atomic_set_fb_for_plane(plane->state, fb);
> > >  
> > > -	vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > > -			   vc4_async_page_flip_seqno_complete);
> > > +	vc4_async_set_fence_cb(dev, flip_state);
> > 
> > I tried to run igt kms_async_flip, but the test still seems very tied to
> > the i915 driver. So, as far as I could check, I didn't see any red
> > flags and sync page flip behaviour seems preserved.
> > 
> > >  
> > >  	/* Driver takes ownership of state on successful async commit. */
> > >  	return 0;
> > > -- 
> > > 2.35.1
> > > 
> 
>
Maxime Ripard June 6, 2022, 1:59 p.m. UTC | #5
Hi,

On Thu, May 12, 2022 at 09:44:42AM -0100, Melissa Wen wrote:
> On 05/09, Melissa Wen wrote:
> > O 05/09, Melissa Wen wrote:
> > > On 05/03, Maxime Ripard wrote:
> > > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the
> > > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
> > > > possible GPU buffer being rendered through a call to
> > > > vc4_queue_seqno_cb().
> > > > 
> > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that
> > > > function is defined in vc4_gem.c to wait for the buffer to be rendered,
> > > > and once it's done, call a callback.
> > > > 
> > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
> > > > separate (v3d) and that function won't do anything. This was working
> > > > because we were going into a path, due to uninitialized variables, that
> > > > was always scheduling the callback.
> > > > 
> > > > However, we were never actually waiting for the buffer to be rendered
> > > > which was resulting in frames being displayed out of order.
> > > > 
> > > > The generic API to signal those kind of completion in the kernel are the
> > > > DMA fences, and fortunately the v3d drivers supports them and signal
> > > > when its job is done. That API also provides an equivalent function that
> > > > allows to have a callback being executed when the fence is signalled as
> > > > done.
> > > > 
> > > > Let's change our driver a bit to rely on the previous function for the
> > > > older SoCs, and on DMA fences for the BCM2711.
> > > > 
> > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > > > ---
> > > >  drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 39 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > index e0ae7bef08fa..8e1369fca937 100644
> > > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state {
> > > >  	struct drm_pending_vblank_event *event;
> > > >  
> > > >  	union {
> > > > +		struct dma_fence_cb fence;
> > > >  		struct vc4_seqno_cb seqno;
> > > >  	} cb;
> > > >  };
> > > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
> > > >  		vc4_bo_dec_usecnt(bo);
> > > >  }
> > > >  
> > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
> > > > +					       struct dma_fence_cb *cb)
> > > > +{
> > > > +	struct vc4_async_flip_state *flip_state =
> > > > +		container_of(cb, struct vc4_async_flip_state, cb.fence);
> > > > +
> > > > +	vc4_async_page_flip_complete(flip_state);
> > > > +	dma_fence_put(fence);
> > > > +}
> > > > +
> > > > +static int vc4_async_set_fence_cb(struct drm_device *dev,
> > > > +				  struct vc4_async_flip_state *flip_state)
> > > > +{
> > > > +	struct drm_framebuffer *fb = flip_state->fb;
> > > > +	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
> > > > +	struct vc4_dev *vc4 = to_vc4_dev(dev);
> > > > +	struct dma_fence *fence;
> > > > +	int ret;
> > > > +
> > > > +	if (!vc4->is_vc5) {
> > > > +		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
> > > > +
> > > > +		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > > > +					  vc4_async_page_flip_seqno_complete);
> > > > +	}
> > > > +
> > > > +	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
> > + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ`
> > to run some tests
> > 
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
> me again :)
> 
> I was thinking if we should add a check here for !fence and just complete the page flip,
> instead of letting `dma_fence_add_callback` warns whenever fence is NULL.
> I think there are situation in which fence is NULL and it is not an
> issue, right? Does it make sense?

I'm not sure. What situation do you have in mind?

Maxime
Melissa Wen June 8, 2022, 11:24 a.m. UTC | #6
On 06/06, Maxime Ripard wrote:
> Hi,
> 
> On Thu, May 12, 2022 at 09:44:42AM -0100, Melissa Wen wrote:
> > On 05/09, Melissa Wen wrote:
> > > O 05/09, Melissa Wen wrote:
> > > > On 05/03, Maxime Ripard wrote:
> > > > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the
> > > > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the
> > > > > possible GPU buffer being rendered through a call to
> > > > > vc4_queue_seqno_cb().
> > > > > 
> > > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that
> > > > > function is defined in vc4_gem.c to wait for the buffer to be rendered,
> > > > > and once it's done, call a callback.
> > > > > 
> > > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is
> > > > > separate (v3d) and that function won't do anything. This was working
> > > > > because we were going into a path, due to uninitialized variables, that
> > > > > was always scheduling the callback.
> > > > > 
> > > > > However, we were never actually waiting for the buffer to be rendered
> > > > > which was resulting in frames being displayed out of order.
> > > > > 
> > > > > The generic API to signal those kind of completion in the kernel are the
> > > > > DMA fences, and fortunately the v3d drivers supports them and signal
> > > > > when its job is done. That API also provides an equivalent function that
> > > > > allows to have a callback being executed when the fence is signalled as
> > > > > done.
> > > > > 
> > > > > Let's change our driver a bit to rely on the previous function for the
> > > > > older SoCs, and on DMA fences for the BCM2711.
> > > > > 
> > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > > > > ---
> > > > >  drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 39 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > > index e0ae7bef08fa..8e1369fca937 100644
> > > > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c
> > > > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state {
> > > > >  	struct drm_pending_vblank_event *event;
> > > > >  
> > > > >  	union {
> > > > > +		struct dma_fence_cb fence;
> > > > >  		struct vc4_seqno_cb seqno;
> > > > >  	} cb;
> > > > >  };
> > > > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
> > > > >  		vc4_bo_dec_usecnt(bo);
> > > > >  }
> > > > >  
> > > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
> > > > > +					       struct dma_fence_cb *cb)
> > > > > +{
> > > > > +	struct vc4_async_flip_state *flip_state =
> > > > > +		container_of(cb, struct vc4_async_flip_state, cb.fence);
> > > > > +
> > > > > +	vc4_async_page_flip_complete(flip_state);
> > > > > +	dma_fence_put(fence);
> > > > > +}
> > > > > +
> > > > > +static int vc4_async_set_fence_cb(struct drm_device *dev,
> > > > > +				  struct vc4_async_flip_state *flip_state)
> > > > > +{
> > > > > +	struct drm_framebuffer *fb = flip_state->fb;
> > > > > +	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
> > > > > +	struct vc4_dev *vc4 = to_vc4_dev(dev);
> > > > > +	struct dma_fence *fence;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (!vc4->is_vc5) {
> > > > > +		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
> > > > > +
> > > > > +		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
> > > > > +					  vc4_async_page_flip_seqno_complete);
> > > > > +	}
> > > > > +
> > > > > +	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
> > > + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ`
> > > to run some tests
> > > 
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
> > me again :)
> > 
> > I was thinking if we should add a check here for !fence and just complete the page flip,
> > instead of letting `dma_fence_add_callback` warns whenever fence is NULL.
> > I think there are situation in which fence is NULL and it is not an
> > issue, right? Does it make sense?
> 
> I'm not sure. What situation do you have in mind?

I mean, if no implicity fence was attached to this bo, it's safe to just
do the page flip. This behaviour will happen anyway, after
dma_fence_add_callback() checks fence is NULL and return -EINVAL. But
this check will also trigger a warning for something that it's not an
issue here, I think. So, if we just check `fence` before calling
dma_fence_add_callback(), we keep the same behaviour and prevent the
warning.

Melissa
> 
> Maxime
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index e0ae7bef08fa..8e1369fca937 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -776,6 +776,7 @@  struct vc4_async_flip_state {
 	struct drm_pending_vblank_event *event;
 
 	union {
+		struct dma_fence_cb fence;
 		struct vc4_seqno_cb seqno;
 	} cb;
 };
@@ -835,6 +836,43 @@  static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
 		vc4_bo_dec_usecnt(bo);
 }
 
+static void vc4_async_page_flip_fence_complete(struct dma_fence *fence,
+					       struct dma_fence_cb *cb)
+{
+	struct vc4_async_flip_state *flip_state =
+		container_of(cb, struct vc4_async_flip_state, cb.fence);
+
+	vc4_async_page_flip_complete(flip_state);
+	dma_fence_put(fence);
+}
+
+static int vc4_async_set_fence_cb(struct drm_device *dev,
+				  struct vc4_async_flip_state *flip_state)
+{
+	struct drm_framebuffer *fb = flip_state->fb;
+	struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	struct dma_fence *fence;
+	int ret;
+
+	if (!vc4->is_vc5) {
+		struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
+
+		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
+					  vc4_async_page_flip_seqno_complete);
+	}
+
+	ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence);
+	if (ret)
+		return ret;
+
+	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
+				   vc4_async_page_flip_fence_complete))
+		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
+
+	return 0;
+}
+
 static int
 vc4_async_page_flip_common(struct drm_crtc *crtc,
 			   struct drm_framebuffer *fb,
@@ -874,8 +912,7 @@  vc4_async_page_flip_common(struct drm_crtc *crtc,
 	 */
 	drm_atomic_set_fb_for_plane(plane->state, fb);
 
-	vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
-			   vc4_async_page_flip_seqno_complete);
+	vc4_async_set_fence_cb(dev, flip_state);
 
 	/* Driver takes ownership of state on successful async commit. */
 	return 0;