Message ID | 20220503121341.983842-14-maxime@cerno.tech (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/vc4: Properly separate v3d on BCM2711, and fix frame ordering | expand |
Hi Maxime, I love your patch! Perhaps something to improve: [auto build test WARNING on drm-intel/for-linux-next] [also build test WARNING on drm-tip/drm-tip next-20220503] [cannot apply to anholt/for-next v5.18-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Maxime-Ripard/drm-vc4-Properly-separate-v3d-on-BCM2711-and-fix-frame-ordering/20220503-201745 base: git://anongit.freedesktop.org/drm-intel for-linux-next config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220503/202205032134.wMMUQ85T-lkp@intel.com/config) compiler: arceb-elf-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/0db44bd12236a0a64cf67bbef6b11df5bbe37134 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Maxime-Ripard/drm-vc4-Properly-separate-v3d-on-BCM2711-and-fix-frame-ordering/20220503-201745 git checkout 0db44bd12236a0a64cf67bbef6b11df5bbe37134 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash drivers/gpu/drm/vc4/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): drivers/gpu/drm/vc4/vc4_crtc.c: In function 'vc4_async_set_fence_cb': >> drivers/gpu/drm/vc4/vc4_crtc.c:865:57: warning: implicit conversion from 'enum <anonymous>' to 'enum dma_resv_usage' [-Wenum-conversion] 865 | ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); | ^~~~~ vim +865 drivers/gpu/drm/vc4/vc4_crtc.c 848 849 static int vc4_async_set_fence_cb(struct drm_device *dev, 850 struct vc4_async_flip_state *flip_state) 851 { 852 struct drm_framebuffer *fb = flip_state->fb; 853 struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); 854 struct vc4_dev *vc4 = to_vc4_dev(dev); 855 struct dma_fence *fence; 856 int ret; 857 858 if (!vc4->is_vc5) { 859 struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); 860 861 return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, 862 vc4_async_page_flip_seqno_complete); 863 } 864 > 865 ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); 866 if (ret) 867 return ret; 868 869 if (dma_fence_add_callback(fence, &flip_state->cb.fence, 870 vc4_async_page_flip_fence_complete)) 871 vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence); 872 873 return 0; 874 } 875
On 05/03, Maxime Ripard wrote: > When doing an asynchronous page flip (PAGE_FLIP ioctl with the > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the > possible GPU buffer being rendered through a call to > vc4_queue_seqno_cb(). > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that > function is defined in vc4_gem.c to wait for the buffer to be rendered, > and once it's done, call a callback. > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is > separate (v3d) and that function won't do anything. This was working > because we were going into a path, due to uninitialized variables, that > was always scheduling the callback. > > However, we were never actually waiting for the buffer to be rendered > which was resulting in frames being displayed out of order. > > The generic API to signal those kind of completion in the kernel are the > DMA fences, and fortunately the v3d drivers supports them and signal > when its job is done. That API also provides an equivalent function that > allows to have a callback being executed when the fence is signalled as > done. > > Let's change our driver a bit to rely on the previous function for the > older SoCs, and on DMA fences for the BCM2711. > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > --- > drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- > 1 file changed, 39 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c > index e0ae7bef08fa..8e1369fca937 100644 > --- a/drivers/gpu/drm/vc4/vc4_crtc.c > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c > @@ -776,6 +776,7 @@ struct vc4_async_flip_state { > struct drm_pending_vblank_event *event; > > union { > + struct dma_fence_cb fence; > struct vc4_seqno_cb seqno; > } cb; > }; > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) > vc4_bo_dec_usecnt(bo); > } > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, > + struct dma_fence_cb *cb) > +{ > + struct vc4_async_flip_state *flip_state = > + container_of(cb, struct vc4_async_flip_state, cb.fence); > + > + vc4_async_page_flip_complete(flip_state); > + dma_fence_put(fence); > +} > + > +static int vc4_async_set_fence_cb(struct drm_device *dev, > + struct vc4_async_flip_state *flip_state) > +{ > + struct drm_framebuffer *fb = flip_state->fb; > + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); > + struct vc4_dev *vc4 = to_vc4_dev(dev); > + struct dma_fence *fence; > + int ret; > + > + if (!vc4->is_vc5) { > + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); > + > + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > + vc4_async_page_flip_seqno_complete); > + } > + > + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); > + if (ret) > + return ret; > + > + if (dma_fence_add_callback(fence, &flip_state->cb.fence, > + vc4_async_page_flip_fence_complete)) > + vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence); > + > + return 0; > +} > + > static int > vc4_async_page_flip_common(struct drm_crtc *crtc, > struct drm_framebuffer *fb, > @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc, > */ > drm_atomic_set_fb_for_plane(plane->state, fb); > > - vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > - vc4_async_page_flip_seqno_complete); > + vc4_async_set_fence_cb(dev, flip_state); I tried to run igt kms_async_flip, but the test still seems very tied to the i915 driver. So, as far as I could check, I didn't see any red flags and sync page flip behaviour seems preserved. > > /* Driver takes ownership of state on successful async commit. */ > return 0; > -- > 2.35.1 >
O 05/09, Melissa Wen wrote: > On 05/03, Maxime Ripard wrote: > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the > > possible GPU buffer being rendered through a call to > > vc4_queue_seqno_cb(). > > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that > > function is defined in vc4_gem.c to wait for the buffer to be rendered, > > and once it's done, call a callback. > > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is > > separate (v3d) and that function won't do anything. This was working > > because we were going into a path, due to uninitialized variables, that > > was always scheduling the callback. > > > > However, we were never actually waiting for the buffer to be rendered > > which was resulting in frames being displayed out of order. > > > > The generic API to signal those kind of completion in the kernel are the > > DMA fences, and fortunately the v3d drivers supports them and signal > > when its job is done. That API also provides an equivalent function that > > allows to have a callback being executed when the fence is signalled as > > done. > > > > Let's change our driver a bit to rely on the previous function for the > > older SoCs, and on DMA fences for the BCM2711. > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > --- > > drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c > > index e0ae7bef08fa..8e1369fca937 100644 > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state { > > struct drm_pending_vblank_event *event; > > > > union { > > + struct dma_fence_cb fence; > > struct vc4_seqno_cb seqno; > > } cb; > > }; > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) > > vc4_bo_dec_usecnt(bo); > > } > > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, > > + struct dma_fence_cb *cb) > > +{ > > + struct vc4_async_flip_state *flip_state = > > + container_of(cb, struct vc4_async_flip_state, cb.fence); > > + > > + vc4_async_page_flip_complete(flip_state); > > + dma_fence_put(fence); > > +} > > + > > +static int vc4_async_set_fence_cb(struct drm_device *dev, > > + struct vc4_async_flip_state *flip_state) > > +{ > > + struct drm_framebuffer *fb = flip_state->fb; > > + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > + struct dma_fence *fence; > > + int ret; > > + > > + if (!vc4->is_vc5) { > > + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); > > + > > + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > + vc4_async_page_flip_seqno_complete); > > + } > > + > > + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ` to run some tests > > + if (ret) > > + return ret; > > + > > + if (dma_fence_add_callback(fence, &flip_state->cb.fence, > > + vc4_async_page_flip_fence_complete)) > > + vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence); > > + > > + return 0; > > +} > > + > > static int > > vc4_async_page_flip_common(struct drm_crtc *crtc, > > struct drm_framebuffer *fb, > > @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc, > > */ > > drm_atomic_set_fb_for_plane(plane->state, fb); > > > > - vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > - vc4_async_page_flip_seqno_complete); > > + vc4_async_set_fence_cb(dev, flip_state); > > I tried to run igt kms_async_flip, but the test still seems very tied to > the i915 driver. So, as far as I could check, I didn't see any red > flags and sync page flip behaviour seems preserved. > > > > > /* Driver takes ownership of state on successful async commit. */ > > return 0; > > -- > > 2.35.1 > >
On 05/09, Melissa Wen wrote: > O 05/09, Melissa Wen wrote: > > On 05/03, Maxime Ripard wrote: > > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the > > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the > > > possible GPU buffer being rendered through a call to > > > vc4_queue_seqno_cb(). > > > > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that > > > function is defined in vc4_gem.c to wait for the buffer to be rendered, > > > and once it's done, call a callback. > > > > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is > > > separate (v3d) and that function won't do anything. This was working > > > because we were going into a path, due to uninitialized variables, that > > > was always scheduling the callback. > > > > > > However, we were never actually waiting for the buffer to be rendered > > > which was resulting in frames being displayed out of order. > > > > > > The generic API to signal those kind of completion in the kernel are the > > > DMA fences, and fortunately the v3d drivers supports them and signal > > > when its job is done. That API also provides an equivalent function that > > > allows to have a callback being executed when the fence is signalled as > > > done. > > > > > > Let's change our driver a bit to rely on the previous function for the > > > older SoCs, and on DMA fences for the BCM2711. > > > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > > --- > > > drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- > > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c > > > index e0ae7bef08fa..8e1369fca937 100644 > > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c > > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c > > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state { > > > struct drm_pending_vblank_event *event; > > > > > > union { > > > + struct dma_fence_cb fence; > > > struct vc4_seqno_cb seqno; > > > } cb; > > > }; > > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) > > > vc4_bo_dec_usecnt(bo); > > > } > > > > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, > > > + struct dma_fence_cb *cb) > > > +{ > > > + struct vc4_async_flip_state *flip_state = > > > + container_of(cb, struct vc4_async_flip_state, cb.fence); > > > + > > > + vc4_async_page_flip_complete(flip_state); > > > + dma_fence_put(fence); > > > +} > > > + > > > +static int vc4_async_set_fence_cb(struct drm_device *dev, > > > + struct vc4_async_flip_state *flip_state) > > > +{ > > > + struct drm_framebuffer *fb = flip_state->fb; > > > + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); > > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > > + struct dma_fence *fence; > > > + int ret; > > > + > > > + if (!vc4->is_vc5) { > > > + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); > > > + > > > + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > > + vc4_async_page_flip_seqno_complete); > > > + } > > > + > > > + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); > + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ` > to run some tests > > > > + if (ret) > > > + return ret; > > > + > > > + if (dma_fence_add_callback(fence, &flip_state->cb.fence, me again :) I was thinking if we should add a check here for !fence and just complete the page flip, instead of letting `dma_fence_add_callback` warns whenever fence is NULL. I think there are situation in which fence is NULL and it is not an issue, right? Does it make sense? > > > + vc4_async_page_flip_fence_complete)) > > > + vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence); > > > + > > > + return 0; > > > +} > > > + > > > static int > > > vc4_async_page_flip_common(struct drm_crtc *crtc, > > > struct drm_framebuffer *fb, > > > @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc, > > > */ > > > drm_atomic_set_fb_for_plane(plane->state, fb); > > > > > > - vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > > - vc4_async_page_flip_seqno_complete); > > > + vc4_async_set_fence_cb(dev, flip_state); > > > > I tried to run igt kms_async_flip, but the test still seems very tied to > > the i915 driver. So, as far as I could check, I didn't see any red > > flags and sync page flip behaviour seems preserved. > > > > > > > > /* Driver takes ownership of state on successful async commit. */ > > > return 0; > > > -- > > > 2.35.1 > > > > >
Hi, On Thu, May 12, 2022 at 09:44:42AM -0100, Melissa Wen wrote: > On 05/09, Melissa Wen wrote: > > O 05/09, Melissa Wen wrote: > > > On 05/03, Maxime Ripard wrote: > > > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the > > > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the > > > > possible GPU buffer being rendered through a call to > > > > vc4_queue_seqno_cb(). > > > > > > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that > > > > function is defined in vc4_gem.c to wait for the buffer to be rendered, > > > > and once it's done, call a callback. > > > > > > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is > > > > separate (v3d) and that function won't do anything. This was working > > > > because we were going into a path, due to uninitialized variables, that > > > > was always scheduling the callback. > > > > > > > > However, we were never actually waiting for the buffer to be rendered > > > > which was resulting in frames being displayed out of order. > > > > > > > > The generic API to signal those kind of completion in the kernel are the > > > > DMA fences, and fortunately the v3d drivers supports them and signal > > > > when its job is done. That API also provides an equivalent function that > > > > allows to have a callback being executed when the fence is signalled as > > > > done. > > > > > > > > Let's change our driver a bit to rely on the previous function for the > > > > older SoCs, and on DMA fences for the BCM2711. > > > > > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > > > --- > > > > drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c > > > > index e0ae7bef08fa..8e1369fca937 100644 > > > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c > > > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c > > > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state { > > > > struct drm_pending_vblank_event *event; > > > > > > > > union { > > > > + struct dma_fence_cb fence; > > > > struct vc4_seqno_cb seqno; > > > > } cb; > > > > }; > > > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) > > > > vc4_bo_dec_usecnt(bo); > > > > } > > > > > > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, > > > > + struct dma_fence_cb *cb) > > > > +{ > > > > + struct vc4_async_flip_state *flip_state = > > > > + container_of(cb, struct vc4_async_flip_state, cb.fence); > > > > + > > > > + vc4_async_page_flip_complete(flip_state); > > > > + dma_fence_put(fence); > > > > +} > > > > + > > > > +static int vc4_async_set_fence_cb(struct drm_device *dev, > > > > + struct vc4_async_flip_state *flip_state) > > > > +{ > > > > + struct drm_framebuffer *fb = flip_state->fb; > > > > + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); > > > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > > > + struct dma_fence *fence; > > > > + int ret; > > > > + > > > > + if (!vc4->is_vc5) { > > > > + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); > > > > + > > > > + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > > > + vc4_async_page_flip_seqno_complete); > > > > + } > > > > + > > > > + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); > > + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ` > > to run some tests > > > > > > + if (ret) > > > > + return ret; > > > > + > > > > + if (dma_fence_add_callback(fence, &flip_state->cb.fence, > me again :) > > I was thinking if we should add a check here for !fence and just complete the page flip, > instead of letting `dma_fence_add_callback` warns whenever fence is NULL. > I think there are situation in which fence is NULL and it is not an > issue, right? Does it make sense? I'm not sure. What situation do you have in mind? Maxime
On 06/06, Maxime Ripard wrote: > Hi, > > On Thu, May 12, 2022 at 09:44:42AM -0100, Melissa Wen wrote: > > On 05/09, Melissa Wen wrote: > > > O 05/09, Melissa Wen wrote: > > > > On 05/03, Maxime Ripard wrote: > > > > > When doing an asynchronous page flip (PAGE_FLIP ioctl with the > > > > > DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the > > > > > possible GPU buffer being rendered through a call to > > > > > vc4_queue_seqno_cb(). > > > > > > > > > > On the BCM2835-37, the GPU driver is part of the vc4 driver and that > > > > > function is defined in vc4_gem.c to wait for the buffer to be rendered, > > > > > and once it's done, call a callback. > > > > > > > > > > However, on the BCM2711 used on the RaspberryPi4, the GPU driver is > > > > > separate (v3d) and that function won't do anything. This was working > > > > > because we were going into a path, due to uninitialized variables, that > > > > > was always scheduling the callback. > > > > > > > > > > However, we were never actually waiting for the buffer to be rendered > > > > > which was resulting in frames being displayed out of order. > > > > > > > > > > The generic API to signal those kind of completion in the kernel are the > > > > > DMA fences, and fortunately the v3d drivers supports them and signal > > > > > when its job is done. That API also provides an equivalent function that > > > > > allows to have a callback being executed when the fence is signalled as > > > > > done. > > > > > > > > > > Let's change our driver a bit to rely on the previous function for the > > > > > older SoCs, and on DMA fences for the BCM2711. > > > > > > > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > > > > --- > > > > > drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- > > > > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c > > > > > index e0ae7bef08fa..8e1369fca937 100644 > > > > > --- a/drivers/gpu/drm/vc4/vc4_crtc.c > > > > > +++ b/drivers/gpu/drm/vc4/vc4_crtc.c > > > > > @@ -776,6 +776,7 @@ struct vc4_async_flip_state { > > > > > struct drm_pending_vblank_event *event; > > > > > > > > > > union { > > > > > + struct dma_fence_cb fence; > > > > > struct vc4_seqno_cb seqno; > > > > > } cb; > > > > > }; > > > > > @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) > > > > > vc4_bo_dec_usecnt(bo); > > > > > } > > > > > > > > > > +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, > > > > > + struct dma_fence_cb *cb) > > > > > +{ > > > > > + struct vc4_async_flip_state *flip_state = > > > > > + container_of(cb, struct vc4_async_flip_state, cb.fence); > > > > > + > > > > > + vc4_async_page_flip_complete(flip_state); > > > > > + dma_fence_put(fence); > > > > > +} > > > > > + > > > > > +static int vc4_async_set_fence_cb(struct drm_device *dev, > > > > > + struct vc4_async_flip_state *flip_state) > > > > > +{ > > > > > + struct drm_framebuffer *fb = flip_state->fb; > > > > > + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); > > > > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > > > > + struct dma_fence *fence; > > > > > + int ret; > > > > > + > > > > > + if (!vc4->is_vc5) { > > > > > + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); > > > > > + > > > > > + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, > > > > > + vc4_async_page_flip_seqno_complete); > > > > > + } > > > > > + > > > > > + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); > > > + for kernel bot complaint, I replaced false with `DMA_RESV_USAGE_READ` > > > to run some tests > > > > > > > > + if (ret) > > > > > + return ret; > > > > > + > > > > > + if (dma_fence_add_callback(fence, &flip_state->cb.fence, > > me again :) > > > > I was thinking if we should add a check here for !fence and just complete the page flip, > > instead of letting `dma_fence_add_callback` warns whenever fence is NULL. > > I think there are situation in which fence is NULL and it is not an > > issue, right? Does it make sense? > > I'm not sure. What situation do you have in mind? I mean, if no implicity fence was attached to this bo, it's safe to just do the page flip. This behaviour will happen anyway, after dma_fence_add_callback() checks fence is NULL and return -EINVAL. But this check will also trigger a warning for something that it's not an issue here, I think. So, if we just check `fence` before calling dma_fence_add_callback(), we keep the same behaviour and prevent the warning. Melissa > > Maxime
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index e0ae7bef08fa..8e1369fca937 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -776,6 +776,7 @@ struct vc4_async_flip_state { struct drm_pending_vblank_event *event; union { + struct dma_fence_cb fence; struct vc4_seqno_cb seqno; } cb; }; @@ -835,6 +836,43 @@ static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb) vc4_bo_dec_usecnt(bo); } +static void vc4_async_page_flip_fence_complete(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct vc4_async_flip_state *flip_state = + container_of(cb, struct vc4_async_flip_state, cb.fence); + + vc4_async_page_flip_complete(flip_state); + dma_fence_put(fence); +} + +static int vc4_async_set_fence_cb(struct drm_device *dev, + struct vc4_async_flip_state *flip_state) +{ + struct drm_framebuffer *fb = flip_state->fb; + struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0); + struct vc4_dev *vc4 = to_vc4_dev(dev); + struct dma_fence *fence; + int ret; + + if (!vc4->is_vc5) { + struct vc4_bo *bo = to_vc4_bo(&cma_bo->base); + + return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, + vc4_async_page_flip_seqno_complete); + } + + ret = dma_resv_get_singleton(cma_bo->base.resv, false, &fence); + if (ret) + return ret; + + if (dma_fence_add_callback(fence, &flip_state->cb.fence, + vc4_async_page_flip_fence_complete)) + vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence); + + return 0; +} + static int vc4_async_page_flip_common(struct drm_crtc *crtc, struct drm_framebuffer *fb, @@ -874,8 +912,7 @@ vc4_async_page_flip_common(struct drm_crtc *crtc, */ drm_atomic_set_fb_for_plane(plane->state, fb); - vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno, - vc4_async_page_flip_seqno_complete); + vc4_async_set_fence_cb(dev, flip_state); /* Driver takes ownership of state on successful async commit. */ return 0;
When doing an asynchronous page flip (PAGE_FLIP ioctl with the DRM_MODE_PAGE_FLIP_ASYNC flag set), the current code waits for the possible GPU buffer being rendered through a call to vc4_queue_seqno_cb(). On the BCM2835-37, the GPU driver is part of the vc4 driver and that function is defined in vc4_gem.c to wait for the buffer to be rendered, and once it's done, call a callback. However, on the BCM2711 used on the RaspberryPi4, the GPU driver is separate (v3d) and that function won't do anything. This was working because we were going into a path, due to uninitialized variables, that was always scheduling the callback. However, we were never actually waiting for the buffer to be rendered which was resulting in frames being displayed out of order. The generic API to signal those kind of completion in the kernel are the DMA fences, and fortunately the v3d drivers supports them and signal when its job is done. That API also provides an equivalent function that allows to have a callback being executed when the fence is signalled as done. Let's change our driver a bit to rely on the previous function for the older SoCs, and on DMA fences for the BCM2711. Signed-off-by: Maxime Ripard <maxime@cerno.tech> --- drivers/gpu/drm/vc4/vc4_crtc.c | 41 ++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-)