mbox series

[v2,0/6] drm/vc4: kms: Misc fixes for HVS commits

Message ID 20211117094527.146275-1-maxime@cerno.tech (mailing list archive)
Headers show
Series drm/vc4: kms: Misc fixes for HVS commits | expand

Message

Maxime Ripard Nov. 17, 2021, 9:45 a.m. UTC
Hi,

The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
atomic helpers") introduced a number of issues in corner cases, most of them
showing themselves in the form of either a vblank timeout or use-after-free
error.

These patches should fix most of them, some of them still being debugged.

Maxime

Changes from v1:
  - Prevent a null pointer dereference

Maxime Ripard (6):
  drm/vc4: kms: Wait for the commit before increasing our clock rate
  drm/vc4: kms: Fix return code check
  drm/vc4: kms: Add missing drm_crtc_commit_put
  drm/vc4: kms: Clear the HVS FIFO commit pointer once done
  drm/vc4: kms: Don't duplicate pending commit
  drm/vc4: kms: Fix previous HVS commit wait

 drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

Comments

Jian-Hong Pan Nov. 18, 2021, 6:42 a.m. UTC | #1
Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
>
> Hi,
>
> The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> atomic helpers") introduced a number of issues in corner cases, most of them
> showing themselves in the form of either a vblank timeout or use-after-free
> error.
>
> These patches should fix most of them, some of them still being debugged.
>
> Maxime
>
> Changes from v1:
>   - Prevent a null pointer dereference
>
> Maxime Ripard (6):
>   drm/vc4: kms: Wait for the commit before increasing our clock rate
>   drm/vc4: kms: Fix return code check
>   drm/vc4: kms: Add missing drm_crtc_commit_put
>   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
>   drm/vc4: kms: Don't duplicate pending commit
>   drm/vc4: kms: Fix previous HVS commit wait
>
>  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 23 deletions(-)

I tested the v2 patches based on latest mainline kernel with RPi 4B.
System can boot up into desktop environment.

Although it still hit the bug [1], which might be under debugging, I
think these patches LGTM.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=214991

Tested-by: Jian-Hong Pan <jhp@endlessos.org>
Maxime Ripard Nov. 18, 2021, 10:40 a.m. UTC | #2
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > atomic helpers") introduced a number of issues in corner cases, most of them
> > showing themselves in the form of either a vblank timeout or use-after-free
> > error.
> >
> > These patches should fix most of them, some of them still being debugged.
> >
> > Maxime
> >
> > Changes from v1:
> >   - Prevent a null pointer dereference
> >
> > Maxime Ripard (6):
> >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> >   drm/vc4: kms: Fix return code check
> >   drm/vc4: kms: Add missing drm_crtc_commit_put
> >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> >   drm/vc4: kms: Don't duplicate pending commit
> >   drm/vc4: kms: Fix previous HVS commit wait
> >
> >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> >  1 file changed, 19 insertions(+), 23 deletions(-)
> 
> I tested the v2 patches based on latest mainline kernel with RPi 4B.
> System can boot up into desktop environment.

So the thing that was broken initially isn't anymore?

> Although it still hit the bug [1], which might be under debugging, I
> think these patches LGTM.

The vblank timeout stuff is a symptom of various different bugs. Can you
share your setup, your config.txt, and what you're doing to trigger it?

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=214991
> 
> Tested-by: Jian-Hong Pan <jhp@endlessos.org>

Thanks!
Maxime
Jian-Hong Pan Nov. 19, 2021, 10:24 a.m. UTC | #3
Maxime Ripard <maxime@cerno.tech> 於 2021年11月18日 週四 下午6:40寫道:
>
> On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> > Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > > atomic helpers") introduced a number of issues in corner cases, most of them
> > > showing themselves in the form of either a vblank timeout or use-after-free
> > > error.
> > >
> > > These patches should fix most of them, some of them still being debugged.
> > >
> > > Maxime
> > >
> > > Changes from v1:
> > >   - Prevent a null pointer dereference
> > >
> > > Maxime Ripard (6):
> > >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> > >   drm/vc4: kms: Fix return code check
> > >   drm/vc4: kms: Add missing drm_crtc_commit_put
> > >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> > >   drm/vc4: kms: Don't duplicate pending commit
> > >   drm/vc4: kms: Fix previous HVS commit wait
> > >
> > >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> > >  1 file changed, 19 insertions(+), 23 deletions(-)
> >
> > I tested the v2 patches based on latest mainline kernel with RPi 4B.
> > System can boot up into desktop environment.
>
> So the thing that was broken initially isn't anymore?

No.  It does not appear anymore.

> > Although it still hit the bug [1], which might be under debugging, I
> > think these patches LGTM.
>
> The vblank timeout stuff is a symptom of various different bugs. Can you
> share your setup, your config.txt, and what you're doing to trigger it?

I get the RPi boot firmware files from raspberrypi's repository at tag
1.20211029 [1]
And, make system boots:
RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB

The config.txt only has:
enable_uart=1
arm_64bit=1
kernel=uboot.bin

This bug can be reproduced with es2gears command easily.  May need to
wait it running a while.

Mesa: 21.2.2
libdrm: 2.4.107
xserver/wayland: 1.20.11  Using wayland

This bug happens with direct boot path as well:
RPi firmware -> Linux kernel (aarch64) with corresponding DTB

I tried with Endless OS and Ubuntu's RPi 4B images.
An easy setup is using Ubuntu 21.10 RPi 4B image [2].  Then, replace
the kernel & device tree blob and edit the config.txt.

[1] https://github.com/raspberrypi/firmware/tree/1.20211029/boot
[2] https://ubuntu.com/download/raspberry-pi

Jian-Hong Pan

> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=214991
> >
> > Tested-by: Jian-Hong Pan <jhp@endlessos.org>
>
> Thanks!
> Maxime
Maxime Ripard Nov. 26, 2021, 3:45 p.m. UTC | #4
On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
> Maxime Ripard <maxime@cerno.tech> 於 2021年11月18日 週四 下午6:40寫道:
> >
> > On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > > > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > > > atomic helpers") introduced a number of issues in corner cases, most of them
> > > > showing themselves in the form of either a vblank timeout or use-after-free
> > > > error.
> > > >
> > > > These patches should fix most of them, some of them still being debugged.
> > > >
> > > > Maxime
> > > >
> > > > Changes from v1:
> > > >   - Prevent a null pointer dereference
> > > >
> > > > Maxime Ripard (6):
> > > >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> > > >   drm/vc4: kms: Fix return code check
> > > >   drm/vc4: kms: Add missing drm_crtc_commit_put
> > > >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> > > >   drm/vc4: kms: Don't duplicate pending commit
> > > >   drm/vc4: kms: Fix previous HVS commit wait
> > > >
> > > >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> > > >  1 file changed, 19 insertions(+), 23 deletions(-)
> > >
> > > I tested the v2 patches based on latest mainline kernel with RPi 4B.
> > > System can boot up into desktop environment.
> >
> > So the thing that was broken initially isn't anymore?
> 
> No.  It does not appear anymore.
> 
> > > Although it still hit the bug [1], which might be under debugging, I
> > > think these patches LGTM.
> >
> > The vblank timeout stuff is a symptom of various different bugs. Can you
> > share your setup, your config.txt, and what you're doing to trigger it?
> 
> I get the RPi boot firmware files from raspberrypi's repository at tag
> 1.20211029 [1]
> And, make system boots:
> RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
> 
> The config.txt only has:
> enable_uart=1
> arm_64bit=1
> kernel=uboot.bin
> 
> This bug can be reproduced with es2gears command easily.  May need to
> wait it running a while.
> 
> Mesa: 21.2.2
> libdrm: 2.4.107
> xserver/wayland: 1.20.11  Using wayland
> 
> This bug happens with direct boot path as well:
> RPi firmware -> Linux kernel (aarch64) with corresponding DTB
> 
> I tried with Endless OS and Ubuntu's RPi 4B images.
> An easy setup is using Ubuntu 21.10 RPi 4B image [2].  Then, replace
> the kernel & device tree blob and edit the config.txt.

Does it still appear if you raise the core clock in the config.txt file
using: core_freq_min=600 ?

Thanks!
Maxime
Jian-Hong Pan Nov. 29, 2021, 8:31 a.m. UTC | #5
Maxime Ripard <maxime@cerno.tech> 於 2021年11月26日 週五 下午11:45寫道:
>
> On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
> > Maxime Ripard <maxime@cerno.tech> 於 2021年11月18日 週四 下午6:40寫道:
> > >
> > > On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> > > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > > > > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > > > > atomic helpers") introduced a number of issues in corner cases, most of them
> > > > > showing themselves in the form of either a vblank timeout or use-after-free
> > > > > error.
> > > > >
> > > > > These patches should fix most of them, some of them still being debugged.
> > > > >
> > > > > Maxime
> > > > >
> > > > > Changes from v1:
> > > > >   - Prevent a null pointer dereference
> > > > >
> > > > > Maxime Ripard (6):
> > > > >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> > > > >   drm/vc4: kms: Fix return code check
> > > > >   drm/vc4: kms: Add missing drm_crtc_commit_put
> > > > >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> > > > >   drm/vc4: kms: Don't duplicate pending commit
> > > > >   drm/vc4: kms: Fix previous HVS commit wait
> > > > >
> > > > >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> > > > >  1 file changed, 19 insertions(+), 23 deletions(-)
> > > >
> > > > I tested the v2 patches based on latest mainline kernel with RPi 4B.
> > > > System can boot up into desktop environment.
> > >
> > > So the thing that was broken initially isn't anymore?
> >
> > No.  It does not appear anymore.
> >
> > > > Although it still hit the bug [1], which might be under debugging, I
> > > > think these patches LGTM.
> > >
> > > The vblank timeout stuff is a symptom of various different bugs. Can you
> > > share your setup, your config.txt, and what you're doing to trigger it?
> >
> > I get the RPi boot firmware files from raspberrypi's repository at tag
> > 1.20211029 [1]
> > And, make system boots:
> > RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
> >
> > The config.txt only has:
> > enable_uart=1
> > arm_64bit=1
> > kernel=uboot.bin
> >
> > This bug can be reproduced with es2gears command easily.  May need to
> > wait it running a while.
> >
> > Mesa: 21.2.2
> > libdrm: 2.4.107
> > xserver/wayland: 1.20.11  Using wayland
> >
> > This bug happens with direct boot path as well:
> > RPi firmware -> Linux kernel (aarch64) with corresponding DTB
> >
> > I tried with Endless OS and Ubuntu's RPi 4B images.
> > An easy setup is using Ubuntu 21.10 RPi 4B image [2].  Then, replace
> > the kernel & device tree blob and edit the config.txt.
>
> Does it still appear if you raise the core clock in the config.txt file
> using: core_freq_min=600 ?

It still appears when I raise the core clock in the config.txt file:
core_freq_min=600

Jian-Hong Pan
Maxime Ripard Nov. 29, 2021, 2:35 p.m. UTC | #6
On Wed, 17 Nov 2021 10:45:21 +0100, Maxime Ripard wrote:
> The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> atomic helpers") introduced a number of issues in corner cases, most of them
> showing themselves in the form of either a vblank timeout or use-after-free
> error.
> 
> These patches should fix most of them, some of them still being debugged.
> 
> [...]

Applied to drm/drm-misc (drm-misc-fixes).

Thanks!
Maxime
Maxime Ripard Dec. 3, 2021, 2:03 p.m. UTC | #7
On Mon, Nov 29, 2021 at 04:31:57PM +0800, Jian-Hong Pan wrote:
> Maxime Ripard <maxime@cerno.tech> 於 2021年11月26日 週五 下午11:45寫道:
> >
> > On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
> > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月18日 週四 下午6:40寫道:
> > > >
> > > > On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> > > > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > > > > > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > > > > > atomic helpers") introduced a number of issues in corner cases, most of them
> > > > > > showing themselves in the form of either a vblank timeout or use-after-free
> > > > > > error.
> > > > > >
> > > > > > These patches should fix most of them, some of them still being debugged.
> > > > > >
> > > > > > Maxime
> > > > > >
> > > > > > Changes from v1:
> > > > > >   - Prevent a null pointer dereference
> > > > > >
> > > > > > Maxime Ripard (6):
> > > > > >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> > > > > >   drm/vc4: kms: Fix return code check
> > > > > >   drm/vc4: kms: Add missing drm_crtc_commit_put
> > > > > >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> > > > > >   drm/vc4: kms: Don't duplicate pending commit
> > > > > >   drm/vc4: kms: Fix previous HVS commit wait
> > > > > >
> > > > > >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> > > > > >  1 file changed, 19 insertions(+), 23 deletions(-)
> > > > >
> > > > > I tested the v2 patches based on latest mainline kernel with RPi 4B.
> > > > > System can boot up into desktop environment.
> > > >
> > > > So the thing that was broken initially isn't anymore?
> > >
> > > No.  It does not appear anymore.
> > >
> > > > > Although it still hit the bug [1], which might be under debugging, I
> > > > > think these patches LGTM.
> > > >
> > > > The vblank timeout stuff is a symptom of various different bugs. Can you
> > > > share your setup, your config.txt, and what you're doing to trigger it?
> > >
> > > I get the RPi boot firmware files from raspberrypi's repository at tag
> > > 1.20211029 [1]
> > > And, make system boots:
> > > RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
> > >
> > > The config.txt only has:
> > > enable_uart=1
> > > arm_64bit=1
> > > kernel=uboot.bin
> > >
> > > This bug can be reproduced with es2gears command easily.  May need to
> > > wait it running a while.
> > >
> > > Mesa: 21.2.2
> > > libdrm: 2.4.107
> > > xserver/wayland: 1.20.11  Using wayland
> > >
> > > This bug happens with direct boot path as well:
> > > RPi firmware -> Linux kernel (aarch64) with corresponding DTB
> > >
> > > I tried with Endless OS and Ubuntu's RPi 4B images.
> > > An easy setup is using Ubuntu 21.10 RPi 4B image [2].  Then, replace
> > > the kernel & device tree blob and edit the config.txt.
> >
> > Does it still appear if you raise the core clock in the config.txt file
> > using: core_freq_min=600 ?
> 
> It still appears when I raise the core clock in the config.txt file:
> core_freq_min=600

That's weird, we had some issues like that already but could never point
exactly what was going on.

Is testing the official raspberrypi kernel an option for you? If so,
trying the same workload with fkms would certainly help

Maxime
Jian-Hong Pan Dec. 7, 2021, 10:11 a.m. UTC | #8
Maxime Ripard <maxime@cerno.tech> 於 2021年12月3日 週五 下午10:03寫道:
>
> On Mon, Nov 29, 2021 at 04:31:57PM +0800, Jian-Hong Pan wrote:
> > Maxime Ripard <maxime@cerno.tech> 於 2021年11月26日 週五 下午11:45寫道:
> > >
> > > On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
> > > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月18日 週四 下午6:40寫道:
> > > > >
> > > > > On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
> > > > > > Maxime Ripard <maxime@cerno.tech> 於 2021年11月17日 週三 下午5:45寫道:
> > > > > > > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to
> > > > > > > atomic helpers") introduced a number of issues in corner cases, most of them
> > > > > > > showing themselves in the form of either a vblank timeout or use-after-free
> > > > > > > error.
> > > > > > >
> > > > > > > These patches should fix most of them, some of them still being debugged.
> > > > > > >
> > > > > > > Maxime
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > >   - Prevent a null pointer dereference
> > > > > > >
> > > > > > > Maxime Ripard (6):
> > > > > > >   drm/vc4: kms: Wait for the commit before increasing our clock rate
> > > > > > >   drm/vc4: kms: Fix return code check
> > > > > > >   drm/vc4: kms: Add missing drm_crtc_commit_put
> > > > > > >   drm/vc4: kms: Clear the HVS FIFO commit pointer once done
> > > > > > >   drm/vc4: kms: Don't duplicate pending commit
> > > > > > >   drm/vc4: kms: Fix previous HVS commit wait
> > > > > > >
> > > > > > >  drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++-------------------
> > > > > > >  1 file changed, 19 insertions(+), 23 deletions(-)
> > > > > >
> > > > > > I tested the v2 patches based on latest mainline kernel with RPi 4B.
> > > > > > System can boot up into desktop environment.
> > > > >
> > > > > So the thing that was broken initially isn't anymore?
> > > >
> > > > No.  It does not appear anymore.
> > > >
> > > > > > Although it still hit the bug [1], which might be under debugging, I
> > > > > > think these patches LGTM.
> > > > >
> > > > > The vblank timeout stuff is a symptom of various different bugs. Can you
> > > > > share your setup, your config.txt, and what you're doing to trigger it?
> > > >
> > > > I get the RPi boot firmware files from raspberrypi's repository at tag
> > > > 1.20211029 [1]
> > > > And, make system boots:
> > > > RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
> > > >
> > > > The config.txt only has:
> > > > enable_uart=1
> > > > arm_64bit=1
> > > > kernel=uboot.bin
> > > >
> > > > This bug can be reproduced with es2gears command easily.  May need to
> > > > wait it running a while.
> > > >
> > > > Mesa: 21.2.2
> > > > libdrm: 2.4.107
> > > > xserver/wayland: 1.20.11  Using wayland
> > > >
> > > > This bug happens with direct boot path as well:
> > > > RPi firmware -> Linux kernel (aarch64) with corresponding DTB
> > > >
> > > > I tried with Endless OS and Ubuntu's RPi 4B images.
> > > > An easy setup is using Ubuntu 21.10 RPi 4B image [2].  Then, replace
> > > > the kernel & device tree blob and edit the config.txt.
> > >
> > > Does it still appear if you raise the core clock in the config.txt file
> > > using: core_freq_min=600 ?
> >
> > It still appears when I raise the core clock in the config.txt file:
> > core_freq_min=600
>
> That's weird, we had some issues like that already but could never point
> exactly what was going on.
>
> Is testing the official raspberrypi kernel an option for you? If so,
> trying the same workload with fkms would certainly help

I tested the raspberrypi kernel on rpi-5.16.y branch at commit
bcb52df6df52 ("xhci: add a quirk to work around a suspected cache bug
on VLI controllers").  Also, enabled the fkms.  So, vc4 and v3d are
loaded.  However, the "flip_done timed out" error does not appear like
mainline kernel.

Jian-Hong Pan