diff mbox

imxdrm issue on SABRE Lite

Message ID 20170212001546.GR27312@n2100.armlinux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King (Oracle) Feb. 12, 2017, 12:15 a.m. UTC
On Sat, Feb 11, 2017 at 09:09:34PM +0000, Dan MacDonald wrote:
> [   43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR*
> [CRTC:24:crtc-0] flip_done timed out
> [   55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR*
> [CRTC:24:crtc-0] flip_done timed out

This seems to lay the foundation for the kernel to Oops itself later.
The problem seems to be this:

drm_atomic_helper_commit(state->dev, state, false)
- drm_atomic_helper_setup_commit(state, false)
  - foreach crtc in state
    - commit->event = kzalloc()
    - crtc_state->event = commit->event
    - crtc_state->event->base.completion = &commit->flip_done
...
- commit_tail(state)
  - funcs->atomic_commit_tail(state)
...
    - drm_atomic_helper_commit_planes(dev, state,
                                        DRM_PLANE_COMMIT_ACTIVE_ONLY |
                                        DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODES$      - foreach active crtc in state
        - funcs->atomic_begin(crtc, old_crtc_state)
          - ipu_crtc_atomic_begin()
            - drm_crtc_vblank_on()
            - if crtc->state->event
              - drm_crtc_arm_vblank_event(crtc, crtc->state->event)
              - crtc->state->event = NULL

At this point, the "commit->flip_done" completion is queued with the
event onto the vblank list.
...
  - drm_atomic_helper_commit_cleanup_done(state)
    - foreach crtc in state
      - try_wait_for_completion(&commit->hw_done)
      - wait_for_completion_timeout(&commit->flip_done, 10sec)

This is where we get the timeout message.

  - drm_atomic_state_free(state)

This "clears" the commit state (calling drm_crtc_commit_put() on it)
which has the effect of kfree()'ing the structure containing the
flip_done, but which is still on the vblank list.

The next time we try to set a mode, the result is that a call to
drm_crtc_vblank_off() causes all queued events to be sent, including
the now kfree()'d flip_done completion, resulting in the reported
kernel oops.

It seems others are also suffering similar issues when the flip_done
completion times out with other drivers:

https://lkml.org/lkml/2016/12/1/171
https://bugs.freedesktop.org/show_bug.cgi?id=96781
https://lists.opensuse.org/opensuse-bugs/2016-10/msg03011.html
https://patchwork.kernel.org/patch/9280223/ (which is me...)

This is likely the same, although the timeout line was not captured:
https://bugzilla.redhat.com/show_bug.cgi?id=1415180
https://bodhi.fedoraproject.org/updates/kernel-4.8.7-200.fc24

So, can we please avoid killing the kernel when the hardware doesn't
quite behave as we want it to?

Right now, when we oops the kernel, we're leaking all the memory
associated with the atomic modeset, so if we stop oopsing the kernel
but still leak the memory, surely that would be an improvement?
Maybe something like the untested patch at the bottom of this mail?

It would give the opportunity to poke about on a failed system to
work out what happened and maybe why the hardware misbehaved.

The real answer is for the hardware to behave, but we can't always
have our cake.

Note - I can't reproduce Dan's problem here on 4.10-rc7 as I suspect
it needs multiple CRTCs/outputs running in the IPU to trigger it,
which Sabre lite has.  I'll try enabling the (disconnected) LVDS
output tomorrow (I have Fabio's LVDS patch knocking about), but I
suspect those with a deeper knowledge of the IPU need to investigate
what's going on.

> [  214.765689] imx-ipuv3 2400000.ipu: DC stop timeout after 50 ms
> [  214.825688] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000
> [  214.833783] pgd = ed1b8000
> [  214.836491] [00000000] *pgd=4c974831
> [  214.840084] Internal error: Oops: 17 [#1] SMP ARM
> [  214.844789] Modules linked in: mousedev snd_soc_sgtl5000
> snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma
> snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio
> snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda
> v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3
> snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops
> videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx
> evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display
> uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables
> [  214.894338] CPU: 2 PID: 316 Comm: Xorg Tainted: G        W
> 4.9.8-1-ARCH #1
> [  214.901735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  214.908264] task: ed2c4d00 task.stack: ed2a6000
> [  214.912803] PC is at __wake_up_common+0x1c/0x80
> [  214.917337] LR is at __wake_up_locked+0x14/0x1c
> [  214.921871] pc : [<c0186568>]    lr : [<c018662c>]    psr: a0070093
> [  214.921871] sp : ed2a7c68  ip : 00000000  fp : c0fa2a70
> [  214.933348] r10: c0f37384  r9 : 00000003  r8 : 00000000
> [  214.938574] r7 : 00000000  r6 : edbf3410  r5 : edbf3408  r4 : edbf340c
> [  214.945101] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : edbf340c
> [  214.951630] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
> Segment none
> [  214.958854] Control: 10c5387d  Table: 3d1b804a  DAC: 00000051
> [  214.964600] Process Xorg (pid: 316, stack limit = 0xed2a6220)
> [  214.970347] Stack: (0xed2a7c68 to 0xed2a8000)
> [  214.974708] 7c60:                   00000000 edbf340c edbf3408
> a0070093 edb46114 edb46000
> [  214.982889] 7c80: 0000039f c0f37384 c0fa2a70 c018662c 00000000
> c0fa1144 bf21c4fc c018704c
> [  214.991069] 7ca0: edbf3380 edbf3380 00000000 c06ccac0 edfa1400
> edbf3380 00000000 c06d023c
> [  214.999250] 7cc0: 0000039f 00000000 edab2a00 80070013 edb4611c
> c12040cc 00000000 00000000
> [  215.007430] 7ce0: 00000000 00040908 edb3f410 ed412c40 00000000
> 00000000 ed26a800 edb47c18
> [  215.015610] 7d00: c0f37384 c0fa1144 bf21c4fc c06c3e3c edc67780
> c06c1c24 edb46000 edb46000
> [  215.023791] 7d20: bf01862c ed412c40 edb46000 00000000 edb46000
> ed410100 ed412240 ed412240
> [  215.031971] 7d40: c0c0c0c0 bf0182bc ed412c40 bf018ca4 00000000
> c06c40f4 ed412c40 00000000
> [  215.040152] 7d60: 00000000 c06c41ac ed412c40 00000000 ed2a7dd0
> edb47c18 ed410100 c06c46dc
> [  215.048332] 7d80: eda36180 edb47c18 00000001 c12040cc ed410100
> c06d5f8c ed2a7e4c edb46000
> [  215.056513] 7da0: 00000001 c12040cc ed410100 c06d74c4 edaa5980
> edb46248 edb47c18 eda3618c
> [  215.064694] 7dc0: eda36180 ed412240 ed410100 c12040cc eda36180
> edb47c18 ed410100 00000000
> [  215.072874] 7de0: 00000000 ed412240 00000001 00040908 ed2a7e70
> 00000051 00000068 c12040cc
> [  215.081055] 7e00: c0cbf0a0 00000068 edf66800 ed2a7e4c c06864a2
> c06ce704 0000e201 00000001
> [  215.089236] 7e20: c0fa22f0 00040908 f50417fa ed2a7e4c be9d1910
> 000000a2 edb46000 00000068
> [  215.097415] 7e40: 00000062 c06d7020 c1208504 008a1210 00000000
> 00000001 00000018 00000047
> [  215.105596] 7e60: 00000000 00000000 00000000 00000001 00007530
> 03480320 03a00378 01e00000
> [  215.113775] 7e80: 01f001ed 0000020d 00000000 00000000 00000000
> 00000000 00000000 00000000
> [  215.121955] 7ea0: 00000000 00000000 00000000 00000000 00000000
> c0b0a268 00000000 00040908
> [  215.130137] 7ec0: ed2a7edc ed2a7fb0 00000008 00040908 00000000
> c12040cc be9d1910 edd8cbc8
> [  215.138317] 7ee0: edcc9b40 be9d1910 0000000b 00000000 00000000
> c0288638 004f10a0 c0101308
> [  215.146497] 7f00: 00000000 eff6fe48 eff6fe24 eff6fe00 000112da
> c02bb9bc 00000008 c01b4a08
> [  215.154677] 7f20: 00040908 c12040cc 00000000 ed29b1b8 00000003
> be9d1910 be9d1808 ffffe000
> [  215.162857] 7f40: 00000051 ed2a6100 00000100 00000000 ed2a6000
> c0102fa4 be9d1678 00040908
> [  215.171038] 7f60: c12040cc 00000000 edcc9b41 edcc9b40 c06864a2
> be9d1910 0000000b 00000000
> [  215.179218] 7f80: 00000000 c0288f78 b6fc3c90 be9d1910 c06864a2
> 00000036 c0107e24 ed2a6000
> [  215.187398] 7fa0: 00000000 c0107c60 b6fc3c90 be9d1910 0000000b
> c06864a2 be9d1910 00000001
> [  215.195579] 7fc0: b6fc3c90 be9d1910 c06864a2 00000036 008a1210
> 00000047 00000018 00000000
> [  215.203759] 7fe0: b6ddf088 be9d18f4 b6dc7ad4 b6b11adc 40070010
> 0000000b 3fffd861 3fffdc61
> [  215.211946] [<c0186568>] (__wake_up_common) from [<c018662c>]
> (__wake_up_locked+0x14/0x1c)
> [  215.220216] [<c018662c>] (__wake_up_locked) from [<c018704c>]
> (complete_all+0x34/0x44)
> [  215.228141] [<c018704c>] (complete_all) from [<c06ccac0>]
> (drm_send_event_locked+0x28/0x11c)
> [  215.236588] [<c06ccac0>] (drm_send_event_locked) from [<c06d023c>]
> (drm_vblank_off+0x1b0/0x21c)
> [  215.245296] [<c06d023c>] (drm_vblank_off) from [<c06c3e3c>]
> (drm_atomic_helper_commit_modeset_disables+0x1dc/0x3fc)
> [  215.255750] [<c06c3e3c>]
> (drm_atomic_helper_commit_modeset_disables) from [<bf0182bc>]
> (imx_drm_atomic_commit_tail+0x18/0x58 [imxdrm])
> [  215.267843] [<bf0182bc>] (imx_drm_atomic_commit_tail [imxdrm]) from
> [<c06c40f4>] (commit_tail+0x40/0x5c)
> [  215.277327] [<c06c40f4>] (commit_tail) from [<c06c41ac>]
> (drm_atomic_helper_commit+0x94/0xd8)
> [  215.285858] [<c06c41ac>] (drm_atomic_helper_commit) from
> [<c06c46dc>] (drm_atomic_helper_set_config+0x78/0x9c)
> [  215.295865] [<c06c46dc>] (drm_atomic_helper_set_config) from
> [<c06d5f8c>] (drm_mode_set_config_internal+0x58/0xdc)
> [  215.306219] [<c06d5f8c>] (drm_mode_set_config_internal) from
> [<c06d74c4>] (drm_mode_setcrtc+0x4a4/0x550)
> [  215.315703] [<c06d74c4>] (drm_mode_setcrtc) from [<c06ce704>]
> (drm_ioctl+0x214/0x44c)
> [  215.323540] [<c06ce704>] (drm_ioctl) from [<c0288638>]
> (do_vfs_ioctl+0xac/0x980)
> [  215.330940] [<c0288638>] (do_vfs_ioctl) from [<c0288f78>]
> (SyS_ioctl+0x6c/0x7c)
> [  215.338258] [<c0288f78>] (SyS_ioctl) from [<c0107c60>]
> (ret_fast_syscall+0x0/0x3c)
> [  215.345832] Code: e5b61004 e1a08003 e59d7028 e1560001 (e5913000)
> [  215.351929] ---[ end trace e8a77aa320be7e56 ]---

 drivers/gpu/drm/drm_atomic_helper.c | 15 +++++++++------
 include/drm/drm_atomic_helper.h     |  2 +-
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

Thierry Reding Feb. 13, 2017, 8:05 a.m. UTC | #1
On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote:
> On Sat, Feb 11, 2017 at 09:09:34PM +0000, Dan MacDonald wrote:
> > [   43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR*
> > [CRTC:24:crtc-0] flip_done timed out
> > [   55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR*
> > [CRTC:24:crtc-0] flip_done timed out
> 
> This seems to lay the foundation for the kernel to Oops itself later.
> The problem seems to be this:
> 
> drm_atomic_helper_commit(state->dev, state, false)
> - drm_atomic_helper_setup_commit(state, false)
>   - foreach crtc in state
>     - commit->event = kzalloc()
>     - crtc_state->event = commit->event
>     - crtc_state->event->base.completion = &commit->flip_done
> ...
> - commit_tail(state)
>   - funcs->atomic_commit_tail(state)
> ...
>     - drm_atomic_helper_commit_planes(dev, state,
>                                         DRM_PLANE_COMMIT_ACTIVE_ONLY |
>                                         DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODES$      - foreach active crtc in state
>         - funcs->atomic_begin(crtc, old_crtc_state)
>           - ipu_crtc_atomic_begin()
>             - drm_crtc_vblank_on()
>             - if crtc->state->event
>               - drm_crtc_arm_vblank_event(crtc, crtc->state->event)
>               - crtc->state->event = NULL
> 
> At this point, the "commit->flip_done" completion is queued with the
> event onto the vblank list.
> ...
>   - drm_atomic_helper_commit_cleanup_done(state)
>     - foreach crtc in state
>       - try_wait_for_completion(&commit->hw_done)
>       - wait_for_completion_timeout(&commit->flip_done, 10sec)
> 
> This is where we get the timeout message.
> 
>   - drm_atomic_state_free(state)
> 
> This "clears" the commit state (calling drm_crtc_commit_put() on it)
> which has the effect of kfree()'ing the structure containing the
> flip_done, but which is still on the vblank list.
> 
> The next time we try to set a mode, the result is that a call to
> drm_crtc_vblank_off() causes all queued events to be sent, including
> the now kfree()'d flip_done completion, resulting in the reported
> kernel oops.
> 
> It seems others are also suffering similar issues when the flip_done
> completion times out with other drivers:
> 
> https://lkml.org/lkml/2016/12/1/171
> https://bugs.freedesktop.org/show_bug.cgi?id=96781
> https://lists.opensuse.org/opensuse-bugs/2016-10/msg03011.html
> https://patchwork.kernel.org/patch/9280223/ (which is me...)
> 
> This is likely the same, although the timeout line was not captured:
> https://bugzilla.redhat.com/show_bug.cgi?id=1415180
> https://bodhi.fedoraproject.org/updates/kernel-4.8.7-200.fc24
> 
> So, can we please avoid killing the kernel when the hardware doesn't
> quite behave as we want it to?
> 
> Right now, when we oops the kernel, we're leaking all the memory
> associated with the atomic modeset, so if we stop oopsing the kernel
> but still leak the memory, surely that would be an improvement?
> Maybe something like the untested patch at the bottom of this mail?
> 
> It would give the opportunity to poke about on a failed system to
> work out what happened and maybe why the hardware misbehaved.
> 
> The real answer is for the hardware to behave, but we can't always
> have our cake.
> 
> Note - I can't reproduce Dan's problem here on 4.10-rc7 as I suspect
> it needs multiple CRTCs/outputs running in the IPU to trigger it,
> which Sabre lite has.  I'll try enabling the (disconnected) LVDS
> output tomorrow (I have Fabio's LVDS patch knocking about), but I
> suspect those with a deeper knowledge of the IPU need to investigate
> what's going on.
> 
> > [  214.765689] imx-ipuv3 2400000.ipu: DC stop timeout after 50 ms
> > [  214.825688] Unable to handle kernel NULL pointer dereference at
> > virtual address 00000000
> > [  214.833783] pgd = ed1b8000
> > [  214.836491] [00000000] *pgd=4c974831
> > [  214.840084] Internal error: Oops: 17 [#1] SMP ARM
> > [  214.844789] Modules linked in: mousedev snd_soc_sgtl5000
> > snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma
> > snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio
> > snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda
> > v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3
> > snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops
> > videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx
> > evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display
> > uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables
> > [  214.894338] CPU: 2 PID: 316 Comm: Xorg Tainted: G        W
> > 4.9.8-1-ARCH #1
> > [  214.901735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [  214.908264] task: ed2c4d00 task.stack: ed2a6000
> > [  214.912803] PC is at __wake_up_common+0x1c/0x80
> > [  214.917337] LR is at __wake_up_locked+0x14/0x1c
> > [  214.921871] pc : [<c0186568>]    lr : [<c018662c>]    psr: a0070093
> > [  214.921871] sp : ed2a7c68  ip : 00000000  fp : c0fa2a70
> > [  214.933348] r10: c0f37384  r9 : 00000003  r8 : 00000000
> > [  214.938574] r7 : 00000000  r6 : edbf3410  r5 : edbf3408  r4 : edbf340c
> > [  214.945101] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : edbf340c
> > [  214.951630] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
> > Segment none
> > [  214.958854] Control: 10c5387d  Table: 3d1b804a  DAC: 00000051
> > [  214.964600] Process Xorg (pid: 316, stack limit = 0xed2a6220)
> > [  214.970347] Stack: (0xed2a7c68 to 0xed2a8000)
> > [  214.974708] 7c60:                   00000000 edbf340c edbf3408
> > a0070093 edb46114 edb46000
> > [  214.982889] 7c80: 0000039f c0f37384 c0fa2a70 c018662c 00000000
> > c0fa1144 bf21c4fc c018704c
> > [  214.991069] 7ca0: edbf3380 edbf3380 00000000 c06ccac0 edfa1400
> > edbf3380 00000000 c06d023c
> > [  214.999250] 7cc0: 0000039f 00000000 edab2a00 80070013 edb4611c
> > c12040cc 00000000 00000000
> > [  215.007430] 7ce0: 00000000 00040908 edb3f410 ed412c40 00000000
> > 00000000 ed26a800 edb47c18
> > [  215.015610] 7d00: c0f37384 c0fa1144 bf21c4fc c06c3e3c edc67780
> > c06c1c24 edb46000 edb46000
> > [  215.023791] 7d20: bf01862c ed412c40 edb46000 00000000 edb46000
> > ed410100 ed412240 ed412240
> > [  215.031971] 7d40: c0c0c0c0 bf0182bc ed412c40 bf018ca4 00000000
> > c06c40f4 ed412c40 00000000
> > [  215.040152] 7d60: 00000000 c06c41ac ed412c40 00000000 ed2a7dd0
> > edb47c18 ed410100 c06c46dc
> > [  215.048332] 7d80: eda36180 edb47c18 00000001 c12040cc ed410100
> > c06d5f8c ed2a7e4c edb46000
> > [  215.056513] 7da0: 00000001 c12040cc ed410100 c06d74c4 edaa5980
> > edb46248 edb47c18 eda3618c
> > [  215.064694] 7dc0: eda36180 ed412240 ed410100 c12040cc eda36180
> > edb47c18 ed410100 00000000
> > [  215.072874] 7de0: 00000000 ed412240 00000001 00040908 ed2a7e70
> > 00000051 00000068 c12040cc
> > [  215.081055] 7e00: c0cbf0a0 00000068 edf66800 ed2a7e4c c06864a2
> > c06ce704 0000e201 00000001
> > [  215.089236] 7e20: c0fa22f0 00040908 f50417fa ed2a7e4c be9d1910
> > 000000a2 edb46000 00000068
> > [  215.097415] 7e40: 00000062 c06d7020 c1208504 008a1210 00000000
> > 00000001 00000018 00000047
> > [  215.105596] 7e60: 00000000 00000000 00000000 00000001 00007530
> > 03480320 03a00378 01e00000
> > [  215.113775] 7e80: 01f001ed 0000020d 00000000 00000000 00000000
> > 00000000 00000000 00000000
> > [  215.121955] 7ea0: 00000000 00000000 00000000 00000000 00000000
> > c0b0a268 00000000 00040908
> > [  215.130137] 7ec0: ed2a7edc ed2a7fb0 00000008 00040908 00000000
> > c12040cc be9d1910 edd8cbc8
> > [  215.138317] 7ee0: edcc9b40 be9d1910 0000000b 00000000 00000000
> > c0288638 004f10a0 c0101308
> > [  215.146497] 7f00: 00000000 eff6fe48 eff6fe24 eff6fe00 000112da
> > c02bb9bc 00000008 c01b4a08
> > [  215.154677] 7f20: 00040908 c12040cc 00000000 ed29b1b8 00000003
> > be9d1910 be9d1808 ffffe000
> > [  215.162857] 7f40: 00000051 ed2a6100 00000100 00000000 ed2a6000
> > c0102fa4 be9d1678 00040908
> > [  215.171038] 7f60: c12040cc 00000000 edcc9b41 edcc9b40 c06864a2
> > be9d1910 0000000b 00000000
> > [  215.179218] 7f80: 00000000 c0288f78 b6fc3c90 be9d1910 c06864a2
> > 00000036 c0107e24 ed2a6000
> > [  215.187398] 7fa0: 00000000 c0107c60 b6fc3c90 be9d1910 0000000b
> > c06864a2 be9d1910 00000001
> > [  215.195579] 7fc0: b6fc3c90 be9d1910 c06864a2 00000036 008a1210
> > 00000047 00000018 00000000
> > [  215.203759] 7fe0: b6ddf088 be9d18f4 b6dc7ad4 b6b11adc 40070010
> > 0000000b 3fffd861 3fffdc61
> > [  215.211946] [<c0186568>] (__wake_up_common) from [<c018662c>]
> > (__wake_up_locked+0x14/0x1c)
> > [  215.220216] [<c018662c>] (__wake_up_locked) from [<c018704c>]
> > (complete_all+0x34/0x44)
> > [  215.228141] [<c018704c>] (complete_all) from [<c06ccac0>]
> > (drm_send_event_locked+0x28/0x11c)
> > [  215.236588] [<c06ccac0>] (drm_send_event_locked) from [<c06d023c>]
> > (drm_vblank_off+0x1b0/0x21c)
> > [  215.245296] [<c06d023c>] (drm_vblank_off) from [<c06c3e3c>]
> > (drm_atomic_helper_commit_modeset_disables+0x1dc/0x3fc)
> > [  215.255750] [<c06c3e3c>]
> > (drm_atomic_helper_commit_modeset_disables) from [<bf0182bc>]
> > (imx_drm_atomic_commit_tail+0x18/0x58 [imxdrm])
> > [  215.267843] [<bf0182bc>] (imx_drm_atomic_commit_tail [imxdrm]) from
> > [<c06c40f4>] (commit_tail+0x40/0x5c)
> > [  215.277327] [<c06c40f4>] (commit_tail) from [<c06c41ac>]
> > (drm_atomic_helper_commit+0x94/0xd8)
> > [  215.285858] [<c06c41ac>] (drm_atomic_helper_commit) from
> > [<c06c46dc>] (drm_atomic_helper_set_config+0x78/0x9c)
> > [  215.295865] [<c06c46dc>] (drm_atomic_helper_set_config) from
> > [<c06d5f8c>] (drm_mode_set_config_internal+0x58/0xdc)
> > [  215.306219] [<c06d5f8c>] (drm_mode_set_config_internal) from
> > [<c06d74c4>] (drm_mode_setcrtc+0x4a4/0x550)
> > [  215.315703] [<c06d74c4>] (drm_mode_setcrtc) from [<c06ce704>]
> > (drm_ioctl+0x214/0x44c)
> > [  215.323540] [<c06ce704>] (drm_ioctl) from [<c0288638>]
> > (do_vfs_ioctl+0xac/0x980)
> > [  215.330940] [<c0288638>] (do_vfs_ioctl) from [<c0288f78>]
> > (SyS_ioctl+0x6c/0x7c)
> > [  215.338258] [<c0288f78>] (SyS_ioctl) from [<c0107c60>]
> > (ret_fast_syscall+0x0/0x3c)
> > [  215.345832] Code: e5b61004 e1a08003 e59d7028 e1560001 (e5913000)
> > [  215.351929] ---[ end trace e8a77aa320be7e56 ]---
> 
>  drivers/gpu/drm/drm_atomic_helper.c | 15 +++++++++------
>  include/drm/drm_atomic_helper.h     |  2 +-
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 21f992605541..46668d071d6a 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state)
>  	else
>  		drm_atomic_helper_commit_tail(state);
>  
> -	drm_atomic_helper_commit_cleanup_done(state);
> -
> -	drm_atomic_state_free(state);
> +	if (drm_atomic_helper_commit_cleanup_done(state) == 0)
> +		drm_atomic_state_free(state);

Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe
that already fixes the issue?

Thierry

>  }
>  
>  static void commit_work(struct work_struct *work)
> @@ -1591,12 +1590,12 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done);
>   * This is part of the atomic helper support for nonblocking commits, see
>   * drm_atomic_helper_setup_commit() for an overview.
>   */
> -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
> +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
>  {
>  	struct drm_crtc *crtc;
>  	struct drm_crtc_state *crtc_state;
>  	struct drm_crtc_commit *commit;
> -	int i;
> +	int i, failed = 0;
>  	long ret;
>  
>  	for_each_crtc_in_state(state, crtc, crtc_state, i) {
> @@ -1621,15 +1620,19 @@ void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
>  		 * not hold a reference of its own. */
>  		ret = wait_for_completion_timeout(&commit->flip_done,
>  						  10*HZ);
> -		if (ret == 0)
> -			DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n",
> +		if (ret == 0) {
> +			DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory leaked\n",
>  				  crtc->base.id, crtc->name);
> +			failed = -ETIMEDOUT;
> +		}
>  
>  		spin_lock(&crtc->commit_lock);
>  del_commit:
>  		list_del(&commit->commit_entry);
>  		spin_unlock(&crtc->commit_lock);
>  	}
> +
> +	return failed;
>  }
>  EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done);
>  
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 7ff92b09fd9c..ee3d642c1feb 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -88,7 +88,7 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state,
>  				   bool nonblock);
>  void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state);
>  void drm_atomic_helper_commit_hw_done(struct drm_atomic_state *state);
> -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state);
> +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state);
>  
>  /* implementations for legacy interfaces */
>  int drm_atomic_helper_update_plane(struct drm_plane *plane,
> 
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson Feb. 13, 2017, 8:55 a.m. UTC | #2
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote:
> On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote:
> > On Sat, Feb 11, 2017 at 09:09:34PM +0000, Dan MacDonald wrote:
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 21f992605541..46668d071d6a 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state)
> >  	else
> >  		drm_atomic_helper_commit_tail(state);
> >  
> > -	drm_atomic_helper_commit_cleanup_done(state);
> > -
> > -	drm_atomic_state_free(state);
> > +	if (drm_atomic_helper_commit_cleanup_done(state) == 0)
> > +		drm_atomic_state_free(state);
> 
> Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe
> that already fixes the issue?

I'm not confident it will, as there is not an independent ref on the
state for the phases, and so a forced timeout still leaves a dangling
pointer. The above chunk goes the opposite way and leaks the state to
avoid the invalid deref, what we need is a ref around its existence on
the dependency queue if that is outside the lifetime of the commit.
-Chris
Russell King (Oracle) Feb. 13, 2017, 9:20 a.m. UTC | #3
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote:
> On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote:
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 21f992605541..46668d071d6a 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state)
> >  	else
> >  		drm_atomic_helper_commit_tail(state);
> >  
> > -	drm_atomic_helper_commit_cleanup_done(state);
> > -
> > -	drm_atomic_state_free(state);
> > +	if (drm_atomic_helper_commit_cleanup_done(state) == 0)
> > +		drm_atomic_state_free(state);
> 
> Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe
> that already fixes the issue?

No.  It's not the atomic state that's referenced, it's only a completion
within the drm_crtc_commit structure, which is completely separate from
the atomic state.  Moreover, the event code has no knowledge of commits,
so it can't "put" a reference count on it.  See:

void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e)
{
        assert_spin_locked(&dev->event_lock);

        if (e->completion) {
                /* ->completion might disappear as soon as it signalled. */
                complete_all(e->completion);
                e->completion = NULL;
        }

vs the setup of the event done in drm_atomic_helper_setup_commit():

                if (!crtc_state->event) {
                        commit->event = kzalloc(sizeof(*commit->event),
                                                GFP_KERNEL);
                        if (!commit->event)
                                return -ENOMEM;

                        crtc_state->event = commit->event;
                }

                crtc_state->event->base.completion = &commit->flip_done;

"commit" gets freed before drm_send_event_locked() is called (hence
the timeout message) and when drm_send_event_locked() is eventually
called via drm_vblank_off(), this causes a use-after-free bug.
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 21f992605541..46668d071d6a 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1201,9 +1201,8 @@  static void commit_tail(struct drm_atomic_state *state)
 	else
 		drm_atomic_helper_commit_tail(state);
 
-	drm_atomic_helper_commit_cleanup_done(state);
-
-	drm_atomic_state_free(state);
+	if (drm_atomic_helper_commit_cleanup_done(state) == 0)
+		drm_atomic_state_free(state);
 }
 
 static void commit_work(struct work_struct *work)
@@ -1591,12 +1590,12 @@  EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done);
  * This is part of the atomic helper support for nonblocking commits, see
  * drm_atomic_helper_setup_commit() for an overview.
  */
-void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
+int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
 {
 	struct drm_crtc *crtc;
 	struct drm_crtc_state *crtc_state;
 	struct drm_crtc_commit *commit;
-	int i;
+	int i, failed = 0;
 	long ret;
 
 	for_each_crtc_in_state(state, crtc, crtc_state, i) {
@@ -1621,15 +1620,19 @@  void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state)
 		 * not hold a reference of its own. */
 		ret = wait_for_completion_timeout(&commit->flip_done,
 						  10*HZ);
-		if (ret == 0)
-			DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n",
+		if (ret == 0) {
+			DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory leaked\n",
 				  crtc->base.id, crtc->name);
+			failed = -ETIMEDOUT;
+		}
 
 		spin_lock(&crtc->commit_lock);
 del_commit:
 		list_del(&commit->commit_entry);
 		spin_unlock(&crtc->commit_lock);
 	}
+
+	return failed;
 }
 EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done);
 
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 7ff92b09fd9c..ee3d642c1feb 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -88,7 +88,7 @@  int drm_atomic_helper_setup_commit(struct drm_atomic_state *state,
 				   bool nonblock);
 void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state);
 void drm_atomic_helper_commit_hw_done(struct drm_atomic_state *state);
-void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state);
+int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state);
 
 /* implementations for legacy interfaces */
 int drm_atomic_helper_update_plane(struct drm_plane *plane,