Message ID | 20170212001546.GR27312@n2100.armlinux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote: > On Sat, Feb 11, 2017 at 09:09:34PM +0000, Dan MacDonald wrote: > > [ 43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > > [CRTC:24:crtc-0] flip_done timed out > > [ 55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > > [CRTC:24:crtc-0] flip_done timed out > > This seems to lay the foundation for the kernel to Oops itself later. > The problem seems to be this: > > drm_atomic_helper_commit(state->dev, state, false) > - drm_atomic_helper_setup_commit(state, false) > - foreach crtc in state > - commit->event = kzalloc() > - crtc_state->event = commit->event > - crtc_state->event->base.completion = &commit->flip_done > ... > - commit_tail(state) > - funcs->atomic_commit_tail(state) > ... > - drm_atomic_helper_commit_planes(dev, state, > DRM_PLANE_COMMIT_ACTIVE_ONLY | > DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODES$ - foreach active crtc in state > - funcs->atomic_begin(crtc, old_crtc_state) > - ipu_crtc_atomic_begin() > - drm_crtc_vblank_on() > - if crtc->state->event > - drm_crtc_arm_vblank_event(crtc, crtc->state->event) > - crtc->state->event = NULL > > At this point, the "commit->flip_done" completion is queued with the > event onto the vblank list. > ... > - drm_atomic_helper_commit_cleanup_done(state) > - foreach crtc in state > - try_wait_for_completion(&commit->hw_done) > - wait_for_completion_timeout(&commit->flip_done, 10sec) > > This is where we get the timeout message. > > - drm_atomic_state_free(state) > > This "clears" the commit state (calling drm_crtc_commit_put() on it) > which has the effect of kfree()'ing the structure containing the > flip_done, but which is still on the vblank list. > > The next time we try to set a mode, the result is that a call to > drm_crtc_vblank_off() causes all queued events to be sent, including > the now kfree()'d flip_done completion, resulting in the reported > kernel oops. > > It seems others are also suffering similar issues when the flip_done > completion times out with other drivers: > > https://lkml.org/lkml/2016/12/1/171 > https://bugs.freedesktop.org/show_bug.cgi?id=96781 > https://lists.opensuse.org/opensuse-bugs/2016-10/msg03011.html > https://patchwork.kernel.org/patch/9280223/ (which is me...) > > This is likely the same, although the timeout line was not captured: > https://bugzilla.redhat.com/show_bug.cgi?id=1415180 > https://bodhi.fedoraproject.org/updates/kernel-4.8.7-200.fc24 > > So, can we please avoid killing the kernel when the hardware doesn't > quite behave as we want it to? > > Right now, when we oops the kernel, we're leaking all the memory > associated with the atomic modeset, so if we stop oopsing the kernel > but still leak the memory, surely that would be an improvement? > Maybe something like the untested patch at the bottom of this mail? > > It would give the opportunity to poke about on a failed system to > work out what happened and maybe why the hardware misbehaved. > > The real answer is for the hardware to behave, but we can't always > have our cake. > > Note - I can't reproduce Dan's problem here on 4.10-rc7 as I suspect > it needs multiple CRTCs/outputs running in the IPU to trigger it, > which Sabre lite has. I'll try enabling the (disconnected) LVDS > output tomorrow (I have Fabio's LVDS patch knocking about), but I > suspect those with a deeper knowledge of the IPU need to investigate > what's going on. > > > [ 214.765689] imx-ipuv3 2400000.ipu: DC stop timeout after 50 ms > > [ 214.825688] Unable to handle kernel NULL pointer dereference at > > virtual address 00000000 > > [ 214.833783] pgd = ed1b8000 > > [ 214.836491] [00000000] *pgd=4c974831 > > [ 214.840084] Internal error: Oops: 17 [#1] SMP ARM > > [ 214.844789] Modules linked in: mousedev snd_soc_sgtl5000 > > snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma > > snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio > > snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda > > v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3 > > snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops > > videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx > > evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display > > uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables > > [ 214.894338] CPU: 2 PID: 316 Comm: Xorg Tainted: G W > > 4.9.8-1-ARCH #1 > > [ 214.901735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > [ 214.908264] task: ed2c4d00 task.stack: ed2a6000 > > [ 214.912803] PC is at __wake_up_common+0x1c/0x80 > > [ 214.917337] LR is at __wake_up_locked+0x14/0x1c > > [ 214.921871] pc : [<c0186568>] lr : [<c018662c>] psr: a0070093 > > [ 214.921871] sp : ed2a7c68 ip : 00000000 fp : c0fa2a70 > > [ 214.933348] r10: c0f37384 r9 : 00000003 r8 : 00000000 > > [ 214.938574] r7 : 00000000 r6 : edbf3410 r5 : edbf3408 r4 : edbf340c > > [ 214.945101] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : edbf340c > > [ 214.951630] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM > > Segment none > > [ 214.958854] Control: 10c5387d Table: 3d1b804a DAC: 00000051 > > [ 214.964600] Process Xorg (pid: 316, stack limit = 0xed2a6220) > > [ 214.970347] Stack: (0xed2a7c68 to 0xed2a8000) > > [ 214.974708] 7c60: 00000000 edbf340c edbf3408 > > a0070093 edb46114 edb46000 > > [ 214.982889] 7c80: 0000039f c0f37384 c0fa2a70 c018662c 00000000 > > c0fa1144 bf21c4fc c018704c > > [ 214.991069] 7ca0: edbf3380 edbf3380 00000000 c06ccac0 edfa1400 > > edbf3380 00000000 c06d023c > > [ 214.999250] 7cc0: 0000039f 00000000 edab2a00 80070013 edb4611c > > c12040cc 00000000 00000000 > > [ 215.007430] 7ce0: 00000000 00040908 edb3f410 ed412c40 00000000 > > 00000000 ed26a800 edb47c18 > > [ 215.015610] 7d00: c0f37384 c0fa1144 bf21c4fc c06c3e3c edc67780 > > c06c1c24 edb46000 edb46000 > > [ 215.023791] 7d20: bf01862c ed412c40 edb46000 00000000 edb46000 > > ed410100 ed412240 ed412240 > > [ 215.031971] 7d40: c0c0c0c0 bf0182bc ed412c40 bf018ca4 00000000 > > c06c40f4 ed412c40 00000000 > > [ 215.040152] 7d60: 00000000 c06c41ac ed412c40 00000000 ed2a7dd0 > > edb47c18 ed410100 c06c46dc > > [ 215.048332] 7d80: eda36180 edb47c18 00000001 c12040cc ed410100 > > c06d5f8c ed2a7e4c edb46000 > > [ 215.056513] 7da0: 00000001 c12040cc ed410100 c06d74c4 edaa5980 > > edb46248 edb47c18 eda3618c > > [ 215.064694] 7dc0: eda36180 ed412240 ed410100 c12040cc eda36180 > > edb47c18 ed410100 00000000 > > [ 215.072874] 7de0: 00000000 ed412240 00000001 00040908 ed2a7e70 > > 00000051 00000068 c12040cc > > [ 215.081055] 7e00: c0cbf0a0 00000068 edf66800 ed2a7e4c c06864a2 > > c06ce704 0000e201 00000001 > > [ 215.089236] 7e20: c0fa22f0 00040908 f50417fa ed2a7e4c be9d1910 > > 000000a2 edb46000 00000068 > > [ 215.097415] 7e40: 00000062 c06d7020 c1208504 008a1210 00000000 > > 00000001 00000018 00000047 > > [ 215.105596] 7e60: 00000000 00000000 00000000 00000001 00007530 > > 03480320 03a00378 01e00000 > > [ 215.113775] 7e80: 01f001ed 0000020d 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 215.121955] 7ea0: 00000000 00000000 00000000 00000000 00000000 > > c0b0a268 00000000 00040908 > > [ 215.130137] 7ec0: ed2a7edc ed2a7fb0 00000008 00040908 00000000 > > c12040cc be9d1910 edd8cbc8 > > [ 215.138317] 7ee0: edcc9b40 be9d1910 0000000b 00000000 00000000 > > c0288638 004f10a0 c0101308 > > [ 215.146497] 7f00: 00000000 eff6fe48 eff6fe24 eff6fe00 000112da > > c02bb9bc 00000008 c01b4a08 > > [ 215.154677] 7f20: 00040908 c12040cc 00000000 ed29b1b8 00000003 > > be9d1910 be9d1808 ffffe000 > > [ 215.162857] 7f40: 00000051 ed2a6100 00000100 00000000 ed2a6000 > > c0102fa4 be9d1678 00040908 > > [ 215.171038] 7f60: c12040cc 00000000 edcc9b41 edcc9b40 c06864a2 > > be9d1910 0000000b 00000000 > > [ 215.179218] 7f80: 00000000 c0288f78 b6fc3c90 be9d1910 c06864a2 > > 00000036 c0107e24 ed2a6000 > > [ 215.187398] 7fa0: 00000000 c0107c60 b6fc3c90 be9d1910 0000000b > > c06864a2 be9d1910 00000001 > > [ 215.195579] 7fc0: b6fc3c90 be9d1910 c06864a2 00000036 008a1210 > > 00000047 00000018 00000000 > > [ 215.203759] 7fe0: b6ddf088 be9d18f4 b6dc7ad4 b6b11adc 40070010 > > 0000000b 3fffd861 3fffdc61 > > [ 215.211946] [<c0186568>] (__wake_up_common) from [<c018662c>] > > (__wake_up_locked+0x14/0x1c) > > [ 215.220216] [<c018662c>] (__wake_up_locked) from [<c018704c>] > > (complete_all+0x34/0x44) > > [ 215.228141] [<c018704c>] (complete_all) from [<c06ccac0>] > > (drm_send_event_locked+0x28/0x11c) > > [ 215.236588] [<c06ccac0>] (drm_send_event_locked) from [<c06d023c>] > > (drm_vblank_off+0x1b0/0x21c) > > [ 215.245296] [<c06d023c>] (drm_vblank_off) from [<c06c3e3c>] > > (drm_atomic_helper_commit_modeset_disables+0x1dc/0x3fc) > > [ 215.255750] [<c06c3e3c>] > > (drm_atomic_helper_commit_modeset_disables) from [<bf0182bc>] > > (imx_drm_atomic_commit_tail+0x18/0x58 [imxdrm]) > > [ 215.267843] [<bf0182bc>] (imx_drm_atomic_commit_tail [imxdrm]) from > > [<c06c40f4>] (commit_tail+0x40/0x5c) > > [ 215.277327] [<c06c40f4>] (commit_tail) from [<c06c41ac>] > > (drm_atomic_helper_commit+0x94/0xd8) > > [ 215.285858] [<c06c41ac>] (drm_atomic_helper_commit) from > > [<c06c46dc>] (drm_atomic_helper_set_config+0x78/0x9c) > > [ 215.295865] [<c06c46dc>] (drm_atomic_helper_set_config) from > > [<c06d5f8c>] (drm_mode_set_config_internal+0x58/0xdc) > > [ 215.306219] [<c06d5f8c>] (drm_mode_set_config_internal) from > > [<c06d74c4>] (drm_mode_setcrtc+0x4a4/0x550) > > [ 215.315703] [<c06d74c4>] (drm_mode_setcrtc) from [<c06ce704>] > > (drm_ioctl+0x214/0x44c) > > [ 215.323540] [<c06ce704>] (drm_ioctl) from [<c0288638>] > > (do_vfs_ioctl+0xac/0x980) > > [ 215.330940] [<c0288638>] (do_vfs_ioctl) from [<c0288f78>] > > (SyS_ioctl+0x6c/0x7c) > > [ 215.338258] [<c0288f78>] (SyS_ioctl) from [<c0107c60>] > > (ret_fast_syscall+0x0/0x3c) > > [ 215.345832] Code: e5b61004 e1a08003 e59d7028 e1560001 (e5913000) > > [ 215.351929] ---[ end trace e8a77aa320be7e56 ]--- > > drivers/gpu/drm/drm_atomic_helper.c | 15 +++++++++------ > include/drm/drm_atomic_helper.h | 2 +- > 2 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 21f992605541..46668d071d6a 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) > else > drm_atomic_helper_commit_tail(state); > > - drm_atomic_helper_commit_cleanup_done(state); > - > - drm_atomic_state_free(state); > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > + drm_atomic_state_free(state); Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe that already fixes the issue? Thierry > } > > static void commit_work(struct work_struct *work) > @@ -1591,12 +1590,12 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done); > * This is part of the atomic helper support for nonblocking commits, see > * drm_atomic_helper_setup_commit() for an overview. > */ > -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) > +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) > { > struct drm_crtc *crtc; > struct drm_crtc_state *crtc_state; > struct drm_crtc_commit *commit; > - int i; > + int i, failed = 0; > long ret; > > for_each_crtc_in_state(state, crtc, crtc_state, i) { > @@ -1621,15 +1620,19 @@ void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) > * not hold a reference of its own. */ > ret = wait_for_completion_timeout(&commit->flip_done, > 10*HZ); > - if (ret == 0) > - DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", > + if (ret == 0) { > + DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory leaked\n", > crtc->base.id, crtc->name); > + failed = -ETIMEDOUT; > + } > > spin_lock(&crtc->commit_lock); > del_commit: > list_del(&commit->commit_entry); > spin_unlock(&crtc->commit_lock); > } > + > + return failed; > } > EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done); > > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h > index 7ff92b09fd9c..ee3d642c1feb 100644 > --- a/include/drm/drm_atomic_helper.h > +++ b/include/drm/drm_atomic_helper.h > @@ -88,7 +88,7 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, > bool nonblock); > void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state); > void drm_atomic_helper_commit_hw_done(struct drm_atomic_state *state); > -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state); > +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state); > > /* implementations for legacy interfaces */ > int drm_atomic_helper_update_plane(struct drm_plane *plane, > > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: > On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote: > > On Sat, Feb 11, 2017 at 09:09:34PM +0000, Dan MacDonald wrote: > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > > index 21f992605541..46668d071d6a 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) > > else > > drm_atomic_helper_commit_tail(state); > > > > - drm_atomic_helper_commit_cleanup_done(state); > > - > > - drm_atomic_state_free(state); > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > > + drm_atomic_state_free(state); > > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe > that already fixes the issue? I'm not confident it will, as there is not an independent ref on the state for the phases, and so a forced timeout still leaves a dangling pointer. The above chunk goes the opposite way and leaks the state to avoid the invalid deref, what we need is a ref around its existence on the dependency queue if that is outside the lifetime of the commit. -Chris
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: > On Sun, Feb 12, 2017 at 12:15:46AM +0000, Russell King - ARM Linux wrote: > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > > index 21f992605541..46668d071d6a 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) > > else > > drm_atomic_helper_commit_tail(state); > > > > - drm_atomic_helper_commit_cleanup_done(state); > > - > > - drm_atomic_state_free(state); > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > > + drm_atomic_state_free(state); > > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe > that already fixes the issue? No. It's not the atomic state that's referenced, it's only a completion within the drm_crtc_commit structure, which is completely separate from the atomic state. Moreover, the event code has no knowledge of commits, so it can't "put" a reference count on it. See: void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e) { assert_spin_locked(&dev->event_lock); if (e->completion) { /* ->completion might disappear as soon as it signalled. */ complete_all(e->completion); e->completion = NULL; } vs the setup of the event done in drm_atomic_helper_setup_commit(): if (!crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL); if (!commit->event) return -ENOMEM; crtc_state->event = commit->event; } crtc_state->event->base.completion = &commit->flip_done; "commit" gets freed before drm_send_event_locked() is called (hence the timeout message) and when drm_send_event_locked() is eventually called via drm_vblank_off(), this causes a use-after-free bug.
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 21f992605541..46668d071d6a 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) else drm_atomic_helper_commit_tail(state); - drm_atomic_helper_commit_cleanup_done(state); - - drm_atomic_state_free(state); + if (drm_atomic_helper_commit_cleanup_done(state) == 0) + drm_atomic_state_free(state); } static void commit_work(struct work_struct *work) @@ -1591,12 +1590,12 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done); * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; struct drm_crtc_commit *commit; - int i; + int i, failed = 0; long ret; for_each_crtc_in_state(state, crtc, crtc_state, i) { @@ -1621,15 +1620,19 @@ void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) * not hold a reference of its own. */ ret = wait_for_completion_timeout(&commit->flip_done, 10*HZ); - if (ret == 0) - DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", + if (ret == 0) { + DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory leaked\n", crtc->base.id, crtc->name); + failed = -ETIMEDOUT; + } spin_lock(&crtc->commit_lock); del_commit: list_del(&commit->commit_entry); spin_unlock(&crtc->commit_lock); } + + return failed; } EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done); diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h index 7ff92b09fd9c..ee3d642c1feb 100644 --- a/include/drm/drm_atomic_helper.h +++ b/include/drm/drm_atomic_helper.h @@ -88,7 +88,7 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, bool nonblock); void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state); void drm_atomic_helper_commit_hw_done(struct drm_atomic_state *state); -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state); +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state); /* implementations for legacy interfaces */ int drm_atomic_helper_update_plane(struct drm_plane *plane,