Patchwork [v3] drm/exynos: fix kernel panic issue at drm releasing

login
register
mail settings
Submitter Inki Dae
Date Jan. 8, 2016, 8:46 a.m.
Message ID <1452242760-20298-2-git-send-email-inki.dae@samsung.com>
Download mbox | patch
Permalink /patch/7983351/
State New
Headers show

Comments

Inki Dae - Jan. 8, 2016, 8:46 a.m.
This patch fixes a kernel panic issue which happened
when drm driver is closed while modetest.

This issue could be reproduced easily by launching modetest
with page flip repeatedly.

The reason is that invalid drm_file object could be accessed by
send_vblank_event function when finishing page flip if the drm_file
object was removed by drm_release and there was a pended page
flip event which was already committed to hardware.

So this patch makes the pended page flip event to be cancelled by
preclose callback which is called at front of drm_release function.

Changelog v2:
- free vblank event objects belonging to the request process,
  increment event space and decrease pending_update when cancelling
  the event

Changelog v3:
- initialize only device specific things. Each page flip event object
  is created by DRM core so DRM core should release the object including
  incrementing event space.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
---
 drivers/gpu/drm/exynos/exynos_drm_crtc.c | 15 +++++++++++++++
 drivers/gpu/drm/exynos/exynos_drm_crtc.h |  3 +++
 drivers/gpu/drm/exynos/exynos_drm_drv.c  |  5 +++++
 3 files changed, 23 insertions(+)
Daniel Vetter - Jan. 11, 2016, 6:32 p.m.
On Fri, Jan 8, 2016 at 9:46 AM, Inki Dae <inki.dae@samsung.com> wrote:
> This patch fixes a kernel panic issue which happened
> when drm driver is closed while modetest.
>
> This issue could be reproduced easily by launching modetest
> with page flip repeatedly.
>
> The reason is that invalid drm_file object could be accessed by
> send_vblank_event function when finishing page flip if the drm_file
> object was removed by drm_release and there was a pended page
> flip event which was already committed to hardware.
>
> So this patch makes the pended page flip event to be cancelled by
> preclose callback which is called at front of drm_release function.
>
> Changelog v2:
> - free vblank event objects belonging to the request process,
>   increment event space and decrease pending_update when cancelling
>   the event
>
> Changelog v3:
> - initialize only device specific things. Each page flip event object
>   is created by DRM core so DRM core should release the object including
>   incrementing event space.
>
> Signed-off-by: Inki Dae <inki.dae@samsung.com>
> Reviewed-by: Daniel Stone <daniels@collabora.com>

Note that after my drm_event cleanup series has landed (aiming for
early 4.6) you can/should revert this again. But it's too big as a
bugfix this late in 4.5. Ack/tested-by from exynos on that series
would be great.
-Daniel
Daniel Stone - Jan. 11, 2016, 7 p.m.
Hi Inki,

On 8 January 2016 at 08:46, Inki Dae <inki.dae@samsung.com> wrote:
> Changelog v3:
> - initialize only device specific things. Each page flip event object
>   is created by DRM core so DRM core should release the object including
>   incrementing event space.

I'm a bit confused here; we no longer call event->base.destroy(),
because you say that the DRM core should release it. But how does the
DRM core know to release the event? From the core point of view, the
event disappears into the driver, and it is no longer tracked.

As Daniel says though, later versions handle all this in the core in a
much more clean way, so we can remove these from the drivers then.

Cheers,
Daniel
Inki Dae - Jan. 12, 2016, 6:25 a.m.
Hi Daniel,

2016? 01? 12? 04:00? Daniel Stone ?(?) ? ?:
> Hi Inki,
> 
> On 8 January 2016 at 08:46, Inki Dae <inki.dae@samsung.com> wrote:
>> Changelog v3:
>> - initialize only device specific things. Each page flip event object
>>   is created by DRM core so DRM core should release the object including
>>   incrementing event space.
> 
> I'm a bit confused here; we no longer call event->base.destroy(),
> because you say that the DRM core should release it. But how does the
> DRM core know to release the event? From the core point of view, the
> event disappears into the driver, and it is no longer tracked.

DRM core would need something to track the events. I think basically, someone who created one object should also destroy the object.

> 
> As Daniel says though, later versions handle all this in the core in a
> much more clean way, so we can remove these from the drivers then.

So I think it's not reasonable for specific driver to destroy the object created by core although there is a memory leak. However, the memory leak would be more critical than temporary codes.
Ok, I will merge this patch with more comments which will say the object will be destroyed by core part later.

Thanks,
Inki Dae

> 
> Cheers,
> Daniel
> 
>
Daniel Stone - Jan. 12, 2016, 9:55 a.m.
Hi Inki,

On 12 January 2016 at 06:25, Inki Dae <inki.dae@samsung.com> wrote:
> 2016? 01? 12? 04:00? Daniel Stone ?(?) ? ?:
>> On 8 January 2016 at 08:46, Inki Dae <inki.dae@samsung.com> wrote:
>>> Changelog v3:
>>> - initialize only device specific things. Each page flip event object
>>>   is created by DRM core so DRM core should release the object including
>>>   incrementing event space.
>>
>> I'm a bit confused here; we no longer call event->base.destroy(),
>> because you say that the DRM core should release it. But how does the
>> DRM core know to release the event? From the core point of view, the
>> event disappears into the driver, and it is no longer tracked.
>
> DRM core would need something to track the events. I think basically, someone who created one object should also destroy the object.

You're right, but this doesn't exist until Daniel Vetter's rather
larger patchset which is still pending merge.

>> As Daniel says though, later versions handle all this in the core in a
>> much more clean way, so we can remove these from the drivers then.
>
> So I think it's not reasonable for specific driver to destroy the object created by core although there is a memory leak. However, the memory leak would be more critical than temporary codes.
> Ok, I will merge this patch with more comments which will say the object will be destroyed by core part later.

Also, by stealing the event out of crtc_state->event and moving it to
exynos_crtc->event, you can argue that we have quite explictly removed
the responsibility from the core. ;)

Thanks for handling this!

Cheers,
Daniel

Patch

diff --git a/drivers/gpu/drm/exynos/exynos_drm_crtc.c b/drivers/gpu/drm/exynos/exynos_drm_crtc.c
index cd94981..8b503a0 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_crtc.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_crtc.c
@@ -226,3 +226,18 @@  void exynos_drm_crtc_te_handler(struct drm_crtc *crtc)
 	if (exynos_crtc->ops->te_handler)
 		exynos_crtc->ops->te_handler(exynos_crtc);
 }
+
+void exynos_drm_crtc_cancel_page_flip(struct drm_crtc *crtc)
+{
+	struct exynos_drm_crtc *exynos_crtc = to_exynos_crtc(crtc);
+	struct drm_pending_vblank_event *e;
+	unsigned long flags;
+
+	spin_lock_irqsave(&crtc->dev->event_lock, flags);
+	e = exynos_crtc->event;
+	if (e) {
+		exynos_crtc->event = NULL;
+		atomic_dec(&exynos_crtc->pending_update);
+	}
+	spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
+}
diff --git a/drivers/gpu/drm/exynos/exynos_drm_crtc.h b/drivers/gpu/drm/exynos/exynos_drm_crtc.h
index 6a581a8..b4def6e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_crtc.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_crtc.h
@@ -40,4 +40,7 @@  int exynos_drm_crtc_get_pipe_from_type(struct drm_device *drm_dev,
  */
 void exynos_drm_crtc_te_handler(struct drm_crtc *crtc);
 
+/* This function cancels a page flip request. */
+void exynos_drm_crtc_cancel_page_flip(struct drm_crtc *crtc);
+
 #endif
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 9756797a..57c0e7d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -330,7 +330,12 @@  err_file_priv_free:
 static void exynos_drm_preclose(struct drm_device *dev,
 					struct drm_file *file)
 {
+	struct drm_crtc *crtc;
+
 	exynos_drm_subdrv_close(dev, file);
+
+	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
+		exynos_drm_crtc_cancel_page_flip(crtc);
 }
 
 static void exynos_drm_postclose(struct drm_device *dev, struct drm_file *file)