[4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference
diff mbox

Message ID 55A36FA7.7010707@linux.intel.com
State New
Headers show

Commit Message

Maarten Lankhorst July 13, 2015, 7:58 a.m. UTC
Op 13-07-15 om 09:42 schreef Jörg Otte:
> 2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>:
>> Op 13-07-15 om 08:22 schreef Daniel Vetter:
>>> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
>>>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote:
>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
>>>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
>>>> Ugh. Please enable KALLSYMS to get sane symbols.
>>>>
>>>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
>>>> so it's pretty clearly just that change frm
>>>>
>>>> -       if (intel_crtc->active) {
>>>> +       if (crtc_state->base.active) {
>>>>
>>>> and "crtc_state" is NULL.
>>>>
>>>> And the code very much knows that crtc_state can be NULL, since it's
>>>> initialized with
>>>>
>>>>         crtc_state = state->base.state ?
>>>>                 intel_atomic_get_crtc_state(state->base.state,
>>>> intel_crtc) : NULL;
>>>>
>>>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
>>>> ("drm/i915: Use crtc_state->active in primary check_plane func") for
>>>> now, or is there a better fix? Like just checking crtc_state for NULL?
>>> Indeed embarrassing. I've missed that we still have 1 caller left that's
>>> using the transitional helpers, and those don't fill out
>>> plane_state->state backpointers to the global atomic update since there is
>>> no global atomic update for transitional helpers. Below diff should fix
>>> this - we need to preferentially check crts_state->active and if that's
>>> not set intel_crtc->active should yield the right result for the one
>>> remaining caller (it's in the crtc_disable paths).
>>>
>>> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision
>>> to transition to a different QA team ("we'll do this in 1 week without
>>> upfront planing") I essentially don't have proper QA support for 1-2
>>> months by now. The other trouble in this area specifically is that this
>>> code is already completely changed in -next again, so any testing done on
>>> integration trees (like -next or drm-intel-nightly) won't test any patches
>>> for 4.2.
>>> -Daniel
>>>
>>> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you
>>> decide to apply this right away.
>>>
>> Well your version has the benefit of compiling without errors. :-)
>>
>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Just noticed another problem:
> On each resume I get the following error:
> -----------[ cut here ]------------
> WARNING: CPU: 2 PID: 2663 at
> /data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319
> 0xffffffff9a33d5e9()
> WARN_ON(!crtc->state->enable)
> CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15
> ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012
> orkqueue: events_unbound 0xffffffff9a055750
> 0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000
> ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000
> ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664
> all Trace:
> [<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2
> [<ffffffff9a03c416>] ? 0xffffffff9a03c416
> [<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5
> [<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9
> [<ffffffff9a343ac3>] ? 0xffffffff9a343ac3
> [<ffffffff9a34444a>] ? 0xffffffff9a34444a
> [<ffffffff9a345518>] ? 0xffffffff9a345518
> [<ffffffff9a3246f0>] ? 0xffffffff9a3246f0
> [<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8
> [<ffffffff9a236170>] ? 0xffffffff9a236170
> [<ffffffff9a38b28d>] ? 0xffffffff9a38b28d
> [<ffffffff9a38b784>] ? 0xffffffff9a38b784
> [<ffffffff9a38baa4>] ? 0xffffffff9a38baa4
> [<ffffffff9a05577d>] ? 0xffffffff9a05577d
> [<ffffffff9a04dc47>] ? 0xffffffff9a04dc47
> [<ffffffff9a04dfab>] ? 0xffffffff9a04dfab
> [<ffffffff9a04dea0>] ? 0xffffffff9a04dea0
> [<ffffffff9a05331c>] ? 0xffffffff9a05331c
> [<ffffffff9a053260>] ? 0xffffffff9a053260
> [<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f
> [<ffffffff9a053260>] ? 0xffffffff9a053260
> --[ end trace 1b6d28ee34071679 ]---
>
> Nervertheless resume works, so it doesn't hurt me.
>
>
> BTW: I get also up to 40..50!  compile warnings like:
> i915/i915_drv.h: In function 'i915_debugfs_connector_add':
> i915/i915_drv.h:3119:53: warning: no return statement in function
> returning non-void [-Wreturn-type]
>
> which may cause yet uncovered troubles.
>
> Thanks, Jörg
kallsyms please!

Looks like intel_crtc_disable being called with a mode change on a already disabled crtc, it's gone in 4.3 because of the atomic rework.

Does something like below work?

Comments

Jörg Otte July 13, 2015, 8:50 a.m. UTC | #1
2015-07-13 9:58 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>:
> Op 13-07-15 om 09:42 schreef Jörg Otte:
>> 2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>:
>>> Op 13-07-15 om 08:22 schreef Daniel Vetter:
>>>> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
>>>>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote:
>>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
>>>>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
>>>>> Ugh. Please enable KALLSYMS to get sane symbols.
>>>>>
>>>>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
>>>>> so it's pretty clearly just that change frm
>>>>>
>>>>> -       if (intel_crtc->active) {
>>>>> +       if (crtc_state->base.active) {
>>>>>
>>>>> and "crtc_state" is NULL.
>>>>>
>>>>> And the code very much knows that crtc_state can be NULL, since it's
>>>>> initialized with
>>>>>
>>>>>         crtc_state = state->base.state ?
>>>>>                 intel_atomic_get_crtc_state(state->base.state,
>>>>> intel_crtc) : NULL;
>>>>>
>>>>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
>>>>> ("drm/i915: Use crtc_state->active in primary check_plane func") for
>>>>> now, or is there a better fix? Like just checking crtc_state for NULL?
>>>> Indeed embarrassing. I've missed that we still have 1 caller left that's
>>>> using the transitional helpers, and those don't fill out
>>>> plane_state->state backpointers to the global atomic update since there is
>>>> no global atomic update for transitional helpers. Below diff should fix
>>>> this - we need to preferentially check crts_state->active and if that's
>>>> not set intel_crtc->active should yield the right result for the one
>>>> remaining caller (it's in the crtc_disable paths).
>>>>
>>>> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision
>>>> to transition to a different QA team ("we'll do this in 1 week without
>>>> upfront planing") I essentially don't have proper QA support for 1-2
>>>> months by now. The other trouble in this area specifically is that this
>>>> code is already completely changed in -next again, so any testing done on
>>>> integration trees (like -next or drm-intel-nightly) won't test any patches
>>>> for 4.2.
>>>> -Daniel
>>>>
>>>> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you
>>>> decide to apply this right away.
>>>>
>>> Well your version has the benefit of compiling without errors. :-)
>>>
>>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Just noticed another problem:
>> On each resume I get the following error:
>> -----------[ cut here ]------------
>> WARNING: CPU: 2 PID: 2663 at
>> /data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319
>> 0xffffffff9a33d5e9()
>> WARN_ON(!crtc->state->enable)
>> CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15
>> ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012
>> orkqueue: events_unbound 0xffffffff9a055750
>> 0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000
>> ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000
>> ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664
>> all Trace:
>> [<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2
>> [<ffffffff9a03c416>] ? 0xffffffff9a03c416
>> [<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5
>> [<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9
>> [<ffffffff9a343ac3>] ? 0xffffffff9a343ac3
>> [<ffffffff9a34444a>] ? 0xffffffff9a34444a
>> [<ffffffff9a345518>] ? 0xffffffff9a345518
>> [<ffffffff9a3246f0>] ? 0xffffffff9a3246f0
>> [<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8
>> [<ffffffff9a236170>] ? 0xffffffff9a236170
>> [<ffffffff9a38b28d>] ? 0xffffffff9a38b28d
>> [<ffffffff9a38b784>] ? 0xffffffff9a38b784
>> [<ffffffff9a38baa4>] ? 0xffffffff9a38baa4
>> [<ffffffff9a05577d>] ? 0xffffffff9a05577d
>> [<ffffffff9a04dc47>] ? 0xffffffff9a04dc47
>> [<ffffffff9a04dfab>] ? 0xffffffff9a04dfab
>> [<ffffffff9a04dea0>] ? 0xffffffff9a04dea0
>> [<ffffffff9a05331c>] ? 0xffffffff9a05331c
>> [<ffffffff9a053260>] ? 0xffffffff9a053260
>> [<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f
>> [<ffffffff9a053260>] ? 0xffffffff9a053260
>> --[ end trace 1b6d28ee34071679 ]---
>>
>> Nervertheless resume works, so it doesn't hurt me.
>>
>>
>> BTW: I get also up to 40..50!  compile warnings like:
>> i915/i915_drv.h: In function 'i915_debugfs_connector_add':
>> i915/i915_drv.h:3119:53: warning: no return statement in function
>> returning non-void [-Wreturn-type]
>>
>> which may cause yet uncovered troubles.
>>
>> Thanks, Jörg
> kallsyms please!
>
> Looks like intel_crtc_disable being called with a mode change on a already disabled crtc, it's gone in 4.3 because of the atomic rework.
>
> Does something like below work?
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index ba9321998a41..725d2b727704 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -6315,9 +6315,6 @@ static void intel_crtc_disable(struct drm_crtc *crtc)
>         struct drm_connector *connector;
>         struct drm_i915_private *dev_priv = dev->dev_private;
>
> -       /* crtc should still be enabled when we disable it. */
> -       WARN_ON(!crtc->state->enable);
> -
>         intel_crtc_disable_planes(crtc);
>         dev_priv->display.crtc_disable(crtc);
>         dev_priv->display.off(crtc);
> @@ -12591,7 +12588,8 @@ static int __intel_set_mode(struct drm_crtc *modeset_crtc,
>                         continue;
>
>                 if (!crtc_state->enable) {
> -                       intel_crtc_disable(crtc);
> +                       if (crtc->state->enable)
> +                               intel_crtc_disable(crtc);
>                 } else if (crtc->state->enable) {
>                         intel_crtc_disable_planes(crtc);
>                         dev_priv->display.crtc_disable(crtc);
>
The patch works for me.

Thanks, Jörg

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ba9321998a41..725d2b727704 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -6315,9 +6315,6 @@  static void intel_crtc_disable(struct drm_crtc *crtc)
 	struct drm_connector *connector;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/* crtc should still be enabled when we disable it. */
-	WARN_ON(!crtc->state->enable);
-
 	intel_crtc_disable_planes(crtc);
 	dev_priv->display.crtc_disable(crtc);
 	dev_priv->display.off(crtc);
@@ -12591,7 +12588,8 @@  static int __intel_set_mode(struct drm_crtc *modeset_crtc,
 			continue;
 
 		if (!crtc_state->enable) {
-			intel_crtc_disable(crtc);
+			if (crtc->state->enable)
+				intel_crtc_disable(crtc);
 		} else if (crtc->state->enable) {
 			intel_crtc_disable_planes(crtc);
 			dev_priv->display.crtc_disable(crtc);