[4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference
diff mbox

Message ID 20150713062222.GG3736@phenom.ffwll.local
State New
Headers show

Commit Message

Daniel Vetter July 13, 2015, 6:22 a.m. UTC
On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
> > IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
> 
> Ugh. Please enable KALLSYMS to get sane symbols.
> 
> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
> so it's pretty clearly just that change frm
> 
> -       if (intel_crtc->active) {
> +       if (crtc_state->base.active) {
> 
> and "crtc_state" is NULL.
> 
> And the code very much knows that crtc_state can be NULL, since it's
> initialized with
> 
>         crtc_state = state->base.state ?
>                 intel_atomic_get_crtc_state(state->base.state,
> intel_crtc) : NULL;
> 
> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
> ("drm/i915: Use crtc_state->active in primary check_plane func") for
> now, or is there a better fix? Like just checking crtc_state for NULL?

Indeed embarrassing. I've missed that we still have 1 caller left that's
using the transitional helpers, and those don't fill out
plane_state->state backpointers to the global atomic update since there is
no global atomic update for transitional helpers. Below diff should fix
this - we need to preferentially check crts_state->active and if that's
not set intel_crtc->active should yield the right result for the one
remaining caller (it's in the crtc_disable paths).

For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision
to transition to a different QA team ("we'll do this in 1 week without
upfront planing") I essentially don't have proper QA support for 1-2
months by now. The other trouble in this area specifically is that this
code is already completely changed in -next again, so any testing done on
integration trees (like -next or drm-intel-nightly) won't test any patches
for 4.2.
-Daniel

Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you
decide to apply this right away.
---

Comments

Maarten Lankhorst July 13, 2015, 7:23 a.m. UTC | #1
Op 13-07-15 om 08:22 schreef Daniel Vetter:
> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote:
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
>> Ugh. Please enable KALLSYMS to get sane symbols.
>>
>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
>> so it's pretty clearly just that change frm
>>
>> -       if (intel_crtc->active) {
>> +       if (crtc_state->base.active) {
>>
>> and "crtc_state" is NULL.
>>
>> And the code very much knows that crtc_state can be NULL, since it's
>> initialized with
>>
>>         crtc_state = state->base.state ?
>>                 intel_atomic_get_crtc_state(state->base.state,
>> intel_crtc) : NULL;
>>
>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
>> ("drm/i915: Use crtc_state->active in primary check_plane func") for
>> now, or is there a better fix? Like just checking crtc_state for NULL?
> Indeed embarrassing. I've missed that we still have 1 caller left that's
> using the transitional helpers, and those don't fill out
> plane_state->state backpointers to the global atomic update since there is
> no global atomic update for transitional helpers. Below diff should fix
> this - we need to preferentially check crts_state->active and if that's
> not set intel_crtc->active should yield the right result for the one
> remaining caller (it's in the crtc_disable paths).
>
> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision
> to transition to a different QA team ("we'll do this in 1 week without
> upfront planing") I essentially don't have proper QA support for 1-2
> months by now. The other trouble in this area specifically is that this
> code is already completely changed in -next again, so any testing done on
> integration trees (like -next or drm-intel-nightly) won't test any patches
> for 4.2.
> -Daniel
>
> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you
> decide to apply this right away.
>
Well your version has the benefit of compiling without errors. :-)

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Jörg Otte July 13, 2015, 7:42 a.m. UTC | #2
2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>:
> Op 13-07-15 om 08:22 schreef Daniel Vetter:
>> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
>>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote:
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
>>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
>>> Ugh. Please enable KALLSYMS to get sane symbols.
>>>
>>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
>>> so it's pretty clearly just that change frm
>>>
>>> -       if (intel_crtc->active) {
>>> +       if (crtc_state->base.active) {
>>>
>>> and "crtc_state" is NULL.
>>>
>>> And the code very much knows that crtc_state can be NULL, since it's
>>> initialized with
>>>
>>>         crtc_state = state->base.state ?
>>>                 intel_atomic_get_crtc_state(state->base.state,
>>> intel_crtc) : NULL;
>>>
>>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
>>> ("drm/i915: Use crtc_state->active in primary check_plane func") for
>>> now, or is there a better fix? Like just checking crtc_state for NULL?
>> Indeed embarrassing. I've missed that we still have 1 caller left that's
>> using the transitional helpers, and those don't fill out
>> plane_state->state backpointers to the global atomic update since there is
>> no global atomic update for transitional helpers. Below diff should fix
>> this - we need to preferentially check crts_state->active and if that's
>> not set intel_crtc->active should yield the right result for the one
>> remaining caller (it's in the crtc_disable paths).
>>
>> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision
>> to transition to a different QA team ("we'll do this in 1 week without
>> upfront planing") I essentially don't have proper QA support for 1-2
>> months by now. The other trouble in this area specifically is that this
>> code is already completely changed in -next again, so any testing done on
>> integration trees (like -next or drm-intel-nightly) won't test any patches
>> for 4.2.
>> -Daniel
>>
>> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you
>> decide to apply this right away.
>>
> Well your version has the benefit of compiling without errors. :-)
>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Just noticed another problem:
On each resume I get the following error:
-----------[ cut here ]------------
WARNING: CPU: 2 PID: 2663 at
/data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319
0xffffffff9a33d5e9()
WARN_ON(!crtc->state->enable)
CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15
ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012
orkqueue: events_unbound 0xffffffff9a055750
0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000
ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000
ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664
all Trace:
[<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2
[<ffffffff9a03c416>] ? 0xffffffff9a03c416
[<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5
[<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9
[<ffffffff9a343ac3>] ? 0xffffffff9a343ac3
[<ffffffff9a34444a>] ? 0xffffffff9a34444a
[<ffffffff9a345518>] ? 0xffffffff9a345518
[<ffffffff9a3246f0>] ? 0xffffffff9a3246f0
[<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8
[<ffffffff9a236170>] ? 0xffffffff9a236170
[<ffffffff9a38b28d>] ? 0xffffffff9a38b28d
[<ffffffff9a38b784>] ? 0xffffffff9a38b784
[<ffffffff9a38baa4>] ? 0xffffffff9a38baa4
[<ffffffff9a05577d>] ? 0xffffffff9a05577d
[<ffffffff9a04dc47>] ? 0xffffffff9a04dc47
[<ffffffff9a04dfab>] ? 0xffffffff9a04dfab
[<ffffffff9a04dea0>] ? 0xffffffff9a04dea0
[<ffffffff9a05331c>] ? 0xffffffff9a05331c
[<ffffffff9a053260>] ? 0xffffffff9a053260
[<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f
[<ffffffff9a053260>] ? 0xffffffff9a053260
--[ end trace 1b6d28ee34071679 ]---

Nervertheless resume works, so it doesn't hurt me.


BTW: I get also up to 40..50!  compile warnings like:
i915/i915_drv.h: In function 'i915_debugfs_connector_add':
i915/i915_drv.h:3119:53: warning: no return statement in function
returning non-void [-Wreturn-type]

which may cause yet uncovered troubles.

Thanks, Jörg

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ba9321998a41..85ac6d85dc39 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13276,7 +13276,7 @@  intel_check_primary_plane(struct drm_plane *plane,
 	if (ret)
 		return ret;
 
-	if (crtc_state->base.active) {
+	if (crtc_state ? crtc_state->base.active : intel_crtc->active) {
 		struct intel_plane_state *old_state =
 			to_intel_plane_state(plane->state);