Message ID | 20130217133814.GK5813@phenom.ffwll.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Feb 17, 2013 at 2:38 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > 2. The new i915 force restore code is indeed missing a safety check > compared to the old crtc helper based one: > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index 6eb3882..095094c 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev, > > if (force_restore) { > for_each_pipe(pipe) { > - intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]); > + struct drm_crtc * crtc = > + dev_priv->pipe_to_crtc_mapping[pipe]; > + > + if (!crtc->enabled) > + continue; > + > + intel_crtc_restore_mode(crtc); > } > > i915_redisable_vga(dev); > > The issue is that we save a pointer to the fb (since those are refcounted) > but copy the mode into the crtc struct (since modes are not refcounted). > So on restore the mode will always be non-NULL, which is wrong if the crtc > is off (and so also has a NULL fb). > > The problem I have with that patch is figuring out how this ever worked. I > think I need more coffee ;-) Ok, coffee seems to be working now. I think the above diff shouldn't change anything, since we already have a crtc->enabled check in intel_modeset_affected_pipes in intel_display.c. Still would be good if you can prove this one way or the other. For those wondering why this check is buried this deeply: We're in the middle of a massive rework of our modeset code, moving from one-crtc-at-a-time to global modeset. We need that to implement some fancy features like fastboot or better handling of global resource constraints (shared clocks, bw limits, ...). In the new world we set up the desired state in staging pointers/data structures. Then the modeset code diffs that with the current state and computes the best way to do the transition. But since we're still converting code some pieces pass in the new state explicitly, but lower levels then ignore some pieces when not required to reach the desired state. The new lid restore code relies on that by updating the tracked hw state from the real hw one and restoring the last desired state (which is still around from the last modeset call). Cheers, Daniel
On Sun, Feb 17, 2013 at 5:31 PM, Hugh Dickins <hughd@google.com> wrote: > On Sun, 17 Feb 2013, Daniel Vetter wrote: >> On Sun, Feb 17, 2013 at 3:21 AM, Hugh Dickins <hughd@google.com> wrote: >> > On Sat, 16 Feb 2013, Hugh Dickins wrote: >> >> On Sat, 16 Feb 2013, Linus Torvalds wrote: >> >> > >> >> > I think it's worth it to give them a heads-up already. So I've cc'd >> >> > the main suspects here.. >> >> >> >> Okay, thanks. >> >> >> >> > >> >> > Daniel, Dave - any comments about a NULL fb in >> >> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some >> >> > googling shows this: >> >> > >> >> > https://bugzilla.redhat.com/show_bug.cgi?id=895123 >> >> >> >> Great, yes, I'm sure that's the same (though it says "suspend" >> >> and I say "resume"). >> >> >> >> > >> >> > which sounds remarkably similar, and is also during a suspend attempt >> >> > (but apparently Satish got a full oops out).. Some timing race with a >> >> > worker entry? >> > >> > Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that >> > the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore >> > on lid open", whose force_restore case now passes down crtc->base.fb. But >> > I wouldn't have a clue why that's usually non-NULL but occasionally NULL: >> > your timing race with a worker entry, perhaps. >> > >> > And 45e2b5f640b3 contains a fine history of going back and forth, so I >> > wouldn't want to play further with it out of ignorance - though tempted >> > to replace the "if (force_restore) {" by an interim safe-seeming >> > compromise of "if (force_restore && crtc->base.fb) {". > > My suggestion there in the line above was stupidly wrong :( > >> >> Two things to try while I try to grow a clue on what exactly is going on: > > Thank you. > > By the way, I hope you've looked back through this thread, and realize > that Dave and I both had ThinkPad T4?0s suspend/resume display problems, > but they've gone off in different directions: so a lot of the discussion > comes from Dave having CONFIG_PROVE_RCU_DELAY, and has nothing to do with > what we now know to be my oops in i915/intel_display.c. Oh, I haven't read the earlier parts of the thread, but agree that it's a completely different bug: Different chipset (this matters massively for gpus usually), Dave's issue happens on -rc1 (which doesn't contain the offending lid_notifier patch yet) and seems to die someplace completely else than your box. >> 1. Related to new ACPI sleep states we've recently made the lid_notifier >> locking more sound, maybe that helps: >> >> http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=b8efb17b3d687695b81485f606fc4e6c35a50f9a > > Looks like it may be relevant, but I'll ignore it for now: > preferring first to test the more minimal safety you suggest below. > >> >> 2. The new i915 force restore code is indeed missing a safety check >> compared to the old crtc helper based one: >> >> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c >> index 6eb3882..095094c 100644 >> --- a/drivers/gpu/drm/i915/intel_display.c >> +++ b/drivers/gpu/drm/i915/intel_display.c >> @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev, >> >> if (force_restore) { >> for_each_pipe(pipe) { >> - intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]); >> + struct drm_crtc * crtc = >> + dev_priv->pipe_to_crtc_mapping[pipe]; >> + >> + if (!crtc->enabled) >> + continue; >> + >> + intel_crtc_restore_mode(crtc); >> } >> >> i915_redisable_vga(dev); > > I see your followup, where you observe that intel_modeset_affected_pipes() > should already have made this check; but you do say it would still be good > to prove one way or the other, so I'll run from now with the patch below. > > Note that we haven't got to worrying about what will be in 3.9 yet > (linux-next tells me to expect hangcheck timer problems): it's Linus's > current git for 3.8 final that we're working on at present. Right, patch was on top of -next, but there shouldn't be any (functional) differences in this area compared to 3.8. The first part of the big rework landed in 3.7 and contained the crtc->enabled check from day one. For the hangcheck issue in -next, that might be a new one. If you catch it again, can you please grab the i915_error_state from debugfs and throw it over to me? That should be enough for basic analysis. > And since quick resumes have always worked for me, it's only occasionally > that a long suspend (e.g. overnight) fails for me in this way, so I won't > be able to report success for several days - but failure may come sooner. > > And, it being very tiresome to debug when it does fail, I have inserted > WARN_ONs and more safety: here's what I'm running with now. Yep, that should be interesting once it catches something. For more debug noise can you set drm.debug=0xe (should work even when setting it in /sys/modules/drm/parameters at runtime). That spills tons of stuff into dmesg which will come handy in reconstructing how things fell apart. Presuming your machines survives a bad resume and you can grab dmesg, ofc. Thanks, Daniel > --- 3.8-rc7/drivers/gpu/drm/i915/intel_display.c 2013-01-17 20:06:11.384002362 -0800 > +++ linux/drivers/gpu/drm/i915/intel_display.c 2013-02-17 07:50:28.012075150 -0800 > @@ -4156,7 +4156,9 @@ static bool intel_choose_pipe_bpp_dither > * also stays within the max display bpc discovered above. > */ > > - switch (fb->depth) { > + if (WARN_ON(!fb)) > + bpc = 8; > + else switch (fb->depth) { > case 8: > bpc = 8; /* since we go through a colormap */ > break; > @@ -9302,6 +9304,10 @@ void intel_modeset_setup_hw_state(struct > if (force_restore) { > for_each_pipe(pipe) { > crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]); > + if (WARN_ON(!crtc->base.enabled)) > + continue; > + if (WARN_ON(!crtc->base.fb)) > + continue; > intel_set_mode(&crtc->base, &crtc->base.mode, > crtc->base.x, crtc->base.y, crtc->base.fb); > }
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 6eb3882..095094c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev, if (force_restore) { for_each_pipe(pipe) { - intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]); + struct drm_crtc * crtc = + dev_priv->pipe_to_crtc_mapping[pipe]; + + if (!crtc->enabled) + continue; + + intel_crtc_restore_mode(crtc); } i915_redisable_vga(dev);