diff mbox

drm/i915: Discard previous atomic state on resume if connectors change

Message ID 1462284220-14930-1-git-send-email-cpaul@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

cpaul@redhat.com May 3, 2016, 2:03 p.m. UTC
If an MST device is disconnected while the machine is suspended, the
number of connectors will change as well after we call
intel_dp_mst_resume(). This means that any previous atomic state we had
before suspending is no longer valid, since it'll still be pointing to
missing connectors. We need to check for this before committing the
state, otherwise we'll kernel panic on resume whenever if any MST
display was disconnected before we started resuming:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
Call Trace:
 [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
 [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
 [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
 [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
 [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
 [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
 [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
 [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
 [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
 [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
 [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
 [<ffffffff814da455>] device_resume+0xd5/0x1f0
 [<ffffffff814da58d>] async_resume+0x1d/0x50
 [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
 [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
 [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
 [<ffffffff810ad038>] worker_thread+0x48/0x4e0
 [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
 [<ffffffff810b3794>] kthread+0xe4/0x100
 [<ffffffff81742672>] ret_from_fork+0x22/0x50
 [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200

Cc: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Daniel Vetter May 3, 2016, 2:29 p.m. UTC | #1
On Tue, May 03, 2016 at 10:03:40AM -0400, Lyude wrote:
> If an MST device is disconnected while the machine is suspended, the
> number of connectors will change as well after we call
> intel_dp_mst_resume(). This means that any previous atomic state we had
> before suspending is no longer valid, since it'll still be pointing to
> missing connectors. We need to check for this before committing the
> state, otherwise we'll kernel panic on resume whenever if any MST
> display was disconnected before we started resuming:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
> Call Trace:
>  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
>  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
>  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
>  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
>  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
>  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
>  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
>  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
>  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
>  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
>  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
>  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
>  [<ffffffff814da455>] device_resume+0xd5/0x1f0
>  [<ffffffff814da58d>] async_resume+0x1d/0x50
>  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
>  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
>  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
>  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
>  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
>  [<ffffffff810b3794>] kthread+0xe4/0x100
>  [<ffffffff81742672>] ret_from_fork+0x22/0x50
>  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Lyude <cpaul@redhat.com>

This should be addressed by the connector refcounting fixes Dave Airlie
has for 4.7 (not all merged yet though). Can you please retest with those?
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 6e0d828..252c06c 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -15945,6 +15945,17 @@ void intel_display_resume(struct drm_device *dev)
>  	dev_priv->modeset_restore_state = NULL;
>  
>  	/*
> +	 * With MST, the number of connectors can change between suspend and
> +	 * resume, which means that the state we want to restore might now be
> +	 * impossible to use since it'll be pointing to non-existant
> +	 * connectors.
> +	 */
> +	if (state->num_connector != dev->mode_config.num_connector) {
> +		drm_atomic_state_free(state);
> +		state = NULL;
> +	}
> +
> +	/*
>  	 * This is a cludge because with real atomic modeset mode_config.mutex
>  	 * won't be taken. Unfortunately some probed state like
>  	 * audio_codec_enable is still protected by mode_config.mutex, so lock
> -- 
> 2.5.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
cpaul@redhat.com May 3, 2016, 3:08 p.m. UTC | #2
Yeah airlied said the same thing. This patch is more intended for just 4.6 since
the refcounting patch isn't very likely to get into 4.6.

On Tue, 2016-05-03 at 16:29 +0200, Daniel Vetter wrote:
> On Tue, May 03, 2016 at 10:03:40AM -0400, Lyude wrote:
> > 
> > If an MST device is disconnected while the machine is suspended, the
> > number of connectors will change as well after we call
> > intel_dp_mst_resume(). This means that any previous atomic state we had
> > before suspending is no longer valid, since it'll still be pointing to
> > missing connectors. We need to check for this before committing the
> > state, otherwise we'll kernel panic on resume whenever if any MST
> > display was disconnected before we started resuming:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40
> > [drm_kms_helper]
> > Call Trace:
> >  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
> >  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
> >  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
> >  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
> >  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
> >  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
> >  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
> >  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
> >  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
> >  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
> >  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
> >  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
> >  [<ffffffff814da455>] device_resume+0xd5/0x1f0
> >  [<ffffffff814da58d>] async_resume+0x1d/0x50
> >  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
> >  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
> >  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
> >  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
> >  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
> >  [<ffffffff810b3794>] kthread+0xe4/0x100
> >  [<ffffffff81742672>] ret_from_fork+0x22/0x50
> >  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lyude <cpaul@redhat.com>
> This should be addressed by the connector refcounting fixes Dave Airlie
> has for 4.7 (not all merged yet though). Can you please retest with those?
> -Daniel
> 
> > 
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 6e0d828..252c06c 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -15945,6 +15945,17 @@ void intel_display_resume(struct drm_device *dev)
> >  	dev_priv->modeset_restore_state = NULL;
> >  
> >  	/*
> > +	 * With MST, the number of connectors can change between suspend
> > and
> > +	 * resume, which means that the state we want to restore might now
> > be
> > +	 * impossible to use since it'll be pointing to non-existant
> > +	 * connectors.
> > +	 */
> > +	if (state->num_connector != dev->mode_config.num_connector) {
> > +		drm_atomic_state_free(state);
> > +		state = NULL;
> > +	}
> > +
> > +	/*
> >  	 * This is a cludge because with real atomic modeset
> > mode_config.mutex
> >  	 * won't be taken. Unfortunately some probed state like
> >  	 * audio_codec_enable is still protected by mode_config.mutex, so
> > lock
> > -- 
> > 2.5.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 6e0d828..252c06c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15945,6 +15945,17 @@  void intel_display_resume(struct drm_device *dev)
 	dev_priv->modeset_restore_state = NULL;
 
 	/*
+	 * With MST, the number of connectors can change between suspend and
+	 * resume, which means that the state we want to restore might now be
+	 * impossible to use since it'll be pointing to non-existant
+	 * connectors.
+	 */
+	if (state->num_connector != dev->mode_config.num_connector) {
+		drm_atomic_state_free(state);
+		state = NULL;
+	}
+
+	/*
 	 * This is a cludge because with real atomic modeset mode_config.mutex
 	 * won't be taken. Unfortunately some probed state like
 	 * audio_codec_enable is still protected by mode_config.mutex, so lock