diff mbox

[v2] drm/i915: Discard previous atomic state on resume if connectors change

Message ID 1462570796-22500-1-git-send-email-cpaul@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

cpaul@redhat.com May 6, 2016, 9:39 p.m. UTC
If an MST device is disconnected while the machine is suspended, the
number of connectors will change as well after we call
intel_dp_mst_resume(). This means that any previous atomic state we had
before suspending is no longer valid, since it'll still be pointing to
missing connectors. We need to check for this before committing the
state, otherwise we'll kernel panic on resume whenever if any MST
display was disconnected before we started resuming:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
Call Trace:
 [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
 [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
 [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
 [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
 [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
 [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
 [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
 [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
 [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
 [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
 [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
 [<ffffffff814da455>] device_resume+0xd5/0x1f0
 [<ffffffff814da58d>] async_resume+0x1d/0x50
 [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
 [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
 [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
 [<ffffffff810ad038>] worker_thread+0x48/0x4e0
 [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
 [<ffffffff810b3794>] kthread+0xe4/0x100
 [<ffffffff81742672>] ret_from_fork+0x22/0x50
 [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200

Changes since v1:
  - Move drm_atomic_state_free() call down so we're holding the
    appropriate locks when destroying the atomic state

Cc: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
---
 drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Daniel Vetter May 9, 2016, 6:42 a.m. UTC | #1
On Fri, May 06, 2016 at 05:39:56PM -0400, Lyude wrote:
> If an MST device is disconnected while the machine is suspended, the
> number of connectors will change as well after we call
> intel_dp_mst_resume(). This means that any previous atomic state we had
> before suspending is no longer valid, since it'll still be pointing to
> missing connectors. We need to check for this before committing the
> state, otherwise we'll kernel panic on resume whenever if any MST
> display was disconnected before we started resuming:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
> Call Trace:
>  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
>  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
>  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
>  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
>  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
>  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
>  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
>  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
>  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
>  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
>  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
>  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
>  [<ffffffff814da455>] device_resume+0xd5/0x1f0
>  [<ffffffff814da58d>] async_resume+0x1d/0x50
>  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
>  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
>  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
>  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
>  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
>  [<ffffffff810b3794>] kthread+0xe4/0x100
>  [<ffffffff81742672>] ret_from_fork+0x22/0x50
>  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> 
> Changes since v1:
>   - Move drm_atomic_state_free() call down so we're holding the
>     appropriate locks when destroying the atomic state
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Lyude <cpaul@redhat.com>

Is this still an issue on -nightly with the connector refcounting fixed?
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0f29ef6..d68efc7 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -15934,6 +15934,18 @@ void intel_display_resume(struct drm_device *dev)
>  retry:
>  	ret = drm_modeset_lock_all_ctx(dev, &ctx);
>  
> +	/*
> +	 * With MST, the number of connectors can change between suspend and
> +	 * resume, which means that the state we want to restore might now be
> +	 * impossible to use since it'll be pointing to non-existant
> +	 * connectors.
> +	 */
> +	if (ret == 0 &&
> +	    state->num_connector != dev->mode_config.num_connector) {
> +		drm_atomic_state_free(state);
> +		state = NULL;
> +	}
> +
>  	if (ret == 0 && !setup) {
>  		setup = true;
>  
> -- 
> 2.5.5
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
cpaul@redhat.com May 9, 2016, 2:07 p.m. UTC | #2
Yep, Dave's patches fix the issue on their own so this is only going to be
needed for 4.6.

On Mon, 2016-05-09 at 08:42 +0200, Daniel Vetter wrote:
> On Fri, May 06, 2016 at 05:39:56PM -0400, Lyude wrote:
> > 
> > If an MST device is disconnected while the machine is suspended, the
> > number of connectors will change as well after we call
> > intel_dp_mst_resume(). This means that any previous atomic state we had
> > before suspending is no longer valid, since it'll still be pointing to
> > missing connectors. We need to check for this before committing the
> > state, otherwise we'll kernel panic on resume whenever if any MST
> > display was disconnected before we started resuming:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40
> > [drm_kms_helper]
> > Call Trace:
> >  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
> >  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
> >  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
> >  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
> >  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
> >  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
> >  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
> >  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
> >  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
> >  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
> >  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
> >  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
> >  [<ffffffff814da455>] device_resume+0xd5/0x1f0
> >  [<ffffffff814da58d>] async_resume+0x1d/0x50
> >  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
> >  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
> >  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
> >  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
> >  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
> >  [<ffffffff810b3794>] kthread+0xe4/0x100
> >  [<ffffffff81742672>] ret_from_fork+0x22/0x50
> >  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> > 
> > Changes since v1:
> >   - Move drm_atomic_state_free() call down so we're holding the
> >     appropriate locks when destroying the atomic state
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lyude <cpaul@redhat.com>
> Is this still an issue on -nightly with the connector refcounting fixed?
> -Daniel
> 
> > 
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 0f29ef6..d68efc7 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -15934,6 +15934,18 @@ void intel_display_resume(struct drm_device *dev)
> >  retry:
> >  	ret = drm_modeset_lock_all_ctx(dev, &ctx);
> >  
> > +	/*
> > +	 * With MST, the number of connectors can change between suspend
> > and
> > +	 * resume, which means that the state we want to restore might now
> > be
> > +	 * impossible to use since it'll be pointing to non-existant
> > +	 * connectors.
> > +	 */
> > +	if (ret == 0 &&
> > +	    state->num_connector != dev->mode_config.num_connector) {
> > +		drm_atomic_state_free(state);
> > +		state = NULL;
> > +	}
> > +
> >  	if (ret == 0 && !setup) {
> >  		setup = true;
> >  
> > -- 
> > 2.5.5
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter May 9, 2016, 2:53 p.m. UTC | #3
On Mon, May 09, 2016 at 10:07:08AM -0400, Lyude Paul wrote:
> Yep, Dave's patches fix the issue on their own so this is only going to be
> needed for 4.6.

Ok, so not the first one that only needs to be applied to stable kernels.
For that to work you need to explain that this is the most minimal fix for
stabel kernels, and add a reference to the final commit in upstream that
fixes the issue for real. Otherwise Greg KH won't take it. It still needs
my ack.

Probably best to have an entire dp mst series for all of those, but maybe
ping me first on irc so I can check them quickly before you hit send.
-Daniel

> 
> On Mon, 2016-05-09 at 08:42 +0200, Daniel Vetter wrote:
> > On Fri, May 06, 2016 at 05:39:56PM -0400, Lyude wrote:
> > > 
> > > If an MST device is disconnected while the machine is suspended, the
> > > number of connectors will change as well after we call
> > > intel_dp_mst_resume(). This means that any previous atomic state we had
> > > before suspending is no longer valid, since it'll still be pointing to
> > > missing connectors. We need to check for this before committing the
> > > state, otherwise we'll kernel panic on resume whenever if any MST
> > > display was disconnected before we started resuming:
> > > 
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40
> > > [drm_kms_helper]
> > > Call Trace:
> > >  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
> > >  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
> > >  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
> > >  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
> > >  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
> > >  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
> > >  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
> > >  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
> > >  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
> > >  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
> > >  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
> > >  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
> > >  [<ffffffff814da455>] device_resume+0xd5/0x1f0
> > >  [<ffffffff814da58d>] async_resume+0x1d/0x50
> > >  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
> > >  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
> > >  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
> > >  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
> > >  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
> > >  [<ffffffff810b3794>] kthread+0xe4/0x100
> > >  [<ffffffff81742672>] ret_from_fork+0x22/0x50
> > >  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> > > 
> > > Changes since v1:
> > >   - Move drm_atomic_state_free() call down so we're holding the
> > >     appropriate locks when destroying the atomic state
> > > 
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Lyude <cpaul@redhat.com>
> > Is this still an issue on -nightly with the connector refcounting fixed?
> > -Daniel
> > 
> > > 
> > > ---
> > >  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > > b/drivers/gpu/drm/i915/intel_display.c
> > > index 0f29ef6..d68efc7 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -15934,6 +15934,18 @@ void intel_display_resume(struct drm_device *dev)
> > >  retry:
> > >  	ret = drm_modeset_lock_all_ctx(dev, &ctx);
> > >  
> > > +	/*
> > > +	 * With MST, the number of connectors can change between suspend
> > > and
> > > +	 * resume, which means that the state we want to restore might now
> > > be
> > > +	 * impossible to use since it'll be pointing to non-existant
> > > +	 * connectors.
> > > +	 */
> > > +	if (ret == 0 &&
> > > +	    state->num_connector != dev->mode_config.num_connector) {
> > > +		drm_atomic_state_free(state);
> > > +		state = NULL;
> > > +	}
> > > +
> > >  	if (ret == 0 && !setup) {
> > >  		setup = true;
> > >  
> > > -- 
> > > 2.5.5
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> -- 
> Cheers,
> 	Lyude
>
cpaul@redhat.com May 9, 2016, 3:31 p.m. UTC | #4
no problem, I'll let you know when they're ready

On Mon, 2016-05-09 at 16:53 +0200, Daniel Vetter wrote:
> On Mon, May 09, 2016 at 10:07:08AM -0400, Lyude Paul wrote:
> > 
> > Yep, Dave's patches fix the issue on their own so this is only going to be
> > needed for 4.6.
> Ok, so not the first one that only needs to be applied to stable kernels.
> For that to work you need to explain that this is the most minimal fix for
> stabel kernels, and add a reference to the final commit in upstream that
> fixes the issue for real. Otherwise Greg KH won't take it. It still needs
> my ack.
> 
> Probably best to have an entire dp mst series for all of those, but maybe
> ping me first on irc so I can check them quickly before you hit send.
> -Daniel
> 
> > 
> > 
> > On Mon, 2016-05-09 at 08:42 +0200, Daniel Vetter wrote:
> > > 
> > > On Fri, May 06, 2016 at 05:39:56PM -0400, Lyude wrote:
> > > > 
> > > > 
> > > > If an MST device is disconnected while the machine is suspended, the
> > > > number of connectors will change as well after we call
> > > > intel_dp_mst_resume(). This means that any previous atomic state we had
> > > > before suspending is no longer valid, since it'll still be pointing to
> > > > missing connectors. We need to check for this before committing the
> > > > state, otherwise we'll kernel panic on resume whenever if any MST
> > > > display was disconnected before we started resuming:
> > > > 
> > > > BUG: unable to handle kernel NULL pointer dereference at
> > > > 0000000000000008
> > > > IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40
> > > > [drm_kms_helper]
> > > > Call Trace:
> > > >  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
> > > >  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
> > > >  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
> > > >  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
> > > >  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
> > > >  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
> > > >  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
> > > >  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
> > > >  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
> > > >  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
> > > >  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
> > > >  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
> > > >  [<ffffffff814da455>] device_resume+0xd5/0x1f0
> > > >  [<ffffffff814da58d>] async_resume+0x1d/0x50
> > > >  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
> > > >  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
> > > >  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
> > > >  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
> > > >  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
> > > >  [<ffffffff810b3794>] kthread+0xe4/0x100
> > > >  [<ffffffff81742672>] ret_from_fork+0x22/0x50
> > > >  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> > > > 
> > > > Changes since v1:
> > > >   - Move drm_atomic_state_free() call down so we're holding the
> > > >     appropriate locks when destroying the atomic state
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Lyude <cpaul@redhat.com>
> > > Is this still an issue on -nightly with the connector refcounting fixed?
> > > -Daniel
> > > 
> > > > 
> > > > 
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > > > b/drivers/gpu/drm/i915/intel_display.c
> > > > index 0f29ef6..d68efc7 100644
> > > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > > @@ -15934,6 +15934,18 @@ void intel_display_resume(struct drm_device
> > > > *dev)
> > > >  retry:
> > > >  	ret = drm_modeset_lock_all_ctx(dev, &ctx);
> > > >  
> > > > +	/*
> > > > +	 * With MST, the number of connectors can change between
> > > > suspend
> > > > and
> > > > +	 * resume, which means that the state we want to restore might
> > > > now
> > > > be
> > > > +	 * impossible to use since it'll be pointing to non-existant
> > > > +	 * connectors.
> > > > +	 */
> > > > +	if (ret == 0 &&
> > > > +	    state->num_connector != dev->mode_config.num_connector) {
> > > > +		drm_atomic_state_free(state);
> > > > +		state = NULL;
> > > > +	}
> > > > +
> > > >  	if (ret == 0 && !setup) {
> > > >  		setup = true;
> > > >  
> > > > -- 
> > > > 2.5.5
> > > > 
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > -- 
> > Cheers,
> > 	Lyude
> >
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0f29ef6..d68efc7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15934,6 +15934,18 @@  void intel_display_resume(struct drm_device *dev)
 retry:
 	ret = drm_modeset_lock_all_ctx(dev, &ctx);
 
+	/*
+	 * With MST, the number of connectors can change between suspend and
+	 * resume, which means that the state we want to restore might now be
+	 * impossible to use since it'll be pointing to non-existant
+	 * connectors.
+	 */
+	if (ret == 0 &&
+	    state->num_connector != dev->mode_config.num_connector) {
+		drm_atomic_state_free(state);
+		state = NULL;
+	}
+
 	if (ret == 0 && !setup) {
 		setup = true;