diff mbox series

[RESEND,v2,1/2] drm/dp/mst: Reprobe EDID for MST ports on resume

Message ID 20181024021925.27026-2-juston.li@intel.com (mailing list archive)
State New, archived
Headers show
Series Check MST topology change on resume | expand

Commit Message

Juston Li Oct. 24, 2018, 2:19 a.m. UTC
From: Lyude <cpaul@redhat.com>

As observed with the latest ThinkPad docks, we unfortunately can't rely
on docks keeping us updated with hotplug events that happened while we
were suspended. On top of that, even if the number of connectors remains
the same between suspend and resume it's still not safe to assume that
there were no hotplugs, since a different monitor might have been
plugged into a port another monitor previously occupied. As such, we
need to go through all of the MST ports and check whether or not their
EDIDs have changed.

In addition to that, we also now return -EINVAL from
drm_dp_mst_topology_mgr_resume to indicate to callers that they need to
reset the MST connection, and that they can't rely on any other method
of reprobing.

Cc: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Juston Li <juston.li@intel.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 94 ++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

Comments

Lyude Paul Oct. 24, 2018, 10:09 p.m. UTC | #1
Thought this was going to be an easy review until I realized that there's
multiple problems in nouveau this would cause issues with, even if we didn't
pay attention to the -EINVAL that gets returned. The suspend/resume order in
nouveau needs some fixing up to prevent this patch from causing timeouts, and
then also there's a hidden surprise

[  972.294437] ============================================
[  972.295225] WARNING: possible recursive locking detected
[  972.296042] 4.19.0-rc8mst-reprobe+ #2 Not tainted
[  972.296842] --------------------------------------------
[  972.297614] zsh/6840 is trying to acquire lock:
[  972.298441] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.299299] 
               but task is already holding lock:
[  972.300991] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
status_store+0x34/0x180 [drm]
[  972.301875] 
               other info that might help us debug this:
[  972.303563]  Possible unsafe locking scenario:

[  972.305315]        CPU0
[  972.306182]        ----
[  972.307061]   lock(&dev->mode_config.mutex);
[  972.307943]   lock(&dev->mode_config.mutex);
[  972.308819] 
                *** DEADLOCK ***

[  972.311446]  May be due to missing lock nesting notation

[  972.313234] 6 locks held by zsh/6840:
[  972.314135]  #0: 0000000092b5ba9d (sb_writers#4){.+.+}, at:
vfs_write+0x13e/0x1a0
[  972.315049]  #1: 00000000e628e6e9 (&of->mutex){+.+.}, at:
kernfs_fop_write+0xbd/0x1a0
[  972.315980]  #2: 000000000fb65e6c (kn->count#240){.+.+}, at:
kernfs_fop_write+0xc5/0x1a0
[  972.316928]  #3: 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
status_store+0x34/0x180 [drm]
[  972.317892]  #4: 00000000344224c2 (crtc_ww_class_acquire){+.+.}, at:
drm_helper_probe_single_connector_modes+0x66/0x6c0 [drm_kms_helper]
[  972.318863]  #5: 00000000d65352e2 (crtc_ww_class_mutex){+.+.}, at:
drm_modeset_lock+0x44/0x110 [drm]
[  972.319860] 
               stack backtrace:
[  972.321800] CPU: 5 PID: 6840 Comm: zsh Kdump: loaded Not tainted 4.19.0-
rc8mst-reprobe+ #2
[  972.322772] Hardware name: LENOVO 20EQZ1J7US/20EQZ1J7US, BIOS N1EET80W
(1.53 ) 09/14/2018
[  972.323792] Call Trace:
[  972.324821]  dump_stack+0x85/0xc0
[  972.325832]  __lock_acquire.cold.59+0x158/0x227
[  972.326887]  ? retint_kernel+0x10/0x10
[  972.327918]  lock_acquire+0x9e/0x170
[  972.328948]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.329986]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.331066]  __mutex_lock+0x68/0x900
[  972.332094]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.333146]  ? __mutex_unlock_slowpath+0x4b/0x2b0
[  972.334221]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.335255]  drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
[  972.336318]  drm_dp_mst_topology_mgr_resume+0x104/0x190 [drm_kms_helper]
[  972.337397]  nv50_display_init+0x73/0xd0 [nouveau]
[  972.338483]  nouveau_display_init+0x8e/0xe0 [nouveau]
[  972.339547]  nouveau_display_resume+0x39/0x250 [nouveau]
[  972.340634]  ? pci_restore_standard_config+0x40/0x40
[  972.341762]  nouveau_do_resume+0x79/0x150 [nouveau]
[  972.342850]  nouveau_pmops_runtime_resume+0x88/0x150 [nouveau]
[  972.343915]  pci_pm_runtime_resume+0x78/0xb0
[  972.345004]  __rpm_callback+0x75/0x1b0
[  972.346084]  ? pci_restore_standard_config+0x40/0x40
[  972.347148]  rpm_callback+0x1f/0x70
[  972.348256]  ? pci_restore_standard_config+0x40/0x40
[  972.349351]  rpm_resume+0x5d4/0x810
[  972.350446]  ? drm_modeset_lock+0x44/0x110 [drm]
[  972.351573]  __pm_runtime_resume+0x47/0x70
[  972.352737]  nouveau_connector_detect+0x373/0x470 [nouveau]
[  972.353841]  ? _cond_resched+0x15/0x30
[  972.354945]  ? ww_mutex_lock+0x30/0x60
[  972.356044]  ? drm_modeset_lock+0x44/0x110 [drm]
[  972.357146]  drm_helper_probe_single_connector_modes+0xd9/0x6c0
[drm_kms_helper]
[  972.358274]  ? printk+0x58/0x6f
[  972.359400]  status_store+0xb2/0x180 [drm]
[  972.360519]  kernfs_fop_write+0xf0/0x1a0
[  972.361711]  __vfs_write+0x36/0x180
[  972.362842]  ? rcu_read_lock_sched_held+0x55/0x60
[  972.363974]  ? rcu_sync_lockdep_assert+0x2e/0x60
[  972.365103]  ? __sb_start_write+0x115/0x170
[  972.366241]  ? vfs_write+0x13e/0x1a0
[  972.367365]  vfs_write+0xa5/0x1a0
[  972.368486]  ksys_write+0x52/0xc0
[  972.369596]  do_syscall_64+0x60/0x1a0
[  972.370782]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  972.371890] RIP: 0033:0x7fc93e169ef4
[  972.373012] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00
00 66 90 48 8d 05 f1 3a 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[  972.374214] RSP: 002b:00007ffcd7ad1918 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[  972.375421] RAX: ffffffffffffffda RBX: 0000000000000007 RCX:
00007fc93e169ef4
[  972.376628] RDX: 0000000000000007 RSI: 000055aefe5c30c0 RDI:
0000000000000001
[  972.377872] RBP: 000055aefe5c30c0 R08: 00007fc93e43a8c0 R09:
00007fc93f6cdb80
[  972.379082] R10: 000000000000000a R11: 0000000000000246 R12:
00007fc93e439760
[  972.380311] R13: 0000000000000007 R14: 00007fc93e434760 R15:
0000000000000007

I must not have noticed that back when I wrote these patches! Whoops.

Since there's going to be quite a number of changes I need to make to this I'm
just going to make the changes myself! I'll make sure to Cc you with the
respin

On Tue, 2018-10-23 at 19:19 -0700, Juston Li wrote:
> From: Lyude <cpaul@redhat.com>
> 
> As observed with the latest ThinkPad docks, we unfortunately can't rely
> on docks keeping us updated with hotplug events that happened while we
> were suspended. On top of that, even if the number of connectors remains
> the same between suspend and resume it's still not safe to assume that
> there were no hotplugs, since a different monitor might have been
> plugged into a port another monitor previously occupied. As such, we
> need to go through all of the MST ports and check whether or not their
> EDIDs have changed.
> 
> In addition to that, we also now return -EINVAL from
> drm_dp_mst_topology_mgr_resume to indicate to callers that they need to
> reset the MST connection, and that they can't rely on any other method
> of reprobing.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Lyude <cpaul@redhat.com>
> Signed-off-by: Juston Li <juston.li@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 94 ++++++++++++++++++++++++++-
>  1 file changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 5ff1d79b86c4..88abebe52021 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -29,6 +29,7 @@
>  #include <linux/i2c.h>
>  #include <drm/drm_dp_mst_helper.h>
>  #include <drm/drmP.h>
> +#include <drm/drm_edid.h>
>  
>  #include <drm/drm_fixed.h>
>  #include <drm/drm_atomic.h>
> @@ -2201,6 +2202,64 @@ void drm_dp_mst_topology_mgr_suspend(struct
> drm_dp_mst_topology_mgr *mgr)
>  }
>  EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
>  
> +static bool drm_dp_mst_edids_changed(struct drm_dp_mst_topology_mgr *mgr,
> +				     struct drm_dp_mst_port *port)
> +{
> +	struct drm_device *dev;
> +	struct drm_connector *connector;
> +	struct drm_dp_mst_port *dport;
> +	struct drm_dp_mst_branch *mstb;
> +	struct edid *current_edid, *cached_edid;
> +	bool ret = false;
> +
> +	port = drm_dp_get_validated_port_ref(mgr, port);
> +	if (!port)
> +		return false;
> +
> +	mstb = drm_dp_get_validated_mstb_ref(mgr, port->mstb);
> +	if (mstb) {
> +		list_for_each_entry(dport, &port->mstb->ports, next) {
> +			ret = drm_dp_mst_edids_changed(mgr, dport);
> +			if (ret)
> +				break;
> +		}
> +
> +		drm_dp_put_mst_branch_device(mstb);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	connector = port->connector;
> +	if (!connector || !port->aux.ddc.algo)
> +		goto out;
> +
> +	dev = connector->dev;
> +	mutex_lock(&dev->mode_config.mutex);
> +
> +	current_edid = drm_get_edid(connector, &port->aux.ddc);
> +	if (connector->edid_blob_ptr)
> +		cached_edid = (void *)connector->edid_blob_ptr->data;
> +	else
> +		return false;
> +
> +	if ((current_edid && cached_edid && memcmp(current_edid, cached_edid,
> +						   sizeof(struct edid)) != 0)
> ||
> +	    (!current_edid && cached_edid) || (current_edid && !cached_edid))
> {
> +		ret = true;
> +		DRM_DEBUG_KMS("EDID on %s changed, reprobing connectors\n",
> +			      connector->name);
> +	}
> +
> +	mutex_unlock(&dev->mode_config.mutex);
> +
> +	kfree(current_edid);
> +
> +out:
> +	drm_dp_put_port(port);
> +
> +	return ret;
> +}
> +
>  /**
>   * drm_dp_mst_topology_mgr_resume() - resume the MST manager
>   * @mgr: manager to resume
> @@ -2210,9 +2269,15 @@ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
>   *
>   * if the device fails this returns -1, and the driver should do
>   * a full MST reprobe, in case we were undocked.
> + *
> + * if the device can no longer be trusted, this returns -EINVAL
> + * and the driver should unconditionally disconnect and reconnect
> + * the dock.
>   */
>  int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr)
>  {
> +	struct drm_dp_mst_branch *mstb;
> +	struct drm_dp_mst_port *port;
>  	int ret = 0;
>  
>  	mutex_lock(&mgr->lock);
> @@ -2246,8 +2311,35 @@ int drm_dp_mst_topology_mgr_resume(struct
> drm_dp_mst_topology_mgr *mgr)
>  		drm_dp_check_mstb_guid(mgr->mst_primary, guid);
>  
>  		ret = 0;
> -	} else
> +
> +		/*
> +		 * Some hubs also forget to notify us of hotplugs that
> happened
> +		 * while we were in suspend, so we need to verify that the
> edid
> +		 * hasn't changed for any of the connectors. If it has been,
> +		 * we unfortunately can't rely on the dock updating us with
> +		 * hotplug events, so indicate we need a full reconnect.
> +		 */
> +
> +		/* MST's I2C helpers can't be used while holding this lock */
> +		mutex_unlock(&mgr->lock);
> +
> +		mstb = drm_dp_get_validated_mstb_ref(mgr, mgr->mst_primary);
> +		if (mstb) {
> +			list_for_each_entry(port, &mstb->ports, next) {
> +				if (drm_dp_mst_edids_changed(mgr, port)) {
> +					ret = -EINVAL;
> +					break;
> +				}
> +			}
> +
> +			drm_dp_put_mst_branch_device(mstb);
> +		}
> +	} else {
>  		ret = -1;
> +		mutex_unlock(&mgr->lock);
> +	}
> +
> +	return ret;
>  
>  out_unlock:
>  	mutex_unlock(&mgr->lock);
Juston Li Oct. 24, 2018, 10:45 p.m. UTC | #2
On Wed, 2018-10-24 at 18:09 -0400, Lyude Paul wrote:
> Since there's going to be quite a number of changes I need to make to
> this I'm
> just going to make the changes myself! I'll make sure to Cc you with
> the
> respin

Sounds good, thanks for picking it up!

Juston
Lyude Paul Oct. 24, 2018, 11:02 p.m. UTC | #3
Gah, the more I think about this the more I realize this was never the correct
approach to begin with. I wrote this patch a long time ago when I wasn't
nearly as experienced, so that's not terribly surprising.

So: the thing is this isn't actually a problem that's specific to MST. Pretty
much all of the different connector types have this issue, since any GPU is
going to miss hotplugs if the entire system is in S3. So it doesn't make much
sense to have this be in MST helpers; we should just reprobe all of the
connectors on the system.

Honestly: I think it might just be a better idea to just send a sysfs hotplug
when we wake up from suspend/resume, so that userspace just rechecks all of
the connectors anyway. Nouveau already does this, jfyi.

On Wed, 2018-10-24 at 18:09 -0400, Lyude Paul wrote:
> Thought this was going to be an easy review until I realized that there's
> multiple problems in nouveau this would cause issues with, even if we didn't
> pay attention to the -EINVAL that gets returned. The suspend/resume order in
> nouveau needs some fixing up to prevent this patch from causing timeouts,
> and
> then also there's a hidden surprise
> 
> [  972.294437] ============================================
> [  972.295225] WARNING: possible recursive locking detected
> [  972.296042] 4.19.0-rc8mst-reprobe+ #2 Not tainted
> [  972.296842] --------------------------------------------
> [  972.297614] zsh/6840 is trying to acquire lock:
> [  972.298441] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
> drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.299299] 
>                but task is already holding lock:
> [  972.300991] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
> status_store+0x34/0x180 [drm]
> [  972.301875] 
>                other info that might help us debug this:
> [  972.303563]  Possible unsafe locking scenario:
> 
> [  972.305315]        CPU0
> [  972.306182]        ----
> [  972.307061]   lock(&dev->mode_config.mutex);
> [  972.307943]   lock(&dev->mode_config.mutex);
> [  972.308819] 
>                 *** DEADLOCK ***
> 
> [  972.311446]  May be due to missing lock nesting notation
> 
> [  972.313234] 6 locks held by zsh/6840:
> [  972.314135]  #0: 0000000092b5ba9d (sb_writers#4){.+.+}, at:
> vfs_write+0x13e/0x1a0
> [  972.315049]  #1: 00000000e628e6e9 (&of->mutex){+.+.}, at:
> kernfs_fop_write+0xbd/0x1a0
> [  972.315980]  #2: 000000000fb65e6c (kn->count#240){.+.+}, at:
> kernfs_fop_write+0xc5/0x1a0
> [  972.316928]  #3: 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at:
> status_store+0x34/0x180 [drm]
> [  972.317892]  #4: 00000000344224c2 (crtc_ww_class_acquire){+.+.}, at:
> drm_helper_probe_single_connector_modes+0x66/0x6c0 [drm_kms_helper]
> [  972.318863]  #5: 00000000d65352e2 (crtc_ww_class_mutex){+.+.}, at:
> drm_modeset_lock+0x44/0x110 [drm]
> [  972.319860] 
>                stack backtrace:
> [  972.321800] CPU: 5 PID: 6840 Comm: zsh Kdump: loaded Not tainted 4.19.0-
> rc8mst-reprobe+ #2
> [  972.322772] Hardware name: LENOVO 20EQZ1J7US/20EQZ1J7US, BIOS N1EET80W
> (1.53 ) 09/14/2018
> [  972.323792] Call Trace:
> [  972.324821]  dump_stack+0x85/0xc0
> [  972.325832]  __lock_acquire.cold.59+0x158/0x227
> [  972.326887]  ? retint_kernel+0x10/0x10
> [  972.327918]  lock_acquire+0x9e/0x170
> [  972.328948]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.329986]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.331066]  __mutex_lock+0x68/0x900
> [  972.332094]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.333146]  ? __mutex_unlock_slowpath+0x4b/0x2b0
> [  972.334221]  ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.335255]  drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper]
> [  972.336318]  drm_dp_mst_topology_mgr_resume+0x104/0x190 [drm_kms_helper]
> [  972.337397]  nv50_display_init+0x73/0xd0 [nouveau]
> [  972.338483]  nouveau_display_init+0x8e/0xe0 [nouveau]
> [  972.339547]  nouveau_display_resume+0x39/0x250 [nouveau]
> [  972.340634]  ? pci_restore_standard_config+0x40/0x40
> [  972.341762]  nouveau_do_resume+0x79/0x150 [nouveau]
> [  972.342850]  nouveau_pmops_runtime_resume+0x88/0x150 [nouveau]
> [  972.343915]  pci_pm_runtime_resume+0x78/0xb0
> [  972.345004]  __rpm_callback+0x75/0x1b0
> [  972.346084]  ? pci_restore_standard_config+0x40/0x40
> [  972.347148]  rpm_callback+0x1f/0x70
> [  972.348256]  ? pci_restore_standard_config+0x40/0x40
> [  972.349351]  rpm_resume+0x5d4/0x810
> [  972.350446]  ? drm_modeset_lock+0x44/0x110 [drm]
> [  972.351573]  __pm_runtime_resume+0x47/0x70
> [  972.352737]  nouveau_connector_detect+0x373/0x470 [nouveau]
> [  972.353841]  ? _cond_resched+0x15/0x30
> [  972.354945]  ? ww_mutex_lock+0x30/0x60
> [  972.356044]  ? drm_modeset_lock+0x44/0x110 [drm]
> [  972.357146]  drm_helper_probe_single_connector_modes+0xd9/0x6c0
> [drm_kms_helper]
> [  972.358274]  ? printk+0x58/0x6f
> [  972.359400]  status_store+0xb2/0x180 [drm]
> [  972.360519]  kernfs_fop_write+0xf0/0x1a0
> [  972.361711]  __vfs_write+0x36/0x180
> [  972.362842]  ? rcu_read_lock_sched_held+0x55/0x60
> [  972.363974]  ? rcu_sync_lockdep_assert+0x2e/0x60
> [  972.365103]  ? __sb_start_write+0x115/0x170
> [  972.366241]  ? vfs_write+0x13e/0x1a0
> [  972.367365]  vfs_write+0xa5/0x1a0
> [  972.368486]  ksys_write+0x52/0xc0
> [  972.369596]  do_syscall_64+0x60/0x1a0
> [  972.370782]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  972.371890] RIP: 0033:0x7fc93e169ef4
> [  972.373012] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00
> 00
> 00 66 90 48 8d 05 f1 3a 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d
> 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
> [  972.374214] RSP: 002b:00007ffcd7ad1918 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  972.375421] RAX: ffffffffffffffda RBX: 0000000000000007 RCX:
> 00007fc93e169ef4
> [  972.376628] RDX: 0000000000000007 RSI: 000055aefe5c30c0 RDI:
> 0000000000000001
> [  972.377872] RBP: 000055aefe5c30c0 R08: 00007fc93e43a8c0 R09:
> 00007fc93f6cdb80
> [  972.379082] R10: 000000000000000a R11: 0000000000000246 R12:
> 00007fc93e439760
> [  972.380311] R13: 0000000000000007 R14: 00007fc93e434760 R15:
> 0000000000000007
> 
> I must not have noticed that back when I wrote these patches! Whoops.
> 
> Since there's going to be quite a number of changes I need to make to this
> I'm
> just going to make the changes myself! I'll make sure to Cc you with the
> respin
> 
> On Tue, 2018-10-23 at 19:19 -0700, Juston Li wrote:
> > From: Lyude <cpaul@redhat.com>
> > 
> > As observed with the latest ThinkPad docks, we unfortunately can't rely
> > on docks keeping us updated with hotplug events that happened while we
> > were suspended. On top of that, even if the number of connectors remains
> > the same between suspend and resume it's still not safe to assume that
> > there were no hotplugs, since a different monitor might have been
> > plugged into a port another monitor previously occupied. As such, we
> > need to go through all of the MST ports and check whether or not their
> > EDIDs have changed.
> > 
> > In addition to that, we also now return -EINVAL from
> > drm_dp_mst_topology_mgr_resume to indicate to callers that they need to
> > reset the MST connection, and that they can't rely on any other method
> > of reprobing.
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lyude <cpaul@redhat.com>
> > Signed-off-by: Juston Li <juston.li@intel.com>
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 94 ++++++++++++++++++++++++++-
> >  1 file changed, 93 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 5ff1d79b86c4..88abebe52021 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/i2c.h>
> >  #include <drm/drm_dp_mst_helper.h>
> >  #include <drm/drmP.h>
> > +#include <drm/drm_edid.h>
> >  
> >  #include <drm/drm_fixed.h>
> >  #include <drm/drm_atomic.h>
> > @@ -2201,6 +2202,64 @@ void drm_dp_mst_topology_mgr_suspend(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  }
> >  EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
> >  
> > +static bool drm_dp_mst_edids_changed(struct drm_dp_mst_topology_mgr *mgr,
> > +				     struct drm_dp_mst_port *port)
> > +{
> > +	struct drm_device *dev;
> > +	struct drm_connector *connector;
> > +	struct drm_dp_mst_port *dport;
> > +	struct drm_dp_mst_branch *mstb;
> > +	struct edid *current_edid, *cached_edid;
> > +	bool ret = false;
> > +
> > +	port = drm_dp_get_validated_port_ref(mgr, port);
> > +	if (!port)
> > +		return false;
> > +
> > +	mstb = drm_dp_get_validated_mstb_ref(mgr, port->mstb);
> > +	if (mstb) {
> > +		list_for_each_entry(dport, &port->mstb->ports, next) {
> > +			ret = drm_dp_mst_edids_changed(mgr, dport);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +		drm_dp_put_mst_branch_device(mstb);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +
> > +	connector = port->connector;
> > +	if (!connector || !port->aux.ddc.algo)
> > +		goto out;
> > +
> > +	dev = connector->dev;
> > +	mutex_lock(&dev->mode_config.mutex);
> > +
> > +	current_edid = drm_get_edid(connector, &port->aux.ddc);
> > +	if (connector->edid_blob_ptr)
> > +		cached_edid = (void *)connector->edid_blob_ptr->data;
> > +	else
> > +		return false;
> > +
> > +	if ((current_edid && cached_edid && memcmp(current_edid, cached_edid,
> > +						   sizeof(struct edid)) != 0)
> > > > 
> > 
> > +	    (!current_edid && cached_edid) || (current_edid && !cached_edid))
> > {
> > +		ret = true;
> > +		DRM_DEBUG_KMS("EDID on %s changed, reprobing connectors\n",
> > +			      connector->name);
> > +	}
> > +
> > +	mutex_unlock(&dev->mode_config.mutex);
> > +
> > +	kfree(current_edid);
> > +
> > +out:
> > +	drm_dp_put_port(port);
> > +
> > +	return ret;
> > +}
> > +
> >  /**
> >   * drm_dp_mst_topology_mgr_resume() - resume the MST manager
> >   * @mgr: manager to resume
> > @@ -2210,9 +2269,15 @@ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
> >   *
> >   * if the device fails this returns -1, and the driver should do
> >   * a full MST reprobe, in case we were undocked.
> > + *
> > + * if the device can no longer be trusted, this returns -EINVAL
> > + * and the driver should unconditionally disconnect and reconnect
> > + * the dock.
> >   */
> >  int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr)
> >  {
> > +	struct drm_dp_mst_branch *mstb;
> > +	struct drm_dp_mst_port *port;
> >  	int ret = 0;
> >  
> >  	mutex_lock(&mgr->lock);
> > @@ -2246,8 +2311,35 @@ int drm_dp_mst_topology_mgr_resume(struct
> > drm_dp_mst_topology_mgr *mgr)
> >  		drm_dp_check_mstb_guid(mgr->mst_primary, guid);
> >  
> >  		ret = 0;
> > -	} else
> > +
> > +		/*
> > +		 * Some hubs also forget to notify us of hotplugs that
> > happened
> > +		 * while we were in suspend, so we need to verify that the
> > edid
> > +		 * hasn't changed for any of the connectors. If it has been,
> > +		 * we unfortunately can't rely on the dock updating us with
> > +		 * hotplug events, so indicate we need a full reconnect.
> > +		 */
> > +
> > +		/* MST's I2C helpers can't be used while holding this lock */
> > +		mutex_unlock(&mgr->lock);
> > +
> > +		mstb = drm_dp_get_validated_mstb_ref(mgr, mgr->mst_primary);
> > +		if (mstb) {
> > +			list_for_each_entry(port, &mstb->ports, next) {
> > +				if (drm_dp_mst_edids_changed(mgr, port)) {
> > +					ret = -EINVAL;
> > +					break;
> > +				}
> > +			}
> > +
> > +			drm_dp_put_mst_branch_device(mstb);
> > +		}
> > +	} else {
> >  		ret = -1;
> > +		mutex_unlock(&mgr->lock);
> > +	}
> > +
> > +	return ret;
> >  
> >  out_unlock:
> >  	mutex_unlock(&mgr->lock);
Lyude Paul Nov. 9, 2018, 12:43 a.m. UTC | #4
Are you still looking into this at all? Even if it's not the right approach
I'm still interested in seeing this get fixed as well/discussing what I said
before

On Wed, 2018-10-24 at 22:45 +0000, Li, Juston wrote:
> On Wed, 2018-10-24 at 18:09 -0400, Lyude Paul wrote:
> > Since there's going to be quite a number of changes I need to make to
> > this I'm
> > just going to make the changes myself! I'll make sure to Cc you with
> > the
> > respin
> 
> Sounds good, thanks for picking it up!
> 
> Juston
>
Juston Li Nov. 26, 2018, 9:53 p.m. UTC | #5
Sorry for the late reply, email got caught in a bad filter =/

I haven't been looking into it lately but we are still interested in
getting this fixed so I can start looking into this again.

In your last email you mentioned a sysfs hotplug could be the way to
go?

Thanks
Juston

On Thu, 2018-11-08 at 19:43 -0500, Lyude Paul wrote:
> Are you still looking into this at all? Even if it's not the right
> approach
> I'm still interested in seeing this get fixed as well/discussing what
> I said
> before
> 
> On Wed, 2018-10-24 at 22:45 +0000, Li, Juston wrote:
> > On Wed, 2018-10-24 at 18:09 -0400, Lyude Paul wrote:
> > > Since there's going to be quite a number of changes I need to
> > > make to
> > > this I'm
> > > just going to make the changes myself! I'll make sure to Cc you
> > > with
> > > the
> > > respin
> > 
> > Sounds good, thanks for picking it up!
> > 
> > Juston
> >
Lyude Paul Nov. 26, 2018, 10:35 p.m. UTC | #6
On Mon, 2018-11-26 at 21:53 +0000, Li, Juston wrote:
> Sorry for the late reply, email got caught in a bad filter =/
> 
> I haven't been looking into it lately but we are still interested in
> getting this fixed so I can start looking into this again.
> 
> In your last email you mentioned a sysfs hotplug could be the way to
> go?
I had thought so; but I did a little more investigation and it looks like I
wasn't totally wrong when I first wrote this patch; we actually do need to
destroy the previous topology when it's not actually there and prevent the
driver from trying to set it up again. Something that apparently a simple
sysfs hotplug won't do.

I think what we might be able to do is instead of comparing against the
connector's EDID, we should try repurposing drm_dp_mst_port->cached_edid so
that it holds a copy of each mst connector's EDID instead of just logical
MSTBs, then compare against those EDIDs instead of the ones we assigned to the
drm_connector to avoid the lockdep issues that would occur from trying to grab
dev->mode_config.mutex from the suspend/resume codepaths, and decide from
where whether or not to tell the driver to cut off the topology.

Feel free to poke me about this on irc btw, might be quicker to discuss it
there
> 
> Thanks
> Juston
> 
> On Thu, 2018-11-08 at 19:43 -0500, Lyude Paul wrote:
> > Are you still looking into this at all? Even if it's not the right
> > approach
> > I'm still interested in seeing this get fixed as well/discussing what
> > I said
> > before
> > 
> > On Wed, 2018-10-24 at 22:45 +0000, Li, Juston wrote:
> > > On Wed, 2018-10-24 at 18:09 -0400, Lyude Paul wrote:
> > > > Since there's going to be quite a number of changes I need to
> > > > make to
> > > > this I'm
> > > > just going to make the changes myself! I'll make sure to Cc you
> > > > with
> > > > the
> > > > respin
> > > 
> > > Sounds good, thanks for picking it up!
> > > 
> > > Juston
> > >
Sasha Levin June 14, 2019, 9:56 p.m. UTC | #7
Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.1.9, v4.19.50, v4.14.125, v4.9.181, v4.4.181.

v5.1.9: Build failed! Errors:
    drivers/gpu/drm/drm_dp_mst_topology.c:2672:9: error: implicit declaration of function ‘drm_dp_get_validated_port_ref’; did you mean ‘drm_mode_validate_driver’? [-Werror=implicit-function-declaration]
    drivers/gpu/drm/drm_dp_mst_topology.c:2676:9: error: implicit declaration of function ‘drm_dp_get_validated_mstb_ref’; did you mean ‘drm_mode_validate_size’? [-Werror=implicit-function-declaration]
    drivers/gpu/drm/drm_dp_mst_topology.c:2684:3: error: implicit declaration of function ‘drm_dp_put_mst_branch_device’; did you mean ‘drm_dp_get_mst_branch_device’? [-Werror=implicit-function-declaration]
    drivers/gpu/drm/drm_dp_mst_topology.c:2715:2: error: implicit declaration of function ‘drm_dp_put_port’; did you mean ‘drm_dp_get_port’? [-Werror=implicit-function-declaration]

v4.19.50: Build OK!
v4.14.125: Build OK!
v4.9.181: Build OK!
v4.4.181: Build OK!

How should we proceed with this patch?

--
Thanks,
Sasha
Lyude Paul June 14, 2019, 10:10 p.m. UTC | #8
uh, Sasha not sure if you're a bot or not but this patch isn't even upstream

On Fri, 2019-06-14 at 21:56 +0000, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
> 
> The bot has tested the following trees: v5.1.9, v4.19.50, v4.14.125,
> v4.9.181, v4.4.181.
> 
> v5.1.9: Build failed! Errors:
>     drivers/gpu/drm/drm_dp_mst_topology.c:2672:9: error: implicit
> declaration of function ‘drm_dp_get_validated_port_ref’; did you mean
> ‘drm_mode_validate_driver’? [-Werror=implicit-function-declaration]
>     drivers/gpu/drm/drm_dp_mst_topology.c:2676:9: error: implicit
> declaration of function ‘drm_dp_get_validated_mstb_ref’; did you mean
> ‘drm_mode_validate_size’? [-Werror=implicit-function-declaration]
>     drivers/gpu/drm/drm_dp_mst_topology.c:2684:3: error: implicit
> declaration of function ‘drm_dp_put_mst_branch_device’; did you mean
> ‘drm_dp_get_mst_branch_device’? [-Werror=implicit-function-declaration]
>     drivers/gpu/drm/drm_dp_mst_topology.c:2715:2: error: implicit
> declaration of function ‘drm_dp_put_port’; did you mean ‘drm_dp_get_port’?
> [-Werror=implicit-function-declaration]
> 
> v4.19.50: Build OK!
> v4.14.125: Build OK!
> v4.9.181: Build OK!
> v4.4.181: Build OK!
> 
> How should we proceed with this patch?
> 
> --
> Thanks,
> Sasha
Sasha Levin June 15, 2019, 10:38 p.m. UTC | #9
On Fri, Jun 14, 2019 at 06:10:20PM -0400, Lyude Paul wrote:
>uh, Sasha not sure if you're a bot or not but this patch isn't even upstream

This is indeed a bot, and that's indeed it's purpose :)

We tend to get less responses if we wait for weeks for a patch to get
upstream and queued for -stable because most often folks have moved on
and forgotten all about it.

This bot looks for commits tagged for stable, and runs basic sanity
checks on them while they're still being discussed on the mailing list.
This way we get more responses with instructions on how to deal with the
patch.

Either way, it won't be actually queued for stable before it's upstream.

--
Thanks,
Sasha
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 5ff1d79b86c4..88abebe52021 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -29,6 +29,7 @@ 
 #include <linux/i2c.h>
 #include <drm/drm_dp_mst_helper.h>
 #include <drm/drmP.h>
+#include <drm/drm_edid.h>
 
 #include <drm/drm_fixed.h>
 #include <drm/drm_atomic.h>
@@ -2201,6 +2202,64 @@  void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr)
 }
 EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
 
+static bool drm_dp_mst_edids_changed(struct drm_dp_mst_topology_mgr *mgr,
+				     struct drm_dp_mst_port *port)
+{
+	struct drm_device *dev;
+	struct drm_connector *connector;
+	struct drm_dp_mst_port *dport;
+	struct drm_dp_mst_branch *mstb;
+	struct edid *current_edid, *cached_edid;
+	bool ret = false;
+
+	port = drm_dp_get_validated_port_ref(mgr, port);
+	if (!port)
+		return false;
+
+	mstb = drm_dp_get_validated_mstb_ref(mgr, port->mstb);
+	if (mstb) {
+		list_for_each_entry(dport, &port->mstb->ports, next) {
+			ret = drm_dp_mst_edids_changed(mgr, dport);
+			if (ret)
+				break;
+		}
+
+		drm_dp_put_mst_branch_device(mstb);
+		if (ret)
+			goto out;
+	}
+
+	connector = port->connector;
+	if (!connector || !port->aux.ddc.algo)
+		goto out;
+
+	dev = connector->dev;
+	mutex_lock(&dev->mode_config.mutex);
+
+	current_edid = drm_get_edid(connector, &port->aux.ddc);
+	if (connector->edid_blob_ptr)
+		cached_edid = (void *)connector->edid_blob_ptr->data;
+	else
+		return false;
+
+	if ((current_edid && cached_edid && memcmp(current_edid, cached_edid,
+						   sizeof(struct edid)) != 0) ||
+	    (!current_edid && cached_edid) || (current_edid && !cached_edid)) {
+		ret = true;
+		DRM_DEBUG_KMS("EDID on %s changed, reprobing connectors\n",
+			      connector->name);
+	}
+
+	mutex_unlock(&dev->mode_config.mutex);
+
+	kfree(current_edid);
+
+out:
+	drm_dp_put_port(port);
+
+	return ret;
+}
+
 /**
  * drm_dp_mst_topology_mgr_resume() - resume the MST manager
  * @mgr: manager to resume
@@ -2210,9 +2269,15 @@  EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
  *
  * if the device fails this returns -1, and the driver should do
  * a full MST reprobe, in case we were undocked.
+ *
+ * if the device can no longer be trusted, this returns -EINVAL
+ * and the driver should unconditionally disconnect and reconnect
+ * the dock.
  */
 int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr)
 {
+	struct drm_dp_mst_branch *mstb;
+	struct drm_dp_mst_port *port;
 	int ret = 0;
 
 	mutex_lock(&mgr->lock);
@@ -2246,8 +2311,35 @@  int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr)
 		drm_dp_check_mstb_guid(mgr->mst_primary, guid);
 
 		ret = 0;
-	} else
+
+		/*
+		 * Some hubs also forget to notify us of hotplugs that happened
+		 * while we were in suspend, so we need to verify that the edid
+		 * hasn't changed for any of the connectors. If it has been,
+		 * we unfortunately can't rely on the dock updating us with
+		 * hotplug events, so indicate we need a full reconnect.
+		 */
+
+		/* MST's I2C helpers can't be used while holding this lock */
+		mutex_unlock(&mgr->lock);
+
+		mstb = drm_dp_get_validated_mstb_ref(mgr, mgr->mst_primary);
+		if (mstb) {
+			list_for_each_entry(port, &mstb->ports, next) {
+				if (drm_dp_mst_edids_changed(mgr, port)) {
+					ret = -EINVAL;
+					break;
+				}
+			}
+
+			drm_dp_put_mst_branch_device(mstb);
+		}
+	} else {
 		ret = -1;
+		mutex_unlock(&mgr->lock);
+	}
+
+	return ret;
 
 out_unlock:
 	mutex_unlock(&mgr->lock);