diff mbox series

drm/i915/tgl: Set drm_crtc_state.active=false for all added disconnected CRTCs sharing MST stream.

Message ID 20201020074555.24315-1-khaled.almahallawy@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/tgl: Set drm_crtc_state.active=false for all added disconnected CRTCs sharing MST stream. | expand

Commit Message

Almahallawy, Khaled Oct. 20, 2020, 7:45 a.m. UTC
This patch avoids failing atomic commits sent by user space by making sure CRTC/Connector added to drm_atomic_state by the driver are in valid state.

When disconnecting MST hub with two or more connected displays. The user space sends IOCTL for each MST pipe to disable.
drm_atomic_state object sent from user space contains only the state of the crtc/pipe intended to disable.
In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC and connectors that share the MST stream to drm_atomic_state:

drm_atomic_commit
   drm_atomic_helper_check_modeset
       update_connector_routing
       intel_dp_mst_atomic_check = funcs->atomic_check(connector, state);
       	   intel_dp_mst_atomic_master_trans_check
		intel_atomic_get_digital_connector_state
			drm_atomic_get_connector_state   <-- Add all Connectors
			    drm_atomic_get_crtc_state <-- Add all CRTCs
       update_connector_routing <-- Check added Connector/CRTCs - Will fail

However the added crtc/connector pair will be in invalid state (enabled state for a removed connector)
triggering this condition in drm_atomic_helper.c/update_connector_routing:

	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
	    crtc_state->active) {
		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
				 connector->base.id, connector->name);
		return -EINVAL;
	}

Which will cause the drm_atomic_commit/IOCTL for disabling one of MST stream pipes (Main MST) to fail.

The problem happens when a user space (as Chrome) doesn’t retry a falling commit, leaving a disconnected MST pipe still ON,
which will result in failing reconnect of MST hub or even worse leaving TC PHY in a connected state while the MST Hub is disconnected.

Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)

Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
---
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ville Syrjälä Oct. 20, 2020, 12:41 p.m. UTC | #1
On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> This patch avoids failing atomic commits sent by user space by making sure CRTC/Connector added to drm_atomic_state by the driver are in valid state.
> 
> When disconnecting MST hub with two or more connected displays. The user space sends IOCTL for each MST pipe to disable.
> drm_atomic_state object sent from user space contains only the state of the crtc/pipe intended to disable.
> In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC and connectors that share the MST stream to drm_atomic_state:
> 
> drm_atomic_commit
>    drm_atomic_helper_check_modeset
>        update_connector_routing
>        intel_dp_mst_atomic_check = funcs->atomic_check(connector, state);
>        	   intel_dp_mst_atomic_master_trans_check
> 		intel_atomic_get_digital_connector_state
> 			drm_atomic_get_connector_state   <-- Add all Connectors
> 			    drm_atomic_get_crtc_state <-- Add all CRTCs
>        update_connector_routing <-- Check added Connector/CRTCs - Will fail
> 
> However the added crtc/connector pair will be in invalid state (enabled state for a removed connector)
> triggering this condition in drm_atomic_helper.c/update_connector_routing:
> 
> 	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> 	    crtc_state->active) {
> 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> 				 connector->base.id, connector->name);
> 		return -EINVAL;
> 	}

Yeah, I think that "reject modeset on unregistered connectors" idea is
a bit broken given how the uapi has worked in the past. Cc:ing danvet
and lyude who IIRC were involved with that.

Hmm. Maybe we could add the other stuff to the state only after the
connector .atomic_check() stuff has been done? I don't quite remember
why we decided to do it here. José do you recall the details?

> 
> Which will cause the drm_atomic_commit/IOCTL for disabling one of MST stream pipes (Main MST) to fail.
> 
> The problem happens when a user space (as Chrome) doesn’t retry a falling commit, leaving a disconnected MST pipe still ON,
> which will result in failing reconnect of MST hub or even worse leaving TC PHY in a connected state while the MST Hub is disconnected.
> 
> Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)
> 
> Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> index e948aacbd4ab..1ede980876ed 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> @@ -265,6 +265,9 @@ intel_dp_mst_atomic_master_trans_check(struct intel_connector *connector,
>  			return ret;
>  		}
>  		crtc_state->uapi.mode_changed = true;
> +
> +		if (connector_iter->base.status == connector_status_disconnected)
> +			crtc_state->uapi.active = false;

That will make the state userspace last set inconsistent with what's
really going on. Which means suddenly page flips/vblank waits and
whatnot will start to fail.

Also that wil directly mutate the prop visible to user space, which
is not how these things are supposed to work. I think if we did do
something like this we should maybe have some kind of internal
flag for it.

>  	}
>  	drm_connector_list_iter_end(&connector_list_iter);
>  
> -- 
> 2.25.1
Souza, Jose Oct. 20, 2020, 11:25 p.m. UTC | #2
On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> > This patch avoids failing atomic commits sent by user space by making sure CRTC/Connector added to drm_atomic_state by the driver are in valid state.
> > 
> > When disconnecting MST hub with two or more connected displays. The user space sends IOCTL for each MST pipe to disable.
> > drm_atomic_state object sent from user space contains only the state of the crtc/pipe intended to disable.
> > In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC and connectors that share the MST stream to drm_atomic_state:
> > 
> > drm_atomic_commit
> >    drm_atomic_helper_check_modeset
> >        update_connector_routing
> >        intel_dp_mst_atomic_check = funcs->atomic_check(connector, state);
> >        	   intel_dp_mst_atomic_master_trans_check
> > 		intel_atomic_get_digital_connector_state
> > 			drm_atomic_get_connector_state   <-- Add all Connectors
> > 			    drm_atomic_get_crtc_state <-- Add all CRTCs
> >        update_connector_routing <-- Check added Connector/CRTCs - Will fail
> > 
> > However the added crtc/connector pair will be in invalid state (enabled state for a removed connector)
> > triggering this condition in drm_atomic_helper.c/update_connector_routing:
> > 
> > 	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > 	    crtc_state->active) {
> > 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > 				 connector->base.id, connector->name);
> > 		return -EINVAL;
> > 	}
> 
> Yeah, I think that "reject modeset on unregistered connectors" idea is
> a bit broken given how the uapi has worked in the past. Cc:ing danvet
> and lyude who IIRC were involved with that.
> 
> Hmm. Maybe we could add the other stuff to the state only after the
> connector .atomic_check() stuff has been done? I don't quite remember
> why we decided to do it here. José do you recall the details?

Because the connector check function runs twice in drm_atomic_helper_check_modeset(), in the first iteration it will add all connectors that share the
same MST stream to state, the second one will make sure all other checks passed in all connectors of the MST stream.

To me looks like the Chrome userspace is not doing the right thing, it is sending asynchronous atomic commits with conflicting state between each
commit.
If it had a pool that dispatch one atomic state at time waiting for completion before dispatch the next one it would not be a issue.

> 
> > 
> > Which will cause the drm_atomic_commit/IOCTL for disabling one of MST stream pipes (Main MST) to fail.
> > 
> > The problem happens when a user space (as Chrome) doesn’t retry a falling commit, leaving a disconnected MST pipe still ON,
> > which will result in failing reconnect of MST hub or even worse leaving TC PHY in a connected state while the MST Hub is disconnected.
> > 
> > Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)
> > 
> > Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > index e948aacbd4ab..1ede980876ed 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > @@ -265,6 +265,9 @@ intel_dp_mst_atomic_master_trans_check(struct intel_connector *connector,
> >  			return ret;
> >  		}
> >  		crtc_state->uapi.mode_changed = true;
> > +
> > +		if (connector_iter->base.status == connector_status_disconnected)
> > +			crtc_state->uapi.active = false;
> 
> That will make the state userspace last set inconsistent with what's
> really going on. Which means suddenly page flips/vblank waits and
> whatnot will start to fail.
> 
> Also that wil directly mutate the prop visible to user space, which
> is not how these things are supposed to work. I think if we did do
> something like this we should maybe have some kind of internal
> flag for it.
> 
> >  	}
> >  	drm_connector_list_iter_end(&connector_list_iter);
> >  
> > 
> > 
> > 
> > -- 
> > 2.25.1
>
Souza, Jose Oct. 21, 2020, 12:29 a.m. UTC | #3
On Tue, 2020-10-20 at 16:25 -0700, José Roberto de Souza wrote:
> On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> > > This patch avoids failing atomic commits sent by user space by making sure CRTC/Connector added to drm_atomic_state by the driver are in valid state.
> > > 
> > > When disconnecting MST hub with two or more connected displays. The user space sends IOCTL for each MST pipe to disable.
> > > drm_atomic_state object sent from user space contains only the state of the crtc/pipe intended to disable.
> > > In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC and connectors that share the MST stream to drm_atomic_state:
> > > 
> > > drm_atomic_commit
> > >    drm_atomic_helper_check_modeset
> > >        update_connector_routing
> > >        intel_dp_mst_atomic_check = funcs->atomic_check(connector, state);
> > >        	   intel_dp_mst_atomic_master_trans_check
> > > 		intel_atomic_get_digital_connector_state
> > > 			drm_atomic_get_connector_state   <-- Add all Connectors
> > > 			    drm_atomic_get_crtc_state <-- Add all CRTCs
> > >        update_connector_routing <-- Check added Connector/CRTCs - Will fail
> > > 
> > > However the added crtc/connector pair will be in invalid state (enabled state for a removed connector)
> > > triggering this condition in drm_atomic_helper.c/update_connector_routing:
> > > 
> > > 	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > 	    crtc_state->active) {
> > > 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > 				 connector->base.id, connector->name);
> > > 		return -EINVAL;
> > > 	}
> > 
> > Yeah, I think that "reject modeset on unregistered connectors" idea is
> > a bit broken given how the uapi has worked in the past. Cc:ing danvet
> > and lyude who IIRC were involved with that.
> > 
> > Hmm. Maybe we could add the other stuff to the state only after the
> > connector .atomic_check() stuff has been done? I don't quite remember
> > why we decided to do it here. José do you recall the details?
> 
> Because the connector check function runs twice in drm_atomic_helper_check_modeset(), in the first iteration it will add all connectors that share the
> same MST stream to state, the second one will make sure all other checks passed in all connectors of the MST stream.
> 
> To me looks like the Chrome userspace is not doing the right thing, it is sending asynchronous atomic commits with conflicting state between each
> commit.

Oh it do not have information about other connectors so not conflicting but anyways userspace should retry atomic commits anyways.

> If it had a pool that dispatch one atomic state at time waiting for completion before dispatch the next one it would not be a issue.
> 
> > 
> > > 
> > > Which will cause the drm_atomic_commit/IOCTL for disabling one of MST stream pipes (Main MST) to fail.
> > > 
> > > The problem happens when a user space (as Chrome) doesn’t retry a falling commit, leaving a disconnected MST pipe still ON,
> > > which will result in failing reconnect of MST hub or even worse leaving TC PHY in a connected state while the MST Hub is disconnected.
> > > 
> > > Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)
> > > 
> > > Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > index e948aacbd4ab..1ede980876ed 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > @@ -265,6 +265,9 @@ intel_dp_mst_atomic_master_trans_check(struct intel_connector *connector,
> > >  			return ret;
> > >  		}
> > >  		crtc_state->uapi.mode_changed = true;
> > > +
> > > +		if (connector_iter->base.status == connector_status_disconnected)
> > > +			crtc_state->uapi.active = false;
> > 
> > That will make the state userspace last set inconsistent with what's
> > really going on. Which means suddenly page flips/vblank waits and
> > whatnot will start to fail.
> > 
> > Also that wil directly mutate the prop visible to user space, which
> > is not how these things are supposed to work. I think if we did do
> > something like this we should maybe have some kind of internal
> > flag for it.
> > 
> > >  	}
> > >  	drm_connector_list_iter_end(&connector_list_iter);
> > >  
> > > 
> > > 
> > > 
> > > -- 
> > > 2.25.1
> > 
>
Almahallawy, Khaled Oct. 21, 2020, 12:53 a.m. UTC | #4
On Wed, 2020-10-21 at 00:29 +0000, Souza, Jose wrote:
> On Tue, 2020-10-20 at 16:25 -0700, José Roberto de Souza wrote:
> > On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy
> > > wrote:
> > > > This patch avoids failing atomic commits sent by user space by
> > > > making sure CRTC/Connector added to drm_atomic_state by the
> > > > driver are in valid state.
> > > > 
> > > > When disconnecting MST hub with two or more connected displays.
> > > > The user space sends IOCTL for each MST pipe to disable.
> > > > drm_atomic_state object sent from user space contains only the
> > > > state of the crtc/pipe intended to disable.
> > > > In TGL, intel_dp_mst_atomic_master_trans_check will add all
> > > > other CRTC and connectors that share the MST stream to
> > > > drm_atomic_state:
> > > > 
> > > > drm_atomic_commit
> > > >    drm_atomic_helper_check_modeset
> > > >        update_connector_routing
> > > >        intel_dp_mst_atomic_check = funcs-
> > > > >atomic_check(connector, state);
> > > >           intel_dp_mst_atomic_master_trans_check
> > > > intel_atomic_get_digital_connector_state
> > > > drm_atomic_get_connector_state   <-- Add all Connectors
> > > >     drm_atomic_get_crtc_state <-- Add all CRTCs
> > > >        update_connector_routing <-- Check added Connector/CRTCs
> > > > - Will fail
> > > > 
> > > > However the added crtc/connector pair will be in invalid state
> > > > (enabled state for a removed connector)
> > > > triggering this condition in
> > > > drm_atomic_helper.c/update_connector_routing:
> > > > 
> > > > if (!state->duplicated &&
> > > > drm_connector_is_unregistered(connector) &&
> > > >     crtc_state->active) {
> > > > DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > >  connector->base.id, connector->name);
> > > > return -EINVAL;
> > > > }
> > > 
> > > Yeah, I think that "reject modeset on unregistered connectors"
> > > idea is
> > > a bit broken given how the uapi has worked in the past. Cc:ing
> > > danvet
> > > and lyude who IIRC were involved with that.
> > > 

Actually by removing this condition everything works fine as well.
However we were worried about touching this part and screwing other
stuff :-)

> > > Hmm. Maybe we could add the other stuff to the state only after
> > > the
> > > connector .atomic_check() stuff has been done? I don't quite
> > > remember
> > > why we decided to do it here. José do you recall the details?
> > 
> > Because the connector check function runs twice in
> > drm_atomic_helper_check_modeset(), in the first iteration it will
> > add all connectors that share the
> > same MST stream to state, the second one will make sure all other
> > checks passed in all connectors of the MST stream.
> > 
> > To me looks like the Chrome userspace is not doing the right thing,
> > it is sending asynchronous atomic commits with conflicting state
> > between each
> > commit.
> 
> Oh it do not have information about other connectors so not
> conflicting but anyways userspace should retry atomic commits
> anyways.
> 

CC: Mark and Gill for Chrome userspace input

Yes Chrome is sending the atomic commits with correct states one after
another. Matter of fact, the failure we observe in Chrome is also
happening in Ubuntu. 

We checked that by printing the drm_atomic_state sent by user space
before atomic check:

+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1365,9 +1365,12 @@ int drm_atomic_commit(struct drm_atomic_state
*state)
 	struct drm_mode_config *config = &state->dev->mode_config;
 	int ret;
 
+	drm_atomic_print_state(state);
 	ret = drm_atomic_check_only(state);


Using MST with two displays and eDP connected. First MST on Pipe B and
Second MST on Pipe C. 

Chrome send atomic commit to disable Pipe B first as below:

[   60.406455] [drm:drm_atomic_print_state] checking 00000000c0e0cc6c
[   60.406455] i915 0000:00:02.0: [drm] crtc[152]: pipe B
[   60.406456] i915 0000:00:02.0: [drm] 	enable=0
[   60.406457] i915 0000:00:02.0: [drm] 	active=0
[   60.406458] i915 0000:00:02.0: [drm] 	self_refresh_active=0
[   60.406459] i915 0000:00:02.0: [drm] 	planes_changed=0
[   60.406460] i915 0000:00:02.0: [drm] 	mode_changed=0
[   60.406461] i915 0000:00:02.0: [drm] 	active_changed=0
[   60.406462] i915 0000:00:02.0: [drm] 	connectors_changed=0
[   60.406463] i915 0000:00:02.0: [drm] 	color_mgmt_changed=0
[   60.406464] i915 0000:00:02.0: [drm] 	plane_mask=100
[   60.406465] i915 0000:00:02.0: [drm] 	connector_mask=0
[   60.406466] i915 0000:00:02.0: [drm] 	encoder_mask=2000
[   60.406467] i915 0000:00:02.0: [drm] 	mode: "": 0 0 0 0 0 0 0
0 0 0 0x0 0x0
[   60.406468] i915 0000:00:02.0: [drm] connector[353]: DP-6
[   60.406469] i915 0000:00:02.0: [drm] 	crtc=(null)
[   60.406470] i915 0000:00:02.0: [drm] 	self_refresh_aware=0
[   60.406471] [drm:drm_atomic_check_only] checking 00000000c0e0cc6c
[   60.406473] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
mode changed
[   60.406474] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
enable changed
[   60.406474] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
active changed
...
[   60.407674] [drm:drm_atomic_check_only] atomic driver check for
00000000c0e0cc6c failed: -22


Even in Ubuntu we get similar atomic commit with similar state for Pipe
B:

[  301.136961] i915 0000:00:02.0: [drm] crtc[152]: pipe B
[  301.136964] i915 0000:00:02.0: [drm] 	enable=1
[  301.136966] i915 0000:00:02.0: [drm] 	active=0
[  301.136969] i915 0000:00:02.0: [drm] 	self_refresh_active=0
[  301.136995] i915 0000:00:02.0: [drm] 	planes_changed=0
[  301.136998] i915 0000:00:02.0: [drm] 	mode_changed=0
[  301.137001] i915 0000:00:02.0: [drm] 	active_changed=0
[  301.137004] i915 0000:00:02.0: [drm] 	connectors_changed=0
[  301.137008] i915 0000:00:02.0: [drm] 	color_mgmt_changed=0
[  301.137011] i915 0000:00:02.0: [drm] 	plane_mask=8100
[  301.137014] i915 0000:00:02.0: [drm] 	connector_mask=40
[  301.137017] i915 0000:00:02.0: [drm] 	encoder_mask=4000
[  301.137021] i915 0000:00:02.0: [drm] 	mode: "": 60 148500
1920 2008 2052 2200 1080 1084 1089 1125 0x0 0x5
[  301.137025] i915 0000:00:02.0: [drm] connector[327]: DP-5
[  301.137028] i915 0000:00:02.0: [drm] 	crtc=pipe B
[  301.137031] i915 0000:00:02.0: [drm] 	self_refresh_aware=0
[  301.137035] [drm:drm_atomic_check_only] checking 00000000238dbb90
[  301.137041] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
active changed
...
[  301.179635] [drm:drm_atomic_check_only] atomic driver check for
00000000238dbb90 failed: -22


The problem now, is that i915 added CRTC/Connector for pipe C to
drm_atomic_state (not the userspace) which resulted in having a
CRTC/Connector for Pipe C in a bad state (crtc_state->active = true &&
Connector is disconnected). This triggered the failure by taking the
above condition in update_connector_routing.

Chrome will not retry sending atomic commit to disable Pipe B anymore.

However, Ubuntu keeps retrying sending atomic_commits for disabling
Pipe B and each time it will fail on the same condition (crtc_state-
>active = true and Disconnected connector for pipe C). 
Eventually Ubuntu will send atomic commit for Pipe C that succeed, then
it send one more time atomic_commit for Pipe B and at this time
crtc_state->active = false for Pipe C and the above condition in
update_connctor_routing will not be taken and commit for Pipe B succeed
and pipe B will get disabled. 
This also explains why modest disable for MST takes a little bit longer
now in TGL. 

[  301.285786] [drm:drm_atomic_print_state] checking 000000002c7f205f
[  301.285789] i915 0000:00:02.0: [drm] crtc[213]: pipe C
[  301.285790] i915 0000:00:02.0: [drm] 	enable=1
[  301.285791] i915 0000:00:02.0: [drm] 	active=0
[  301.285793] i915 0000:00:02.0: [drm] 	self_refresh_active=0
[  301.285794] i915 0000:00:02.0: [drm] 	planes_changed=0
[  301.285796] i915 0000:00:02.0: [drm] 	mode_changed=0
[  301.285797] i915 0000:00:02.0: [drm] 	active_changed=0
[  301.285799] i915 0000:00:02.0: [drm] 	connectors_changed=0
[  301.285800] i915 0000:00:02.0: [drm] 	color_mgmt_changed=0
[  301.285802] i915 0000:00:02.0: [drm] 	plane_mask=10000
[  301.285803] i915 0000:00:02.0: [drm] 	connector_mask=80
[  301.285805] i915 0000:00:02.0: [drm] 	encoder_mask=8000
[  301.285807] i915 0000:00:02.0: [drm] 	mode: "": 60 533250
3840 3888 3920 4000 2160 2163 2168 2222 0x0 0x9
[  301.285808] i915 0000:00:02.0: [drm] connector[333]: DP-6
[  301.285809] i915 0000:00:02.0: [drm] 	crtc=pipe C
[  301.285811] i915 0000:00:02.0: [drm] 	self_refresh_aware=0
[  301.285813] [drm:drm_atomic_check_only] checking 000000002c7f205f
[  301.285816] [drm:drm_atomic_helper_check_modeset] [CRTC:213:pipe C]
active changed
 301.285958] [drm:drm_atomic_helper_check_modeset.cold] Updating
routing for [CONNECTOR:333:DP-6]


[  302.080580] [drm:drm_atomic_print_state] checking 000000009f6d6341
[  302.080582] i915 0000:00:02.0: [drm] plane[92]: plane 1B
[  302.080584] i915 0000:00:02.0: [drm] 	crtc=(null)
[  302.080585] i915 0000:00:02.0: [drm] 	fb=0
[  302.080587] i915 0000:00:02.0: [drm] 	crtc-pos=1920x1080+0+0
[  302.080589] i915 0000:00:02.0: [drm] 	src-
pos=1920.000000x1080.000000+1920.000000+0.000000
[  302.080590] i915 0000:00:02.0: [drm] 	rotation=1
[  302.080592] i915 0000:00:02.0: [drm] 	normalized-zpos=0
[  302.080593] i915 0000:00:02.0: [drm] 	color-encoding=ITU-R
BT.709 YCbCr
[  302.080595] i915 0000:00:02.0: [drm] 	color-range=YCbCr
limited range
[  302.080597] i915 0000:00:02.0: [drm] crtc[152]: pipe B
[  302.080598] i915 0000:00:02.0: [drm] 	enable=0
[  302.080600] i915 0000:00:02.0: [drm] 	active=0
[  302.080601] i915 0000:00:02.0: [drm] 	self_refresh_active=0
[  302.080602] i915 0000:00:02.0: [drm] 	planes_changed=0
[  302.080604] i915 0000:00:02.0: [drm] 	mode_changed=0
[  302.080605] i915 0000:00:02.0: [drm] 	active_changed=0
[  302.080607] i915 0000:00:02.0: [drm] 	connectors_changed=0
[  302.080608] i915 0000:00:02.0: [drm] 	color_mgmt_changed=0
[  302.080609] i915 0000:00:02.0: [drm] 	plane_mask=0
[  302.080611] i915 0000:00:02.0: [drm] 	connector_mask=0
[  302.080612] i915 0000:00:02.0: [drm] 	encoder_mask=4000
[  302.080614] i915 0000:00:02.0: [drm] 	mode: "": 0 0 0 0 0 0 0
0 0 0 0x0 0x0
[  302.080616] i915 0000:00:02.0: [drm] connector[327]: DP-5
[  302.080617] i915 0000:00:02.0: [drm] 	crtc=(null)
[  302.080619] i915 0000:00:02.0: [drm] 	self_refresh_aware=0
[  302.080620] [drm:drm_atomic_check_only] checking 000000009f6d6341
[  302.080623] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
mode changed
[  302.080625] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
enable changed
[  302.080627] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
active changed

[  302.080826] [drm:drm_atomic_helper_check_modeset.cold] Updating
routing for [CONNECTOR:327:DP-5]
[  302.080841] [drm:drm_atomic_get_connector_state] Added
[CONNECTOR:333:DP-6] 00000000aa611aad state to 000000009f6d6341
[  302.080852] [drm:drm_atomic_get_crtc_state] Added [CRTC:213:pipe C]
0000000019b57d0d state to 000000009f6d6341
[  302.080884] [drm:drm_atomic_helper_check_modeset.cold] Updating
routing for [CONNECTOR:333:DP-6]
[  302.093785] [drm:drm_atomic_helper_check_modeset.cold]
[CONNECTOR:333:DP-6] keeps [ENCODER:313:DP-MST C], now on
[CRTC:213:pipe C]



> > If it had a pool that dispatch one atomic state at time waiting for
> > completion before dispatch the next one it would not be a issue.
> > 
> > > > Which will cause the drm_atomic_commit/IOCTL for disabling one
> > > > of MST stream pipes (Main MST) to fail.
> > > > 
> > > > The problem happens when a user space (as Chrome) doesn’t retry
> > > > a falling commit, leaving a disconnected MST pipe still ON,
> > > > which will result in failing reconnect of MST hub or even worse
> > > > leaving TC PHY in a connected state while the MST Hub is
> > > > disconnected.
> > > > 
> > > > Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)
> > > > 
> > > > Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com
> > > > >
> > > > ---
> > > >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > index e948aacbd4ab..1ede980876ed 100644
> > > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > @@ -265,6 +265,9 @@
> > > > intel_dp_mst_atomic_master_trans_check(struct intel_connector
> > > > *connector,
> > > >  return ret;
> > > >  }
> > > >  crtc_state->uapi.mode_changed = true;
> > > > +
> > > > +if (connector_iter->base.status ==
> > > > connector_status_disconnected)
> > > > +crtc_state->uapi.active = false;
> > > 
> > > That will make the state userspace last set inconsistent with
> > > what's
> > > really going on. Which means suddenly page flips/vblank waits and
> > > whatnot will start to fail.
> > > 
> > > Also that wil directly mutate the prop visible to user space,
> > > which
> > > is not how these things are supposed to work. I think if we did
> > > do
> > > something like this we should maybe have some kind of internal
> > > flag for it.
> > > 
> > > >  }
> > > >  drm_connector_list_iter_end(&connector_list_iter);
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --
> > > > 2.25.1
Ville Syrjälä Oct. 21, 2020, 1:26 p.m. UTC | #5
On Tue, Oct 20, 2020 at 11:25:53PM +0000, Souza, Jose wrote:
> On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> > > This patch avoids failing atomic commits sent by user space by making sure CRTC/Connector added to drm_atomic_state by the driver are in valid state.
> > > 
> > > When disconnecting MST hub with two or more connected displays. The user space sends IOCTL for each MST pipe to disable.
> > > drm_atomic_state object sent from user space contains only the state of the crtc/pipe intended to disable.
> > > In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC and connectors that share the MST stream to drm_atomic_state:
> > > 
> > > drm_atomic_commit
> > >    drm_atomic_helper_check_modeset
> > >        update_connector_routing
> > >        intel_dp_mst_atomic_check = funcs->atomic_check(connector, state);
> > >        	   intel_dp_mst_atomic_master_trans_check
> > > 		intel_atomic_get_digital_connector_state
> > > 			drm_atomic_get_connector_state   <-- Add all Connectors
> > > 			    drm_atomic_get_crtc_state <-- Add all CRTCs
> > >        update_connector_routing <-- Check added Connector/CRTCs - Will fail
> > > 
> > > However the added crtc/connector pair will be in invalid state (enabled state for a removed connector)
> > > triggering this condition in drm_atomic_helper.c/update_connector_routing:
> > > 
> > > 	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > 	    crtc_state->active) {
> > > 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > 				 connector->base.id, connector->name);
> > > 		return -EINVAL;
> > > 	}
> > 
> > Yeah, I think that "reject modeset on unregistered connectors" idea is
> > a bit broken given how the uapi has worked in the past. Cc:ing danvet
> > and lyude who IIRC were involved with that.
> > 
> > Hmm. Maybe we could add the other stuff to the state only after the
> > connector .atomic_check() stuff has been done? I don't quite remember
> > why we decided to do it here. José do you recall the details?
> 
> Because the connector check function runs twice in drm_atomic_helper_check_modeset(), in the first iteration it will add all connectors that share the
> same MST stream to state, the second one will make sure all other checks passed in all connectors of the MST stream.
> 
> To me looks like the Chrome userspace is not doing the right thing, it is sending asynchronous atomic commits with conflicting state between each
> commit.
> If it had a pool that dispatch one atomic state at time waiting for completion before dispatch the next one it would not be a issue.

Yeah, with atomic userspace could avoid this potentially. Though it
may be racy depending on whether it has noticed all the MST connectors
disappearing yet or not. Either way it's still an issue for legacy
uapi.
Mark Yacoub Oct. 21, 2020, 3:25 p.m. UTC | #6
On Tue, Oct 20, 2020 at 8:54 PM Almahallawy, Khaled
<khaled.almahallawy@intel.com> wrote:
>
> On Wed, 2020-10-21 at 00:29 +0000, Souza, Jose wrote:
> > On Tue, 2020-10-20 at 16:25 -0700, José Roberto de Souza wrote:
> > > On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > > > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy
> > > > wrote:
> > > > > This patch avoids failing atomic commits sent by user space by
> > > > > making sure CRTC/Connector added to drm_atomic_state by the
> > > > > driver are in valid state.
> > > > >
> > > > > When disconnecting MST hub with two or more connected displays.
> > > > > The user space sends IOCTL for each MST pipe to disable.
> > > > > drm_atomic_state object sent from user space contains only the
> > > > > state of the crtc/pipe intended to disable.
> > > > > In TGL, intel_dp_mst_atomic_master_trans_check will add all
> > > > > other CRTC and connectors that share the MST stream to
> > > > > drm_atomic_state:
> > > > >
> > > > > drm_atomic_commit
> > > > >    drm_atomic_helper_check_modeset
> > > > >        update_connector_routing
> > > > >        intel_dp_mst_atomic_check = funcs-
> > > > > >atomic_check(connector, state);
> > > > >           intel_dp_mst_atomic_master_trans_check
> > > > > intel_atomic_get_digital_connector_state
> > > > > drm_atomic_get_connector_state   <-- Add all Connectors
> > > > >     drm_atomic_get_crtc_state <-- Add all CRTCs
> > > > >        update_connector_routing <-- Check added Connector/CRTCs
> > > > > - Will fail
> > > > >
> > > > > However the added crtc/connector pair will be in invalid state
> > > > > (enabled state for a removed connector)
> > > > > triggering this condition in
> > > > > drm_atomic_helper.c/update_connector_routing:
> > > > >
> > > > > if (!state->duplicated &&
> > > > > drm_connector_is_unregistered(connector) &&
> > > > >     crtc_state->active) {
> > > > > DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > >  connector->base.id, connector->name);
> > > > > return -EINVAL;
> > > > > }
> > > >
> > > > Yeah, I think that "reject modeset on unregistered connectors"
> > > > idea is
> > > > a bit broken given how the uapi has worked in the past. Cc:ing
> > > > danvet
> > > > and lyude who IIRC were involved with that.
> > > >
>
> Actually by removing this condition everything works fine as well.
> However we were worried about touching this part and screwing other
> stuff :-)
>
> > > > Hmm. Maybe we could add the other stuff to the state only after
> > > > the
> > > > connector .atomic_check() stuff has been done? I don't quite
> > > > remember
> > > > why we decided to do it here. José do you recall the details?
> > >
> > > Because the connector check function runs twice in
> > > drm_atomic_helper_check_modeset(), in the first iteration it will
> > > add all connectors that share the
> > > same MST stream to state, the second one will make sure all other
> > > checks passed in all connectors of the MST stream.
> > >
> > > To me looks like the Chrome userspace is not doing the right thing,
> > > it is sending asynchronous atomic commits with conflicting state
> > > between each
> > > commit.
> >
> > Oh it do not have information about other connectors so not
> > conflicting but anyways userspace should retry atomic commits
> > anyways.
> >
>
> CC: Mark and Gill for Chrome userspace input
>
> Yes Chrome is sending the atomic commits with correct states one after
> another. Matter of fact, the failure we observe in Chrome is also
> happening in Ubuntu.
>
> We checked that by printing the drm_atomic_state sent by user space
> before atomic check:
>
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -1365,9 +1365,12 @@ int drm_atomic_commit(struct drm_atomic_state
> *state)
>         struct drm_mode_config *config = &state->dev->mode_config;
>         int ret;
>
> +       drm_atomic_print_state(state);
>         ret = drm_atomic_check_only(state);
>
>
> Using MST with two displays and eDP connected. First MST on Pipe B and
> Second MST on Pipe C.
>
> Chrome send atomic commit to disable Pipe B first as below:
>
> [   60.406455] [drm:drm_atomic_print_state] checking 00000000c0e0cc6c
> [   60.406455] i915 0000:00:02.0: [drm] crtc[152]: pipe B
> [   60.406456] i915 0000:00:02.0: [drm]         enable=0
> [   60.406457] i915 0000:00:02.0: [drm]         active=0
> [   60.406458] i915 0000:00:02.0: [drm]         self_refresh_active=0
> [   60.406459] i915 0000:00:02.0: [drm]         planes_changed=0
> [   60.406460] i915 0000:00:02.0: [drm]         mode_changed=0
> [   60.406461] i915 0000:00:02.0: [drm]         active_changed=0
> [   60.406462] i915 0000:00:02.0: [drm]         connectors_changed=0
> [   60.406463] i915 0000:00:02.0: [drm]         color_mgmt_changed=0
> [   60.406464] i915 0000:00:02.0: [drm]         plane_mask=100
> [   60.406465] i915 0000:00:02.0: [drm]         connector_mask=0
> [   60.406466] i915 0000:00:02.0: [drm]         encoder_mask=2000
> [   60.406467] i915 0000:00:02.0: [drm]         mode: "": 0 0 0 0 0 0 0
> 0 0 0 0x0 0x0
> [   60.406468] i915 0000:00:02.0: [drm] connector[353]: DP-6
> [   60.406469] i915 0000:00:02.0: [drm]         crtc=(null)
> [   60.406470] i915 0000:00:02.0: [drm]         self_refresh_aware=0
> [   60.406471] [drm:drm_atomic_check_only] checking 00000000c0e0cc6c
> [   60.406473] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> mode changed
> [   60.406474] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> enable changed
> [   60.406474] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> active changed
> ...
> [   60.407674] [drm:drm_atomic_check_only] atomic driver check for
> 00000000c0e0cc6c failed: -22
>
>
> Even in Ubuntu we get similar atomic commit with similar state for Pipe
> B:
>
> [  301.136961] i915 0000:00:02.0: [drm] crtc[152]: pipe B
> [  301.136964] i915 0000:00:02.0: [drm]         enable=1
> [  301.136966] i915 0000:00:02.0: [drm]         active=0
> [  301.136969] i915 0000:00:02.0: [drm]         self_refresh_active=0
> [  301.136995] i915 0000:00:02.0: [drm]         planes_changed=0
> [  301.136998] i915 0000:00:02.0: [drm]         mode_changed=0
> [  301.137001] i915 0000:00:02.0: [drm]         active_changed=0
> [  301.137004] i915 0000:00:02.0: [drm]         connectors_changed=0
> [  301.137008] i915 0000:00:02.0: [drm]         color_mgmt_changed=0
> [  301.137011] i915 0000:00:02.0: [drm]         plane_mask=8100
> [  301.137014] i915 0000:00:02.0: [drm]         connector_mask=40
> [  301.137017] i915 0000:00:02.0: [drm]         encoder_mask=4000
> [  301.137021] i915 0000:00:02.0: [drm]         mode: "": 60 148500
> 1920 2008 2052 2200 1080 1084 1089 1125 0x0 0x5
> [  301.137025] i915 0000:00:02.0: [drm] connector[327]: DP-5
> [  301.137028] i915 0000:00:02.0: [drm]         crtc=pipe B
> [  301.137031] i915 0000:00:02.0: [drm]         self_refresh_aware=0
> [  301.137035] [drm:drm_atomic_check_only] checking 00000000238dbb90
> [  301.137041] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> active changed
> ...
> [  301.179635] [drm:drm_atomic_check_only] atomic driver check for
> 00000000238dbb90 failed: -22
>
>
> The problem now, is that i915 added CRTC/Connector for pipe C to
> drm_atomic_state (not the userspace) which resulted in having a
> CRTC/Connector for Pipe C in a bad state (crtc_state->active = true &&
> Connector is disconnected). This triggered the failure by taking the
> above condition in update_connector_routing.
>
> Chrome will not retry sending atomic commit to disable Pipe B anymore.

IMO, I don't think Chrome should keep on retrying, given the nature of
disabling (unlike enabling, you just wanna shut off the display, there
are no "different" configurations you wanna disable with).

With your analysis, I think you pointed out where we should be
looking. IIUC, Pipe C is not really in "bad" state, it's more in a
"transient" state: connector physically disconnected but it's still
active because Chrome hasn't disabled it yet. Is that right?
If so, this seems like it will be a common case with MST: a number of
connectors are connected -> you unplug their common cable -> the
userspace disables them one by one.
Because of MST, we might need to handle them in a special way. 2 ideas
come to mind: either ignore connectors in "transient state" that are
irrelevant to the current pipe [in our example, if you know you're on
MST with Pipe C, ignore its transient state while you're disabling
Pipe B], or,
queue all transient disable requests until you get all displays then
deactivate them together [in our example, when Pipe B wants to be
disabled, add it "i want to disable these"_queue because C is still
transient, and once you get a request for Pipe C, deactivate them all
together] (more complex i think and kinda risky)

> However, Ubuntu keeps retrying sending atomic_commits for disabling
> Pipe B and each time it will fail on the same condition (crtc_state-
> >active = true and Disconnected connector for pipe C).
> Eventually Ubuntu will send atomic commit for Pipe C that succeed, then
> it send one more time atomic_commit for Pipe B and at this time
> crtc_state->active = false for Pipe C and the above condition in
> update_connctor_routing will not be taken and commit for Pipe B succeed
> and pipe B will get disabled.
> This also explains why modest disable for MST takes a little bit longer
> now in TGL.
>
> [  301.285786] [drm:drm_atomic_print_state] checking 000000002c7f205f
> [  301.285789] i915 0000:00:02.0: [drm] crtc[213]: pipe C
> [  301.285790] i915 0000:00:02.0: [drm]         enable=1
> [  301.285791] i915 0000:00:02.0: [drm]         active=0
> [  301.285793] i915 0000:00:02.0: [drm]         self_refresh_active=0
> [  301.285794] i915 0000:00:02.0: [drm]         planes_changed=0
> [  301.285796] i915 0000:00:02.0: [drm]         mode_changed=0
> [  301.285797] i915 0000:00:02.0: [drm]         active_changed=0
> [  301.285799] i915 0000:00:02.0: [drm]         connectors_changed=0
> [  301.285800] i915 0000:00:02.0: [drm]         color_mgmt_changed=0
> [  301.285802] i915 0000:00:02.0: [drm]         plane_mask=10000
> [  301.285803] i915 0000:00:02.0: [drm]         connector_mask=80
> [  301.285805] i915 0000:00:02.0: [drm]         encoder_mask=8000
> [  301.285807] i915 0000:00:02.0: [drm]         mode: "": 60 533250
> 3840 3888 3920 4000 2160 2163 2168 2222 0x0 0x9
> [  301.285808] i915 0000:00:02.0: [drm] connector[333]: DP-6
> [  301.285809] i915 0000:00:02.0: [drm]         crtc=pipe C
> [  301.285811] i915 0000:00:02.0: [drm]         self_refresh_aware=0
> [  301.285813] [drm:drm_atomic_check_only] checking 000000002c7f205f
> [  301.285816] [drm:drm_atomic_helper_check_modeset] [CRTC:213:pipe C]
> active changed
>  301.285958] [drm:drm_atomic_helper_check_modeset.cold] Updating
> routing for [CONNECTOR:333:DP-6]
>
>
> [  302.080580] [drm:drm_atomic_print_state] checking 000000009f6d6341
> [  302.080582] i915 0000:00:02.0: [drm] plane[92]: plane 1B
> [  302.080584] i915 0000:00:02.0: [drm]         crtc=(null)
> [  302.080585] i915 0000:00:02.0: [drm]         fb=0
> [  302.080587] i915 0000:00:02.0: [drm]         crtc-pos=1920x1080+0+0
> [  302.080589] i915 0000:00:02.0: [drm]         src-
> pos=1920.000000x1080.000000+1920.000000+0.000000
> [  302.080590] i915 0000:00:02.0: [drm]         rotation=1
> [  302.080592] i915 0000:00:02.0: [drm]         normalized-zpos=0
> [  302.080593] i915 0000:00:02.0: [drm]         color-encoding=ITU-R
> BT.709 YCbCr
> [  302.080595] i915 0000:00:02.0: [drm]         color-range=YCbCr
> limited range
> [  302.080597] i915 0000:00:02.0: [drm] crtc[152]: pipe B
> [  302.080598] i915 0000:00:02.0: [drm]         enable=0
> [  302.080600] i915 0000:00:02.0: [drm]         active=0
> [  302.080601] i915 0000:00:02.0: [drm]         self_refresh_active=0
> [  302.080602] i915 0000:00:02.0: [drm]         planes_changed=0
> [  302.080604] i915 0000:00:02.0: [drm]         mode_changed=0
> [  302.080605] i915 0000:00:02.0: [drm]         active_changed=0
> [  302.080607] i915 0000:00:02.0: [drm]         connectors_changed=0
> [  302.080608] i915 0000:00:02.0: [drm]         color_mgmt_changed=0
> [  302.080609] i915 0000:00:02.0: [drm]         plane_mask=0
> [  302.080611] i915 0000:00:02.0: [drm]         connector_mask=0
> [  302.080612] i915 0000:00:02.0: [drm]         encoder_mask=4000
> [  302.080614] i915 0000:00:02.0: [drm]         mode: "": 0 0 0 0 0 0 0
> 0 0 0 0x0 0x0
> [  302.080616] i915 0000:00:02.0: [drm] connector[327]: DP-5
> [  302.080617] i915 0000:00:02.0: [drm]         crtc=(null)
> [  302.080619] i915 0000:00:02.0: [drm]         self_refresh_aware=0
> [  302.080620] [drm:drm_atomic_check_only] checking 000000009f6d6341
> [  302.080623] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> mode changed
> [  302.080625] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> enable changed
> [  302.080627] [drm:drm_atomic_helper_check_modeset] [CRTC:152:pipe B]
> active changed
>
> [  302.080826] [drm:drm_atomic_helper_check_modeset.cold] Updating
> routing for [CONNECTOR:327:DP-5]
> [  302.080841] [drm:drm_atomic_get_connector_state] Added
> [CONNECTOR:333:DP-6] 00000000aa611aad state to 000000009f6d6341
> [  302.080852] [drm:drm_atomic_get_crtc_state] Added [CRTC:213:pipe C]
> 0000000019b57d0d state to 000000009f6d6341
> [  302.080884] [drm:drm_atomic_helper_check_modeset.cold] Updating
> routing for [CONNECTOR:333:DP-6]
> [  302.093785] [drm:drm_atomic_helper_check_modeset.cold]
> [CONNECTOR:333:DP-6] keeps [ENCODER:313:DP-MST C], now on
> [CRTC:213:pipe C]
>
>
>
> > > If it had a pool that dispatch one atomic state at time waiting for
> > > completion before dispatch the next one it would not be a issue.
> > >
> > > > > Which will cause the drm_atomic_commit/IOCTL for disabling one
> > > > > of MST stream pipes (Main MST) to fail.
> > > > >
> > > > > The problem happens when a user space (as Chrome) doesn’t retry
> > > > > a falling commit, leaving a disconnected MST pipe still ON,
> > > > > which will result in failing reconnect of MST hub or even worse
> > > > > leaving TC PHY in a connected state while the MST Hub is
> > > > > disconnected.
> > > > >
> > > > > Tested on Ubuntu(drm-tip) and Chrome(kernel-next 5.9 rc7)
> > > > >
> > > > > Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com
> > > > > >
> > > > > ---
> > > > >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > index e948aacbd4ab..1ede980876ed 100644
> > > > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > @@ -265,6 +265,9 @@
> > > > > intel_dp_mst_atomic_master_trans_check(struct intel_connector
> > > > > *connector,
> > > > >  return ret;
> > > > >  }
> > > > >  crtc_state->uapi.mode_changed = true;
> > > > > +
> > > > > +if (connector_iter->base.status ==
> > > > > connector_status_disconnected)
> > > > > +crtc_state->uapi.active = false;
> > > >
> > > > That will make the state userspace last set inconsistent with
> > > > what's
> > > > really going on. Which means suddenly page flips/vblank waits and
> > > > whatnot will start to fail.
> > > >
> > > > Also that wil directly mutate the prop visible to user space,
> > > > which
> > > > is not how these things are supposed to work. I think if we did
> > > > do
> > > > something like this we should maybe have some kind of internal
> > > > flag for it.
> > > >

Agree with Ville, this is explicitly setting fields for the purpose of
getting something working, I'd be worried of the inconsistencies it
could introduce down the line.

> > > > >  }
> > > > >  drm_connector_list_iter_end(&connector_list_iter);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 2.25.1
Lyude Paul Oct. 21, 2020, 9:25 p.m. UTC | #7
On Wed, 2020-10-21 at 16:26 +0300, Ville Syrjälä wrote:
> On Tue, Oct 20, 2020 at 11:25:53PM +0000, Souza, Jose wrote:
> > On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> > > > This patch avoids failing atomic commits sent by user space by making
> > > > sure CRTC/Connector added to drm_atomic_state by the driver are in valid
> > > > state.
> > > > 
> > > > When disconnecting MST hub with two or more connected displays. The user
> > > > space sends IOCTL for each MST pipe to disable.
> > > > drm_atomic_state object sent from user space contains only the state of
> > > > the crtc/pipe intended to disable.
> > > > In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC
> > > > and connectors that share the MST stream to drm_atomic_state:
> > > > 
> > > > drm_atomic_commit
> > > >    drm_atomic_helper_check_modeset
> > > >        update_connector_routing
> > > >        intel_dp_mst_atomic_check = funcs-
> > > > >atomic_check(connector, state);
> > > >        	   intel_dp_mst_atomic_master_trans_chec
> > > > k
> > > > 		intel_atomic_get_digital_connector_state
> > > > 			drm_atomic_get_connector_state   <-- Add all
> > > > Connectors
> > > > 			    drm_atomic_get_crtc_state <-- Add all CRTCs
> > > >        update_connector_routing <-- Check added
> > > > Connector/CRTCs - Will fail
> > > > 
> > > > However the added crtc/connector pair will be in invalid state (enabled
> > > > state for a removed connector)
> > > > triggering this condition in
> > > > drm_atomic_helper.c/update_connector_routing:
> > > > 
> > > > 	if (!state->duplicated &&
> > > > drm_connector_is_unregistered(connector) &&
> > > > 	    crtc_state->active) {
> > > > 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not
> > > > registered\n",
> > > > 				 connector->base.id, connector->name);
> > > > 		return -EINVAL;
> > > > 	}
> > > 
> > > Yeah, I think that "reject modeset on unregistered connectors" idea is
> > > a bit broken given how the uapi has worked in the past. Cc:ing danvet
> > > and lyude who IIRC were involved with that.
> > > 
> > > Hmm. Maybe we could add the other stuff to the state only after the
> > > connector .atomic_check() stuff has been done? I don't quite remember
> > > why we decided to do it here. José do you recall the details?
> > 
> > Because the connector check function runs twice in
> > drm_atomic_helper_check_modeset(), in the first iteration it will add all
> > connectors that share the
> > same MST stream to state, the second one will make sure all other checks
> > passed in all connectors of the MST stream.
> > 
> > To me looks like the Chrome userspace is not doing the right thing, it is
> > sending asynchronous atomic commits with conflicting state between each
> > commit.
> > If it had a pool that dispatch one atomic state at time waiting for
> > completion before dispatch the next one it would not be a issue.
> 
> Yeah, with atomic userspace could avoid this potentially. Though it
> may be racy depending on whether it has noticed all the MST connectors
> disappearing yet or not. Either way it's still an issue for legacy
> uapi.

Sigh-I had hoped that we would have hooked this up such that we'd avoid this (as
I've already had to fix some issues this caused with legacy modesetting) but I
guess not. Have you guys considered trying to use the connector epochs whenever
you receive a hotplug event to differentiate between removed ('stale')
connectors and other connectors? tbh, if you can't find a connector with the
same mst path and epoch you last had as your stale connector then it's safe to
just assume it's gone.

Also - I'm totally open to better ideas for handling this or making it more
obvious when a connector has been removed, most of the reason for adding these
checks was to try our best (as this is impossible to fully guarantee) to avoid
situations where a host tried to enable an MST display that no longer existed
and put the hardware into a weird state. At least if I remember correctly, it's
been a while.

>
Ville Syrjälä Oct. 22, 2020, 11:07 a.m. UTC | #8
On Wed, Oct 21, 2020 at 05:25:40PM -0400, Lyude Paul wrote:
> On Wed, 2020-10-21 at 16:26 +0300, Ville Syrjälä wrote:
> > On Tue, Oct 20, 2020 at 11:25:53PM +0000, Souza, Jose wrote:
> > > On Tue, 2020-10-20 at 15:41 +0300, Ville Syrjälä wrote:
> > > > On Tue, Oct 20, 2020 at 12:45:55AM -0700, Khaled Almahallawy wrote:
> > > > > This patch avoids failing atomic commits sent by user space by making
> > > > > sure CRTC/Connector added to drm_atomic_state by the driver are in valid
> > > > > state.
> > > > > 
> > > > > When disconnecting MST hub with two or more connected displays. The user
> > > > > space sends IOCTL for each MST pipe to disable.
> > > > > drm_atomic_state object sent from user space contains only the state of
> > > > > the crtc/pipe intended to disable.
> > > > > In TGL, intel_dp_mst_atomic_master_trans_check will add all other CRTC
> > > > > and connectors that share the MST stream to drm_atomic_state:
> > > > > 
> > > > > drm_atomic_commit
> > > > >    drm_atomic_helper_check_modeset
> > > > >        update_connector_routing
> > > > >        intel_dp_mst_atomic_check = funcs-
> > > > > >atomic_check(connector, state);
> > > > >        	   intel_dp_mst_atomic_master_trans_chec
> > > > > k
> > > > > 		intel_atomic_get_digital_connector_state
> > > > > 			drm_atomic_get_connector_state   <-- Add all
> > > > > Connectors
> > > > > 			    drm_atomic_get_crtc_state <-- Add all CRTCs
> > > > >        update_connector_routing <-- Check added
> > > > > Connector/CRTCs - Will fail
> > > > > 
> > > > > However the added crtc/connector pair will be in invalid state (enabled
> > > > > state for a removed connector)
> > > > > triggering this condition in
> > > > > drm_atomic_helper.c/update_connector_routing:
> > > > > 
> > > > > 	if (!state->duplicated &&
> > > > > drm_connector_is_unregistered(connector) &&
> > > > > 	    crtc_state->active) {
> > > > > 		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not
> > > > > registered\n",
> > > > > 				 connector->base.id, connector->name);
> > > > > 		return -EINVAL;
> > > > > 	}
> > > > 
> > > > Yeah, I think that "reject modeset on unregistered connectors" idea is
> > > > a bit broken given how the uapi has worked in the past. Cc:ing danvet
> > > > and lyude who IIRC were involved with that.
> > > > 
> > > > Hmm. Maybe we could add the other stuff to the state only after the
> > > > connector .atomic_check() stuff has been done? I don't quite remember
> > > > why we decided to do it here. José do you recall the details?
> > > 
> > > Because the connector check function runs twice in
> > > drm_atomic_helper_check_modeset(), in the first iteration it will add all
> > > connectors that share the
> > > same MST stream to state, the second one will make sure all other checks
> > > passed in all connectors of the MST stream.
> > > 
> > > To me looks like the Chrome userspace is not doing the right thing, it is
> > > sending asynchronous atomic commits with conflicting state between each
> > > commit.
> > > If it had a pool that dispatch one atomic state at time waiting for
> > > completion before dispatch the next one it would not be a issue.
> > 
> > Yeah, with atomic userspace could avoid this potentially. Though it
> > may be racy depending on whether it has noticed all the MST connectors
> > disappearing yet or not. Either way it's still an issue for legacy
> > uapi.
> 
> Sigh-I had hoped that we would have hooked this up such that we'd avoid this (as
> I've already had to fix some issues this caused with legacy modesetting) but I
> guess not. Have you guys considered trying to use the connector epochs whenever
> you receive a hotplug event to differentiate between removed ('stale')
> connectors and other connectors? tbh, if you can't find a connector with the
> same mst path and epoch you last had as your stale connector then it's safe to
> just assume it's gone.
> 
> Also - I'm totally open to better ideas for handling this or making it more
> obvious when a connector has been removed, most of the reason for adding these
> checks was to try our best (as this is impossible to fully guarantee) to avoid
> situations where a host tried to enable an MST display that no longer existed
> and put the hardware into a weird state. At least if I remember correctly, it's
> been a while.

It's all racy anyway is it not? Because of that I'm pretty firmly in
the "just plow ahead blindly" camp.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index e948aacbd4ab..1ede980876ed 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -265,6 +265,9 @@  intel_dp_mst_atomic_master_trans_check(struct intel_connector *connector,
 			return ret;
 		}
 		crtc_state->uapi.mode_changed = true;
+
+		if (connector_iter->base.status == connector_status_disconnected)
+			crtc_state->uapi.active = false;
 	}
 	drm_connector_list_iter_end(&connector_list_iter);