diff mbox series

drm: Allow modeset on unregisted connectors unconditionally

Message ID 20190520174109.12732-1-imre.deak@intel.com (mailing list archive)
State New, archived
Headers show
Series drm: Allow modeset on unregisted connectors unconditionally | expand

Commit Message

Imre Deak May 20, 2019, 5:41 p.m. UTC
We allowed modesetting an unregistered connector only in the case the
mode is getting disabled on the connector.

The reason for this check was the lack of proper refcounting for the
backing memory objects. That problem has been solved meanwhile so there
is no reason any more to reject the modesetting in general. The check
for that also makes driver internal modesets more cumbersome where we
need to add exemptions for the cases where we do need to allow the
modeset even for unregistered connectors. One such case is the
restoration of the mode during resume.

Simplify things by removing the unneeded check. I can't see how
modesetting an unregistered connector can cause any problem and the race
(described in the code comment) can anyway result in such a modeset (if
the connector is unregistered right after the check).

Cc: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

Comments

Daniel Vetter May 20, 2019, 6:37 p.m. UTC | #1
On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> We allowed modesetting an unregistered connector only in the case the
> mode is getting disabled on the connector.
> 
> The reason for this check was the lack of proper refcounting for the
> backing memory objects. That problem has been solved meanwhile so there
> is no reason any more to reject the modesetting in general.

I'm not parsing this at all ... maybe references to the commits that fix
this? Or do you mean the refcounting work for all the things hanging of
connectors, including the entire mst tree?

> The check
> for that also makes driver internal modesets more cumbersome where we
> need to add exemptions for the cases where we do need to allow the
> modeset even for unregistered connectors. One such case is the
> restoration of the mode during resume.

Yeah this one actually makes sense to me. We could still keep this check
here, but for the atomic ioctl only when called from userspace. But iirc
Lyude also said she has some plans here, so no idea whether that all fits.

> Simplify things by removing the unneeded check. I can't see how
> modesetting an unregistered connector can cause any problem and the race
> (described in the code comment) can anyway result in such a modeset (if
> the connector is unregistered right after the check).

Not saying we don't need this, but there's fairly enormous amounts of
history behind all this stuff, and lots of discussions. Would be good to
at least reference those, so we have a good story for when this then all
goes wrong again.
-Daniel

> 
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
>  1 file changed, 2 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 2e0cb4246cbd..e94e69483498 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
>  		return 0;
>  	}
>  
> -	crtc_state = drm_atomic_get_new_crtc_state(state,
> -						   new_connector_state->crtc);
> -	/*
> -	 * For compatibility with legacy users, we want to make sure that
> -	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> -	 * which would result in anything else must be considered invalid, to
> -	 * avoid turning on new displays on dead connectors.
> -	 *
> -	 * Since the connector can be unregistered at any point during an
> -	 * atomic check or commit, this is racy. But that's OK: all we care
> -	 * about is ensuring that userspace can't do anything but shut off the
> -	 * display on a connector that was destroyed after it's been notified,
> -	 * not before.
> -	 *
> -	 * Additionally, we also want to ignore connector registration when
> -	 * we're trying to restore an atomic state during system resume since
> -	 * there's a chance the connector may have been destroyed during the
> -	 * process, but it's better to ignore that then cause
> -	 * drm_atomic_helper_resume() to fail.
> -	 */
> -	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> -	    crtc_state->active) {
> -		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> -				 connector->base.id, connector->name);
> -		return -EINVAL;
> -	}
> -
>  	funcs = connector->helper_private;
>  
>  	if (funcs->atomic_best_encoder)
> @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
>  
>  	set_best_encoder(state, new_connector_state, new_encoder);
>  
> +	crtc_state = drm_atomic_get_new_crtc_state(state,
> +						   new_connector_state->crtc);
>  	crtc_state->connectors_changed = true;
>  
>  	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> -- 
> 2.17.1
>
Imre Deak May 20, 2019, 7:09 p.m. UTC | #2
On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > We allowed modesetting an unregistered connector only in the case the
> > mode is getting disabled on the connector.
> > 
> > The reason for this check was the lack of proper refcounting for the
> > backing memory objects. That problem has been solved meanwhile so there
> > is no reason any more to reject the modesetting in general.
> 
> I'm not parsing this at all ... maybe references to the commits that fix
> this? Or do you mean the refcounting work for all the things hanging of
> connectors, including the entire mst tree?

Yes the check was added to solve the issue related to the removal of MST
connectors that could happen asynchronously wrt. a modeset referring to
that MST connector.  That could happen since the MST core doesn't hold
any locks (for instance the connection_mutex) during removing an MST
connector that would prevent doing a modeset at the same time.

Adding the refcounting for such MST connectors (via the
drm_connector_get()/drm_connector_put()) got rid of the above problem.

> 
> > The check
> > for that also makes driver internal modesets more cumbersome where we
> > need to add exemptions for the cases where we do need to allow the
> > modeset even for unregistered connectors. One such case is the
> > restoration of the mode during resume.
> 
> Yeah this one actually makes sense to me. We could still keep this check
> here, but for the atomic ioctl only when called from userspace. But iirc
> Lyude also said she has some plans here, so no idea whether that all fits.
> 
> > Simplify things by removing the unneeded check. I can't see how
> > modesetting an unregistered connector can cause any problem and the race
> > (described in the code comment) can anyway result in such a modeset (if
> > the connector is unregistered right after the check).
> 
> Not saying we don't need this, but there's fairly enormous amounts of
> history behind all this stuff, and lots of discussions. Would be good to
> at least reference those, so we have a good story for when this then all
> goes wrong again.

I still don't see why this check is needed. There is no justification
for it - besides the original reason for it as discussed above about the
refcounting problem, which is solved now - so I think we should remove
it, instead of just making it a special case for the user space modeset.

As I wrote a user space modeset can end up anyway doing a modeset on an
unregistered connector when the unregistering - by MST core - happens just
right after the check.

> -Daniel
> 
> > 
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> >  1 file changed, 2 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 2e0cb4246cbd..e94e69483498 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> >  		return 0;
> >  	}
> >  
> > -	crtc_state = drm_atomic_get_new_crtc_state(state,
> > -						   new_connector_state->crtc);
> > -	/*
> > -	 * For compatibility with legacy users, we want to make sure that
> > -	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > -	 * which would result in anything else must be considered invalid, to
> > -	 * avoid turning on new displays on dead connectors.
> > -	 *
> > -	 * Since the connector can be unregistered at any point during an
> > -	 * atomic check or commit, this is racy. But that's OK: all we care
> > -	 * about is ensuring that userspace can't do anything but shut off the
> > -	 * display on a connector that was destroyed after it's been notified,
> > -	 * not before.
> > -	 *
> > -	 * Additionally, we also want to ignore connector registration when
> > -	 * we're trying to restore an atomic state during system resume since
> > -	 * there's a chance the connector may have been destroyed during the
> > -	 * process, but it's better to ignore that then cause
> > -	 * drm_atomic_helper_resume() to fail.
> > -	 */
> > -	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > -	    crtc_state->active) {
> > -		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > -				 connector->base.id, connector->name);
> > -		return -EINVAL;
> > -	}
> > -
> >  	funcs = connector->helper_private;
> >  
> >  	if (funcs->atomic_best_encoder)
> > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> >  
> >  	set_best_encoder(state, new_connector_state, new_encoder);
> >  
> > +	crtc_state = drm_atomic_get_new_crtc_state(state,
> > +						   new_connector_state->crtc);
> >  	crtc_state->connectors_changed = true;
> >  
> >  	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > -- 
> > 2.17.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter May 20, 2019, 7:23 p.m. UTC | #3
On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > We allowed modesetting an unregistered connector only in the case the
> > > mode is getting disabled on the connector.
> > > 
> > > The reason for this check was the lack of proper refcounting for the
> > > backing memory objects. That problem has been solved meanwhile so there
> > > is no reason any more to reject the modesetting in general.
> > 
> > I'm not parsing this at all ... maybe references to the commits that fix
> > this? Or do you mean the refcounting work for all the things hanging of
> > connectors, including the entire mst tree?
> 
> Yes the check was added to solve the issue related to the removal of MST
> connectors that could happen asynchronously wrt. a modeset referring to
> that MST connector.  That could happen since the MST core doesn't hold
> any locks (for instance the connection_mutex) during removing an MST
> connector that would prevent doing a modeset at the same time.
> 
> Adding the refcounting for such MST connectors (via the
> drm_connector_get()/drm_connector_put()) got rid of the above problem.

We added the check way after that stuff landed. Before all the connector
reworking connectors were forcefully disabled by the kernel. The idea
behind this check is to make sure that that userspace notices a connector
is gone (only thing that's not allowed is enabling it, you can keep
pageflipping). I think we've always had behaviour like ever since mst (all
userspace has some "oops mst connector probably gone" failure catching
around modesets).

So no idea what all blows up if we stop catching userspace this way.

Now very much possible I'm getting all this wrong again or missing
something, this stuff is often way over my head. But I'm really vary of
breaking userspace here. E.g. just the drm_connector_get/put lifetime
changes results in some userspace breaking if you unplug/replug fast
enough, because then it doesn't notice the connector change anymore. I
couldn't figure out a way to paper over that regression without
reintroduce the rather broken and oops-prone old connector lifetime
management.

> > > The check
> > > for that also makes driver internal modesets more cumbersome where we
> > > need to add exemptions for the cases where we do need to allow the
> > > modeset even for unregistered connectors. One such case is the
> > > restoration of the mode during resume.
> > 
> > Yeah this one actually makes sense to me. We could still keep this check
> > here, but for the atomic ioctl only when called from userspace. But iirc
> > Lyude also said she has some plans here, so no idea whether that all fits.
> > 
> > > Simplify things by removing the unneeded check. I can't see how
> > > modesetting an unregistered connector can cause any problem and the race
> > > (described in the code comment) can anyway result in such a modeset (if
> > > the connector is unregistered right after the check).
> > 
> > Not saying we don't need this, but there's fairly enormous amounts of
> > history behind all this stuff, and lots of discussions. Would be good to
> > at least reference those, so we have a good story for when this then all
> > goes wrong again.
> 
> I still don't see why this check is needed. There is no justification
> for it - besides the original reason for it as discussed above about the
> refcounting problem, which is solved now - so I think we should remove
> it, instead of just making it a special case for the user space modeset.
> 
> As I wrote a user space modeset can end up anyway doing a modeset on an
> unregistered connector when the unregistering - by MST core - happens just
> right after the check.

Yup. Always been like that.
-Daniel

> 
> > -Daniel
> > 
> > > 
> > > Cc: Lyude Paul <lyude@redhat.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > ---
> > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > index 2e0cb4246cbd..e94e69483498 100644
> > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > >  		return 0;
> > >  	}
> > >  
> > > -	crtc_state = drm_atomic_get_new_crtc_state(state,
> > > -						   new_connector_state->crtc);
> > > -	/*
> > > -	 * For compatibility with legacy users, we want to make sure that
> > > -	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > -	 * which would result in anything else must be considered invalid, to
> > > -	 * avoid turning on new displays on dead connectors.
> > > -	 *
> > > -	 * Since the connector can be unregistered at any point during an
> > > -	 * atomic check or commit, this is racy. But that's OK: all we care
> > > -	 * about is ensuring that userspace can't do anything but shut off the
> > > -	 * display on a connector that was destroyed after it's been notified,
> > > -	 * not before.
> > > -	 *
> > > -	 * Additionally, we also want to ignore connector registration when
> > > -	 * we're trying to restore an atomic state during system resume since
> > > -	 * there's a chance the connector may have been destroyed during the
> > > -	 * process, but it's better to ignore that then cause
> > > -	 * drm_atomic_helper_resume() to fail.
> > > -	 */
> > > -	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > -	    crtc_state->active) {
> > > -		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > -				 connector->base.id, connector->name);
> > > -		return -EINVAL;
> > > -	}
> > > -
> > >  	funcs = connector->helper_private;
> > >  
> > >  	if (funcs->atomic_best_encoder)
> > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > >  
> > >  	set_best_encoder(state, new_connector_state, new_encoder);
> > >  
> > > +	crtc_state = drm_atomic_get_new_crtc_state(state,
> > > +						   new_connector_state->crtc);
> > >  	crtc_state->connectors_changed = true;
> > >  
> > >  	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > -- 
> > > 2.17.1
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Imre Deak May 20, 2019, 8:06 p.m. UTC | #4
On Mon, May 20, 2019 at 09:23:00PM +0200, Daniel Vetter wrote:
> On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> > On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > > We allowed modesetting an unregistered connector only in the case the
> > > > mode is getting disabled on the connector.
> > > > 
> > > > The reason for this check was the lack of proper refcounting for the
> > > > backing memory objects. That problem has been solved meanwhile so there
> > > > is no reason any more to reject the modesetting in general.
> > > 
> > > I'm not parsing this at all ... maybe references to the commits that fix
> > > this? Or do you mean the refcounting work for all the things hanging of
> > > connectors, including the entire mst tree?
> > 
> > Yes the check was added to solve the issue related to the removal of MST
> > connectors that could happen asynchronously wrt. a modeset referring to
> > that MST connector.  That could happen since the MST core doesn't hold
> > any locks (for instance the connection_mutex) during removing an MST
> > connector that would prevent doing a modeset at the same time.
> > 
> > Adding the refcounting for such MST connectors (via the
> > drm_connector_get()/drm_connector_put()) got rid of the above problem.
> 
> We added the check way after that stuff landed. Before all the connector
> reworking connectors were forcefully disabled by the kernel. The idea
> behind this check is to make sure that that userspace notices a connector
> is gone (only thing that's not allowed is enabling it, you can keep
> pageflipping). I think we've always had behaviour like ever since mst (all
> userspace has some "oops mst connector probably gone" failure catching
> around modesets).

Right, pageflipping works.

> So no idea what all blows up if we stop catching userspace this way.
> 
> Now very much possible I'm getting all this wrong again or missing
> something, this stuff is often way over my head. But I'm really vary of
> breaking userspace here. E.g. just the drm_connector_get/put lifetime
> changes results in some userspace breaking if you unplug/replug fast
> enough, because then it doesn't notice the connector change anymore. I
> couldn't figure out a way to paper over that regression without
> reintroduce the rather broken and oops-prone old connector lifetime
> management.

Yes, but what is the the actual description of the failing scenario? I
can't see how anything can go wrong without this check. The resume time
restoration modeset may have to act in the same way on an old connector.

I don't understand how userspace would not notice the connector change.
It will get a hotplug uevent in response to which it would have to do a
detect which returns to it the updated information about the new MST
connector tree.

> > > > The check
> > > > for that also makes driver internal modesets more cumbersome where we
> > > > need to add exemptions for the cases where we do need to allow the
> > > > modeset even for unregistered connectors. One such case is the
> > > > restoration of the mode during resume.
> > > 
> > > Yeah this one actually makes sense to me. We could still keep this check
> > > here, but for the atomic ioctl only when called from userspace. But iirc
> > > Lyude also said she has some plans here, so no idea whether that all fits.
> > > 
> > > > Simplify things by removing the unneeded check. I can't see how
> > > > modesetting an unregistered connector can cause any problem and the race
> > > > (described in the code comment) can anyway result in such a modeset (if
> > > > the connector is unregistered right after the check).
> > > 
> > > Not saying we don't need this, but there's fairly enormous amounts of
> > > history behind all this stuff, and lots of discussions. Would be good to
> > > at least reference those, so we have a good story for when this then all
> > > goes wrong again.
> > 
> > I still don't see why this check is needed. There is no justification
> > for it - besides the original reason for it as discussed above about the
> > refcounting problem, which is solved now - so I think we should remove
> > it, instead of just making it a special case for the user space modeset.
> > 
> > As I wrote a user space modeset can end up anyway doing a modeset on an
> > unregistered connector when the unregistering - by MST core - happens just
> > right after the check.
> 
> Yup. Always been like that.
> -Daniel
> 
> > 
> > > -Daniel
> > > 
> > > > 
> > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > index 2e0cb4246cbd..e94e69483498 100644
> > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > > >  		return 0;
> > > >  	}
> > > >  
> > > > -	crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > -						   new_connector_state->crtc);
> > > > -	/*
> > > > -	 * For compatibility with legacy users, we want to make sure that
> > > > -	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > > -	 * which would result in anything else must be considered invalid, to
> > > > -	 * avoid turning on new displays on dead connectors.
> > > > -	 *
> > > > -	 * Since the connector can be unregistered at any point during an
> > > > -	 * atomic check or commit, this is racy. But that's OK: all we care
> > > > -	 * about is ensuring that userspace can't do anything but shut off the
> > > > -	 * display on a connector that was destroyed after it's been notified,
> > > > -	 * not before.
> > > > -	 *
> > > > -	 * Additionally, we also want to ignore connector registration when
> > > > -	 * we're trying to restore an atomic state during system resume since
> > > > -	 * there's a chance the connector may have been destroyed during the
> > > > -	 * process, but it's better to ignore that then cause
> > > > -	 * drm_atomic_helper_resume() to fail.
> > > > -	 */
> > > > -	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > > -	    crtc_state->active) {
> > > > -		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > -				 connector->base.id, connector->name);
> > > > -		return -EINVAL;
> > > > -	}
> > > > -
> > > >  	funcs = connector->helper_private;
> > > >  
> > > >  	if (funcs->atomic_best_encoder)
> > > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > > >  
> > > >  	set_best_encoder(state, new_connector_state, new_encoder);
> > > >  
> > > > +	crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > +						   new_connector_state->crtc);
> > > >  	crtc_state->connectors_changed = true;
> > > >  
> > > >  	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > > -- 
> > > > 2.17.1
> > > > 
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter May 20, 2019, 8:15 p.m. UTC | #5
On Mon, May 20, 2019 at 10:06 PM Imre Deak <imre.deak@intel.com> wrote:
> On Mon, May 20, 2019 at 09:23:00PM +0200, Daniel Vetter wrote:
> > On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> > > On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > > > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > > > We allowed modesetting an unregistered connector only in the case the
> > > > > mode is getting disabled on the connector.
> > > > >
> > > > > The reason for this check was the lack of proper refcounting for the
> > > > > backing memory objects. That problem has been solved meanwhile so there
> > > > > is no reason any more to reject the modesetting in general.
> > > >
> > > > I'm not parsing this at all ... maybe references to the commits that fix
> > > > this? Or do you mean the refcounting work for all the things hanging of
> > > > connectors, including the entire mst tree?
> > >
> > > Yes the check was added to solve the issue related to the removal of MST
> > > connectors that could happen asynchronously wrt. a modeset referring to
> > > that MST connector.  That could happen since the MST core doesn't hold
> > > any locks (for instance the connection_mutex) during removing an MST
> > > connector that would prevent doing a modeset at the same time.
> > >
> > > Adding the refcounting for such MST connectors (via the
> > > drm_connector_get()/drm_connector_put()) got rid of the above problem.
> >
> > We added the check way after that stuff landed. Before all the connector
> > reworking connectors were forcefully disabled by the kernel. The idea
> > behind this check is to make sure that that userspace notices a connector
> > is gone (only thing that's not allowed is enabling it, you can keep
> > pageflipping). I think we've always had behaviour like ever since mst (all
> > userspace has some "oops mst connector probably gone" failure catching
> > around modesets).
>
> Right, pageflipping works.
>
> > So no idea what all blows up if we stop catching userspace this way.
> >
> > Now very much possible I'm getting all this wrong again or missing
> > something, this stuff is often way over my head. But I'm really vary of
> > breaking userspace here. E.g. just the drm_connector_get/put lifetime
> > changes results in some userspace breaking if you unplug/replug fast
> > enough, because then it doesn't notice the connector change anymore. I
> > couldn't figure out a way to paper over that regression without
> > reintroduce the rather broken and oops-prone old connector lifetime
> > management.
>
> Yes, but what is the the actual description of the failing scenario? I
> can't see how anything can go wrong without this check. The resume time
> restoration modeset may have to act in the same way on an old connector.

That's what the !state->duplicated thing is meant to check for btw.

> I don't understand how userspace would not notice the connector change.
> It will get a hotplug uevent in response to which it would have to do a
> detect which returns to it the updated information about the new MST
> connector tree.

uevent handling can take positively forever, at which point the new
connector could already be plugged in, and then userspace makes a mess
aliasing the two since the path property matches. Just needs a wobbly
cable. This is the issue with the "fixed" lifetime management, and I'm
wary of breaking more stuff.

Other way round: What do we gain if userspace is allowed to turn on a
connector again which doesn't exist and will not ever show any pixels?
I'm not seeing a benefit here in allowing that. And history says we
change something around mst handling, it'll break something somewhere.
-Daniel

>
> > > > > The check
> > > > > for that also makes driver internal modesets more cumbersome where we
> > > > > need to add exemptions for the cases where we do need to allow the
> > > > > modeset even for unregistered connectors. One such case is the
> > > > > restoration of the mode during resume.
> > > >
> > > > Yeah this one actually makes sense to me. We could still keep this check
> > > > here, but for the atomic ioctl only when called from userspace. But iirc
> > > > Lyude also said she has some plans here, so no idea whether that all fits.
> > > >
> > > > > Simplify things by removing the unneeded check. I can't see how
> > > > > modesetting an unregistered connector can cause any problem and the race
> > > > > (described in the code comment) can anyway result in such a modeset (if
> > > > > the connector is unregistered right after the check).
> > > >
> > > > Not saying we don't need this, but there's fairly enormous amounts of
> > > > history behind all this stuff, and lots of discussions. Would be good to
> > > > at least reference those, so we have a good story for when this then all
> > > > goes wrong again.
> > >
> > > I still don't see why this check is needed. There is no justification
> > > for it - besides the original reason for it as discussed above about the
> > > refcounting problem, which is solved now - so I think we should remove
> > > it, instead of just making it a special case for the user space modeset.
> > >
> > > As I wrote a user space modeset can end up anyway doing a modeset on an
> > > unregistered connector when the unregistering - by MST core - happens just
> > > right after the check.
> >
> > Yup. Always been like that.
> > -Daniel
> >
> > >
> > > > -Daniel
> > > >
> > > > >
> > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > > > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > index 2e0cb4246cbd..e94e69483498 100644
> > > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > >                 return 0;
> > > > >         }
> > > > >
> > > > > -       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > -                                                  new_connector_state->crtc);
> > > > > -       /*
> > > > > -        * For compatibility with legacy users, we want to make sure that
> > > > > -        * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > > > -        * which would result in anything else must be considered invalid, to
> > > > > -        * avoid turning on new displays on dead connectors.
> > > > > -        *
> > > > > -        * Since the connector can be unregistered at any point during an
> > > > > -        * atomic check or commit, this is racy. But that's OK: all we care
> > > > > -        * about is ensuring that userspace can't do anything but shut off the
> > > > > -        * display on a connector that was destroyed after it's been notified,
> > > > > -        * not before.
> > > > > -        *
> > > > > -        * Additionally, we also want to ignore connector registration when
> > > > > -        * we're trying to restore an atomic state during system resume since
> > > > > -        * there's a chance the connector may have been destroyed during the
> > > > > -        * process, but it's better to ignore that then cause
> > > > > -        * drm_atomic_helper_resume() to fail.
> > > > > -        */
> > > > > -       if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > > > -           crtc_state->active) {
> > > > > -               DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > > -                                connector->base.id, connector->name);
> > > > > -               return -EINVAL;
> > > > > -       }
> > > > > -
> > > > >         funcs = connector->helper_private;
> > > > >
> > > > >         if (funcs->atomic_best_encoder)
> > > > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > >
> > > > >         set_best_encoder(state, new_connector_state, new_encoder);
> > > > >
> > > > > +       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > +                                                  new_connector_state->crtc);
> > > > >         crtc_state->connectors_changed = true;
> > > > >
> > > > >         DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > > > --
> > > > > 2.17.1
> > > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Imre Deak May 20, 2019, 8:50 p.m. UTC | #6
On Mon, May 20, 2019 at 10:15:20PM +0200, Daniel Vetter wrote:
> On Mon, May 20, 2019 at 10:06 PM Imre Deak <imre.deak@intel.com> wrote:
> > On Mon, May 20, 2019 at 09:23:00PM +0200, Daniel Vetter wrote:
> > > On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> > > > On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > > > > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > > > > We allowed modesetting an unregistered connector only in the case the
> > > > > > mode is getting disabled on the connector.
> > > > > >
> > > > > > The reason for this check was the lack of proper refcounting for the
> > > > > > backing memory objects. That problem has been solved meanwhile so there
> > > > > > is no reason any more to reject the modesetting in general.
> > > > >
> > > > > I'm not parsing this at all ... maybe references to the commits that fix
> > > > > this? Or do you mean the refcounting work for all the things hanging of
> > > > > connectors, including the entire mst tree?
> > > >
> > > > Yes the check was added to solve the issue related to the removal of MST
> > > > connectors that could happen asynchronously wrt. a modeset referring to
> > > > that MST connector.  That could happen since the MST core doesn't hold
> > > > any locks (for instance the connection_mutex) during removing an MST
> > > > connector that would prevent doing a modeset at the same time.
> > > >
> > > > Adding the refcounting for such MST connectors (via the
> > > > drm_connector_get()/drm_connector_put()) got rid of the above problem.
> > >
> > > We added the check way after that stuff landed. Before all the connector
> > > reworking connectors were forcefully disabled by the kernel. The idea
> > > behind this check is to make sure that that userspace notices a connector
> > > is gone (only thing that's not allowed is enabling it, you can keep
> > > pageflipping). I think we've always had behaviour like ever since mst (all
> > > userspace has some "oops mst connector probably gone" failure catching
> > > around modesets).
> >
> > Right, pageflipping works.
> >
> > > So no idea what all blows up if we stop catching userspace this way.
> > >
> > > Now very much possible I'm getting all this wrong again or missing
> > > something, this stuff is often way over my head. But I'm really vary of
> > > breaking userspace here. E.g. just the drm_connector_get/put lifetime
> > > changes results in some userspace breaking if you unplug/replug fast
> > > enough, because then it doesn't notice the connector change anymore. I
> > > couldn't figure out a way to paper over that regression without
> > > reintroduce the rather broken and oops-prone old connector lifetime
> > > management.
> >
> > Yes, but what is the the actual description of the failing scenario? I
> > can't see how anything can go wrong without this check. The resume time
> > restoration modeset may have to act in the same way on an old connector.
> 
> That's what the !state->duplicated thing is meant to check for btw.
> 
> > I don't understand how userspace would not notice the connector change.
> > It will get a hotplug uevent in response to which it would have to do a
> > detect which returns to it the updated information about the new MST
> > connector tree.
> 
> uevent handling can take positively forever, at which point the new
> connector could already be plugged in, and then userspace makes a mess
> aliasing the two since the path property matches. Just needs a wobbly
> cable. This is the issue with the "fixed" lifetime management, and I'm
> wary of breaking more stuff.

Yea, processing can be delayed arbitrarily, the corresponding
GETRESOURCES/GETCONNECTOR should still return to the userspace the
correct information to do right thing, even if the path property
matches: the connector ID for the old and new connector will be
different and the GETCONNECTOR for the old ID will return properly that
this old connector is already disconnected (see intel_dp_mst_detect()).
So I don't see how userspace could mess up things. Obviously if user
space is buggy, then well, it is just buggy and then that user space bug
would need to be fixed instead of papering over such problems in the
kernel.

> Other way round: What do we gain if userspace is allowed to turn on a
> connector again which doesn't exist and will not ever show any pixels?
> I'm not seeing a benefit here in allowing that. And history says we
> change something around mst handling, it'll break something somewhere.

Let's then describe the actual reason for this check, that description
is missing.

We should remove this check to simplify things if there isn't an actual
need for it. Userspace can do already a modeset on connectors in general
that are not in a connected state. So this special casing - for MST
connectors only - is a bad idea if there is no reason for special casing
it. If it's there to paper over some user space bug that user space bug
should be fixed instead of adding this special casing which adds
complexity to the driver.

The only justification for a check in the driver if something could blow
up in the kernel without that check. Nothing can blow up in the kernel
without this check. Every other failure scenario should be handled/fixed
in userspace.

> -Daniel
> 
> >
> > > > > > The check
> > > > > > for that also makes driver internal modesets more cumbersome where we
> > > > > > need to add exemptions for the cases where we do need to allow the
> > > > > > modeset even for unregistered connectors. One such case is the
> > > > > > restoration of the mode during resume.
> > > > >
> > > > > Yeah this one actually makes sense to me. We could still keep this check
> > > > > here, but for the atomic ioctl only when called from userspace. But iirc
> > > > > Lyude also said she has some plans here, so no idea whether that all fits.
> > > > >
> > > > > > Simplify things by removing the unneeded check. I can't see how
> > > > > > modesetting an unregistered connector can cause any problem and the race
> > > > > > (described in the code comment) can anyway result in such a modeset (if
> > > > > > the connector is unregistered right after the check).
> > > > >
> > > > > Not saying we don't need this, but there's fairly enormous amounts of
> > > > > history behind all this stuff, and lots of discussions. Would be good to
> > > > > at least reference those, so we have a good story for when this then all
> > > > > goes wrong again.
> > > >
> > > > I still don't see why this check is needed. There is no justification
> > > > for it - besides the original reason for it as discussed above about the
> > > > refcounting problem, which is solved now - so I think we should remove
> > > > it, instead of just making it a special case for the user space modeset.
> > > >
> > > > As I wrote a user space modeset can end up anyway doing a modeset on an
> > > > unregistered connector when the unregistering - by MST core - happens just
> > > > right after the check.
> > >
> > > Yup. Always been like that.
> > > -Daniel
> > >
> > > >
> > > > > -Daniel
> > > > >
> > > > > >
> > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > > > > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > index 2e0cb4246cbd..e94e69483498 100644
> > > > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > >                 return 0;
> > > > > >         }
> > > > > >
> > > > > > -       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > -                                                  new_connector_state->crtc);
> > > > > > -       /*
> > > > > > -        * For compatibility with legacy users, we want to make sure that
> > > > > > -        * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > > > > -        * which would result in anything else must be considered invalid, to
> > > > > > -        * avoid turning on new displays on dead connectors.
> > > > > > -        *
> > > > > > -        * Since the connector can be unregistered at any point during an
> > > > > > -        * atomic check or commit, this is racy. But that's OK: all we care
> > > > > > -        * about is ensuring that userspace can't do anything but shut off the
> > > > > > -        * display on a connector that was destroyed after it's been notified,
> > > > > > -        * not before.
> > > > > > -        *
> > > > > > -        * Additionally, we also want to ignore connector registration when
> > > > > > -        * we're trying to restore an atomic state during system resume since
> > > > > > -        * there's a chance the connector may have been destroyed during the
> > > > > > -        * process, but it's better to ignore that then cause
> > > > > > -        * drm_atomic_helper_resume() to fail.
> > > > > > -        */
> > > > > > -       if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > > > > -           crtc_state->active) {
> > > > > > -               DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > > > -                                connector->base.id, connector->name);
> > > > > > -               return -EINVAL;
> > > > > > -       }
> > > > > > -
> > > > > >         funcs = connector->helper_private;
> > > > > >
> > > > > >         if (funcs->atomic_best_encoder)
> > > > > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > >
> > > > > >         set_best_encoder(state, new_connector_state, new_encoder);
> > > > > >
> > > > > > +       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > +                                                  new_connector_state->crtc);
> > > > > >         crtc_state->connectors_changed = true;
> > > > > >
> > > > > >         DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > > > > --
> > > > > > 2.17.1
> > > > > >
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter May 20, 2019, 8:59 p.m. UTC | #7
On Mon, May 20, 2019 at 10:51 PM Imre Deak <imre.deak@intel.com> wrote:
>
> On Mon, May 20, 2019 at 10:15:20PM +0200, Daniel Vetter wrote:
> > On Mon, May 20, 2019 at 10:06 PM Imre Deak <imre.deak@intel.com> wrote:
> > > On Mon, May 20, 2019 at 09:23:00PM +0200, Daniel Vetter wrote:
> > > > On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> > > > > On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > > > > > We allowed modesetting an unregistered connector only in the case the
> > > > > > > mode is getting disabled on the connector.
> > > > > > >
> > > > > > > The reason for this check was the lack of proper refcounting for the
> > > > > > > backing memory objects. That problem has been solved meanwhile so there
> > > > > > > is no reason any more to reject the modesetting in general.
> > > > > >
> > > > > > I'm not parsing this at all ... maybe references to the commits that fix
> > > > > > this? Or do you mean the refcounting work for all the things hanging of
> > > > > > connectors, including the entire mst tree?
> > > > >
> > > > > Yes the check was added to solve the issue related to the removal of MST
> > > > > connectors that could happen asynchronously wrt. a modeset referring to
> > > > > that MST connector.  That could happen since the MST core doesn't hold
> > > > > any locks (for instance the connection_mutex) during removing an MST
> > > > > connector that would prevent doing a modeset at the same time.
> > > > >
> > > > > Adding the refcounting for such MST connectors (via the
> > > > > drm_connector_get()/drm_connector_put()) got rid of the above problem.
> > > >
> > > > We added the check way after that stuff landed. Before all the connector
> > > > reworking connectors were forcefully disabled by the kernel. The idea
> > > > behind this check is to make sure that that userspace notices a connector
> > > > is gone (only thing that's not allowed is enabling it, you can keep
> > > > pageflipping). I think we've always had behaviour like ever since mst (all
> > > > userspace has some "oops mst connector probably gone" failure catching
> > > > around modesets).
> > >
> > > Right, pageflipping works.
> > >
> > > > So no idea what all blows up if we stop catching userspace this way.
> > > >
> > > > Now very much possible I'm getting all this wrong again or missing
> > > > something, this stuff is often way over my head. But I'm really vary of
> > > > breaking userspace here. E.g. just the drm_connector_get/put lifetime
> > > > changes results in some userspace breaking if you unplug/replug fast
> > > > enough, because then it doesn't notice the connector change anymore. I
> > > > couldn't figure out a way to paper over that regression without
> > > > reintroduce the rather broken and oops-prone old connector lifetime
> > > > management.
> > >
> > > Yes, but what is the the actual description of the failing scenario? I
> > > can't see how anything can go wrong without this check. The resume time
> > > restoration modeset may have to act in the same way on an old connector.
> >
> > That's what the !state->duplicated thing is meant to check for btw.
> >
> > > I don't understand how userspace would not notice the connector change.
> > > It will get a hotplug uevent in response to which it would have to do a
> > > detect which returns to it the updated information about the new MST
> > > connector tree.
> >
> > uevent handling can take positively forever, at which point the new
> > connector could already be plugged in, and then userspace makes a mess
> > aliasing the two since the path property matches. Just needs a wobbly
> > cable. This is the issue with the "fixed" lifetime management, and I'm
> > wary of breaking more stuff.
>
> Yea, processing can be delayed arbitrarily, the corresponding
> GETRESOURCES/GETCONNECTOR should still return to the userspace the
> correct information to do right thing, even if the path property
> matches: the connector ID for the old and new connector will be
> different and the GETCONNECTOR for the old ID will return properly that
> this old connector is already disconnected (see intel_dp_mst_detect()).
> So I don't see how userspace could mess up things. Obviously if user
> space is buggy, then well, it is just buggy and then that user space bug
> would need to be fixed instead of papering over such problems in the
> kernel.

Uh that's not how this "kernel change caused a regression" thing
works. Dave just dropped a rant about this and i915 recently ...

> > Other way round: What do we gain if userspace is allowed to turn on a
> > connector again which doesn't exist and will not ever show any pixels?
> > I'm not seeing a benefit here in allowing that. And history says we
> > change something around mst handling, it'll break something somewhere.
>
> Let's then describe the actual reason for this check, that description
> is missing.
>
> We should remove this check to simplify things if there isn't an actual
> need for it. Userspace can do already a modeset on connectors in general
> that are not in a connected state. So this special casing - for MST
> connectors only - is a bad idea if there is no reason for special casing
> it. If it's there to paper over some user space bug that user space bug
> should be fixed instead of adding this special casing which adds
> complexity to the driver.
>
> The only justification for a check in the driver if something could blow
> up in the kernel without that check. Nothing can blow up in the kernel
> without this check. Every other failure scenario should be handled/fixed
> in userspace.

So I thought you need this because it's a pain on resume, but that
should be handled with the state->duplicated check. I guess I still
don't understand why you want to nuke this check, it seemed like it's
getting in the way somewhere.

If it's not getting in the way then imo let's not touch stuff, I'm too
much burned child in this area.
-Daniel

> > -Daniel
> >
> > >
> > > > > > > The check
> > > > > > > for that also makes driver internal modesets more cumbersome where we
> > > > > > > need to add exemptions for the cases where we do need to allow the
> > > > > > > modeset even for unregistered connectors. One such case is the
> > > > > > > restoration of the mode during resume.
> > > > > >
> > > > > > Yeah this one actually makes sense to me. We could still keep this check
> > > > > > here, but for the atomic ioctl only when called from userspace. But iirc
> > > > > > Lyude also said she has some plans here, so no idea whether that all fits.
> > > > > >
> > > > > > > Simplify things by removing the unneeded check. I can't see how
> > > > > > > modesetting an unregistered connector can cause any problem and the race
> > > > > > > (described in the code comment) can anyway result in such a modeset (if
> > > > > > > the connector is unregistered right after the check).
> > > > > >
> > > > > > Not saying we don't need this, but there's fairly enormous amounts of
> > > > > > history behind all this stuff, and lots of discussions. Would be good to
> > > > > > at least reference those, so we have a good story for when this then all
> > > > > > goes wrong again.
> > > > >
> > > > > I still don't see why this check is needed. There is no justification
> > > > > for it - besides the original reason for it as discussed above about the
> > > > > refcounting problem, which is solved now - so I think we should remove
> > > > > it, instead of just making it a special case for the user space modeset.
> > > > >
> > > > > As I wrote a user space modeset can end up anyway doing a modeset on an
> > > > > unregistered connector when the unregistering - by MST core - happens just
> > > > > right after the check.
> > > >
> > > > Yup. Always been like that.
> > > > -Daniel
> > > >
> > > > >
> > > > > > -Daniel
> > > > > >
> > > > > > >
> > > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > > > > > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > index 2e0cb4246cbd..e94e69483498 100644
> > > > > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > > >                 return 0;
> > > > > > >         }
> > > > > > >
> > > > > > > -       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > > -                                                  new_connector_state->crtc);
> > > > > > > -       /*
> > > > > > > -        * For compatibility with legacy users, we want to make sure that
> > > > > > > -        * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > > > > > -        * which would result in anything else must be considered invalid, to
> > > > > > > -        * avoid turning on new displays on dead connectors.
> > > > > > > -        *
> > > > > > > -        * Since the connector can be unregistered at any point during an
> > > > > > > -        * atomic check or commit, this is racy. But that's OK: all we care
> > > > > > > -        * about is ensuring that userspace can't do anything but shut off the
> > > > > > > -        * display on a connector that was destroyed after it's been notified,
> > > > > > > -        * not before.
> > > > > > > -        *
> > > > > > > -        * Additionally, we also want to ignore connector registration when
> > > > > > > -        * we're trying to restore an atomic state during system resume since
> > > > > > > -        * there's a chance the connector may have been destroyed during the
> > > > > > > -        * process, but it's better to ignore that then cause
> > > > > > > -        * drm_atomic_helper_resume() to fail.
> > > > > > > -        */
> > > > > > > -       if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > > > > > -           crtc_state->active) {
> > > > > > > -               DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > > > > -                                connector->base.id, connector->name);
> > > > > > > -               return -EINVAL;
> > > > > > > -       }
> > > > > > > -
> > > > > > >         funcs = connector->helper_private;
> > > > > > >
> > > > > > >         if (funcs->atomic_best_encoder)
> > > > > > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > > >
> > > > > > >         set_best_encoder(state, new_connector_state, new_encoder);
> > > > > > >
> > > > > > > +       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > > +                                                  new_connector_state->crtc);
> > > > > > >         crtc_state->connectors_changed = true;
> > > > > > >
> > > > > > >         DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > > > > > --
> > > > > > > 2.17.1
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Daniel Vetter
> > > > > > Software Engineer, Intel Corporation
> > > > > > http://blog.ffwll.ch
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Imre Deak May 20, 2019, 9:16 p.m. UTC | #8
On Mon, May 20, 2019 at 10:59:35PM +0200, Daniel Vetter wrote:
> On Mon, May 20, 2019 at 10:51 PM Imre Deak <imre.deak@intel.com> wrote:
> >
> > On Mon, May 20, 2019 at 10:15:20PM +0200, Daniel Vetter wrote:
> > > On Mon, May 20, 2019 at 10:06 PM Imre Deak <imre.deak@intel.com> wrote:
> > > > On Mon, May 20, 2019 at 09:23:00PM +0200, Daniel Vetter wrote:
> > > > > On Mon, May 20, 2019 at 10:09:24PM +0300, Imre Deak wrote:
> > > > > > On Mon, May 20, 2019 at 08:37:46PM +0200, Daniel Vetter wrote:
> > > > > > > On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> > > > > > > > We allowed modesetting an unregistered connector only in the case the
> > > > > > > > mode is getting disabled on the connector.
> > > > > > > >
> > > > > > > > The reason for this check was the lack of proper refcounting for the
> > > > > > > > backing memory objects. That problem has been solved meanwhile so there
> > > > > > > > is no reason any more to reject the modesetting in general.
> > > > > > >
> > > > > > > I'm not parsing this at all ... maybe references to the commits that fix
> > > > > > > this? Or do you mean the refcounting work for all the things hanging of
> > > > > > > connectors, including the entire mst tree?
> > > > > >
> > > > > > Yes the check was added to solve the issue related to the removal of MST
> > > > > > connectors that could happen asynchronously wrt. a modeset referring to
> > > > > > that MST connector.  That could happen since the MST core doesn't hold
> > > > > > any locks (for instance the connection_mutex) during removing an MST
> > > > > > connector that would prevent doing a modeset at the same time.
> > > > > >
> > > > > > Adding the refcounting for such MST connectors (via the
> > > > > > drm_connector_get()/drm_connector_put()) got rid of the above problem.
> > > > >
> > > > > We added the check way after that stuff landed. Before all the connector
> > > > > reworking connectors were forcefully disabled by the kernel. The idea
> > > > > behind this check is to make sure that that userspace notices a connector
> > > > > is gone (only thing that's not allowed is enabling it, you can keep
> > > > > pageflipping). I think we've always had behaviour like ever since mst (all
> > > > > userspace has some "oops mst connector probably gone" failure catching
> > > > > around modesets).
> > > >
> > > > Right, pageflipping works.
> > > >
> > > > > So no idea what all blows up if we stop catching userspace this way.
> > > > >
> > > > > Now very much possible I'm getting all this wrong again or missing
> > > > > something, this stuff is often way over my head. But I'm really vary of
> > > > > breaking userspace here. E.g. just the drm_connector_get/put lifetime
> > > > > changes results in some userspace breaking if you unplug/replug fast
> > > > > enough, because then it doesn't notice the connector change anymore. I
> > > > > couldn't figure out a way to paper over that regression without
> > > > > reintroduce the rather broken and oops-prone old connector lifetime
> > > > > management.
> > > >
> > > > Yes, but what is the the actual description of the failing scenario? I
> > > > can't see how anything can go wrong without this check. The resume time
> > > > restoration modeset may have to act in the same way on an old connector.
> > >
> > > That's what the !state->duplicated thing is meant to check for btw.
> > >
> > > > I don't understand how userspace would not notice the connector change.
> > > > It will get a hotplug uevent in response to which it would have to do a
> > > > detect which returns to it the updated information about the new MST
> > > > connector tree.
> > >
> > > uevent handling can take positively forever, at which point the new
> > > connector could already be plugged in, and then userspace makes a mess
> > > aliasing the two since the path property matches. Just needs a wobbly
> > > cable. This is the issue with the "fixed" lifetime management, and I'm
> > > wary of breaking more stuff.
> >
> > Yea, processing can be delayed arbitrarily, the corresponding
> > GETRESOURCES/GETCONNECTOR should still return to the userspace the
> > correct information to do right thing, even if the path property
> > matches: the connector ID for the old and new connector will be
> > different and the GETCONNECTOR for the old ID will return properly that
> > this old connector is already disconnected (see intel_dp_mst_detect()).
> > So I don't see how userspace could mess up things. Obviously if user
> > space is buggy, then well, it is just buggy and then that user space bug
> > would need to be fixed instead of papering over such problems in the
> > kernel.
> 
> Uh that's not how this "kernel change caused a regression" thing
> works. Dave just dropped a rant about this and i915 recently ...

I still don't see what would depend on the current behavior instead of
acting on the hotplug uevents properly. 

> > > Other way round: What do we gain if userspace is allowed to turn on a
> > > connector again which doesn't exist and will not ever show any pixels?
> > > I'm not seeing a benefit here in allowing that. And history says we
> > > change something around mst handling, it'll break something somewhere.
> >
> > Let's then describe the actual reason for this check, that description
> > is missing.
> >
> > We should remove this check to simplify things if there isn't an actual
> > need for it. Userspace can do already a modeset on connectors in general
> > that are not in a connected state. So this special casing - for MST
> > connectors only - is a bad idea if there is no reason for special casing
> > it. If it's there to paper over some user space bug that user space bug
> > should be fixed instead of adding this special casing which adds
> > complexity to the driver.
> >
> > The only justification for a check in the driver if something could blow
> > up in the kernel without that check. Nothing can blow up in the kernel
> > without this check. Every other failure scenario should be handled/fixed
> > in userspace.
> 
> So I thought you need this because it's a pain on resume, but that
> should be handled with the state->duplicated check. I guess I still
> don't understand why you want to nuke this check, it seemed like it's
> getting in the way somewhere.
> 
> If it's not getting in the way then imo let's not touch stuff, I'm too
> much burned child in this area.

It's a pain to maintain such papering overs in the kernel, that nobody
knows what purpose they serve. What I'm asking here is to preserve this
check here only if there is a purpose for it along with a precise
description of its purpose. If noone can give an explanation for it -
or how it can lead to a regression - then yes, let's remove it to
simplify things.

> -Daniel
> 
> > > -Daniel
> > >
> > > >
> > > > > > > > The check
> > > > > > > > for that also makes driver internal modesets more cumbersome where we
> > > > > > > > need to add exemptions for the cases where we do need to allow the
> > > > > > > > modeset even for unregistered connectors. One such case is the
> > > > > > > > restoration of the mode during resume.
> > > > > > >
> > > > > > > Yeah this one actually makes sense to me. We could still keep this check
> > > > > > > here, but for the atomic ioctl only when called from userspace. But iirc
> > > > > > > Lyude also said she has some plans here, so no idea whether that all fits.
> > > > > > >
> > > > > > > > Simplify things by removing the unneeded check. I can't see how
> > > > > > > > modesetting an unregistered connector can cause any problem and the race
> > > > > > > > (described in the code comment) can anyway result in such a modeset (if
> > > > > > > > the connector is unregistered right after the check).
> > > > > > >
> > > > > > > Not saying we don't need this, but there's fairly enormous amounts of
> > > > > > > history behind all this stuff, and lots of discussions. Would be good to
> > > > > > > at least reference those, so we have a good story for when this then all
> > > > > > > goes wrong again.
> > > > > >
> > > > > > I still don't see why this check is needed. There is no justification
> > > > > > for it - besides the original reason for it as discussed above about the
> > > > > > refcounting problem, which is solved now - so I think we should remove
> > > > > > it, instead of just making it a special case for the user space modeset.
> > > > > >
> > > > > > As I wrote a user space modeset can end up anyway doing a modeset on an
> > > > > > unregistered connector when the unregistering - by MST core - happens just
> > > > > > right after the check.
> > > > >
> > > > > Yup. Always been like that.
> > > > > -Daniel
> > > > >
> > > > > >
> > > > > > > -Daniel
> > > > > > >
> > > > > > > >
> > > > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
> > > > > > > >  1 file changed, 2 insertions(+), 27 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > > index 2e0cb4246cbd..e94e69483498 100644
> > > > > > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > > > > @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > > > >                 return 0;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > -       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > > > -                                                  new_connector_state->crtc);
> > > > > > > > -       /*
> > > > > > > > -        * For compatibility with legacy users, we want to make sure that
> > > > > > > > -        * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> > > > > > > > -        * which would result in anything else must be considered invalid, to
> > > > > > > > -        * avoid turning on new displays on dead connectors.
> > > > > > > > -        *
> > > > > > > > -        * Since the connector can be unregistered at any point during an
> > > > > > > > -        * atomic check or commit, this is racy. But that's OK: all we care
> > > > > > > > -        * about is ensuring that userspace can't do anything but shut off the
> > > > > > > > -        * display on a connector that was destroyed after it's been notified,
> > > > > > > > -        * not before.
> > > > > > > > -        *
> > > > > > > > -        * Additionally, we also want to ignore connector registration when
> > > > > > > > -        * we're trying to restore an atomic state during system resume since
> > > > > > > > -        * there's a chance the connector may have been destroyed during the
> > > > > > > > -        * process, but it's better to ignore that then cause
> > > > > > > > -        * drm_atomic_helper_resume() to fail.
> > > > > > > > -        */
> > > > > > > > -       if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> > > > > > > > -           crtc_state->active) {
> > > > > > > > -               DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> > > > > > > > -                                connector->base.id, connector->name);
> > > > > > > > -               return -EINVAL;
> > > > > > > > -       }
> > > > > > > > -
> > > > > > > >         funcs = connector->helper_private;
> > > > > > > >
> > > > > > > >         if (funcs->atomic_best_encoder)
> > > > > > > > @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
> > > > > > > >
> > > > > > > >         set_best_encoder(state, new_connector_state, new_encoder);
> > > > > > > >
> > > > > > > > +       crtc_state = drm_atomic_get_new_crtc_state(state,
> > > > > > > > +                                                  new_connector_state->crtc);
> > > > > > > >         crtc_state->connectors_changed = true;
> > > > > > > >
> > > > > > > >         DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> > > > > > > > --
> > > > > > > > 2.17.1
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Daniel Vetter
> > > > > > > Software Engineer, Intel Corporation
> > > > > > > http://blog.ffwll.ch
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Imre Deak May 21, 2019, 12:01 a.m. UTC | #9
On Mon, May 20, 2019 at 08:41:09PM +0300, Imre Deak wrote:
> We allowed modesetting an unregistered connector only in the case the
> mode is getting disabled on the connector.
> 
> The reason for this check was the lack of proper refcounting for the
> backing memory objects. That problem has been solved meanwhile so there
> is no reason any more to reject the modesetting in general. The check
> for that also makes driver internal modesets more cumbersome where we
> need to add exemptions for the cases where we do need to allow the
> modeset even for unregistered connectors. One such case is the
> restoration of the mode during resume.
> 
> Simplify things by removing the unneeded check. I can't see how
> modesetting an unregistered connector can cause any problem and the race
> (described in the code comment) can anyway result in such a modeset (if
> the connector is unregistered right after the check).
> 
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Imre Deak <imre.deak@intel.com>

Lyude, could you test this change against the DDX fail scenario that the
check was added for originally (based on our IRC discussion)?

Thanks,
Imre

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 29 ++---------------------------
>  1 file changed, 2 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 2e0cb4246cbd..e94e69483498 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -319,33 +319,6 @@ update_connector_routing(struct drm_atomic_state *state,
>  		return 0;
>  	}
>  
> -	crtc_state = drm_atomic_get_new_crtc_state(state,
> -						   new_connector_state->crtc);
> -	/*
> -	 * For compatibility with legacy users, we want to make sure that
> -	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
> -	 * which would result in anything else must be considered invalid, to
> -	 * avoid turning on new displays on dead connectors.
> -	 *
> -	 * Since the connector can be unregistered at any point during an
> -	 * atomic check or commit, this is racy. But that's OK: all we care
> -	 * about is ensuring that userspace can't do anything but shut off the
> -	 * display on a connector that was destroyed after it's been notified,
> -	 * not before.
> -	 *
> -	 * Additionally, we also want to ignore connector registration when
> -	 * we're trying to restore an atomic state during system resume since
> -	 * there's a chance the connector may have been destroyed during the
> -	 * process, but it's better to ignore that then cause
> -	 * drm_atomic_helper_resume() to fail.
> -	 */
> -	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
> -	    crtc_state->active) {
> -		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
> -				 connector->base.id, connector->name);
> -		return -EINVAL;
> -	}
> -
>  	funcs = connector->helper_private;
>  
>  	if (funcs->atomic_best_encoder)
> @@ -390,6 +363,8 @@ update_connector_routing(struct drm_atomic_state *state,
>  
>  	set_best_encoder(state, new_connector_state, new_encoder);
>  
> +	crtc_state = drm_atomic_get_new_crtc_state(state,
> +						   new_connector_state->crtc);
>  	crtc_state->connectors_changed = true;
>  
>  	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",
> -- 
> 2.17.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 2e0cb4246cbd..e94e69483498 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -319,33 +319,6 @@  update_connector_routing(struct drm_atomic_state *state,
 		return 0;
 	}
 
-	crtc_state = drm_atomic_get_new_crtc_state(state,
-						   new_connector_state->crtc);
-	/*
-	 * For compatibility with legacy users, we want to make sure that
-	 * we allow DPMS On->Off modesets on unregistered connectors. Modesets
-	 * which would result in anything else must be considered invalid, to
-	 * avoid turning on new displays on dead connectors.
-	 *
-	 * Since the connector can be unregistered at any point during an
-	 * atomic check or commit, this is racy. But that's OK: all we care
-	 * about is ensuring that userspace can't do anything but shut off the
-	 * display on a connector that was destroyed after it's been notified,
-	 * not before.
-	 *
-	 * Additionally, we also want to ignore connector registration when
-	 * we're trying to restore an atomic state during system resume since
-	 * there's a chance the connector may have been destroyed during the
-	 * process, but it's better to ignore that then cause
-	 * drm_atomic_helper_resume() to fail.
-	 */
-	if (!state->duplicated && drm_connector_is_unregistered(connector) &&
-	    crtc_state->active) {
-		DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] is not registered\n",
-				 connector->base.id, connector->name);
-		return -EINVAL;
-	}
-
 	funcs = connector->helper_private;
 
 	if (funcs->atomic_best_encoder)
@@ -390,6 +363,8 @@  update_connector_routing(struct drm_atomic_state *state,
 
 	set_best_encoder(state, new_connector_state, new_encoder);
 
+	crtc_state = drm_atomic_get_new_crtc_state(state,
+						   new_connector_state->crtc);
 	crtc_state->connectors_changed = true;
 
 	DRM_DEBUG_ATOMIC("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n",