diff mbox series

drm/dp_mst: Skip validating ports during destruction, just ref

Message ID 20181113224613.28809-1-lyude@redhat.com (mailing list archive)
State New, archived
Headers show
Series drm/dp_mst: Skip validating ports during destruction, just ref | expand

Commit Message

Lyude Paul Nov. 13, 2018, 10:46 p.m. UTC
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I
accidentally introduced into DRM two years ago.

Pretend we have a topology like this:

|- DP-1: mst_primary
   |- DP-4: active display
   |- DP-5: disconnected
   |- DP-6: active hub
      |- DP-7: active display
      |- DP-8: disconnected
      |- DP-9: disconnected

If we unplug DP-6, the topology starting at DP-7 will be destroyed but
it's payloads will live on in DP-1's VCPI allocations and thus require
removal. However, this removal currently fails because
drm_dp_update_payload_part1() will (rightly so) try to validate the port
before accessing it, fail then abort. If we keep going, eventually we
run the MST hub out of bandwidth and all new allocations will start to
fail (or in my case; all new displays just start flickering a ton).

We could just teach drm_dp_update_payload_part1() not to drop the port
ref in this case, but then we also need to teach
drm_dp_destroy_payload_step1() to do the same thing, then hope no one
ever adds anything to the that requires a validated port reference in
drm_dp_destroy_connector_work(). Kind of sketchy.

So let's go with a more clever solution: any port that
drm_dp_destroy_connector_work() interacts with is guaranteed to still
exist in memory until we say so. While said port might not be valid we
don't really care: that's the whole reason we're destroying it in the
first place! So, teach drm_dp_get_validated_port_ref() to use the all
mighty current_work() function to avoid attempting to validate ports
from the context of mgr->destroy_connector_work. I can't see any
situation where this wouldn't be safe, and this avoids having to play
whack-a-mole in the future of trying to work around port validation.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
Reported-by: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: <stable@vger.kernel.org> # v4.6+
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Sasha Levin Nov. 15, 2018, 7:06 a.m. UTC | #1
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 263efde31f97 drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1().

The bot has tested the following trees: v4.19.2, v4.18.19, v4.14.81, v4.9.137, v4.4.163, v3.18.125.

v4.19.2: Build OK!
v4.18.19: Build OK!
v4.14.81: Build OK!
v4.9.137: Build OK!
v4.4.163: Build OK!
v3.18.125: Build failed! Errors:
    drivers/gpu/drm/drm_dp_mst_topology.c:944:6: error: implicit declaration of function ‘current_work’; did you mean ‘current_umask’? [-Werror=implicit-function-declaration]
    drivers/gpu/drm/drm_dp_mst_topology.c:944:28: error: ‘struct drm_dp_mst_topology_mgr’ has no member named ‘destroy_connector_work’


--
Thanks,
Sasha
Dave Airlie Nov. 21, 2018, 8:08 p.m. UTC | #2
On Wed, 14 Nov 2018 at 08:47, Lyude Paul <lyude@redhat.com> wrote:
>
> Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I
> accidentally introduced into DRM two years ago.
>
> Pretend we have a topology like this:
>
> |- DP-1: mst_primary
>    |- DP-4: active display
>    |- DP-5: disconnected
>    |- DP-6: active hub
>       |- DP-7: active display
>       |- DP-8: disconnected
>       |- DP-9: disconnected
>
> If we unplug DP-6, the topology starting at DP-7 will be destroyed but
> it's payloads will live on in DP-1's VCPI allocations and thus require
> removal. However, this removal currently fails because
> drm_dp_update_payload_part1() will (rightly so) try to validate the port
> before accessing it, fail then abort. If we keep going, eventually we
> run the MST hub out of bandwidth and all new allocations will start to
> fail (or in my case; all new displays just start flickering a ton).
>
> We could just teach drm_dp_update_payload_part1() not to drop the port
> ref in this case, but then we also need to teach
> drm_dp_destroy_payload_step1() to do the same thing, then hope no one
> ever adds anything to the that requires a validated port reference in
> drm_dp_destroy_connector_work(). Kind of sketchy.
>
> So let's go with a more clever solution: any port that
> drm_dp_destroy_connector_work() interacts with is guaranteed to still
> exist in memory until we say so. While said port might not be valid we
> don't really care: that's the whole reason we're destroying it in the
> first place! So, teach drm_dp_get_validated_port_ref() to use the all
> mighty current_work() function to avoid attempting to validate ports
> from the context of mgr->destroy_connector_work. I can't see any
> situation where this wouldn't be safe, and this avoids having to play
> whack-a-mole in the future of trying to work around port validation.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
> Reported-by: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Harry Wentland <Harry.Wentland@amd.com>
> Cc: <stable@vger.kernel.org> # v4.6+

Much as I constantly feel we are missing some better way to do all of
this, this seems like a reasonable fix.

Reviewed-by: Dave Airlie <airlied@redhat.com>

> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 529414556962..08978ad72f33 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
>  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
>         struct drm_dp_mst_port *rport = NULL;
> +
>         mutex_lock(&mgr->lock);
> -       if (mgr->mst_primary)
> -               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
> +       /*
> +        * Port may or may not be 'valid' but we don't care about that when
> +        * destroying the port and we are guaranteed that the port pointer
> +        * will be valid until we've finished
> +        */
> +       if (current_work() == &mgr->destroy_connector_work) {
> +               kref_get(&port->kref);
> +               rport = port;
> +       } else if (mgr->mst_primary) {
> +               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> +                                                      port);
> +       }
>         mutex_unlock(&mgr->lock);
>         return rport;
>  }
> --
> 2.19.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Lyude Paul Nov. 21, 2018, 8:24 p.m. UTC | #3
Pushed to drm-misc-fixes, thanks!

On Thu, 2018-11-22 at 06:08 +1000, Dave Airlie wrote:
> On Wed, 14 Nov 2018 at 08:47, Lyude Paul <lyude@redhat.com> wrote:
> > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I
> > accidentally introduced into DRM two years ago.
> > 
> > Pretend we have a topology like this:
> > 
> > > - DP-1: mst_primary
> >    |- DP-4: active display
> >    |- DP-5: disconnected
> >    |- DP-6: active hub
> >       |- DP-7: active display
> >       |- DP-8: disconnected
> >       |- DP-9: disconnected
> > 
> > If we unplug DP-6, the topology starting at DP-7 will be destroyed but
> > it's payloads will live on in DP-1's VCPI allocations and thus require
> > removal. However, this removal currently fails because
> > drm_dp_update_payload_part1() will (rightly so) try to validate the port
> > before accessing it, fail then abort. If we keep going, eventually we
> > run the MST hub out of bandwidth and all new allocations will start to
> > fail (or in my case; all new displays just start flickering a ton).
> > 
> > We could just teach drm_dp_update_payload_part1() not to drop the port
> > ref in this case, but then we also need to teach
> > drm_dp_destroy_payload_step1() to do the same thing, then hope no one
> > ever adds anything to the that requires a validated port reference in
> > drm_dp_destroy_connector_work(). Kind of sketchy.
> > 
> > So let's go with a more clever solution: any port that
> > drm_dp_destroy_connector_work() interacts with is guaranteed to still
> > exist in memory until we say so. While said port might not be valid we
> > don't really care: that's the whole reason we're destroying it in the
> > first place! So, teach drm_dp_get_validated_port_ref() to use the all
> > mighty current_work() function to avoid attempting to validate ports
> > from the context of mgr->destroy_connector_work. I can't see any
> > situation where this wouldn't be safe, and this avoids having to play
> > whack-a-mole in the future of trying to work around port validation.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in
> > drm_dp_update_payload_part1()")
> > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com>
> > Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> > Cc: Harry Wentland <Harry.Wentland@amd.com>
> > Cc: <stable@vger.kernel.org> # v4.6+
> 
> Much as I constantly feel we are missing some better way to do all of
> this, this seems like a reasonable fix.
> 
> Reviewed-by: Dave Airlie <airlied@redhat.com>
> 
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 529414556962..08978ad72f33 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port
> > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> >  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct
> > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> >  {
> >         struct drm_dp_mst_port *rport = NULL;
> > +
> >         mutex_lock(&mgr->lock);
> > -       if (mgr->mst_primary)
> > -               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > port);
> > +       /*
> > +        * Port may or may not be 'valid' but we don't care about that
> > when
> > +        * destroying the port and we are guaranteed that the port pointer
> > +        * will be valid until we've finished
> > +        */
> > +       if (current_work() == &mgr->destroy_connector_work) {
> > +               kref_get(&port->kref);
> > +               rport = port;
> > +       } else if (mgr->mst_primary) {
> > +               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > +                                                      port);
> > +       }
> >         mutex_unlock(&mgr->lock);
> >         return rport;
> >  }
> > --
> > 2.19.1
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Nov. 22, 2018, 8:34 a.m. UTC | #4
On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul <lyude@redhat.com> wrote:
>
> Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I
> accidentally introduced into DRM two years ago.
>
> Pretend we have a topology like this:
>
> |- DP-1: mst_primary
>    |- DP-4: active display
>    |- DP-5: disconnected
>    |- DP-6: active hub
>       |- DP-7: active display
>       |- DP-8: disconnected
>       |- DP-9: disconnected
>
> If we unplug DP-6, the topology starting at DP-7 will be destroyed but
> it's payloads will live on in DP-1's VCPI allocations and thus require
> removal. However, this removal currently fails because
> drm_dp_update_payload_part1() will (rightly so) try to validate the port
> before accessing it, fail then abort. If we keep going, eventually we
> run the MST hub out of bandwidth and all new allocations will start to
> fail (or in my case; all new displays just start flickering a ton).
>
> We could just teach drm_dp_update_payload_part1() not to drop the port
> ref in this case, but then we also need to teach
> drm_dp_destroy_payload_step1() to do the same thing, then hope no one
> ever adds anything to the that requires a validated port reference in
> drm_dp_destroy_connector_work(). Kind of sketchy.
>
> So let's go with a more clever solution: any port that
> drm_dp_destroy_connector_work() interacts with is guaranteed to still
> exist in memory until we say so. While said port might not be valid we
> don't really care: that's the whole reason we're destroying it in the
> first place! So, teach drm_dp_get_validated_port_ref() to use the all
> mighty current_work() function to avoid attempting to validate ports
> from the context of mgr->destroy_connector_work. I can't see any
> situation where this wouldn't be safe, and this avoids having to play
> whack-a-mole in the future of trying to work around port validation.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
> Reported-by: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Harry Wentland <Harry.Wentland@amd.com>
> Cc: <stable@vger.kernel.org> # v4.6+

Hm, sounds very similar to the bug I pointed out in your "make vcpi
alloc more atomic" series. Will this all interact nicely with the
solution we've worked out there (where we need to delay at least the
kfree(port) until the last vcpi allocation is released?
-Daniel

> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 529414556962..08978ad72f33 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
>  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
>         struct drm_dp_mst_port *rport = NULL;
> +
>         mutex_lock(&mgr->lock);
> -       if (mgr->mst_primary)
> -               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
> +       /*
> +        * Port may or may not be 'valid' but we don't care about that when
> +        * destroying the port and we are guaranteed that the port pointer
> +        * will be valid until we've finished
> +        */
> +       if (current_work() == &mgr->destroy_connector_work) {
> +               kref_get(&port->kref);
> +               rport = port;
> +       } else if (mgr->mst_primary) {
> +               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> +                                                      port);
> +       }
>         mutex_unlock(&mgr->lock);
>         return rport;
>  }
> --
> 2.19.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Lyude Paul Nov. 26, 2018, 8:37 p.m. UTC | #5
On Thu, 2018-11-22 at 09:34 +0100, Daniel Vetter wrote:
> On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul <lyude@redhat.com> wrote:
> > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I
> > accidentally introduced into DRM two years ago.
> > 
> > Pretend we have a topology like this:
> > 
> > > - DP-1: mst_primary
> >    |- DP-4: active display
> >    |- DP-5: disconnected
> >    |- DP-6: active hub
> >       |- DP-7: active display
> >       |- DP-8: disconnected
> >       |- DP-9: disconnected
> > 
> > If we unplug DP-6, the topology starting at DP-7 will be destroyed but
> > it's payloads will live on in DP-1's VCPI allocations and thus require
> > removal. However, this removal currently fails because
> > drm_dp_update_payload_part1() will (rightly so) try to validate the port
> > before accessing it, fail then abort. If we keep going, eventually we
> > run the MST hub out of bandwidth and all new allocations will start to
> > fail (or in my case; all new displays just start flickering a ton).
> > 
> > We could just teach drm_dp_update_payload_part1() not to drop the port
> > ref in this case, but then we also need to teach
> > drm_dp_destroy_payload_step1() to do the same thing, then hope no one
> > ever adds anything to the that requires a validated port reference in
> > drm_dp_destroy_connector_work(). Kind of sketchy.
> > 
> > So let's go with a more clever solution: any port that
> > drm_dp_destroy_connector_work() interacts with is guaranteed to still
> > exist in memory until we say so. While said port might not be valid we
> > don't really care: that's the whole reason we're destroying it in the
> > first place! So, teach drm_dp_get_validated_port_ref() to use the all
> > mighty current_work() function to avoid attempting to validate ports
> > from the context of mgr->destroy_connector_work. I can't see any
> > situation where this wouldn't be safe, and this avoids having to play
> > whack-a-mole in the future of trying to work around port validation.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in
> > drm_dp_update_payload_part1()")
> > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com>
> > Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> > Cc: Harry Wentland <Harry.Wentland@amd.com>
> > Cc: <stable@vger.kernel.org> # v4.6+
> 
> Hm, sounds very similar to the bug I pointed out in your "make vcpi
> alloc more atomic" series. Will this all interact nicely with the
> solution we've worked out there (where we need to delay at least the
> kfree(port) until the last vcpi allocation is released?
Yes, it seems to work fine in my tests

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 529414556962..08978ad72f33 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port
> > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> >  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct
> > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> >  {
> >         struct drm_dp_mst_port *rport = NULL;
> > +
> >         mutex_lock(&mgr->lock);
> > -       if (mgr->mst_primary)
> > -               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > port);
> > +       /*
> > +        * Port may or may not be 'valid' but we don't care about that
> > when
> > +        * destroying the port and we are guaranteed that the port pointer
> > +        * will be valid until we've finished
> > +        */
> > +       if (current_work() == &mgr->destroy_connector_work) {
> > +               kref_get(&port->kref);
> > +               rport = port;
> > +       } else if (mgr->mst_primary) {
> > +               rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > +                                                      port);
> > +       }
> >         mutex_unlock(&mgr->lock);
> >         return rport;
> >  }
> > --
> > 2.19.1
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 529414556962..08978ad72f33 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1023,9 +1023,20 @@  static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
 static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
 {
 	struct drm_dp_mst_port *rport = NULL;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary)
-		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
+	/*
+	 * Port may or may not be 'valid' but we don't care about that when
+	 * destroying the port and we are guaranteed that the port pointer
+	 * will be valid until we've finished
+	 */
+	if (current_work() == &mgr->destroy_connector_work) {
+		kref_get(&port->kref);
+		rport = port;
+	} else if (mgr->mst_primary) {
+		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
+						       port);
+	}
 	mutex_unlock(&mgr->lock);
 	return rport;
 }