Message ID | 20181113224613.28809-1-lyude@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/dp_mst: Skip validating ports during destruction, just ref | expand |
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: 263efde31f97 drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1(). The bot has tested the following trees: v4.19.2, v4.18.19, v4.14.81, v4.9.137, v4.4.163, v3.18.125. v4.19.2: Build OK! v4.18.19: Build OK! v4.14.81: Build OK! v4.9.137: Build OK! v4.4.163: Build OK! v3.18.125: Build failed! Errors: drivers/gpu/drm/drm_dp_mst_topology.c:944:6: error: implicit declaration of function ‘current_work’; did you mean ‘current_umask’? [-Werror=implicit-function-declaration] drivers/gpu/drm/drm_dp_mst_topology.c:944:28: error: ‘struct drm_dp_mst_topology_mgr’ has no member named ‘destroy_connector_work’ -- Thanks, Sasha
On Wed, 14 Nov 2018 at 08:47, Lyude Paul <lyude@redhat.com> wrote: > > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I > accidentally introduced into DRM two years ago. > > Pretend we have a topology like this: > > |- DP-1: mst_primary > |- DP-4: active display > |- DP-5: disconnected > |- DP-6: active hub > |- DP-7: active display > |- DP-8: disconnected > |- DP-9: disconnected > > If we unplug DP-6, the topology starting at DP-7 will be destroyed but > it's payloads will live on in DP-1's VCPI allocations and thus require > removal. However, this removal currently fails because > drm_dp_update_payload_part1() will (rightly so) try to validate the port > before accessing it, fail then abort. If we keep going, eventually we > run the MST hub out of bandwidth and all new allocations will start to > fail (or in my case; all new displays just start flickering a ton). > > We could just teach drm_dp_update_payload_part1() not to drop the port > ref in this case, but then we also need to teach > drm_dp_destroy_payload_step1() to do the same thing, then hope no one > ever adds anything to the that requires a validated port reference in > drm_dp_destroy_connector_work(). Kind of sketchy. > > So let's go with a more clever solution: any port that > drm_dp_destroy_connector_work() interacts with is guaranteed to still > exist in memory until we say so. While said port might not be valid we > don't really care: that's the whole reason we're destroying it in the > first place! So, teach drm_dp_get_validated_port_ref() to use the all > mighty current_work() function to avoid attempting to validate ports > from the context of mgr->destroy_connector_work. I can't see any > situation where this wouldn't be safe, and this avoids having to play > whack-a-mole in the future of trying to work around port validation. > > Signed-off-by: Lyude Paul <lyude@redhat.com> > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com> > Cc: Jerry Zuo <Jerry.Zuo@amd.com> > Cc: Harry Wentland <Harry.Wentland@amd.com> > Cc: <stable@vger.kernel.org> # v4.6+ Much as I constantly feel we are missing some better way to do all of this, this seems like a reasonable fix. Reviewed-by: Dave Airlie <airlied@redhat.com> > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c > index 529414556962..08978ad72f33 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > { > struct drm_dp_mst_port *rport = NULL; > + > mutex_lock(&mgr->lock); > - if (mgr->mst_primary) > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); > + /* > + * Port may or may not be 'valid' but we don't care about that when > + * destroying the port and we are guaranteed that the port pointer > + * will be valid until we've finished > + */ > + if (current_work() == &mgr->destroy_connector_work) { > + kref_get(&port->kref); > + rport = port; > + } else if (mgr->mst_primary) { > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > + port); > + } > mutex_unlock(&mgr->lock); > return rport; > } > -- > 2.19.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Pushed to drm-misc-fixes, thanks! On Thu, 2018-11-22 at 06:08 +1000, Dave Airlie wrote: > On Wed, 14 Nov 2018 at 08:47, Lyude Paul <lyude@redhat.com> wrote: > > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I > > accidentally introduced into DRM two years ago. > > > > Pretend we have a topology like this: > > > > > - DP-1: mst_primary > > |- DP-4: active display > > |- DP-5: disconnected > > |- DP-6: active hub > > |- DP-7: active display > > |- DP-8: disconnected > > |- DP-9: disconnected > > > > If we unplug DP-6, the topology starting at DP-7 will be destroyed but > > it's payloads will live on in DP-1's VCPI allocations and thus require > > removal. However, this removal currently fails because > > drm_dp_update_payload_part1() will (rightly so) try to validate the port > > before accessing it, fail then abort. If we keep going, eventually we > > run the MST hub out of bandwidth and all new allocations will start to > > fail (or in my case; all new displays just start flickering a ton). > > > > We could just teach drm_dp_update_payload_part1() not to drop the port > > ref in this case, but then we also need to teach > > drm_dp_destroy_payload_step1() to do the same thing, then hope no one > > ever adds anything to the that requires a validated port reference in > > drm_dp_destroy_connector_work(). Kind of sketchy. > > > > So let's go with a more clever solution: any port that > > drm_dp_destroy_connector_work() interacts with is guaranteed to still > > exist in memory until we say so. While said port might not be valid we > > don't really care: that's the whole reason we're destroying it in the > > first place! So, teach drm_dp_get_validated_port_ref() to use the all > > mighty current_work() function to avoid attempting to validate ports > > from the context of mgr->destroy_connector_work. I can't see any > > situation where this wouldn't be safe, and this avoids having to play > > whack-a-mole in the future of trying to work around port validation. > > > > Signed-off-by: Lyude Paul <lyude@redhat.com> > > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in > > drm_dp_update_payload_part1()") > > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com> > > Cc: Jerry Zuo <Jerry.Zuo@amd.com> > > Cc: Harry Wentland <Harry.Wentland@amd.com> > > Cc: <stable@vger.kernel.org> # v4.6+ > > Much as I constantly feel we are missing some better way to do all of > this, this seems like a reasonable fix. > > Reviewed-by: Dave Airlie <airlied@redhat.com> > > > --- > > drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > index 529414556962..08978ad72f33 100644 > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port > > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > > { > > struct drm_dp_mst_port *rport = NULL; > > + > > mutex_lock(&mgr->lock); > > - if (mgr->mst_primary) > > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > > port); > > + /* > > + * Port may or may not be 'valid' but we don't care about that > > when > > + * destroying the port and we are guaranteed that the port pointer > > + * will be valid until we've finished > > + */ > > + if (current_work() == &mgr->destroy_connector_work) { > > + kref_get(&port->kref); > > + rport = port; > > + } else if (mgr->mst_primary) { > > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > > + port); > > + } > > mutex_unlock(&mgr->lock); > > return rport; > > } > > -- > > 2.19.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul <lyude@redhat.com> wrote: > > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I > accidentally introduced into DRM two years ago. > > Pretend we have a topology like this: > > |- DP-1: mst_primary > |- DP-4: active display > |- DP-5: disconnected > |- DP-6: active hub > |- DP-7: active display > |- DP-8: disconnected > |- DP-9: disconnected > > If we unplug DP-6, the topology starting at DP-7 will be destroyed but > it's payloads will live on in DP-1's VCPI allocations and thus require > removal. However, this removal currently fails because > drm_dp_update_payload_part1() will (rightly so) try to validate the port > before accessing it, fail then abort. If we keep going, eventually we > run the MST hub out of bandwidth and all new allocations will start to > fail (or in my case; all new displays just start flickering a ton). > > We could just teach drm_dp_update_payload_part1() not to drop the port > ref in this case, but then we also need to teach > drm_dp_destroy_payload_step1() to do the same thing, then hope no one > ever adds anything to the that requires a validated port reference in > drm_dp_destroy_connector_work(). Kind of sketchy. > > So let's go with a more clever solution: any port that > drm_dp_destroy_connector_work() interacts with is guaranteed to still > exist in memory until we say so. While said port might not be valid we > don't really care: that's the whole reason we're destroying it in the > first place! So, teach drm_dp_get_validated_port_ref() to use the all > mighty current_work() function to avoid attempting to validate ports > from the context of mgr->destroy_connector_work. I can't see any > situation where this wouldn't be safe, and this avoids having to play > whack-a-mole in the future of trying to work around port validation. > > Signed-off-by: Lyude Paul <lyude@redhat.com> > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com> > Cc: Jerry Zuo <Jerry.Zuo@amd.com> > Cc: Harry Wentland <Harry.Wentland@amd.com> > Cc: <stable@vger.kernel.org> # v4.6+ Hm, sounds very similar to the bug I pointed out in your "make vcpi alloc more atomic" series. Will this all interact nicely with the solution we've worked out there (where we need to delay at least the kfree(port) until the last vcpi allocation is released? -Daniel > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c > index 529414556962..08978ad72f33 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > { > struct drm_dp_mst_port *rport = NULL; > + > mutex_lock(&mgr->lock); > - if (mgr->mst_primary) > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); > + /* > + * Port may or may not be 'valid' but we don't care about that when > + * destroying the port and we are guaranteed that the port pointer > + * will be valid until we've finished > + */ > + if (current_work() == &mgr->destroy_connector_work) { > + kref_get(&port->kref); > + rport = port; > + } else if (mgr->mst_primary) { > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > + port); > + } > mutex_unlock(&mgr->lock); > return rport; > } > -- > 2.19.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, 2018-11-22 at 09:34 +0100, Daniel Vetter wrote: > On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul <lyude@redhat.com> wrote: > > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I > > accidentally introduced into DRM two years ago. > > > > Pretend we have a topology like this: > > > > > - DP-1: mst_primary > > |- DP-4: active display > > |- DP-5: disconnected > > |- DP-6: active hub > > |- DP-7: active display > > |- DP-8: disconnected > > |- DP-9: disconnected > > > > If we unplug DP-6, the topology starting at DP-7 will be destroyed but > > it's payloads will live on in DP-1's VCPI allocations and thus require > > removal. However, this removal currently fails because > > drm_dp_update_payload_part1() will (rightly so) try to validate the port > > before accessing it, fail then abort. If we keep going, eventually we > > run the MST hub out of bandwidth and all new allocations will start to > > fail (or in my case; all new displays just start flickering a ton). > > > > We could just teach drm_dp_update_payload_part1() not to drop the port > > ref in this case, but then we also need to teach > > drm_dp_destroy_payload_step1() to do the same thing, then hope no one > > ever adds anything to the that requires a validated port reference in > > drm_dp_destroy_connector_work(). Kind of sketchy. > > > > So let's go with a more clever solution: any port that > > drm_dp_destroy_connector_work() interacts with is guaranteed to still > > exist in memory until we say so. While said port might not be valid we > > don't really care: that's the whole reason we're destroying it in the > > first place! So, teach drm_dp_get_validated_port_ref() to use the all > > mighty current_work() function to avoid attempting to validate ports > > from the context of mgr->destroy_connector_work. I can't see any > > situation where this wouldn't be safe, and this avoids having to play > > whack-a-mole in the future of trying to work around port validation. > > > > Signed-off-by: Lyude Paul <lyude@redhat.com> > > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in > > drm_dp_update_payload_part1()") > > Reported-by: Jerry Zuo <Jerry.Zuo@amd.com> > > Cc: Jerry Zuo <Jerry.Zuo@amd.com> > > Cc: Harry Wentland <Harry.Wentland@amd.com> > > Cc: <stable@vger.kernel.org> # v4.6+ > > Hm, sounds very similar to the bug I pointed out in your "make vcpi > alloc more atomic" series. Will this all interact nicely with the > solution we've worked out there (where we need to delay at least the > kfree(port) until the last vcpi allocation is released? Yes, it seems to work fine in my tests > -Daniel > > > --- > > drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > index 529414556962..08978ad72f33 100644 > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port > > *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > > { > > struct drm_dp_mst_port *rport = NULL; > > + > > mutex_lock(&mgr->lock); > > - if (mgr->mst_primary) > > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > > port); > > + /* > > + * Port may or may not be 'valid' but we don't care about that > > when > > + * destroying the port and we are guaranteed that the port pointer > > + * will be valid until we've finished > > + */ > > + if (current_work() == &mgr->destroy_connector_work) { > > + kref_get(&port->kref); > > + rport = port; > > + } else if (mgr->mst_primary) { > > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > > + port); > > + } > > mutex_unlock(&mgr->lock); > > return rport; > > } > > -- > > 2.19.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > >
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL; + mutex_lock(&mgr->lock); - if (mgr->mst_primary) - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); + /* + * Port may or may not be 'valid' but we don't care about that when + * destroying the port and we are guaranteed that the port pointer + * will be valid until we've finished + */ + if (current_work() == &mgr->destroy_connector_work) { + kref_get(&port->kref); + rport = port; + } else if (mgr->mst_primary) { + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, + port); + } mutex_unlock(&mgr->lock); return rport; }
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago. Pretend we have a topology like this: |- DP-1: mst_primary |- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton). We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy. So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo <Jerry.Zuo@amd.com> Cc: Jerry Zuo <Jerry.Zuo@amd.com> Cc: Harry Wentland <Harry.Wentland@amd.com> Cc: <stable@vger.kernel.org> # v4.6+ --- drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)