Message ID | 20170718151627.29641-2-paul.kocialkowski@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Paul Kocialkowski (2017-07-18 16:16:26) > It may occur that a hotplug uevent is detected at resume, even though it > does not indicate that an actual hotplug happened. This is the case when > link training fails on any other connector. > > There is currently no way to distinguish what connector caused a hotplug > uevent, nor what the reason for that uevent really is. This makes it > impossible to find out whether the test actually passed or not. And you may get more than one and then this skips even though the test passed. Looks like the patch is overcompensating. What you can do is repeat the test a few times, and then look at all the different errors you get. If the connector remains (no mst disappareance) once it goes bad, it should remain bad and so not generate any new uevent. Or you only repeat the test whilst link_status[old] != link_status[new]. -Chris
On Tue, 2017-07-18 at 22:21 +0100, Chris Wilson wrote: > Quoting Paul Kocialkowski (2017-07-18 16:16:26) > > It may occur that a hotplug uevent is detected at resume, even > > though it > > does not indicate that an actual hotplug happened. This is the case > > when > > link training fails on any other connector. > > > > There is currently no way to distinguish what connector caused a > > hotplug > > uevent, nor what the reason for that uevent really is. This makes it > > impossible to find out whether the test actually passed or not. > > And you may get more than one and then this skips even though the test > passed. Looks like the patch is overcompensating. What you can do is > repeat the test a few times, and then look at all the different errors > you get. If the connector remains (no mst disappareance) once it goes > bad, it should remain bad and so not generate any new uevent. Or you > only repeat the test whilst link_status[old] != link_status[new]. I am not sure it is really desirable to repeat the test until we are fairly certain it succeeds. This involves suspend/resume, that is already long enough as it is. Also, a uevent will be generated everytime link training fails, regardless of whether it was already failing before (I just tested that to make sure). In my case, it's due to a DP-VGA bridge that will consistently fail link training in the first seconds after resume. So this is actually even worse that I thought, because there is no way to find out that this is why a uevent was generated if the link status was already bad before. So I don't see how we can manage with the current information at disposal. My main point here is that we need more information about what's going on than simply "HOTPLUG=1". These patches demonstrate that working around the lack of information is a pain for testing purposes and can only leads to semi-working hackish workarounds. Do you agree that this is what the problem really is?
On Wed, 2017-07-19 at 11:31 +0300, Paul Kocialkowski wrote: > On Tue, 2017-07-18 at 22:21 +0100, Chris Wilson wrote: > > Quoting Paul Kocialkowski (2017-07-18 16:16:26) > > > It may occur that a hotplug uevent is detected at resume, even > > > though it > > > does not indicate that an actual hotplug happened. This is the > > > case > > > when > > > link training fails on any other connector. > > > > > > There is currently no way to distinguish what connector caused a > > > hotplug > > > uevent, nor what the reason for that uevent really is. This makes > > > it > > > impossible to find out whether the test actually passed or not. > > > > And you may get more than one and then this skips even though the > > test > > passed. Looks like the patch is overcompensating. What you can do > > is > > repeat the test a few times, and then look at all the different > > errors > > you get. If the connector remains (no mst disappareance) once it > > goes > > bad, it should remain bad and so not generate any new uevent. Or > > you > > only repeat the test whilst link_status[old] != link_status[new]. > > I am not sure it is really desirable to repeat the test until we are > fairly certain it succeeds. This involves suspend/resume, that is > already long enough as it is. > > Also, a uevent will be generated everytime link training fails, > regardless of whether it was already failing before (I just tested > that > to make sure). In my case, it's due to a DP-VGA bridge that will > consistently fail link training in the first seconds after resume. > > So this is actually even worse that I thought, because there is no > way > to find out that this is why a uevent was generated if the link > status > was already bad before. > > So I don't see how we can manage with the current information at > disposal. > > My main point here is that we need more information about what's > going > on than simply "HOTPLUG=1". These patches demonstrate that working > around the lack of information is a pain for testing purposes and can > only leads to semi-working hackish workarounds. > > Do you agree that this is what the problem really is? Yes, I agree we need more debugging information for when hotplugs fail. This being said though, the fact that i915 is unconditionally sending hotplugs on resume (this appears to be a hack that they did add to stop from missign hotplug events between suspend/resume) is really what's causing this problem specifically. We really need the debugging stuff me and martin suggested for the kernel, and also more drm helpers to actually do edid checks and that sort of stuff so that we don't have to deal with dirty hacks like this :\. >
diff --git a/tests/chamelium.c b/tests/chamelium.c index e26f0557..8af33aaa 100644 --- a/tests/chamelium.c +++ b/tests/chamelium.c @@ -87,6 +87,31 @@ get_precalculated_crc(struct chamelium_port *port, int w, int h) } static void +get_connectors_link_status_failed(data_t *data, bool *link_status_failed) +{ + drmModeConnector *connector; + uint64_t link_status; + drmModePropertyPtr prop; + int p; + + for (p = 0; p < data->port_count; p++) { + connector = chamelium_port_get_connector(data->chamelium, + data->ports[p], false); + + igt_assert(kmstest_get_property(data->drm_fd, + connector->connector_id, + DRM_MODE_OBJECT_CONNECTOR, + "link-status", NULL, + &link_status, &prop)); + + link_status_failed[p] = link_status == DRM_MODE_LINK_STATUS_BAD; + + drmModeFreeProperty(prop); + drmModeFreeConnector(connector); + } +} + +static void require_connector_present(data_t *data, unsigned int type) { int i; @@ -310,6 +335,8 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port, int alt_edid_id) { struct udev_monitor *mon = igt_watch_hotplug(); + bool link_status_failed[2][data->port_count]; + int p; reset_state(data, port); @@ -326,8 +353,16 @@ test_suspend_resume_edid_change(data_t *data, struct chamelium_port *port, */ chamelium_port_set_edid(data->chamelium, port, alt_edid_id); + get_connectors_link_status_failed(data, link_status_failed[0]); + igt_system_suspend_autoresume(state, test); + igt_assert(igt_hotplug_detected(mon, HOTPLUG_TIMEOUT)); + + get_connectors_link_status_failed(data, link_status_failed[1]); + + for (p = 0; p < data->port_count; p++) + igt_skip_on(!link_status_failed[0][p] && link_status_failed[1][p]); } static igt_output_t *
It may occur that a hotplug uevent is detected at resume, even though it does not indicate that an actual hotplug happened. This is the case when link training fails on any other connector. There is currently no way to distinguish what connector caused a hotplug uevent, nor what the reason for that uevent really is. This makes it impossible to find out whether the test actually passed or not. To circumvent this problem, the link status of each connector is collected before and after suspend and compared to skip the test if the state was good before and turned to bad after resume. This only concerns the EDID change test, where we cannot check the connector state (that is not supposed to have changed). For actual hotplug tests, the tests should be safe since they check each connector's state after receiving the uevent. The situation described here happens with DP-VGA bridges that fail link training after resume, as they need some more time to response on their AUX channel. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@linux.intel.com> --- tests/chamelium.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)