diff mbox

[igt] lib/kms: Force a full reprobe if we find a bad link

Message ID 20170526114846.8869-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 26, 2017, 11:48 a.m. UTC
If we do a shallow probe of the connector and it reports the link failed
previous (link-status != GOOD), force a full probe of the connector to
give the kernel a chance to validate the mode list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_kms.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Chris Wilson May 26, 2017, 11:54 a.m. UTC | #1
On Fri, May 26, 2017 at 12:48:46PM +0100, Chris Wilson wrote:
> If we do a shallow probe of the connector and it reports the link failed
> previous (link-status != GOOD), force a full probe of the connector to
> give the kernel a chance to validate the mode list.

Do we also need to do a SetCrtc on top?
-Chris
Martin Peres May 31, 2017, 10:40 a.m. UTC | #2
On 26/05/17 14:48, Chris Wilson wrote:
> If we do a shallow probe of the connector and it reports the link failed
> previous (link-status != GOOD), force a full probe of the connector to
> give the kernel a chance to validate the mode list.

Sounds good, but will this make the tests SKIP if no modes are available?
Chris Wilson May 31, 2017, 12:42 p.m. UTC | #3
On Wed, May 31, 2017 at 01:40:00PM +0300, Martin Peres wrote:
> On 26/05/17 14:48, Chris Wilson wrote:
> >If we do a shallow probe of the connector and it reports the link failed
> >previous (link-status != GOOD), force a full probe of the connector to
> >give the kernel a chance to validate the mode list.
> 
> Sounds good, but will this make the tests SKIP if no modes are available?

I'm actually not sure what will happen if the mode is removed. I think
the tests are just using the first mode in the list? At the moment I
hope just to stop turning a single failure into many, it is still a bug
that the link training failed and was not recovered. Alternatively, we
can ask why isn't the kernel taking the corrective action when presented
with a new setcrtc?

I'm not sure what the correct approach here should be, just what is the
contract the kernel is expecting of userspace? Should that contract
apply to new clients unaware of the earlier error?
-Chris
Martin Peres May 31, 2017, 1:44 p.m. UTC | #4
On 31/05/17 15:42, Chris Wilson wrote:
> On Wed, May 31, 2017 at 01:40:00PM +0300, Martin Peres wrote:
>> On 26/05/17 14:48, Chris Wilson wrote:
>>> If we do a shallow probe of the connector and it reports the link failed
>>> previous (link-status != GOOD), force a full probe of the connector to
>>> give the kernel a chance to validate the mode list.
>>
>> Sounds good, but will this make the tests SKIP if no modes are available?
> 
> I'm actually not sure what will happen if the mode is removed. I think
> the tests are just using the first mode in the list? At the moment I
> hope just to stop turning a single failure into many, it is still a bug
> that the link training failed and was not recovered. Alternatively, we
> can ask why isn't the kernel taking the corrective action when presented
> with a new setcrtc?

No, this is not a kernel bug, it is a failure that the userspace has to 
handle because the kernel can't do shit about this.

> 
> I'm not sure what the correct approach here should be, just what is the
> contract the kernel is expecting of userspace? Should that contract
> apply to new clients unaware of the earlier error?

Right, IGT assumes that if a mode is already set, it can be set again. 
However, this assumption has been broken when the link-status patches 
landed.

On a hotplug event, IGT should do a full reprobe, select one mode from 
the list and use it. If no modes can be set and the test is trying to 
set one, then the test should just SKIP.
Chris Wilson May 31, 2017, 1:55 p.m. UTC | #5
On Wed, May 31, 2017 at 04:44:41PM +0300, Martin Peres wrote:
> On 31/05/17 15:42, Chris Wilson wrote:
> >On Wed, May 31, 2017 at 01:40:00PM +0300, Martin Peres wrote:
> >>On 26/05/17 14:48, Chris Wilson wrote:
> >>>If we do a shallow probe of the connector and it reports the link failed
> >>>previous (link-status != GOOD), force a full probe of the connector to
> >>>give the kernel a chance to validate the mode list.
> >>
> >>Sounds good, but will this make the tests SKIP if no modes are available?
> >
> >I'm actually not sure what will happen if the mode is removed. I think
> >the tests are just using the first mode in the list? At the moment I
> >hope just to stop turning a single failure into many, it is still a bug
> >that the link training failed and was not recovered. Alternatively, we
> >can ask why isn't the kernel taking the corrective action when presented
> >with a new setcrtc?
> 
> No, this is not a kernel bug, it is a failure that the userspace has
> to handle because the kernel can't do shit about this.

Have you demonstrated that the kernel is beyond reproach when it failed
the link training? Nothing changed in the connection and it works most
of the time, so why did the kernel accept the failure. Even if we
temporarily force a change of modes that is poor UX that I see no reason
why it should not have been prevented in the first place.

> >I'm not sure what the correct approach here should be, just what is the
> >contract the kernel is expecting of userspace? Should that contract
> >apply to new clients unaware of the earlier error?
> 
> Right, IGT assumes that if a mode is already set, it can be set
> again. However, this assumption has been broken when the link-status
> patches landed.
> 
> On a hotplug event, IGT should do a full reprobe, select one mode
> from the list and use it. If no modes can be set and the test is
> trying to set one, then the test should just SKIP.

There is no hotplug event when a new client starts so how is igt meant
to even know that it was supposed to pick up the pieces for the kernel.
-Chris
Martin Peres May 31, 2017, 2:45 p.m. UTC | #6
On 31/05/17 16:55, Chris Wilson wrote:
> On Wed, May 31, 2017 at 04:44:41PM +0300, Martin Peres wrote:
>> On 31/05/17 15:42, Chris Wilson wrote:
>>> On Wed, May 31, 2017 at 01:40:00PM +0300, Martin Peres wrote:
>>>> On 26/05/17 14:48, Chris Wilson wrote:
>>>>> If we do a shallow probe of the connector and it reports the link failed
>>>>> previous (link-status != GOOD), force a full probe of the connector to
>>>>> give the kernel a chance to validate the mode list.
>>>>
>>>> Sounds good, but will this make the tests SKIP if no modes are available?
>>>
>>> I'm actually not sure what will happen if the mode is removed. I think
>>> the tests are just using the first mode in the list? At the moment I
>>> hope just to stop turning a single failure into many, it is still a bug
>>> that the link training failed and was not recovered. Alternatively, we
>>> can ask why isn't the kernel taking the corrective action when presented
>>> with a new setcrtc?
>>
>> No, this is not a kernel bug, it is a failure that the userspace has
>> to handle because the kernel can't do shit about this.
> 
> Have you demonstrated that the kernel is beyond reproach when it failed
> the link training? Nothing changed in the connection and it works most
> of the time, so why did the kernel accept the failure. Even if we
> temporarily force a change of modes that is poor UX that I see no reason
> why it should not have been prevented in the first place.

Sorry, this is not what I meant. What I meant is that the kernel is 
allowed to have this behaviour.

I agree though that in the case of the skl bug, it is quite likely that 
the kernel is doing something dodgy, but this is another bug. IGT should 
learn to cope with modes disappearing.

> 
>>> I'm not sure what the correct approach here should be, just what is the
>>> contract the kernel is expecting of userspace? Should that contract
>>> apply to new clients unaware of the earlier error?
>>
>> Right, IGT assumes that if a mode is already set, it can be set
>> again. However, this assumption has been broken when the link-status
>> patches landed.
>>
>> On a hotplug event, IGT should do a full reprobe, select one mode
>> from the list and use it. If no modes can be set and the test is
>> trying to set one, then the test should just SKIP.
> 
> There is no hotplug event when a new client starts so how is igt meant
> to even know that it was supposed to pick up the pieces for the kernel.

Yes.
Martin Peres June 7, 2017, 11:13 a.m. UTC | #7
On 31/05/17 17:45, Martin Peres wrote:
> On 31/05/17 16:55, Chris Wilson wrote:
>> On Wed, May 31, 2017 at 04:44:41PM +0300, Martin Peres wrote:
>>> On 31/05/17 15:42, Chris Wilson wrote:
>>>> On Wed, May 31, 2017 at 01:40:00PM +0300, Martin Peres wrote:
>>>>> On 26/05/17 14:48, Chris Wilson wrote:
>>>>>> If we do a shallow probe of the connector and it reports the link 
>>>>>> failed
>>>>>> previous (link-status != GOOD), force a full probe of the 
>>>>>> connector to
>>>>>> give the kernel a chance to validate the mode list.
>>>>>
>>>>> Sounds good, but will this make the tests SKIP if no modes are 
>>>>> available?
>>>>
>>>> I'm actually not sure what will happen if the mode is removed. I think
>>>> the tests are just using the first mode in the list? At the moment I
>>>> hope just to stop turning a single failure into many, it is still a bug
>>>> that the link training failed and was not recovered. Alternatively, we
>>>> can ask why isn't the kernel taking the corrective action when 
>>>> presented
>>>> with a new setcrtc?
>>>
>>> No, this is not a kernel bug, it is a failure that the userspace has
>>> to handle because the kernel can't do shit about this.
>>
>> Have you demonstrated that the kernel is beyond reproach when it failed
>> the link training? Nothing changed in the connection and it works most
>> of the time, so why did the kernel accept the failure. Even if we
>> temporarily force a change of modes that is poor UX that I see no reason
>> why it should not have been prevented in the first place.
> 
> Sorry, this is not what I meant. What I meant is that the kernel is 
> allowed to have this behaviour.
> 
> I agree though that in the case of the skl bug, it is quite likely that 
> the kernel is doing something dodgy, but this is another bug. IGT should 
> learn to cope with modes disappearing.
> 
>>
>>>> I'm not sure what the correct approach here should be, just what is the
>>>> contract the kernel is expecting of userspace? Should that contract
>>>> apply to new clients unaware of the earlier error?
>>>
>>> Right, IGT assumes that if a mode is already set, it can be set
>>> again. However, this assumption has been broken when the link-status
>>> patches landed.
>>>
>>> On a hotplug event, IGT should do a full reprobe, select one mode
>>> from the list and use it. If no modes can be set and the test is
>>> trying to set one, then the test should just SKIP.
>>
>> There is no hotplug event when a new client starts so how is igt meant
>> to even know that it was supposed to pick up the pieces for the kernel.
> 
> Yes.

How about this: When the modeset call fails, check if the link-status is 
BAD. If not, return a FAIL. If so, force a full re-probe, pick the 
highest available mode and try again. Do this until a mode applies. If 
no modes are left, just SKIP the test altogether.

Does this sound reasonable?

Martin
Chris Wilson June 7, 2017, 11:33 a.m. UTC | #8
Quoting Martin Peres (2017-06-07 12:13:24)
> How about this: When the modeset call fails, check if the link-status is 
> BAD. If not, return a FAIL. If so, force a full re-probe, pick the 
> highest available mode and try again. Do this until a mode applies. If 
> no modes are left, just SKIP the test altogether.
> 
> Does this sound reasonable?

The problem here is that we need to loop back to the test for it to
decide on the next mode. In most cases we don't care, but igt_kms.c
doesn't know this. But if e.g. we have a CRC computed for one size, it
needs to be swapped out for the new mode.
-Chris
Martin Peres June 7, 2017, 11:58 a.m. UTC | #9
On 07/06/17 14:33, Chris Wilson wrote:
> Quoting Martin Peres (2017-06-07 12:13:24)
>> How about this: When the modeset call fails, check if the link-status is
>> BAD. If not, return a FAIL. If so, force a full re-probe, pick the
>> highest available mode and try again. Do this until a mode applies. If
>> no modes are left, just SKIP the test altogether.
>>
>> Does this sound reasonable?
> 
> The problem here is that we need to loop back to the test for it to
> decide on the next mode. In most cases we don't care, but igt_kms.c
> doesn't know this. But if e.g. we have a CRC computed for one size, it
> needs to be swapped out for the new mode.

Oh dear, isn't life fun?
Marta Lofstedt June 7, 2017, 12:26 p.m. UTC | #10
Martin, the kms_flip test already skips when we have entered the "no modes available" state. 
I talked with Petri a bit about this and we sort of agree that IGT should only skip tests on an "expected" lack of HW/SW requirements. IGT should not skip on bad states that has been created by the test itself or other tests.

/Marta

> -----Original Message-----

> From: Martin Peres [mailto:martin.peres@linux.intel.com]

> Sent: Wednesday, June 7, 2017 2:59 PM

> To: Chris Wilson <chris@chris-wilson.co.uk>; intel-gfx@lists.freedesktop.org;

> Lofstedt, Marta <marta.lofstedt@intel.com>

> Subject: Re: [Intel-gfx] [PATCH igt] lib/kms: Force a full reprobe if we find a

> bad link

> 

> On 07/06/17 14:33, Chris Wilson wrote:

> > Quoting Martin Peres (2017-06-07 12:13:24)

> >> How about this: When the modeset call fails, check if the link-status

> >> is BAD. If not, return a FAIL. If so, force a full re-probe, pick the

> >> highest available mode and try again. Do this until a mode applies.

> >> If no modes are left, just SKIP the test altogether.

> >>

> >> Does this sound reasonable?

> >

> > The problem here is that we need to loop back to the test for it to

> > decide on the next mode. In most cases we don't care, but igt_kms.c

> > doesn't know this. But if e.g. we have a CRC computed for one size, it

> > needs to be swapped out for the new mode.

> 

> Oh dear, isn't life fun?
diff mbox

Patch

diff --git a/lib/igt_kms.c b/lib/igt_kms.c
index f7758458..5f2adbdb 100644
--- a/lib/igt_kms.c
+++ b/lib/igt_kms.c
@@ -852,6 +852,26 @@  _kmstest_connector_config_find_encoder(int drm_fd, drmModeConnector *connector,
 	return NULL;
 }
 
+static bool connector_check_link_status(int fd, drmModeConnector *connector)
+{
+	for (int i = 0; i < connector->count_props; i++) {
+		struct drm_mode_get_property prop;
+
+		prop.prop_id = connector->props[i];
+		prop.count_values = 0;
+		prop.count_enum_blobs = 0;
+		if (drmIoctl(fd, DRM_IOCTL_MODE_GETPROPERTY, &prop))
+			continue;
+
+		if (strcmp(prop.name, "link-status"))
+			continue;
+
+		return connector->prop_values[i] == 0;
+	}
+
+	return true;
+}
+
 /**
  * _kmstest_connector_config:
  * @drm_fd: DRM fd
@@ -894,6 +914,15 @@  static bool _kmstest_connector_config(int drm_fd, uint32_t connector_id,
 		goto err3;
 	}
 
+	if (!probe && !connector_check_link_status(drm_fd, connector)) {
+		drmModeFreeConnector(connector);
+		connector = drmModeGetConnector(drm_fd, connector_id);
+		if (!connector)
+			goto err2;
+
+		igt_assert(connector->connector_id == connector_id);
+	}
+
 	/*
 	 * Find given CRTC if crtc_id != 0 or else the first CRTC not in use.
 	 * In both cases find the first compatible encoder and skip the CRTC