drm: Reduce EDID warnings from DRM_ERROR to DRM_NOTE

Message ID	20170210195913.9878-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: dri-devel@lists.freedesktop.org Subject: [PATCH] drm: Reduce EDID warnings from DRM_ERROR to DRM_NOTE Date: Fri, 10 Feb 2017 19:59:13 +0000 Message-Id: <20170210195913.9878-1-chris@chris-wilson.co.uk> Cc: intel-gfx@lists.freedesktop.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Chris Wilson Feb. 10, 2017, 7:59 p.m. UTC

The warnings from parsing the EDID are not driver errors, but the
"normal but significant" conditions from the external device. As such,
they do not need the ferocity of an *ERROR*, but can use the less harsh
DRM_NOTE instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Thierry Reding Feb. 13, 2017, 7:41 a.m. UTC | #1

On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> The warnings from parsing the EDID are not driver errors, but the
> "normal but significant" conditions from the external device. As such,
> they do not need the ferocity of an *ERROR*, but can use the less harsh
> DRM_NOTE instead.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

The below are all conditions that happen when the EDID is bad. I'm not
sure that really qualifies as "normal".

From a quick look through the code we don't always trigger an error from
the below failure paths at higher levels, so decreasing the level here
has the potential to let this kind of exceptional condition go
unnoticed.

Thierry

Chris Wilson Feb. 13, 2017, 8:59 a.m. UTC | #2

On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> > The warnings from parsing the EDID are not driver errors, but the
> > "normal but significant" conditions from the external device. As such,
> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> > DRM_NOTE instead.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> The below are all conditions that happen when the EDID is bad. I'm not
> sure that really qualifies as "normal".

Often it is - a bad EDID on the monitor will always be bad. The
challenge is distinguishing that from silent data corruption during the
read - a reported read failure are trivial.
 
> From a quick look through the code we don't always trigger an error from
> the below failure paths at higher levels, so decreasing the level here
> has the potential to let this kind of exceptional condition go
> unnoticed.

The messages are not gone, they are higher than the default loglevel,
but now below the level at which they are printed to a terminal. The
bad EDID is either expected or recoverable, and definitely not fatal
so I don't think an *ERROR* is justified.
-Chris

Sean Paul Feb. 13, 2017, 5:17 p.m. UTC | #3

On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
>> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
>> > The warnings from parsing the EDID are not driver errors, but the
>> > "normal but significant" conditions from the external device. As such,
>> > they do not need the ferocity of an *ERROR*, but can use the less harsh
>> > DRM_NOTE instead.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
>> >  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> The below are all conditions that happen when the EDID is bad. I'm not
>> sure that really qualifies as "normal".
>
> Often it is - a bad EDID on the monitor will always be bad. The
> challenge is distinguishing that from silent data corruption during the
> read - a reported read failure are trivial.
>
>> From a quick look through the code we don't always trigger an error from
>> the below failure paths at higher levels, so decreasing the level here
>> has the potential to let this kind of exceptional condition go
>> unnoticed.
>
> The messages are not gone, they are higher than the default loglevel,
> but now below the level at which they are printed to a terminal. The
> bad EDID is either expected or recoverable, and definitely not fatal
> so I don't think an *ERROR* is justified.

I tend to agree.

The description for the KERN_NOTICE level is "normal but significant
condition". I might argue that the presence of these EDID messages
represents a normal *or* significant condition (depending on why the
EDID is bad), but I don't think it's unreasonable to expect people to
check their logs if the display/mode is not working properly.

Sean



> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter Feb. 14, 2017, 9:36 p.m. UTC | #4

On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote:
> On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> >> > The warnings from parsing the EDID are not driver errors, but the
> >> > "normal but significant" conditions from the external device. As such,
> >> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> >> > DRM_NOTE instead.
> >> >
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > ---
> >> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> >> >  1 file changed, 8 insertions(+), 7 deletions(-)
> >>
> >> The below are all conditions that happen when the EDID is bad. I'm not
> >> sure that really qualifies as "normal".
> >
> > Often it is - a bad EDID on the monitor will always be bad. The
> > challenge is distinguishing that from silent data corruption during the
> > read - a reported read failure are trivial.
> >
> >> From a quick look through the code we don't always trigger an error from
> >> the below failure paths at higher levels, so decreasing the level here
> >> has the potential to let this kind of exceptional condition go
> >> unnoticed.
> >
> > The messages are not gone, they are higher than the default loglevel,
> > but now below the level at which they are printed to a terminal. The
> > bad EDID is either expected or recoverable, and definitely not fatal
> > so I don't think an *ERROR* is justified.
> 
> I tend to agree.
> 
> The description for the KERN_NOTICE level is "normal but significant
> condition". I might argue that the presence of these EDID messages
> represents a normal *or* significant condition (depending on why the
> EDID is bad), but I don't think it's unreasonable to expect people to
> check their logs if the display/mode is not working properly.

So for cases where we know that there is shit hw out there (specifically
kvm switches that mangle the cea block without adjusting the edid) we
already tune down the error to debug level. So in principle totally agree
with tuning down anything that happens because it's outside of our control
to info or debug, but do we still need this patch after the cea one has
landed? Our CI at least seems happy ...

Cheers, Daniel

Chris Wilson Feb. 14, 2017, 9:43 p.m. UTC | #5

On Tue, Feb 14, 2017 at 10:36:09PM +0100, Daniel Vetter wrote:
> On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote:
> > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> > >> > The warnings from parsing the EDID are not driver errors, but the
> > >> > "normal but significant" conditions from the external device. As such,
> > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> > >> > DRM_NOTE instead.
> > >> >
> > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >> > ---
> > >> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> > >> >  1 file changed, 8 insertions(+), 7 deletions(-)
> > >>
> > >> The below are all conditions that happen when the EDID is bad. I'm not
> > >> sure that really qualifies as "normal".
> > >
> > > Often it is - a bad EDID on the monitor will always be bad. The
> > > challenge is distinguishing that from silent data corruption during the
> > > read - a reported read failure are trivial.
> > >
> > >> From a quick look through the code we don't always trigger an error from
> > >> the below failure paths at higher levels, so decreasing the level here
> > >> has the potential to let this kind of exceptional condition go
> > >> unnoticed.
> > >
> > > The messages are not gone, they are higher than the default loglevel,
> > > but now below the level at which they are printed to a terminal. The
> > > bad EDID is either expected or recoverable, and definitely not fatal
> > > so I don't think an *ERROR* is justified.
> > 
> > I tend to agree.
> > 
> > The description for the KERN_NOTICE level is "normal but significant
> > condition". I might argue that the presence of these EDID messages
> > represents a normal *or* significant condition (depending on why the
> > EDID is bad), but I don't think it's unreasonable to expect people to
> > check their logs if the display/mode is not working properly.
> 
> So for cases where we know that there is shit hw out there (specifically
> kvm switches that mangle the cea block without adjusting the edid) we
> already tune down the error to debug level. So in principle totally agree
> with tuning down anything that happens because it's outside of our control
> to info or debug, but do we still need this patch after the cea one has
> landed? Our CI at least seems happy ...

Yes. The one machine with a dodgy EDID also happens to have a dodgy
BIOS. This reduces the number of consistent errors to 1, but since an
unrelated error still remains, CI doesn't detect the improvement.
https://intel-gfx-ci.01.org/CI/CI_DRM_2198/fi-skl-6700k/igt@drv_module_reload@basic-reload.html
-Chris

Daniel Vetter Feb. 14, 2017, 10:23 p.m. UTC | #6

On Tue, Feb 14, 2017 at 09:43:45PM +0000, Chris Wilson wrote:
> On Tue, Feb 14, 2017 at 10:36:09PM +0100, Daniel Vetter wrote:
> > On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote:
> > > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> > > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> > > >> > The warnings from parsing the EDID are not driver errors, but the
> > > >> > "normal but significant" conditions from the external device. As such,
> > > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> > > >> > DRM_NOTE instead.
> > > >> >
> > > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > >> > ---
> > > >> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> > > >> >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >>
> > > >> The below are all conditions that happen when the EDID is bad. I'm not
> > > >> sure that really qualifies as "normal".
> > > >
> > > > Often it is - a bad EDID on the monitor will always be bad. The
> > > > challenge is distinguishing that from silent data corruption during the
> > > > read - a reported read failure are trivial.
> > > >
> > > >> From a quick look through the code we don't always trigger an error from
> > > >> the below failure paths at higher levels, so decreasing the level here
> > > >> has the potential to let this kind of exceptional condition go
> > > >> unnoticed.
> > > >
> > > > The messages are not gone, they are higher than the default loglevel,
> > > > but now below the level at which they are printed to a terminal. The
> > > > bad EDID is either expected or recoverable, and definitely not fatal
> > > > so I don't think an *ERROR* is justified.
> > > 
> > > I tend to agree.
> > > 
> > > The description for the KERN_NOTICE level is "normal but significant
> > > condition". I might argue that the presence of these EDID messages
> > > represents a normal *or* significant condition (depending on why the
> > > EDID is bad), but I don't think it's unreasonable to expect people to
> > > check their logs if the display/mode is not working properly.
> > 
> > So for cases where we know that there is shit hw out there (specifically
> > kvm switches that mangle the cea block without adjusting the edid) we
> > already tune down the error to debug level. So in principle totally agree
> > with tuning down anything that happens because it's outside of our control
> > to info or debug, but do we still need this patch after the cea one has
> > landed? Our CI at least seems happy ...
> 
> Yes. The one machine with a dodgy EDID also happens to have a dodgy
> BIOS. This reduces the number of consistent errors to 1, but since an
> unrelated error still remains, CI doesn't detect the improvement.
> https://intel-gfx-ci.01.org/CI/CI_DRM_2198/fi-skl-6700k/igt@drv_module_reload@basic-reload.html

Ok, count my convinced, I pushed the patch to drm-misc-next.
-Daniel

drm: Reduce EDID warnings from DRM_ERROR to DRM_NOTE

Commit Message

Comments

Patch