Message ID | 20170210195913.9878-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: > The warnings from parsing the EDID are not driver errors, but the > "normal but significant" conditions from the external device. As such, > they do not need the ferocity of an *ERROR*, but can use the less harsh > DRM_NOTE instead. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) The below are all conditions that happen when the EDID is bad. I'm not sure that really qualifies as "normal". From a quick look through the code we don't always trigger an error from the below failure paths at higher levels, so decreasing the level here has the potential to let this kind of exceptional condition go unnoticed. Thierry
On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote: > On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: > > The warnings from parsing the EDID are not driver errors, but the > > "normal but significant" conditions from the external device. As such, > > they do not need the ferocity of an *ERROR*, but can use the less harsh > > DRM_NOTE instead. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > The below are all conditions that happen when the EDID is bad. I'm not > sure that really qualifies as "normal". Often it is - a bad EDID on the monitor will always be bad. The challenge is distinguishing that from silent data corruption during the read - a reported read failure are trivial. > From a quick look through the code we don't always trigger an error from > the below failure paths at higher levels, so decreasing the level here > has the potential to let this kind of exceptional condition go > unnoticed. The messages are not gone, they are higher than the default loglevel, but now below the level at which they are printed to a terminal. The bad EDID is either expected or recoverable, and definitely not fatal so I don't think an *ERROR* is justified. -Chris
On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote: >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: >> > The warnings from parsing the EDID are not driver errors, but the >> > "normal but significant" conditions from the external device. As such, >> > they do not need the ferocity of an *ERROR*, but can use the less harsh >> > DRM_NOTE instead. >> > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > --- >> > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- >> > 1 file changed, 8 insertions(+), 7 deletions(-) >> >> The below are all conditions that happen when the EDID is bad. I'm not >> sure that really qualifies as "normal". > > Often it is - a bad EDID on the monitor will always be bad. The > challenge is distinguishing that from silent data corruption during the > read - a reported read failure are trivial. > >> From a quick look through the code we don't always trigger an error from >> the below failure paths at higher levels, so decreasing the level here >> has the potential to let this kind of exceptional condition go >> unnoticed. > > The messages are not gone, they are higher than the default loglevel, > but now below the level at which they are printed to a terminal. The > bad EDID is either expected or recoverable, and definitely not fatal > so I don't think an *ERROR* is justified. I tend to agree. The description for the KERN_NOTICE level is "normal but significant condition". I might argue that the presence of these EDID messages represents a normal *or* significant condition (depending on why the EDID is bad), but I don't think it's unreasonable to expect people to check their logs if the display/mode is not working properly. Sean > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote: > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote: > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: > >> > The warnings from parsing the EDID are not driver errors, but the > >> > "normal but significant" conditions from the external device. As such, > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh > >> > DRM_NOTE instead. > >> > > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> > --- > >> > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- > >> > 1 file changed, 8 insertions(+), 7 deletions(-) > >> > >> The below are all conditions that happen when the EDID is bad. I'm not > >> sure that really qualifies as "normal". > > > > Often it is - a bad EDID on the monitor will always be bad. The > > challenge is distinguishing that from silent data corruption during the > > read - a reported read failure are trivial. > > > >> From a quick look through the code we don't always trigger an error from > >> the below failure paths at higher levels, so decreasing the level here > >> has the potential to let this kind of exceptional condition go > >> unnoticed. > > > > The messages are not gone, they are higher than the default loglevel, > > but now below the level at which they are printed to a terminal. The > > bad EDID is either expected or recoverable, and definitely not fatal > > so I don't think an *ERROR* is justified. > > I tend to agree. > > The description for the KERN_NOTICE level is "normal but significant > condition". I might argue that the presence of these EDID messages > represents a normal *or* significant condition (depending on why the > EDID is bad), but I don't think it's unreasonable to expect people to > check their logs if the display/mode is not working properly. So for cases where we know that there is shit hw out there (specifically kvm switches that mangle the cea block without adjusting the edid) we already tune down the error to debug level. So in principle totally agree with tuning down anything that happens because it's outside of our control to info or debug, but do we still need this patch after the cea one has landed? Our CI at least seems happy ... Cheers, Daniel
On Tue, Feb 14, 2017 at 10:36:09PM +0100, Daniel Vetter wrote: > On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote: > > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote: > > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: > > >> > The warnings from parsing the EDID are not driver errors, but the > > >> > "normal but significant" conditions from the external device. As such, > > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh > > >> > DRM_NOTE instead. > > >> > > > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > >> > --- > > >> > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- > > >> > 1 file changed, 8 insertions(+), 7 deletions(-) > > >> > > >> The below are all conditions that happen when the EDID is bad. I'm not > > >> sure that really qualifies as "normal". > > > > > > Often it is - a bad EDID on the monitor will always be bad. The > > > challenge is distinguishing that from silent data corruption during the > > > read - a reported read failure are trivial. > > > > > >> From a quick look through the code we don't always trigger an error from > > >> the below failure paths at higher levels, so decreasing the level here > > >> has the potential to let this kind of exceptional condition go > > >> unnoticed. > > > > > > The messages are not gone, they are higher than the default loglevel, > > > but now below the level at which they are printed to a terminal. The > > > bad EDID is either expected or recoverable, and definitely not fatal > > > so I don't think an *ERROR* is justified. > > > > I tend to agree. > > > > The description for the KERN_NOTICE level is "normal but significant > > condition". I might argue that the presence of these EDID messages > > represents a normal *or* significant condition (depending on why the > > EDID is bad), but I don't think it's unreasonable to expect people to > > check their logs if the display/mode is not working properly. > > So for cases where we know that there is shit hw out there (specifically > kvm switches that mangle the cea block without adjusting the edid) we > already tune down the error to debug level. So in principle totally agree > with tuning down anything that happens because it's outside of our control > to info or debug, but do we still need this patch after the cea one has > landed? Our CI at least seems happy ... Yes. The one machine with a dodgy EDID also happens to have a dodgy BIOS. This reduces the number of consistent errors to 1, but since an unrelated error still remains, CI doesn't detect the improvement. https://intel-gfx-ci.01.org/CI/CI_DRM_2198/fi-skl-6700k/igt@drv_module_reload@basic-reload.html -Chris
On Tue, Feb 14, 2017 at 09:43:45PM +0000, Chris Wilson wrote: > On Tue, Feb 14, 2017 at 10:36:09PM +0100, Daniel Vetter wrote: > > On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote: > > > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote: > > > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote: > > > >> > The warnings from parsing the EDID are not driver errors, but the > > > >> > "normal but significant" conditions from the external device. As such, > > > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh > > > >> > DRM_NOTE instead. > > > >> > > > > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > >> > --- > > > >> > drivers/gpu/drm/drm_edid.c | 15 ++++++++------- > > > >> > 1 file changed, 8 insertions(+), 7 deletions(-) > > > >> > > > >> The below are all conditions that happen when the EDID is bad. I'm not > > > >> sure that really qualifies as "normal". > > > > > > > > Often it is - a bad EDID on the monitor will always be bad. The > > > > challenge is distinguishing that from silent data corruption during the > > > > read - a reported read failure are trivial. > > > > > > > >> From a quick look through the code we don't always trigger an error from > > > >> the below failure paths at higher levels, so decreasing the level here > > > >> has the potential to let this kind of exceptional condition go > > > >> unnoticed. > > > > > > > > The messages are not gone, they are higher than the default loglevel, > > > > but now below the level at which they are printed to a terminal. The > > > > bad EDID is either expected or recoverable, and definitely not fatal > > > > so I don't think an *ERROR* is justified. > > > > > > I tend to agree. > > > > > > The description for the KERN_NOTICE level is "normal but significant > > > condition". I might argue that the presence of these EDID messages > > > represents a normal *or* significant condition (depending on why the > > > EDID is bad), but I don't think it's unreasonable to expect people to > > > check their logs if the display/mode is not working properly. > > > > So for cases where we know that there is shit hw out there (specifically > > kvm switches that mangle the cea block without adjusting the edid) we > > already tune down the error to debug level. So in principle totally agree > > with tuning down anything that happens because it's outside of our control > > to info or debug, but do we still need this patch after the cea one has > > landed? Our CI at least seems happy ... > > Yes. The one machine with a dodgy EDID also happens to have a dodgy > BIOS. This reduces the number of consistent errors to 1, but since an > unrelated error still remains, CI doesn't detect the improvement. > https://intel-gfx-ci.01.org/CI/CI_DRM_2198/fi-skl-6700k/igt@drv_module_reload@basic-reload.html Ok, count my convinced, I pushed the patch to drm-misc-next. -Daniel
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index 5a3b34a88ac3..24e7b282f16c 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -1140,7 +1140,7 @@ bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid, DRM_DEBUG("Assuming a KVM switch modified the CEA block but left the original checksum\n"); } else { if (print_bad_edid) - DRM_ERROR("EDID checksum is invalid, remainder is %d\n", csum); + DRM_NOTE("EDID checksum is invalid, remainder is %d\n", csum); goto bad; } @@ -1150,7 +1150,7 @@ bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid, switch (raw_edid[0]) { case 0: /* base */ if (edid->version != 1) { - DRM_ERROR("EDID has major version %d, instead of 1\n", edid->version); + DRM_NOTE("EDID has major version %d, instead of 1\n", edid->version); goto bad; } @@ -1167,11 +1167,12 @@ bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid, bad: if (print_bad_edid) { if (drm_edid_is_zero(raw_edid, EDID_LENGTH)) { - printk(KERN_ERR "EDID block is all zeroes\n"); + printk(KERN_NOTICE "EDID block is all zeroes\n"); } else { - printk(KERN_ERR "Raw EDID:\n"); - print_hex_dump(KERN_ERR, " \t", DUMP_PREFIX_NONE, 16, 1, - raw_edid, EDID_LENGTH, false); + printk(KERN_NOTICE "Raw EDID:\n"); + print_hex_dump(KERN_NOTICE, + " \t", DUMP_PREFIX_NONE, 16, 1, + raw_edid, EDID_LENGTH, false); } } return false; @@ -4002,7 +4003,7 @@ static int validate_displayid(u8 *displayid, int length, int idx) csum += displayid[i]; } if (csum) { - DRM_ERROR("DisplayID checksum invalid, remainder is %d\n", csum); + DRM_NOTE("DisplayID checksum invalid, remainder is %d\n", csum); return -EINVAL; } return 0;
The warnings from parsing the EDID are not driver errors, but the "normal but significant" conditions from the external device. As such, they do not need the ferocity of an *ERROR*, but can use the less harsh DRM_NOTE instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/drm_edid.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)