diff mbox series

drm/edid: Fix crash with zero/invalid EDID

Message ID 20211004092100.1.Ic90a5ebd44c75db963112be167a03cc96f9fb249@changeid (mailing list archive)
State New, archived
Headers show
Series drm/edid: Fix crash with zero/invalid EDID | expand

Commit Message

Doug Anderson Oct. 4, 2021, 4:21 p.m. UTC
In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
the EDID") I broke out reading the base block of the EDID to its own
function. Unfortunately, when I did that I messed up the handling when
drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
when we went through 4 loops and didn't get a valid EDID. Specifically
I needed to pass the broken EDID to connector_bad_edid() but now I was
passing an error-pointer.

Let's re-jigger things so we can pass the bad EDID in properly.

Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/gpu/drm/drm_edid.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

Comments

Geert Uytterhoeven Oct. 4, 2021, 5:14 p.m. UTC | #1
Hi Douglas,

On Mon, Oct 4, 2021 at 6:22 PM Douglas Anderson <dianders@chromium.org> wrote:
> In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> the EDID") I broke out reading the base block of the EDID to its own
> function. Unfortunately, when I did that I messed up the handling when
> drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> when we went through 4 loops and didn't get a valid EDID. Specifically
> I needed to pass the broken EDID to connector_bad_edid() but now I was
> passing an error-pointer.
>
> Let's re-jigger things so we can pass the bad EDID in properly.
>
> Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

The crash is was seeing is gone, so
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
Ville Syrjälä Oct. 4, 2021, 7:44 p.m. UTC | #2
On Mon, Oct 04, 2021 at 09:21:27AM -0700, Douglas Anderson wrote:
> In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> the EDID") I broke out reading the base block of the EDID to its own
> function. Unfortunately, when I did that I messed up the handling when
> drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> when we went through 4 loops and didn't get a valid EDID. Specifically
> I needed to pass the broken EDID to connector_bad_edid() but now I was
> passing an error-pointer.
> 
> Let's re-jigger things so we can pass the bad EDID in properly.
> 
> Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  drivers/gpu/drm/drm_edid.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 9b19eee0e1b4..9c9463ec5465 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -1911,13 +1911,15 @@ int drm_add_override_edid_modes(struct drm_connector *connector)
>  }
>  EXPORT_SYMBOL(drm_add_override_edid_modes);
>  
> -static struct edid *drm_do_get_edid_base_block(
> +static struct edid *drm_do_get_edid_base_block(struct drm_connector *connector,
>  	int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
>  			      size_t len),
> -	void *data, bool *edid_corrupt, int *null_edid_counter)
> +	void *data)
>  {
> -	int i;
> +	int *null_edid_counter = connector ? &connector->null_edid_counter : NULL;
> +	bool *edid_corrupt = connector ? &connector->edid_corrupt : NULL;
>  	void *edid;
> +	int i;
>  
>  	edid = kmalloc(EDID_LENGTH, GFP_KERNEL);
>  	if (edid == NULL)
> @@ -1941,9 +1943,8 @@ static struct edid *drm_do_get_edid_base_block(
>  	return edid;
>  
>  carp:
> -	kfree(edid);
> -	return ERR_PTR(-EINVAL);
> -
> +	if (connector)
> +		connector_bad_edid(connector, edid, 1);

BTW I believe connector_bad_edid() itself is broken since
commit e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid
corruption test"). Before we've even allocated the memory for the
extension blocks that code now assumes edid[0x7e] is to be 100%
trusted and goes and calculates the checksum on a block based on
that. So that's likely going to be pointing somewhere beyond the base
block into memory we've not even allocated. So anyone who wanted
could craft a bogus EDID and maybe get something interesting to
happen.

Would be good if someone could fix that while at it. Or just revert
the offending commit if there is no simple solution immediately in
sight.

The fact that we're parsing entirely untrustworthy crap in the kernel
always worries me. Either we need super careful review of all relevant
code, and/or we need to think about moving the parser out of the kernel.
I was considering playing around with the usermode helper stuff. IIRC
there is a way to embed the userspace binary into the kernel and just
fire it up when needed. But so far it's been the usual -ENOTIME for
me...
Doug Anderson Oct. 5, 2021, 12:40 a.m. UTC | #3
Hi,

On Mon, Oct 4, 2021 at 10:14 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Douglas,
>
> On Mon, Oct 4, 2021 at 6:22 PM Douglas Anderson <dianders@chromium.org> wrote:
> > In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> > the EDID") I broke out reading the base block of the EDID to its own
> > function. Unfortunately, when I did that I messed up the handling when
> > drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> > when we went through 4 loops and didn't get a valid EDID. Specifically
> > I needed to pass the broken EDID to connector_bad_edid() but now I was
> > passing an error-pointer.
> >
> > Let's re-jigger things so we can pass the bad EDID in properly.
> >
> > Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
>
> The crash is was seeing is gone, so
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Thanks for testing! I'll plan to apply tomorrow morning (California
time) to balance between giving folks a chance to yell at me for my
patch and the urgency of fixing the breakage.

-Doug
Zuo, Jerry Oct. 5, 2021, 1:33 p.m. UTC | #4
[AMD Official Use Only]

Hi Ville:

     Yhea, it is pretty old change from two years ago, and it is no long valid anymore. Please simply drop it.

Regards,
Jerry

> -----Original Message-----
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Sent: October 4, 2021 3:45 PM
> To: Douglas Anderson <dianders@chromium.org>
> Cc: dri-devel@lists.freedesktop.org; geert@linux-m68k.org;
> oliver.sang@intel.com; Daniel Vetter <daniel@ffwll.ch>; David Airlie
> <airlied@linux.ie>; Jani Nikula <jani.nikula@intel.com>; Linus Walleij
> <linus.walleij@linaro.org>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> <mripard@kernel.org>; Sam Ravnborg <sam@ravnborg.org>; Thomas
> Zimmermann <tzimmermann@suse.de>; linux-kernel@vger.kernel.org; Zuo,
> Jerry <Jerry.Zuo@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>;
> Siqueira, Rodrigo <Rodrigo.Siqueira@amd.com>
> Subject: Re: [PATCH] drm/edid: Fix crash with zero/invalid EDID
>
> On Mon, Oct 04, 2021 at 09:21:27AM -0700, Douglas Anderson wrote:
> > In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> > the EDID") I broke out reading the base block of the EDID to its own
> > function. Unfortunately, when I did that I messed up the handling when
> > drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> > when we went through 4 loops and didn't get a valid EDID. Specifically
> > I needed to pass the broken EDID to connector_bad_edid() but now I was
> > passing an error-pointer.
> >
> > Let's re-jigger things so we can pass the bad EDID in properly.
> >
> > Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the
> > EDID")
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  drivers/gpu/drm/drm_edid.c | 27 +++++++++++----------------
> >  1 file changed, 11 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> > index 9b19eee0e1b4..9c9463ec5465 100644
> > --- a/drivers/gpu/drm/drm_edid.c
> > +++ b/drivers/gpu/drm/drm_edid.c
> > @@ -1911,13 +1911,15 @@ int drm_add_override_edid_modes(struct
> > drm_connector *connector)  }
> > EXPORT_SYMBOL(drm_add_override_edid_modes);
> >
> > -static struct edid *drm_do_get_edid_base_block(
> > +static struct edid *drm_do_get_edid_base_block(struct drm_connector
> > +*connector,
> >     int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
> >                           size_t len),
> > -   void *data, bool *edid_corrupt, int *null_edid_counter)
> > +   void *data)
> >  {
> > -   int i;
> > +   int *null_edid_counter = connector ? &connector-
> >null_edid_counter : NULL;
> > +   bool *edid_corrupt = connector ? &connector->edid_corrupt : NULL;
> >     void *edid;
> > +   int i;
> >
> >     edid = kmalloc(EDID_LENGTH, GFP_KERNEL);
> >     if (edid == NULL)
> > @@ -1941,9 +1943,8 @@ static struct edid
> *drm_do_get_edid_base_block(
> >     return edid;
> >
> >  carp:
> > -   kfree(edid);
> > -   return ERR_PTR(-EINVAL);
> > -
> > +   if (connector)
> > +           connector_bad_edid(connector, edid, 1);
>
> BTW I believe connector_bad_edid() itself is broken since commit
> e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid corruption
> test"). Before we've even allocated the memory for the extension blocks
> that code now assumes edid[0x7e] is to be 100% trusted and goes and
> calculates the checksum on a block based on that. So that's likely going to be
> pointing somewhere beyond the base block into memory we've not even
> allocated. So anyone who wanted could craft a bogus EDID and maybe get
> something interesting to happen.
>
> Would be good if someone could fix that while at it. Or just revert the
> offending commit if there is no simple solution immediately in sight.
>
> The fact that we're parsing entirely untrustworthy crap in the kernel always
> worries me. Either we need super careful review of all relevant code, and/or
> we need to think about moving the parser out of the kernel.
> I was considering playing around with the usermode helper stuff. IIRC there
> is a way to embed the userspace binary into the kernel and just fire it up
> when needed. But so far it's been the usual -ENOTIME for me...
>
> --
> Ville Syrjälä
> Intel
Doug Anderson Oct. 5, 2021, 1:45 p.m. UTC | #5
Hi,

On Mon, Oct 4, 2021 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Oct 4, 2021 at 10:14 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Douglas,
> >
> > On Mon, Oct 4, 2021 at 6:22 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> > > the EDID") I broke out reading the base block of the EDID to its own
> > > function. Unfortunately, when I did that I messed up the handling when
> > > drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> > > when we went through 4 loops and didn't get a valid EDID. Specifically
> > > I needed to pass the broken EDID to connector_bad_edid() but now I was
> > > passing an error-pointer.
> > >
> > > Let's re-jigger things so we can pass the bad EDID in properly.
> > >
> > > Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> >
> > The crash is was seeing is gone, so
> > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Thanks for testing! I'll plan to apply tomorrow morning (California
> time) to balance between giving folks a chance to yell at me for my
> patch and the urgency of fixing the breakage.

Ah, doh! I can't push until I can get a review tag from someone. As
soon as I see one then I'll give it a push.

-Doug
Doug Anderson Oct. 5, 2021, 3:13 p.m. UTC | #6
Hi,

On Tue, Oct 5, 2021 at 6:33 AM Zuo, Jerry <Jerry.Zuo@amd.com> wrote:
>
> > BTW I believe connector_bad_edid() itself is broken since commit
> > e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid corruption
> > test"). Before we've even allocated the memory for the extension blocks
> > that code now assumes edid[0x7e] is to be 100% trusted and goes and
> > calculates the checksum on a block based on that. So that's likely going to be
> > pointing somewhere beyond the base block into memory we've not even
> > allocated. So anyone who wanted could craft a bogus EDID and maybe get
> > something interesting to happen.
> >
> > Would be good if someone could fix that while at it. Or just revert the
> > offending commit if there is no simple solution immediately in sight.
> >
> > The fact that we're parsing entirely untrustworthy crap in the kernel always
> > worries me. Either we need super careful review of all relevant code, and/or
> > we need to think about moving the parser out of the kernel.
> > I was considering playing around with the usermode helper stuff. IIRC there
> > is a way to embed the userspace binary into the kernel and just fire it up
> > when needed. But so far it's been the usual -ENOTIME for me...
> >
> [AMD Official Use Only]
>
> Hi Ville:
>
>      Yhea, it is pretty old change from two years ago, and it is no long valid anymore. Please simply drop it.
>
> Regards,
> Jerry

I've cut out other bits from this email and changed the subject line
since I think this is an issue unrelated to the one my original patch
was fixing.

I don't actually know a ton about DP compliance testing, but I
attempted to try to be helpful and revert commit e11f5bd8228f ("drm:
Add support for DP 1.4 Compliance edid corruption test"). It wasn't
too hard to deal with the conflicts in the revert itself, but then
things didn't compile because there are two places that use
`real_edid_checksum` and that goes away if I revert the patch.

I've made an attempt to fix the problem by just adding a bounds check.
Perhaps you can see if that looks good to you:

https://lore.kernel.org/r/20211005081022.1.Ib059f9c23c2611cb5a9d760e7d0a700c1295928d@changeid

-Doug
Zuo, Jerry Oct. 5, 2021, 3:25 p.m. UTC | #7
[AMD Official Use Only]

> -----Original Message-----
> From: Doug Anderson <dianders@chromium.org>
> Sent: October 5, 2021 11:14 AM
> To: Zuo, Jerry <Jerry.Zuo@amd.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; dri-
> devel@lists.freedesktop.org; geert@linux-m68k.org; oliver.sang@intel.com;
> Daniel Vetter <daniel@ffwll.ch>; David Airlie <airlied@linux.ie>; Jani Nikula
> <jani.nikula@intel.com>; Linus Walleij <linus.walleij@linaro.org>; Maarten
> Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> <mripard@kernel.org>; Sam Ravnborg <sam@ravnborg.org>; Thomas
> Zimmermann <tzimmermann@suse.de>; linux-kernel@vger.kernel.org;
> Wentland, Harry <Harry.Wentland@amd.com>; Siqueira, Rodrigo
> <Rodrigo.Siqueira@amd.com>; Kuogee Hsieh <khsieh@codeaurora.org>
> Subject: Re: connector_bad_edid() is broken (was: Re: [PATCH] drm/edid:
> Fix crash with zero/invalid EDID)
>
> Hi,
>
> On Tue, Oct 5, 2021 at 6:33 AM Zuo, Jerry <Jerry.Zuo@amd.com> wrote:
> >
> > > BTW I believe connector_bad_edid() itself is broken since commit
> > > e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid
> > > corruption test"). Before we've even allocated the memory for the
> > > extension blocks that code now assumes edid[0x7e] is to be 100%
> > > trusted and goes and calculates the checksum on a block based on
> > > that. So that's likely going to be pointing somewhere beyond the
> > > base block into memory we've not even allocated. So anyone who
> > > wanted could craft a bogus EDID and maybe get something interesting to
> happen.
> > >
> > > Would be good if someone could fix that while at it. Or just revert
> > > the offending commit if there is no simple solution immediately in sight.
> > >
> > > The fact that we're parsing entirely untrustworthy crap in the
> > > kernel always worries me. Either we need super careful review of all
> > > relevant code, and/or we need to think about moving the parser out of
> the kernel.
> > > I was considering playing around with the usermode helper stuff.
> > > IIRC there is a way to embed the userspace binary into the kernel
> > > and just fire it up when needed. But so far it's been the usual -ENOTIME
> for me...
> > >
> > [AMD Official Use Only]
> >
> > Hi Ville:
> >
> >      Yhea, it is pretty old change from two years ago, and it is no long valid
> anymore. Please simply drop it.
> >
> > Regards,
> > Jerry
>
> I've cut out other bits from this email and changed the subject line since I
> think this is an issue unrelated to the one my original patch was fixing.
>
> I don't actually know a ton about DP compliance testing, but I attempted to
> try to be helpful and revert commit e11f5bd8228f ("drm:
> Add support for DP 1.4 Compliance edid corruption test"). It wasn't too hard
> to deal with the conflicts in the revert itself, but then things didn't compile
> because there are two places that use `real_edid_checksum` and that goes
> away if I revert the patch.
>
> I've made an attempt to fix the problem by just adding a bounds check.
> Perhaps you can see if that looks good to you:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2F20211005081022.1.Ib059f9c23c2611cb5a9d760e7d0a700c1
> 295928d%40changeid&amp;data=04%7C01%7CJerry.Zuo%40amd.com%7C90
> b948659454400cedd308d98812c339%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C637690436453163864%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 000&amp;sdata=OtSngWlYyDc1NbNSgAeALqN3nF%2Bnw08nJ068cpAKZJk%3
> D&amp;reserved=0
>
> -Doug

The patch used for DP1.4 compliance edid corruption test. Let me double check if edid corruption test could be passed without the patch.
Ville Syrjälä Oct. 5, 2021, 4:43 p.m. UTC | #8
On Mon, Oct 04, 2021 at 09:21:27AM -0700, Douglas Anderson wrote:
> In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> the EDID") I broke out reading the base block of the EDID to its own
> function. Unfortunately, when I did that I messed up the handling when
> drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> when we went through 4 loops and didn't get a valid EDID. Specifically
> I needed to pass the broken EDID to connector_bad_edid() but now I was
> passing an error-pointer.
> 
> Let's re-jigger things so we can pass the bad EDID in properly.
> 
> Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

A bit of historical fallout zone this part of the code. So
not the easiest thing to read in general. But looks like what
you have here should work.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
> 
>  drivers/gpu/drm/drm_edid.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 9b19eee0e1b4..9c9463ec5465 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -1911,13 +1911,15 @@ int drm_add_override_edid_modes(struct drm_connector *connector)
>  }
>  EXPORT_SYMBOL(drm_add_override_edid_modes);
>  
> -static struct edid *drm_do_get_edid_base_block(
> +static struct edid *drm_do_get_edid_base_block(struct drm_connector *connector,
>  	int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
>  			      size_t len),
> -	void *data, bool *edid_corrupt, int *null_edid_counter)
> +	void *data)
>  {
> -	int i;
> +	int *null_edid_counter = connector ? &connector->null_edid_counter : NULL;
> +	bool *edid_corrupt = connector ? &connector->edid_corrupt : NULL;
>  	void *edid;
> +	int i;
>  
>  	edid = kmalloc(EDID_LENGTH, GFP_KERNEL);
>  	if (edid == NULL)
> @@ -1941,9 +1943,8 @@ static struct edid *drm_do_get_edid_base_block(
>  	return edid;
>  
>  carp:
> -	kfree(edid);
> -	return ERR_PTR(-EINVAL);
> -
> +	if (connector)
> +		connector_bad_edid(connector, edid, 1);
>  out:
>  	kfree(edid);
>  	return NULL;
> @@ -1982,14 +1983,9 @@ struct edid *drm_do_get_edid(struct drm_connector *connector,
>  	if (override)
>  		return override;
>  
> -	edid = (u8 *)drm_do_get_edid_base_block(get_edid_block, data,
> -						&connector->edid_corrupt,
> -						&connector->null_edid_counter);
> -	if (IS_ERR_OR_NULL(edid)) {
> -		if (IS_ERR(edid))
> -			connector_bad_edid(connector, edid, 1);
> +	edid = (u8 *)drm_do_get_edid_base_block(connector, get_edid_block, data);
> +	if (!edid)
>  		return NULL;
> -	}
>  
>  	/* if there's no extensions or no connector, we're done */
>  	valid_extensions = edid[0x7e];
> @@ -2142,14 +2138,13 @@ u32 drm_edid_get_panel_id(struct i2c_adapter *adapter)
>  	struct edid *edid;
>  	u32 panel_id;
>  
> -	edid = drm_do_get_edid_base_block(drm_do_probe_ddc_edid, adapter,
> -					  NULL, NULL);
> +	edid = drm_do_get_edid_base_block(NULL, drm_do_probe_ddc_edid, adapter);
>  
>  	/*
>  	 * There are no manufacturer IDs of 0, so if there is a problem reading
>  	 * the EDID then we'll just return 0.
>  	 */
> -	if (IS_ERR_OR_NULL(edid))
> +	if (!edid)
>  		return 0;
>  
>  	panel_id = edid_extract_panel_id(edid);
> -- 
> 2.33.0.800.g4c38ced690-goog
Harry Wentland Oct. 5, 2021, 6:03 p.m. UTC | #9
On 2021-10-05 11:25, Zuo, Jerry wrote:
> [AMD Official Use Only]
> 
>> -----Original Message-----
>> From: Doug Anderson <dianders@chromium.org>
>> Sent: October 5, 2021 11:14 AM
>> To: Zuo, Jerry <Jerry.Zuo@amd.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; dri-
>> devel@lists.freedesktop.org; geert@linux-m68k.org; oliver.sang@intel.com;
>> Daniel Vetter <daniel@ffwll.ch>; David Airlie <airlied@linux.ie>; Jani Nikula
>> <jani.nikula@intel.com>; Linus Walleij <linus.walleij@linaro.org>; Maarten
>> Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
>> <mripard@kernel.org>; Sam Ravnborg <sam@ravnborg.org>; Thomas
>> Zimmermann <tzimmermann@suse.de>; linux-kernel@vger.kernel.org;
>> Wentland, Harry <Harry.Wentland@amd.com>; Siqueira, Rodrigo
>> <Rodrigo.Siqueira@amd.com>; Kuogee Hsieh <khsieh@codeaurora.org>
>> Subject: Re: connector_bad_edid() is broken (was: Re: [PATCH] drm/edid:
>> Fix crash with zero/invalid EDID)
>>
>> Hi,
>>
>> On Tue, Oct 5, 2021 at 6:33 AM Zuo, Jerry <Jerry.Zuo@amd.com> wrote:
>>>
>>>> BTW I believe connector_bad_edid() itself is broken since commit
>>>> e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid
>>>> corruption test"). Before we've even allocated the memory for the
>>>> extension blocks that code now assumes edid[0x7e] is to be 100%
>>>> trusted and goes and calculates the checksum on a block based on
>>>> that. So that's likely going to be pointing somewhere beyond the
>>>> base block into memory we've not even allocated. So anyone who
>>>> wanted could craft a bogus EDID and maybe get something interesting to
>> happen.
>>>>
>>>> Would be good if someone could fix that while at it. Or just revert
>>>> the offending commit if there is no simple solution immediately in sight.
>>>>
>>>> The fact that we're parsing entirely untrustworthy crap in the
>>>> kernel always worries me. Either we need super careful review of all
>>>> relevant code, and/or we need to think about moving the parser out of
>> the kernel.
>>>> I was considering playing around with the usermode helper stuff.
>>>> IIRC there is a way to embed the userspace binary into the kernel
>>>> and just fire it up when needed. But so far it's been the usual -ENOTIME
>> for me...
>>>>
>>> [AMD Official Use Only]
>>>
>>> Hi Ville:
>>>
>>>      Yhea, it is pretty old change from two years ago, and it is no long valid
>> anymore. Please simply drop it.
>>>
>>> Regards,
>>> Jerry
>>
>> I've cut out other bits from this email and changed the subject line since I
>> think this is an issue unrelated to the one my original patch was fixing.
>>
>> I don't actually know a ton about DP compliance testing, but I attempted to
>> try to be helpful and revert commit e11f5bd8228f ("drm:
>> Add support for DP 1.4 Compliance edid corruption test"). It wasn't too hard
>> to deal with the conflicts in the revert itself, but then things didn't compile
>> because there are two places that use `real_edid_checksum` and that goes
>> away if I revert the patch.
>>
>> I've made an attempt to fix the problem by just adding a bounds check.
>> Perhaps you can see if that looks good to you:
>>
>> https://lore.
 kernel.org%2Fr%2F20211005081022.1.Ib059f9c23c2611cb5a9d760e7d0a700c1
>> 295928d%40changeid&amp;data=04%7C01%7CJerry.Zuo%40amd.com%7C90
>> b948659454400cedd308d98812c339%7C3dd8961fe4884e608e11a82d994e183d
>> %7C0%7C0%7C637690436453163864%7CUnknown%7CTWFpbGZsb3d8eyJWIj
>> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
>> 000&amp;sdata=OtSngWlYyDc1NbNSgAeALqN3nF%2Bnw08nJ068cpAKZJk%3
>> D&amp;reserved=0
>>
>> -Doug
> 
> The patch used for DP1.4 compliance edid corruption test. Let me double check if edid corruption test could be passed without the patch.
> 

Can you try the CTS test with Doug's fix?

https://patchwork.freedesktop.org/patch/457537/

Harry
Doug Anderson Oct. 6, 2021, 2:20 a.m. UTC | #10
Hi,

On Tue, Oct 5, 2021 at 9:43 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Mon, Oct 04, 2021 at 09:21:27AM -0700, Douglas Anderson wrote:
> > In the commit bac9c2948224 ("drm/edid: Break out reading block 0 of
> > the EDID") I broke out reading the base block of the EDID to its own
> > function. Unfortunately, when I did that I messed up the handling when
> > drm_edid_is_zero() indicated that we had an EDID that was all 0x00 or
> > when we went through 4 loops and didn't get a valid EDID. Specifically
> > I needed to pass the broken EDID to connector_bad_edid() but now I was
> > passing an error-pointer.
> >
> > Let's re-jigger things so we can pass the bad EDID in properly.
> >
> > Fixes: bac9c2948224 ("drm/edid: Break out reading block 0 of the EDID")
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
>
> A bit of historical fallout zone this part of the code. So
> not the easiest thing to read in general. But looks like what
> you have here should work.
>
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Thanks! Pushed to drm-misc/drm-misc-next:

e7bd95a7ed4e drm/edid: Fix crash with zero/invalid EDID
Zuo, Jerry Oct. 6, 2021, 12:05 p.m. UTC | #11
[AMD Official Use Only]

> -----Original Message-----
> From: Wentland, Harry <Harry.Wentland@amd.com>
> Sent: October 5, 2021 2:04 PM
> To: Zuo, Jerry <Jerry.Zuo@amd.com>; Doug Anderson
> <dianders@chromium.org>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; dri-
> devel@lists.freedesktop.org; geert@linux-m68k.org; oliver.sang@intel.com;
> Daniel Vetter <daniel@ffwll.ch>; David Airlie <airlied@linux.ie>; Jani Nikula
> <jani.nikula@intel.com>; Linus Walleij <linus.walleij@linaro.org>; Maarten
> Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> <mripard@kernel.org>; Sam Ravnborg <sam@ravnborg.org>; Thomas
> Zimmermann <tzimmermann@suse.de>; linux-kernel@vger.kernel.org;
> Siqueira, Rodrigo <Rodrigo.Siqueira@amd.com>; Kuogee Hsieh
> <khsieh@codeaurora.org>
> Subject: Re: connector_bad_edid() is broken (was: Re: [PATCH] drm/edid:
> Fix crash with zero/invalid EDID)
>
>
>
> On 2021-10-05 11:25, Zuo, Jerry wrote:
> > [AMD Official Use Only]
> >
> >> -----Original Message-----
> >> From: Doug Anderson <dianders@chromium.org>
> >> Sent: October 5, 2021 11:14 AM
> >> To: Zuo, Jerry <Jerry.Zuo@amd.com>
> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; dri-
> >> devel@lists.freedesktop.org; geert@linux-m68k.org;
> >> oliver.sang@intel.com; Daniel Vetter <daniel@ffwll.ch>; David Airlie
> >> <airlied@linux.ie>; Jani Nikula <jani.nikula@intel.com>; Linus
> >> Walleij <linus.walleij@linaro.org>; Maarten Lankhorst
> >> <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> >> <mripard@kernel.org>; Sam Ravnborg <sam@ravnborg.org>; Thomas
> >> Zimmermann <tzimmermann@suse.de>; linux-kernel@vger.kernel.org;
> >> Wentland, Harry <Harry.Wentland@amd.com>; Siqueira, Rodrigo
> >> <Rodrigo.Siqueira@amd.com>; Kuogee Hsieh <khsieh@codeaurora.org>
> >> Subject: Re: connector_bad_edid() is broken (was: Re: [PATCH] drm/edid:
> >> Fix crash with zero/invalid EDID)
> >>
> >> Hi,
> >>
> >> On Tue, Oct 5, 2021 at 6:33 AM Zuo, Jerry <Jerry.Zuo@amd.com> wrote:
> >>>
> >>>> BTW I believe connector_bad_edid() itself is broken since commit
> >>>> e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid
> >>>> corruption test"). Before we've even allocated the memory for the
> >>>> extension blocks that code now assumes edid[0x7e] is to be 100%
> >>>> trusted and goes and calculates the checksum on a block based on
> >>>> that. So that's likely going to be pointing somewhere beyond the
> >>>> base block into memory we've not even allocated. So anyone who
> >>>> wanted could craft a bogus EDID and maybe get something interesting
> >>>> to
> >> happen.
> >>>>
> >>>> Would be good if someone could fix that while at it. Or just revert
> >>>> the offending commit if there is no simple solution immediately in sight.
> >>>>
> >>>> The fact that we're parsing entirely untrustworthy crap in the
> >>>> kernel always worries me. Either we need super careful review of
> >>>> all relevant code, and/or we need to think about moving the parser
> >>>> out of
> >> the kernel.
> >>>> I was considering playing around with the usermode helper stuff.
> >>>> IIRC there is a way to embed the userspace binary into the kernel
> >>>> and just fire it up when needed. But so far it's been the usual
> >>>> -ENOTIME
> >> for me...
> >>>>
> >>> [AMD Official Use Only]
> >>>
> >>> Hi Ville:
> >>>
> >>>      Yhea, it is pretty old change from two years ago, and it is no
> >>> long valid
> >> anymore. Please simply drop it.
> >>>
> >>> Regards,
> >>> Jerry
> >>
> >> I've cut out other bits from this email and changed the subject line
> >> since I think this is an issue unrelated to the one my original patch was
> fixing.
> >>
> >> I don't actually know a ton about DP compliance testing, but I
> >> attempted to try to be helpful and revert commit e11f5bd8228f ("drm:
> >> Add support for DP 1.4 Compliance edid corruption test"). It wasn't
> >> too hard to deal with the conflicts in the revert itself, but then
> >> things didn't compile because there are two places that use
> >> `real_edid_checksum` and that goes away if I revert the patch.
> >>
> >> I've made an attempt to fix the problem by just adding a bounds check.
> >> Perhaps you can see if that looks good to you:
> >>
> >> https://lore.
>  kernel.org%2Fr%2F20211005081022.1.Ib059f9c23c2611cb5a9d760e7d0a700c1
> >>
> 295928d%40changeid&amp;data=04%7C01%7CJerry.Zuo%40amd.com%7C90
> >>
> b948659454400cedd308d98812c339%7C3dd8961fe4884e608e11a82d994e183d
> >> %7C0%7C0%7C637690436453163864%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIj
> >>
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> >>
> 000&amp;sdata=OtSngWlYyDc1NbNSgAeALqN3nF%2Bnw08nJ068cpAKZJk%3
> >> D&amp;reserved=0
> >>
> >> -Doug
> >
> > The patch used for DP1.4 compliance edid corruption test. Let me double
> check if edid corruption test could be passed without the patch.
> >
>
> Can you try the CTS test with Doug's fix?
>
> https://patchwork.freedesktop.org/patch/457537/
>
> Harry

Yes, I'll give a try on that.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 9b19eee0e1b4..9c9463ec5465 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -1911,13 +1911,15 @@  int drm_add_override_edid_modes(struct drm_connector *connector)
 }
 EXPORT_SYMBOL(drm_add_override_edid_modes);
 
-static struct edid *drm_do_get_edid_base_block(
+static struct edid *drm_do_get_edid_base_block(struct drm_connector *connector,
 	int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
 			      size_t len),
-	void *data, bool *edid_corrupt, int *null_edid_counter)
+	void *data)
 {
-	int i;
+	int *null_edid_counter = connector ? &connector->null_edid_counter : NULL;
+	bool *edid_corrupt = connector ? &connector->edid_corrupt : NULL;
 	void *edid;
+	int i;
 
 	edid = kmalloc(EDID_LENGTH, GFP_KERNEL);
 	if (edid == NULL)
@@ -1941,9 +1943,8 @@  static struct edid *drm_do_get_edid_base_block(
 	return edid;
 
 carp:
-	kfree(edid);
-	return ERR_PTR(-EINVAL);
-
+	if (connector)
+		connector_bad_edid(connector, edid, 1);
 out:
 	kfree(edid);
 	return NULL;
@@ -1982,14 +1983,9 @@  struct edid *drm_do_get_edid(struct drm_connector *connector,
 	if (override)
 		return override;
 
-	edid = (u8 *)drm_do_get_edid_base_block(get_edid_block, data,
-						&connector->edid_corrupt,
-						&connector->null_edid_counter);
-	if (IS_ERR_OR_NULL(edid)) {
-		if (IS_ERR(edid))
-			connector_bad_edid(connector, edid, 1);
+	edid = (u8 *)drm_do_get_edid_base_block(connector, get_edid_block, data);
+	if (!edid)
 		return NULL;
-	}
 
 	/* if there's no extensions or no connector, we're done */
 	valid_extensions = edid[0x7e];
@@ -2142,14 +2138,13 @@  u32 drm_edid_get_panel_id(struct i2c_adapter *adapter)
 	struct edid *edid;
 	u32 panel_id;
 
-	edid = drm_do_get_edid_base_block(drm_do_probe_ddc_edid, adapter,
-					  NULL, NULL);
+	edid = drm_do_get_edid_base_block(NULL, drm_do_probe_ddc_edid, adapter);
 
 	/*
 	 * There are no manufacturer IDs of 0, so if there is a problem reading
 	 * the EDID then we'll just return 0.
 	 */
-	if (IS_ERR_OR_NULL(edid))
+	if (!edid)
 		return 0;
 
 	panel_id = edid_extract_panel_id(edid);