diff mbox series

[1/3] drm/msm: Fix speed-bin support not to access outside valid memory

Message ID 20210226162521.1.Ib5ae69a80704c3a2992100b9b5bac1a6cc470249@changeid (mailing list archive)
State New, archived
Headers show
Series nvmem: Bring a tiny bit of sanity to reading 16/32/64 bits from nvmem | expand

Commit Message

Douglas Anderson Feb. 27, 2021, 12:26 a.m. UTC
When running the latest kernel on an sc7180 with KASAN I got this
splat:
  BUG: KASAN: slab-out-of-bounds in a6xx_gpu_init+0x618/0x644
  Read of size 4 at addr ffffff8088f36100 by task kworker/7:1/58
  CPU: 7 PID: 58 Comm: kworker/7:1 Not tainted 5.11.0+ #3
  Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
  Workqueue: events deferred_probe_work_func
  Call trace:
   dump_backtrace+0x0/0x3a8
   show_stack+0x24/0x30
   dump_stack+0x174/0x1e0
   print_address_description+0x70/0x2e4
   kasan_report+0x178/0x1bc
   __asan_report_load4_noabort+0x44/0x50
   a6xx_gpu_init+0x618/0x644
   adreno_bind+0x26c/0x438

This is because the speed bin is defined like this:
  gpu_speed_bin: gpu_speed_bin@1d2 {
    reg = <0x1d2 0x2>;
    bits = <5 8>;
  };

As you can see the "length" is 2 bytes. That means that the nvmem
subsystem allocates only 2 bytes. The GPU code, however, was casting
the pointer allocated by nvmem to a (u32 *) and dereferencing. That's
not so good.

Let's fix this to just use the nvmem_cell_read_u16() accessor function
which simplifies things and also gets rid of the splat.

Let's also put an explicit conversion from little endian in place just
to make things clear. The nvmem subsystem today is assuming little
endian and this makes it clear. Specifically, the way the above sc7180
cell is interpreted:

NVMEM:
 +--------+--------+--------+--------+--------+
 | ...... | 0x1d3  | 0x1d2  | ...... | 0x000  |
 +--------+--------+--------+--------+--------+
              ^       ^
             msb     lsb

You can see that the least significant data is at the lower address
which is little endian.

NOTE: someone who is truly paying attention might wonder about me
picking the "u16" version of this accessor instead of the "u8" (since
the value is 8 bits big) or the u32 version (just for fun). At the
moment you need to pick the accessor that exactly matches the length
the cell was specified as in the device tree. Hopefully future
patches to the nvmem subsystem will fix this.

Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++--------------------
 1 file changed, 8 insertions(+), 23 deletions(-)

Comments

Srinivas Kandagatla March 5, 2021, 10:28 a.m. UTC | #1
On 27/02/2021 00:26, Douglas Anderson wrote:
> When running the latest kernel on an sc7180 with KASAN I got this
> splat:
>    BUG: KASAN: slab-out-of-bounds in a6xx_gpu_init+0x618/0x644
>    Read of size 4 at addr ffffff8088f36100 by task kworker/7:1/58
>    CPU: 7 PID: 58 Comm: kworker/7:1 Not tainted 5.11.0+ #3
>    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>    Workqueue: events deferred_probe_work_func
>    Call trace:
>     dump_backtrace+0x0/0x3a8
>     show_stack+0x24/0x30
>     dump_stack+0x174/0x1e0
>     print_address_description+0x70/0x2e4
>     kasan_report+0x178/0x1bc
>     __asan_report_load4_noabort+0x44/0x50
>     a6xx_gpu_init+0x618/0x644
>     adreno_bind+0x26c/0x438
> 
> This is because the speed bin is defined like this:
>    gpu_speed_bin: gpu_speed_bin@1d2 {
>      reg = <0x1d2 0x2>;
>      bits = <5 8>;
>    };
> 
> As you can see the "length" is 2 bytes. That means that the nvmem
> subsystem allocates only 2 bytes. The GPU code, however, was casting
> the pointer allocated by nvmem to a (u32 *) and dereferencing. That's
> not so good.
> 
> Let's fix this to just use the nvmem_cell_read_u16() accessor function
> which simplifies things and also gets rid of the splat.
> 
> Let's also put an explicit conversion from little endian in place just
> to make things clear. The nvmem subsystem today is assuming little
> endian and this makes it clear. Specifically, the way the above sc7180
> cell is interpreted:
> 
> NVMEM:
>   +--------+--------+--------+--------+--------+
>   | ...... | 0x1d3  | 0x1d2  | ...... | 0x000  |
>   +--------+--------+--------+--------+--------+
>                ^       ^
>               msb     lsb
> 
> You can see that the least significant data is at the lower address
> which is little endian.
> 
> NOTE: someone who is truly paying attention might wonder about me
> picking the "u16" version of this accessor instead of the "u8" (since
> the value is 8 bits big) or the u32 version (just for fun). At the
> moment you need to pick the accessor that exactly matches the length
> the cell was specified as in the device tree. Hopefully future
> patches to the nvmem subsystem will fix this.
> 
> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++--------------------
>   1 file changed, 8 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index ba8e9d3cf0fe..0e2024defd79 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1350,35 +1350,20 @@ static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu,
>   		u32 revn)
>   {
>   	struct opp_table *opp_table;
> -	struct nvmem_cell *cell;
>   	u32 supp_hw = UINT_MAX;
> -	void *buf;
> -
> -	cell = nvmem_cell_get(dev, "speed_bin");
> -	/*
> -	 * -ENOENT means that the platform doesn't support speedbin which is
> -	 * fine
> -	 */
> -	if (PTR_ERR(cell) == -ENOENT)
> -		return 0;
> -	else if (IS_ERR(cell)) {
> -		DRM_DEV_ERROR(dev,
> -				"failed to read speed-bin. Some OPPs may not be supported by hardware");
> -		goto done;
> -	}
> +	u16 speedbin;
> +	int ret;
>   
> -	buf = nvmem_cell_read(cell, NULL);

I think the issue here is not passing len pointer which should return 
how many bytes the cell is!

Then from there we can decide to do le16_to_cpu or le32_to_cpu or not!
This will also future proof the code to handle speed_bins of different 
sizes!

--srini

> -	if (IS_ERR(buf)) {
> -		nvmem_cell_put(cell);

> +	ret = nvmem_cell_read_u16(dev, "speed_bin", &speedbin);
> +	if (ret) {
>   		DRM_DEV_ERROR(dev,
> -				"failed to read speed-bin. Some OPPs may not be supported by hardware");
> +			      "failed to read speed-bin (%d). Some OPPs may not be supported by hardware",
> +			      ret);
>   		goto done;
>   	}
> +	speedbin = le16_to_cpu(speedbin);
>   
> -	supp_hw = fuse_to_supp_hw(dev, revn, *((u32 *) buf));
> -
> -	kfree(buf);
> -	nvmem_cell_put(cell);
> +	supp_hw = fuse_to_supp_hw(dev, revn, speedbin);
>   
>   done:
>   	opp_table = dev_pm_opp_set_supported_hw(dev, &supp_hw, 1);
>
Douglas Anderson March 5, 2021, 2:45 p.m. UTC | #2
Hi,

On Fri, Mar 5, 2021 at 2:28 AM Srinivas Kandagatla
<srinivas.kandagatla@linaro.org> wrote:
>
>
>
> On 27/02/2021 00:26, Douglas Anderson wrote:
> > When running the latest kernel on an sc7180 with KASAN I got this
> > splat:
> >    BUG: KASAN: slab-out-of-bounds in a6xx_gpu_init+0x618/0x644
> >    Read of size 4 at addr ffffff8088f36100 by task kworker/7:1/58
> >    CPU: 7 PID: 58 Comm: kworker/7:1 Not tainted 5.11.0+ #3
> >    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >    Workqueue: events deferred_probe_work_func
> >    Call trace:
> >     dump_backtrace+0x0/0x3a8
> >     show_stack+0x24/0x30
> >     dump_stack+0x174/0x1e0
> >     print_address_description+0x70/0x2e4
> >     kasan_report+0x178/0x1bc
> >     __asan_report_load4_noabort+0x44/0x50
> >     a6xx_gpu_init+0x618/0x644
> >     adreno_bind+0x26c/0x438
> >
> > This is because the speed bin is defined like this:
> >    gpu_speed_bin: gpu_speed_bin@1d2 {
> >      reg = <0x1d2 0x2>;
> >      bits = <5 8>;
> >    };
> >
> > As you can see the "length" is 2 bytes. That means that the nvmem
> > subsystem allocates only 2 bytes. The GPU code, however, was casting
> > the pointer allocated by nvmem to a (u32 *) and dereferencing. That's
> > not so good.
> >
> > Let's fix this to just use the nvmem_cell_read_u16() accessor function
> > which simplifies things and also gets rid of the splat.
> >
> > Let's also put an explicit conversion from little endian in place just
> > to make things clear. The nvmem subsystem today is assuming little
> > endian and this makes it clear. Specifically, the way the above sc7180
> > cell is interpreted:
> >
> > NVMEM:
> >   +--------+--------+--------+--------+--------+
> >   | ...... | 0x1d3  | 0x1d2  | ...... | 0x000  |
> >   +--------+--------+--------+--------+--------+
> >                ^       ^
> >               msb     lsb
> >
> > You can see that the least significant data is at the lower address
> > which is little endian.
> >
> > NOTE: someone who is truly paying attention might wonder about me
> > picking the "u16" version of this accessor instead of the "u8" (since
> > the value is 8 bits big) or the u32 version (just for fun). At the
> > moment you need to pick the accessor that exactly matches the length
> > the cell was specified as in the device tree. Hopefully future
> > patches to the nvmem subsystem will fix this.
> >
> > Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++--------------------
> >   1 file changed, 8 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index ba8e9d3cf0fe..0e2024defd79 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1350,35 +1350,20 @@ static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu,
> >               u32 revn)
> >   {
> >       struct opp_table *opp_table;
> > -     struct nvmem_cell *cell;
> >       u32 supp_hw = UINT_MAX;
> > -     void *buf;
> > -
> > -     cell = nvmem_cell_get(dev, "speed_bin");
> > -     /*
> > -      * -ENOENT means that the platform doesn't support speedbin which is
> > -      * fine
> > -      */
> > -     if (PTR_ERR(cell) == -ENOENT)
> > -             return 0;
> > -     else if (IS_ERR(cell)) {
> > -             DRM_DEV_ERROR(dev,
> > -                             "failed to read speed-bin. Some OPPs may not be supported by hardware");
> > -             goto done;
> > -     }
> > +     u16 speedbin;
> > +     int ret;
> >
> > -     buf = nvmem_cell_read(cell, NULL);
>
> I think the issue here is not passing len pointer which should return
> how many bytes the cell is!
>
> Then from there we can decide to do le16_to_cpu or le32_to_cpu or not!
> This will also future proof the code to handle speed_bins of different
> sizes!

I think what you're saying is that you want to copy/paste this code
(or something similar) everywhere that accesses an nvmem cell.  Is
that correct?  ...or maybe you can suggest some smaller / shorter code
that I'm missing?

---

{
  struct nvmem_cell *cell;
  ssize_t len;
  char *ret;
  int i;

  *data = 0;

  cell = nvmem_cell_get(dev, cname);
  if (IS_ERR(cell)) {
    if (PTR_ERR(cell) != -EPROBE_DEFER)
      dev_err(dev, "undefined cell %s\n", cname);
    return PTR_ERR(cell);
  }

  ret = nvmem_cell_read(cell, &len);
  nvmem_cell_put(cell);
  if (IS_ERR(ret)) {
    dev_err(dev, "can't read cell %s\n", cname);
    return PTR_ERR(ret);
  }

  for (i = 0; i < len; i++)
    *data |= ret[i] << (8 * i);

  kfree(ret);
  dev_dbg(dev, "efuse read(%s) = %x, bytes %zd\n", cname, *data, len);

  return 0;
}

---

The above code is from cpr_read_efuse() in "cpr.c".  I mentioned in
the cover letter that I thought about doing this and decided it wasn't
a great idea.  There should be _some_ function in the nvmem core that
says: there's an integer that's 32-bits or less stored in nvmem.
Please read it for me.  If you don't think we can use one of the
existing functions for that, would you be opposed to me creating a new
one?

---

In any case, while we discuss what we should do long term, I still
hope that Rob can merge this patch since it fixes the bug.

-Doug
Srinivas Kandagatla March 5, 2021, 4:07 p.m. UTC | #3
On 05/03/2021 14:45, Doug Anderson wrote:
> Hi,
> 
> On Fri, Mar 5, 2021 at 2:28 AM Srinivas Kandagatla
> <srinivas.kandagatla@linaro.org> wrote:
>>
>>
>>
>> On 27/02/2021 00:26, Douglas Anderson wrote:
>>> When running the latest kernel on an sc7180 with KASAN I got this
>>> splat:
>>>     BUG: KASAN: slab-out-of-bounds in a6xx_gpu_init+0x618/0x644
>>>     Read of size 4 at addr ffffff8088f36100 by task kworker/7:1/58
>>>     CPU: 7 PID: 58 Comm: kworker/7:1 Not tainted 5.11.0+ #3
>>>     Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>>>     Workqueue: events deferred_probe_work_func
>>>     Call trace:
>>>      dump_backtrace+0x0/0x3a8
>>>      show_stack+0x24/0x30
>>>      dump_stack+0x174/0x1e0
>>>      print_address_description+0x70/0x2e4
>>>      kasan_report+0x178/0x1bc
>>>      __asan_report_load4_noabort+0x44/0x50
>>>      a6xx_gpu_init+0x618/0x644
>>>      adreno_bind+0x26c/0x438
>>>
>>> This is because the speed bin is defined like this:
>>>     gpu_speed_bin: gpu_speed_bin@1d2 {
>>>       reg = <0x1d2 0x2>;
>>>       bits = <5 8>;
>>>     };
>>>
>>> As you can see the "length" is 2 bytes. That means that the nvmem
>>> subsystem allocates only 2 bytes. The GPU code, however, was casting
>>> the pointer allocated by nvmem to a (u32 *) and dereferencing. That's
>>> not so good.
>>>
>>> Let's fix this to just use the nvmem_cell_read_u16() accessor function
>>> which simplifies things and also gets rid of the splat.
>>>
>>> Let's also put an explicit conversion from little endian in place just
>>> to make things clear. The nvmem subsystem today is assuming little
>>> endian and this makes it clear. Specifically, the way the above sc7180
>>> cell is interpreted:
>>>
>>> NVMEM:
>>>    +--------+--------+--------+--------+--------+
>>>    | ...... | 0x1d3  | 0x1d2  | ...... | 0x000  |
>>>    +--------+--------+--------+--------+--------+
>>>                 ^       ^
>>>                msb     lsb
>>>
>>> You can see that the least significant data is at the lower address
>>> which is little endian.
>>>
>>> NOTE: someone who is truly paying attention might wonder about me
>>> picking the "u16" version of this accessor instead of the "u8" (since
>>> the value is 8 bits big) or the u32 version (just for fun). At the
>>> moment you need to pick the accessor that exactly matches the length
>>> the cell was specified as in the device tree. Hopefully future
>>> patches to the nvmem subsystem will fix this.
>>>
>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> ---
>>>
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++--------------------
>>>    1 file changed, 8 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> index ba8e9d3cf0fe..0e2024defd79 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> @@ -1350,35 +1350,20 @@ static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu,
>>>                u32 revn)
>>>    {
>>>        struct opp_table *opp_table;
>>> -     struct nvmem_cell *cell;
>>>        u32 supp_hw = UINT_MAX;
>>> -     void *buf;
>>> -
>>> -     cell = nvmem_cell_get(dev, "speed_bin");
>>> -     /*
>>> -      * -ENOENT means that the platform doesn't support speedbin which is
>>> -      * fine
>>> -      */
>>> -     if (PTR_ERR(cell) == -ENOENT)
>>> -             return 0;
>>> -     else if (IS_ERR(cell)) {
>>> -             DRM_DEV_ERROR(dev,
>>> -                             "failed to read speed-bin. Some OPPs may not be supported by hardware");
>>> -             goto done;
>>> -     }
>>> +     u16 speedbin;
>>> +     int ret;
>>>
>>> -     buf = nvmem_cell_read(cell, NULL);
>>
>> I think the issue here is not passing len pointer which should return
>> how many bytes the cell is!
>>
>> Then from there we can decide to do le16_to_cpu or le32_to_cpu or not!
>> This will also future proof the code to handle speed_bins of different
>> sizes!
> 
> I think what you're saying is that you want to copy/paste this code
> (or something similar) everywhere that accesses an nvmem cell.  Is
> that correct?  ...or maybe you can suggest some smaller / shorter code
> that I'm missing?
> 

It depends what the consumer is doing! If it is already aware of what 
size of data its expecting then you can use nvmem_cell_read_u8/16/32/64 
variants, however it wants to do bit more with the data then 
nvmem_cell_read() should give more flexibility!

> ---
> 
> {
>    struct nvmem_cell *cell;
>    ssize_t len;
>    char *ret;
>    int i;
> 
>    *data = 0;
> 
>    cell = nvmem_cell_get(dev, cname);
>    if (IS_ERR(cell)) {
>      if (PTR_ERR(cell) != -EPROBE_DEFER)
>        dev_err(dev, "undefined cell %s\n", cname);
>      return PTR_ERR(cell);
>    }
> 
>    ret = nvmem_cell_read(cell, &len);
>    nvmem_cell_put(cell);
>    if (IS_ERR(ret)) {
>      dev_err(dev, "can't read cell %s\n", cname);
>      return PTR_ERR(ret);
>    }
> 
>    for (i = 0; i < len; i++)
>      *data |= ret[i] << (8 * i);
> 
>    kfree(ret);
>    dev_dbg(dev, "efuse read(%s) = %x, bytes %zd\n", cname, *data, len);
> 
>    return 0;
> }
> 
> ---
> 
> The above code is from cpr_read_efuse() in "cpr.c".  I mentioned in
> the cover letter that I thought about doing this and decided it wasn't
> a great idea.  There should be _some_ function in the nvmem core that
> says: there's an integer that's 32-bits or less stored in nvmem.

There is no such helper function other than using the above snippet to do.

> Please read it for me.  If you don't think we can use one of the
> existing functions for that, would you be opposed to me creating a new
> one?

I have no issue in adding a new helper function in nvmem to allow such 
thing!

> 
> ---
> 
> In any case, while we discuss what we should do long term, I still
> hope that Rob can merge this patch since it fixes the bug.

Yes, I agree this definitely fixes the mentioned bug!
> 
> -Doug
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index ba8e9d3cf0fe..0e2024defd79 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1350,35 +1350,20 @@  static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu,
 		u32 revn)
 {
 	struct opp_table *opp_table;
-	struct nvmem_cell *cell;
 	u32 supp_hw = UINT_MAX;
-	void *buf;
-
-	cell = nvmem_cell_get(dev, "speed_bin");
-	/*
-	 * -ENOENT means that the platform doesn't support speedbin which is
-	 * fine
-	 */
-	if (PTR_ERR(cell) == -ENOENT)
-		return 0;
-	else if (IS_ERR(cell)) {
-		DRM_DEV_ERROR(dev,
-				"failed to read speed-bin. Some OPPs may not be supported by hardware");
-		goto done;
-	}
+	u16 speedbin;
+	int ret;
 
-	buf = nvmem_cell_read(cell, NULL);
-	if (IS_ERR(buf)) {
-		nvmem_cell_put(cell);
+	ret = nvmem_cell_read_u16(dev, "speed_bin", &speedbin);
+	if (ret) {
 		DRM_DEV_ERROR(dev,
-				"failed to read speed-bin. Some OPPs may not be supported by hardware");
+			      "failed to read speed-bin (%d). Some OPPs may not be supported by hardware",
+			      ret);
 		goto done;
 	}
+	speedbin = le16_to_cpu(speedbin);
 
-	supp_hw = fuse_to_supp_hw(dev, revn, *((u32 *) buf));
-
-	kfree(buf);
-	nvmem_cell_put(cell);
+	supp_hw = fuse_to_supp_hw(dev, revn, speedbin);
 
 done:
 	opp_table = dev_pm_opp_set_supported_hw(dev, &supp_hw, 1);