diff mbox series

[1/3] drm/panfrost: Simplify lock_region calculation

Message ID 20210820213117.13050-2-alyssa.rosenzweig@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/panfrost: Bug fixes for lock_region | expand

Commit Message

Alyssa Rosenzweig Aug. 20, 2021, 9:31 p.m. UTC
In lock_region, simplify the calculation of the region_width parameter.
This field is the size, but encoded as log2(ceil(size)) - 1.
log2(ceil(size)) may be computed directly as fls(size - 1). However, we
want to use the 64-bit versions as the amount to lock can exceed
32-bits.

This avoids undefined behaviour when locking all memory (size ~0),
caught by UBSAN.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-and-tested-by: Chris Morgan <macromorgan@hotmail.com>
Cc: <stable@vger.kernel.org>
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

Comments

Steven Price Aug. 23, 2021, 9:40 a.m. UTC | #1
On 20/08/2021 22:31, Alyssa Rosenzweig wrote:
> In lock_region, simplify the calculation of the region_width parameter.
> This field is the size, but encoded as log2(ceil(size)) - 1.
> log2(ceil(size)) may be computed directly as fls(size - 1). However, we
> want to use the 64-bit versions as the amount to lock can exceed
> 32-bits.
> 
> This avoids undefined behaviour when locking all memory (size ~0),
> caught by UBSAN.

It might have been useful to mention what it is that UBSAN specifically
picked up (it took me a while to spot) - but anyway I think there's a
bigger issue with it being completely wrong when size == ~0 (see below).

> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Reported-and-tested-by: Chris Morgan <macromorgan@hotmail.com>
> Cc: <stable@vger.kernel.org>

However, I've confirmed this returns the same values and is certainly
more simple, so:

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 0da5b3100ab1..f6e02d0392f4 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
>  {
>  	u8 region_width;
>  	u64 region = iova & PAGE_MASK;
> -	/*
> -	 * fls returns:
> -	 * 1 .. 32
> -	 *
> -	 * 10 + fls(num_pages)
> -	 * results in the range (11 .. 42)
> -	 */
> -
> -	size = round_up(size, PAGE_SIZE);

This seems to be the first issue - ~0 will be 'rounded up' to 0.

>  
> -	region_width = 10 + fls(size >> PAGE_SHIFT);

fls(0) == 0, so region_width == 10.

> -	if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {

Presumably here's where UBSAN objects - we're shifting by a negative
value, which even it it happens to works means the lock region is tiny
and certainly not what was intended! It might well be worth a:

Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")

Note for anyone following along at (working-from-) home: although this
code was cargo culted from kbase - kbase is fine because it takes a pfn
and doesn't do the round_up() stage.

Which also exposes the second bug (fixed in patch 2): a size_t isn't big
enough on 32 bit platforms (all Midgard/Bifrost GPUs have a VA size
bigger than 32 bits). Again kbase gets away with a u32 because it's a pfn.

There is potentially a third bug which kbase only recently attempted to
fix. The lock address is effectively rounded down by the hardware (the
bottom bits are ignored). So if you have mask=(1<<region_width)-1 but
(iova & mask) != ((iova + size) & mask) then you are potentially failing
to lock the end of the intended region. kbase has added some code to
handle this:

> 	/* Round up if some memory pages spill into the next region. */
> 	region_frame_number_start = pfn >> (lockaddr_size_log2 - PAGE_SHIFT);
> 	region_frame_number_end =
> 	    (pfn + num_pages - 1) >> (lockaddr_size_log2 - PAGE_SHIFT);
> 
> 	if (region_frame_number_start < region_frame_number_end)
> 		lockaddr_size_log2 += 1;

I guess we should too?

Steve

> -		/* not pow2, so must go up to the next pow2 */
> -		region_width += 1;
> -	}
> +	/* The size is encoded as ceil(log2) minus(1), which may be calculated
> +	 * with fls. The size must be clamped to hardware bounds.
> +	 */
> +	size = max_t(u64, size, PAGE_SIZE);
> +	region_width = fls64(size - 1) - 1;
>  	region |= region_width;
>  
>  	/* Lock the region that needs to be updated */
>
Alyssa Rosenzweig Aug. 23, 2021, 9:09 p.m. UTC | #2
> > In lock_region, simplify the calculation of the region_width parameter.
> > This field is the size, but encoded as log2(ceil(size)) - 1.
> > log2(ceil(size)) may be computed directly as fls(size - 1). However, we
> > want to use the 64-bit versions as the amount to lock can exceed
> > 32-bits.
> > 
> > This avoids undefined behaviour when locking all memory (size ~0),
> > caught by UBSAN.
> 
> It might have been useful to mention what it is that UBSAN specifically
> picked up (it took me a while to spot) - but anyway I think there's a
> bigger issue with it being completely wrong when size == ~0 (see below).

Indeed. I've updated the commit message in v2 to explain what goes
wrong (your analysis was spot on, but a mailing list message is more
ephermal than a commit message). I'll send out v2 tomorrow assuming
nobody objects to v1 in the mean time.

Thanks for the review.

> There is potentially a third bug which kbase only recently attempted to
> fix. The lock address is effectively rounded down by the hardware (the
> bottom bits are ignored). So if you have mask=(1<<region_width)-1 but
> (iova & mask) != ((iova + size) & mask) then you are potentially failing
> to lock the end of the intended region. kbase has added some code to
> handle this:
> 
> > 	/* Round up if some memory pages spill into the next region. */
> > 	region_frame_number_start = pfn >> (lockaddr_size_log2 - PAGE_SHIFT);
> > 	region_frame_number_end =
> > 	    (pfn + num_pages - 1) >> (lockaddr_size_log2 - PAGE_SHIFT);
> > 
> > 	if (region_frame_number_start < region_frame_number_end)
> > 		lockaddr_size_log2 += 1;
> 
> I guess we should too?

Oh, I missed this one. Guess we have 4 bugs with this code instead of
just 3, yikes. How could such a short function be so deeply and horribly
broken? 
Steven Price Aug. 23, 2021, 9:54 p.m. UTC | #3
On 23 August 2021 22:09:08 BST, Alyssa Rosenzweig <alyssa@collabora.com> wrote:
>> > In lock_region, simplify the calculation of the region_width parameter.
>> > This field is the size, but encoded as log2(ceil(size)) - 1.
>> > log2(ceil(size)) may be computed directly as fls(size - 1). However, we
>> > want to use the 64-bit versions as the amount to lock can exceed
>> > 32-bits.
>> > 
>> > This avoids undefined behaviour when locking all memory (size ~0),
>> > caught by UBSAN.
>> 
>> It might have been useful to mention what it is that UBSAN specifically
>> picked up (it took me a while to spot) - but anyway I think there's a
>> bigger issue with it being completely wrong when size == ~0 (see below).
>
>Indeed. I've updated the commit message in v2 to explain what goes
>wrong (your analysis was spot on, but a mailing list message is more
>ephermal than a commit message). I'll send out v2 tomorrow assuming
>nobody objects to v1 in the mean time.
>
>Thanks for the review.
>
>> There is potentially a third bug which kbase only recently attempted to
>> fix. The lock address is effectively rounded down by the hardware (the
>> bottom bits are ignored). So if you have mask=(1<<region_width)-1 but
>> (iova & mask) != ((iova + size) & mask) then you are potentially failing
>> to lock the end of the intended region. kbase has added some code to
>> handle this:
>> 
>> > 	/* Round up if some memory pages spill into the next region. */
>> > 	region_frame_number_start = pfn >> (lockaddr_size_log2 - PAGE_SHIFT);
>> > 	region_frame_number_end =
>> > 	    (pfn + num_pages - 1) >> (lockaddr_size_log2 - PAGE_SHIFT);
>> > 
>> > 	if (region_frame_number_start < region_frame_number_end)
>> > 		lockaddr_size_log2 += 1;
>> 
>> I guess we should too?
>
>Oh, I missed this one. Guess we have 4 bugs with this code instead of
>just 3, yikes. How could such a short function be so deeply and horribly
>broken? ����
>
>Should I add a fourth patch to the series to fix this?

Yes please!

Thanks,
Steve
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 0da5b3100ab1..f6e02d0392f4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -62,21 +62,12 @@  static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
 {
 	u8 region_width;
 	u64 region = iova & PAGE_MASK;
-	/*
-	 * fls returns:
-	 * 1 .. 32
-	 *
-	 * 10 + fls(num_pages)
-	 * results in the range (11 .. 42)
-	 */
-
-	size = round_up(size, PAGE_SIZE);
 
-	region_width = 10 + fls(size >> PAGE_SHIFT);
-	if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
-		/* not pow2, so must go up to the next pow2 */
-		region_width += 1;
-	}
+	/* The size is encoded as ceil(log2) minus(1), which may be calculated
+	 * with fls. The size must be clamped to hardware bounds.
+	 */
+	size = max_t(u64, size, PAGE_SIZE);
+	region_width = fls64(size - 1) - 1;
 	region |= region_width;
 
 	/* Lock the region that needs to be updated */