diff mbox

dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace

Message ID 20180626143147.14296-1-michel@daenzer.net (mailing list archive)
State New, archived
Headers show

Commit Message

Michel Dänzer June 26, 2018, 2:31 p.m. UTC
From: Michel Dänzer <michel.daenzer@amd.com>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* ttm_eu_reserve_buffers processes a list containing multiple BOs using
  the same reservation object, so it calls
  reservation_object_reserve_shared with that reservation object once
  for each such BO.
* In reservation_object_reserve_shared, old->shared_count ==
  old->shared_max - 1, so obj->staged is freed in preparation of an
  in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
  once for each of the BOs above, always with the same fence.
* The first call adds the fence in the remaining free slot, after which
  old->shared_count == old->shared_max.

In the next call to reservation_object_add_shared_fence, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since the fence is already in the
reservation object.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/dma-buf/reservation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Chris Wilson June 27, 2018, 11:50 a.m. UTC | #1
Quoting Michel Dänzer (2018-06-26 15:31:47)
> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
> 
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>   the same reservation object, so it calls
>   reservation_object_reserve_shared with that reservation object once
>   for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
>   old->shared_max - 1, so obj->staged is freed in preparation of an
>   in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>   once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
>   old->shared_count == old->shared_max.
> 
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
> 
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

I've convinced myself (or rather have not found a valid argument
against) that being able to call reserve_shared + add_shared multiple
times for the same fence is an intended part of reservation_object API 

I'd double check with Christian though.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

>  drivers/dma-buf/reservation.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>         if (signaled) {
>                 RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>         } else {
> +               BUG_ON(fobj->shared_count >= fobj->shared_max);

Personally I would just let kasan detect this and throw away the BUG_ON
or at least move it behind some DMABUF_BUG_ON().
-Chris
Michel Dänzer June 27, 2018, 4:04 p.m. UTC | #2
On 2018-06-27 01:50 PM, Chris Wilson wrote:
> Quoting Michel Dänzer (2018-06-26 15:31:47)
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>   the same reservation object, so it calls
>>   reservation_object_reserve_shared with that reservation object once
>>   for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>   old->shared_max - 1, so obj->staged is freed in preparation of an
>>   in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>   once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>   old->shared_count == old->shared_max.
>>
>> In the next call to reservation_object_add_shared_fence, the BUG_ON
>> triggers. However, nothing bad would happen in
>> reservation_object_add_shared_inplace, since the fence is already in the
>> reservation object.
>>
>> Prevent this by moving the BUG_ON to where an overflow would actually
>> happen (e.g. if a buggy caller didn't call
>> reservation_object_reserve_shared before).
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> 
> I've convinced myself (or rather have not found a valid argument
> against) that being able to call reserve_shared + add_shared multiple
> times for the same fence is an intended part of reservation_object API 
> 
> I'd double check with Christian though.

Right, I'm interested in Christian's feedback.


> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks!


>>  drivers/dma-buf/reservation.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..532545b9488e 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>         if (signaled) {
>>                 RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>>         } else {
>> +               BUG_ON(fobj->shared_count >= fobj->shared_max);
> 
> Personally I would just let kasan detect this and throw away the BUG_ON
> or at least move it behind some DMABUF_BUG_ON().

Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's
clear that the caller is buggy, and proceeding to write beyond the end
of the array could have far-reaching consequences. I'm leaving that to
somebody else.
Christian König July 4, 2018, 8:31 a.m. UTC | #3
Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>    the same reservation object, so it calls
>    reservation_object_reserve_shared with that reservation object once
>    for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
>    old->shared_max - 1, so obj->staged is freed in preparation of an
>    in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>    once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
>    old->shared_count == old->shared_max.

Well, the explanation here is not correct. For multiple BOs using the 
same reservation object we won't call 
reservation_object_add_shared_fence() multiple times because we move 
those to the duplicates list in ttm_eu_reserve_buffers().

But this bug can still happen because we call 
reservation_object_add_shared_fence() manually with fences for the same 
context in a couple of places.

One prominent case which comes to my mind are for the VM BOs during 
updates. Another possibility are VRAM BOs which need to be cleared.

>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

> ---
>   drivers/dma-buf/reservation.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>   	if (signaled) {
>   		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>   	} else {
> +		BUG_ON(fobj->shared_count >= fobj->shared_max);
>   		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
>   		fobj->shared_count++;
>   	}
> @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
>   	old = reservation_object_get_list(obj);
>   	obj->staged = NULL;
>   
> -	if (!fobj) {
> -		BUG_ON(old->shared_count >= old->shared_max);
> +	if (!fobj)
>   		reservation_object_add_shared_inplace(obj, old, fence);
> -	} else
> +	else
>   		reservation_object_add_shared_replace(obj, old, fobj, fence);
>   }
>   EXPORT_SYMBOL(reservation_object_add_shared_fence);
Michel Dänzer July 4, 2018, 9:09 a.m. UTC | #4
On 2018-07-04 10:31 AM, Christian König wrote:
> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>    the same reservation object, so it calls
>>    reservation_object_reserve_shared with that reservation object once
>>    for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>    old->shared_max - 1, so obj->staged is freed in preparation of an
>>    in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>    once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>    old->shared_count == old->shared_max.
> 
> Well, the explanation here is not correct. For multiple BOs using the
> same reservation object we won't call
> reservation_object_add_shared_fence() multiple times because we move
> those to the duplicates list in ttm_eu_reserve_buffers().
> 
> But this bug can still happen because we call
> reservation_object_add_shared_fence() manually with fences for the same
> context in a couple of places.
> 
> One prominent case which comes to my mind are for the VM BOs during
> updates. Another possibility are VRAM BOs which need to be cleared.

Thanks. How about the following:

* ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
* In reservation_object_reserve_shared, shared_count == shared_max - 1,
  so obj->staged is freed in preparation of an in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
  after which shared_count == shared_max.
* The amdgpu driver also calls reservation_object_add_shared_fence for
  the same reservation object, and the BUG_ON triggers.

However, nothing bad would happen in
reservation_object_add_shared_inplace, since all fences use the same
context, so they can only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).


Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
as I suspect this fix is necessary under the circumstances described
there as well.
Christian König July 4, 2018, 9:30 a.m. UTC | #5
Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
> On 2018-07-04 10:31 AM, Christian König wrote:
>> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> Fixes the BUG_ON spuriously triggering under the following
>>> circumstances:
>>>
>>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>>     the same reservation object, so it calls
>>>     reservation_object_reserve_shared with that reservation object once
>>>     for each such BO.
>>> * In reservation_object_reserve_shared, old->shared_count ==
>>>     old->shared_max - 1, so obj->staged is freed in preparation of an
>>>     in-place update.
>>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>>     once for each of the BOs above, always with the same fence.
>>> * The first call adds the fence in the remaining free slot, after which
>>>     old->shared_count == old->shared_max.
>> Well, the explanation here is not correct. For multiple BOs using the
>> same reservation object we won't call
>> reservation_object_add_shared_fence() multiple times because we move
>> those to the duplicates list in ttm_eu_reserve_buffers().
>>
>> But this bug can still happen because we call
>> reservation_object_add_shared_fence() manually with fences for the same
>> context in a couple of places.
>>
>> One prominent case which comes to my mind are for the VM BOs during
>> updates. Another possibility are VRAM BOs which need to be cleared.
> Thanks. How about the following:
>
> * ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
> * In reservation_object_reserve_shared, shared_count == shared_max - 1,
>    so obj->staged is freed in preparation of an in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
>    after which shared_count == shared_max.
> * The amdgpu driver also calls reservation_object_add_shared_fence for
>    the same reservation object, and the BUG_ON triggers.

I would rather completely drop the reference to the ttm_eu_* functions, 
cause those wrappers are completely unrelated to the problem.

Instead let's just note something like the following:

* When reservation_object_reserve_shared is called with shared_count == 
shared_max - 1,
   so obj->staged is freed in preparation of an in-place update.

* Now reservation_object_add_shared_fence is called with the first fence 
and after that shared_count == shared_max.

* After that  reservation_object_add_shared_fence can be called with 
follow up fences from the same context, but since shared_count == 
shared_max we would run into this BUG_ON.

> However, nothing bad would happen in
> reservation_object_add_shared_inplace, since all fences use the same
> context, so they can only occupy a single slot.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
>
> Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
> as I suspect this fix is necessary under the circumstances described
> there as well.

The rest sounds good to me.

Regards,
Christian.
diff mbox

Patch

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@  reservation_object_add_shared_inplace(struct reservation_object *obj,
 	if (signaled) {
 		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
 	} else {
+		BUG_ON(fobj->shared_count >= fobj->shared_max);
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
 	}
@@ -230,10 +231,9 @@  void reservation_object_add_shared_fence(struct reservation_object *obj,
 	old = reservation_object_get_list(obj);
 	obj->staged = NULL;
 
-	if (!fobj) {
-		BUG_ON(old->shared_count >= old->shared_max);
+	if (!fobj)
 		reservation_object_add_shared_inplace(obj, old, fence);
-	} else
+	else
 		reservation_object_add_shared_replace(obj, old, fobj, fence);
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);