diff mbox

drm/amdgpu: Attach exclusive fence to prime exported bo's. (v3)

Message ID 1478486854-30715-1-git-send-email-mario.kleiner.de@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mario Kleiner Nov. 7, 2016, 2:47 a.m. UTC
External clients which import our bo's wait only
for exclusive dmabuf-fences, not on shared ones,
so attach fences on exported buffers as exclusive
ones, if the buffers get imported by an external
client.

See discussion in thread:
https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html

Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present
Prime render offload, and with the Tonga standalone as
primary gpu.

v2: Add a wait for all shared fences before prime export,
    as suggested by Christian Koenig.

v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin,
    so we only use the exclusive fence when exporting a
    bo to external clients like a separate iGPU, but not
    when exporting/importing from/to ourselves as part of
    regular DRI3 fd passing.

    - Propagate failure of reservation_object_wait_rcu back
    to caller.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472
(v1) Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c   | 14 ++++++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Michel Dänzer Nov. 7, 2016, 3:29 a.m. UTC | #1
On 07/11/16 11:47 AM, Mario Kleiner wrote:
> External clients which import our bo's wait only
> for exclusive dmabuf-fences, not on shared ones,
> so attach fences on exported buffers as exclusive
> ones, if the buffers get imported by an external
> client.
> 
> See discussion in thread:
> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html
> 
> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present
> Prime render offload, and with the Tonga standalone as
> primary gpu.
> 
> v2: Add a wait for all shared fences before prime export,
>     as suggested by Christian Koenig.
> 
> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin,
>     so we only use the exclusive fence when exporting a
>     bo to external clients like a separate iGPU, but not
>     when exporting/importing from/to ourselves as part of
>     regular DRI3 fd passing.
> 
>     - Propagate failure of reservation_object_wait_rcu back
>     to caller.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472
> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk>

FWIW, I think the (v1) is usually added at the end of the line, not at
the beginning.

[...]

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 7700dc2..c4494d0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj)
>  {
>  	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>  	int ret = 0;
> +	long lret;
> +
> +	/*
> +	 * Wait for all shared fences to complete before we switch to future
> +	 * use of exclusive fence on this prime_exported bo.
> +	 */
> +	lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false,
> +						   MAX_SCHEDULE_TIMEOUT);
> +	if (unlikely(lret < 0)) {
> +		DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret);
> +		return lret;
> +	}
> +
> +	bo->prime_exported = true;

We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin.

Also, I think we should set bo->prime_exported (prime_shared?) in
amdgpu_gem_prime_import_sg_table as well, so that we'll also set
exclusive fences on BOs imported from other GPUs.
Christian König Nov. 7, 2016, 7:55 a.m. UTC | #2
Am 07.11.2016 um 04:29 schrieb Michel Dänzer:
> On 07/11/16 11:47 AM, Mario Kleiner wrote:
>> External clients which import our bo's wait only
>> for exclusive dmabuf-fences, not on shared ones,
>> so attach fences on exported buffers as exclusive
>> ones, if the buffers get imported by an external
>> client.
>>
>> See discussion in thread:
>> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html
>>
>> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present
>> Prime render offload, and with the Tonga standalone as
>> primary gpu.
>>
>> v2: Add a wait for all shared fences before prime export,
>>      as suggested by Christian Koenig.
>>
>> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin,
>>      so we only use the exclusive fence when exporting a
>>      bo to external clients like a separate iGPU, but not
>>      when exporting/importing from/to ourselves as part of
>>      regular DRI3 fd passing.
>>
>>      - Propagate failure of reservation_object_wait_rcu back
>>      to caller.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472
>> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk>
> FWIW, I think the (v1) is usually added at the end of the line, not at
> the beginning.
>
> [...]
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>> index 7700dc2..c4494d0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj)
>>   {
>>   	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>   	int ret = 0;
>> +	long lret;
>> +
>> +	/*
>> +	 * Wait for all shared fences to complete before we switch to future
>> +	 * use of exclusive fence on this prime_exported bo.
>> +	 */
>> +	lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false,
>> +						   MAX_SCHEDULE_TIMEOUT);
>> +	if (unlikely(lret < 0)) {
>> +		DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret);
>> +		return lret;
>> +	}
>> +
>> +	bo->prime_exported = true;
> We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin.
>
> Also, I think we should set bo->prime_exported (prime_shared?) in
> amdgpu_gem_prime_import_sg_table as well, so that we'll also set
> exclusive fences on BOs imported from other GPUs.
Yes, really good idea.

Additional to that are we sure that amdgpu_gem_prime_pin() is only 
called once for each GEM object? Could in theory be that somebody 
exports a BO to another GFX device as well as a V4L device for example.

In this case we would need a counter, but I need to double check that as 
well.

In general I would protected the flag/counter by the BO being reserved 
to avoid any races. In other word move those lines a bit down after the 
amdgpu_bo_reserve().

Regards,
Christian.
Mario Kleiner Nov. 7, 2016, 7:20 p.m. UTC | #3
On 11/07/2016 08:55 AM, Christian König wrote:
> Am 07.11.2016 um 04:29 schrieb Michel Dänzer:
>> On 07/11/16 11:47 AM, Mario Kleiner wrote:
>>> External clients which import our bo's wait only
>>> for exclusive dmabuf-fences, not on shared ones,
>>> so attach fences on exported buffers as exclusive
>>> ones, if the buffers get imported by an external
>>> client.
>>>
>>> See discussion in thread:
>>> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html
>>>
>>>
>>> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present
>>> Prime render offload, and with the Tonga standalone as
>>> primary gpu.
>>>
>>> v2: Add a wait for all shared fences before prime export,
>>>      as suggested by Christian Koenig.
>>>
>>> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin,
>>>      so we only use the exclusive fence when exporting a
>>>      bo to external clients like a separate iGPU, but not
>>>      when exporting/importing from/to ourselves as part of
>>>      regular DRI3 fd passing.
>>>
>>>      - Propagate failure of reservation_object_wait_rcu back
>>>      to caller.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472
>>> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk>
>> FWIW, I think the (v1) is usually added at the end of the line, not at
>> the beginning.

Ok.

>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> index 7700dc2..c4494d0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj)
>>>   {
>>>       struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>>       int ret = 0;
>>> +    long lret;
>>> +
>>> +    /*
>>> +     * Wait for all shared fences to complete before we switch to
>>> future
>>> +     * use of exclusive fence on this prime_exported bo.
>>> +     */
>>> +    lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true,
>>> false,
>>> +                           MAX_SCHEDULE_TIMEOUT);
>>> +    if (unlikely(lret < 0)) {
>>> +        DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret);
>>> +        return lret;
>>> +    }
>>> +
>>> +    bo->prime_exported = true;
>> We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin.
>>
>> Also, I think we should set bo->prime_exported (prime_shared?) in
>> amdgpu_gem_prime_import_sg_table as well, so that we'll also set
>> exclusive fences on BOs imported from other GPUs.
> Yes, really good idea.
>

Ok. I guess we don't need to wait for fences there before setting the 
flag/counter, as we can't have done anything yet with the bo just 
created from the imported sg_table?

> Additional to that are we sure that amdgpu_gem_prime_pin() is only
> called once for each GEM object? Could in theory be that somebody
> exports a BO to another GFX device as well as a V4L device for example.
>
> In this case we would need a counter, but I need to double check that as
> well.
>

If we want to clear the prime_exported flag in amdgpu_gem_prime_unpin() 
as an optimization if all clients detach from the buffer, then we need a 
prime_shared_counter instead of a prime_exported flag afaics. The dmabuf 
api allows multiple clients to import and attach to an exported dmabuf, 
each one calling dma_buf_attach which will call amdgpu_gem_prime_pin.


> In general I would protected the flag/counter by the BO being reserved
> to avoid any races. In other word move those lines a bit down after the
> amdgpu_bo_reserve().

Ok.

-mario

>
> Regards,
> Christian.
Christian König Nov. 8, 2016, 10:13 a.m. UTC | #4
Am 07.11.2016 um 20:20 schrieb Mario Kleiner:
> On 11/07/2016 08:55 AM, Christian König wrote:
>> Am 07.11.2016 um 04:29 schrieb Michel Dänzer:
>> Also, I think we should set bo->prime_exported (prime_shared?) in
>> amdgpu_gem_prime_import_sg_table as well, so that we'll also set
>> exclusive fences on BOs imported from other GPUs.
>> Yes, really good idea.
>>
>
> Ok. I guess we don't need to wait for fences there before setting the 
> flag/counter, as we can't have done anything yet with the bo just 
> created from the imported sg_table?

Correct, waiting at this point shouldn't be necessary.

Regards,
Christian.
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 039b57e..a337d56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -459,6 +459,7 @@  struct amdgpu_bo {
 	u64				metadata_flags;
 	void				*metadata;
 	u32				metadata_size;
+	bool				prime_exported;
 	/* list of all virtual address to which this bo
 	 * is associated to
 	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 651115d..51c6f60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -132,7 +132,7 @@  static int amdgpu_bo_list_set(struct amdgpu_device *adev,
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
 		entry->tv.bo = &entry->robj->tbo;
-		entry->tv.shared = true;
+		entry->tv.shared = !entry->robj->prime_exported;
 
 		if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS)
 			gds_obj = entry->robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7700dc2..c4494d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -81,6 +81,20 @@  int amdgpu_gem_prime_pin(struct drm_gem_object *obj)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 	int ret = 0;
+	long lret;
+
+	/*
+	 * Wait for all shared fences to complete before we switch to future
+	 * use of exclusive fence on this prime_exported bo.
+	 */
+	lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false,
+						   MAX_SCHEDULE_TIMEOUT);
+	if (unlikely(lret < 0)) {
+		DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret);
+		return lret;
+	}
+
+	bo->prime_exported = true;
 
 	ret = amdgpu_bo_reserve(bo, false);
 	if (unlikely(ret != 0))