[10/11] drm/scheduler: Don't store self-dependencies

Message ID	20210624140025.438303-11-daniel.vetter@ffwll.ch (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=yheE=LS=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 41E8C613C7 From: Daniel Vetter <daniel.vetter@ffwll.ch> To: DRI Development <dri-devel@lists.freedesktop.org> Subject: [PATCH 10/11] drm/scheduler: Don't store self-dependencies Date: Thu, 24 Jun 2021 16:00:24 +0200 Message-Id: <20210624140025.438303-11-daniel.vetter@ffwll.ch> In-Reply-To: <20210624140025.438303-1-daniel.vetter@ffwll.ch> References: <20210624140025.438303-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Cc: Jack Zhang <Jack.Zhang1@amd.com>, Daniel Vetter <daniel.vetter@ffwll.ch>, Luben Tuikov <luben.tuikov@amd.com>, Alex Deucher <alexander.deucher@amd.com>, Daniel Vetter <daniel.vetter@intel.com>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/scheduler dependency tracking \| expand [00/11] drm/scheduler dependency tracking [01/11] drm/sched: Split drm_sched_job_init [02/11] drm/sched: Add dependency tracking [03/11] drm/sched: drop entity parameter from drm_sched_push_job [04/11] drm/panfrost: use scheduler dependency tracking [05/11] drm/lima: use scheduler dependency tracking [06/11] drm/v3d: Move drm_sched_job_init to v3d_job_init [07/11] drm/v3d: Use scheduler dependency handling [08/11] drm/etnaviv: Use scheduler dependency handling [09/11] drm/gem: Delete gem array fencing helpers [10/11] drm/scheduler: Don't store self-dependencies [11/11] drm/sched: Check locking in drm_sched_job_await_implicit

Daniel Vetter June 24, 2021, 2 p.m. UTC

This is essentially part of drm_sched_dependency_optimized(), which
only amdgpu seems to make use of. Use it a bit more.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

Lucas Stach June 24, 2021, 2:42 p.m. UTC | #1

Am Donnerstag, dem 24.06.2021 um 16:00 +0200 schrieb Daniel Vetter:
> This is essentially part of drm_sched_dependency_optimized(), which
> only amdgpu seems to make use of. Use it a bit more.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>

> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 370c336d383f..c31d7cf7df74 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
>  	if (!fence)
>  		return 0;
>  
> +	/* if it's a fence from us it's guaranteed to be earlier */
> +	if (fence->context == job->entity->fence_context ||
> +	    fence->context == job->entity->fence_context + 1) {
> +		dma_fence_put(fence);
> +		return 0;
> +	}
> +
>  	/* Deduplicate if we already depend on a fence from the same context.
>  	 * This lets the size of the array of deps scale with the number of
>  	 * engines involved, rather than the number of BOs.

Christian König June 24, 2021, 5:03 p.m. UTC | #2

Am 24.06.21 um 16:00 schrieb Daniel Vetter:
> This is essentially part of drm_sched_dependency_optimized(), which
> only amdgpu seems to make use of. Use it a bit more.
>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 370c336d383f..c31d7cf7df74 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
>   	if (!fence)
>   		return 0;
>   
> +	/* if it's a fence from us it's guaranteed to be earlier */
> +	if (fence->context == job->entity->fence_context ||
> +	    fence->context == job->entity->fence_context + 1) {
> +		dma_fence_put(fence);
> +		return 0;
> +	}
> +

Well NAK. That would break Vulkan.

The problem is that Vulkan can insert dependencies between jobs which 
run on the same queue.

So we need to track those as well and if the previous job for the same 
queue/scheduler is not yet finished a pipeline synchronization needs to 
be inserted.

That's one of the reasons we wasn't able to unify the dependency 
handling yet.

Christian.

>   	/* Deduplicate if we already depend on a fence from the same context.
>   	 * This lets the size of the array of deps scale with the number of
>   	 * engines involved, rather than the number of BOs.

Daniel Vetter June 24, 2021, 5:29 p.m. UTC | #3

On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote:
> Am 24.06.21 um 16:00 schrieb Daniel Vetter:
> > This is essentially part of drm_sched_dependency_optimized(), which
> > only amdgpu seems to make use of. Use it a bit more.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 370c336d383f..c31d7cf7df74 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
> >   	if (!fence)
> >   		return 0;
> > +	/* if it's a fence from us it's guaranteed to be earlier */
> > +	if (fence->context == job->entity->fence_context ||
> > +	    fence->context == job->entity->fence_context + 1) {
> > +		dma_fence_put(fence);
> > +		return 0;
> > +	}
> > +
> 
> Well NAK. That would break Vulkan.
> 
> The problem is that Vulkan can insert dependencies between jobs which run on
> the same queue.
> 
> So we need to track those as well and if the previous job for the same
> queue/scheduler is not yet finished a pipeline synchronization needs to be
> inserted.
> 
> That's one of the reasons we wasn't able to unify the dependency handling
> yet.

That sounds like an extremely amdgpu specific constraint? You're also the
only one who keeps track of whether the previous job we've scheduled has
finished already (I guess they can get pipelined and you don't flush by
default), so you insert fences.

I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not
sure why we have to inflict this design constraint on all other drivers?
At least I'm not seeing anything in lima, panfrost, v3d or entaviv that
would break with this, and i915 will also be perfectly fine.

Also note: I'm not using this for amdgpu, exactly because there's a few
funny things going on.

Finally: You _really_ need explicit dependency handling for vulkan in your
uapi, instead of the kernel second-guessing what userspace might be doing.
That's really not how vulkan is designed to work :-)

Cheers, Daniel


> Christian.
> 
> >   	/* Deduplicate if we already depend on a fence from the same context.
> >   	 * This lets the size of the array of deps scale with the number of
> >   	 * engines involved, rather than the number of BOs.
>

Christian König June 24, 2021, 5:38 p.m. UTC | #4

Am 24.06.21 um 19:29 schrieb Daniel Vetter:
> On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote:
>> Am 24.06.21 um 16:00 schrieb Daniel Vetter:
>>> This is essentially part of drm_sched_dependency_optimized(), which
>>> only amdgpu seems to make use of. Use it a bit more.
>>>
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 370c336d383f..c31d7cf7df74 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
>>>    	if (!fence)
>>>    		return 0;
>>> +	/* if it's a fence from us it's guaranteed to be earlier */
>>> +	if (fence->context == job->entity->fence_context ||
>>> +	    fence->context == job->entity->fence_context + 1) {
>>> +		dma_fence_put(fence);
>>> +		return 0;
>>> +	}
>>> +
>> Well NAK. That would break Vulkan.
>>
>> The problem is that Vulkan can insert dependencies between jobs which run on
>> the same queue.
>>
>> So we need to track those as well and if the previous job for the same
>> queue/scheduler is not yet finished a pipeline synchronization needs to be
>> inserted.
>>
>> That's one of the reasons we wasn't able to unify the dependency handling
>> yet.
> That sounds like an extremely amdgpu specific constraint?

Yeah, that's totally hardware specific.

It's just that I don't know how else we could track that without having 
the same separation as in amdgpu between implicit and explicit fences. 
And as far as I understand it that's exactly what you want to avoid.

As I said this turned out to be really awkward.

> You're also the
> only one who keeps track of whether the previous job we've scheduled has
> finished already (I guess they can get pipelined and you don't flush by
> default), so you insert fences.

Yes, exactly that.

> I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not
> sure why we have to inflict this design constraint on all other drivers?
> At least I'm not seeing anything in lima, panfrost, v3d or entaviv that
> would break with this, and i915 will also be perfectly fine.
>
> Also note: I'm not using this for amdgpu, exactly because there's a few
> funny things going on.

Yeah, exactly the reason why we never unified this.

Regards,
Christian.

> Finally: You _really_ need explicit dependency handling for vulkan in your
> uapi, instead of the kernel second-guessing what userspace might be doing.
> That's really not how vulkan is designed to work :-)

>
> Cheers, Daniel
>
>
>> Christian.
>>
>>>    	/* Deduplicate if we already depend on a fence from the same context.
>>>    	 * This lets the size of the array of deps scale with the number of
>>>    	 * engines involved, rather than the number of BOs.

Daniel Vetter June 24, 2021, 5:43 p.m. UTC | #5

On Thu, Jun 24, 2021 at 7:38 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 24.06.21 um 19:29 schrieb Daniel Vetter:
> > On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote:
> >> Am 24.06.21 um 16:00 schrieb Daniel Vetter:
> >>> This is essentially part of drm_sched_dependency_optimized(), which
> >>> only amdgpu seems to make use of. Use it a bit more.
> >>>
> >>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>> Cc: "Christian König" <christian.koenig@amd.com>
> >>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>> ---
> >>>    drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
> >>>    1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index 370c336d383f..c31d7cf7df74 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
> >>>     if (!fence)
> >>>             return 0;
> >>> +   /* if it's a fence from us it's guaranteed to be earlier */
> >>> +   if (fence->context == job->entity->fence_context ||
> >>> +       fence->context == job->entity->fence_context + 1) {
> >>> +           dma_fence_put(fence);
> >>> +           return 0;
> >>> +   }
> >>> +
> >> Well NAK. That would break Vulkan.

I'm assuming your reply means the NAK is retracted and was just the
usual "this doesn't perfectly fit for amdgpu" reflex?

> >> The problem is that Vulkan can insert dependencies between jobs which run on
> >> the same queue.
> >>
> >> So we need to track those as well and if the previous job for the same
> >> queue/scheduler is not yet finished a pipeline synchronization needs to be
> >> inserted.
> >>
> >> That's one of the reasons we wasn't able to unify the dependency handling
> >> yet.
> > That sounds like an extremely amdgpu specific constraint?
>
> Yeah, that's totally hardware specific.
>
> It's just that I don't know how else we could track that without having
> the same separation as in amdgpu between implicit and explicit fences.
> And as far as I understand it that's exactly what you want to avoid.
>
> As I said this turned out to be really awkward.
>
> > You're also the
> > only one who keeps track of whether the previous job we've scheduled has
> > finished already (I guess they can get pipelined and you don't flush by
> > default), so you insert fences.
>
> Yes, exactly that.
>
> > I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not
> > sure why we have to inflict this design constraint on all other drivers?
> > At least I'm not seeing anything in lima, panfrost, v3d or entaviv that
> > would break with this, and i915 will also be perfectly fine.
> >
> > Also note: I'm not using this for amdgpu, exactly because there's a few
> > funny things going on.
>
> Yeah, exactly the reason why we never unified this.

Yeah there's clear limits to this, because you also can't use the
await_implicit helper, because you have to keep filtering for owner or
the current amdgpu uapi goes horribly slow. I think the benefit would
be just that we could share the datastructure and the book-keeping,
but aside from that you'd need your own integration in amdgpu.

One idea I just had was whether we could use the tag bits xarray has
for the amdgpu purposed. Like we could do a
drm_sched_job_await_fence_tagged, where you supply additional
information (like the "this might be relevant for the vm_flush" and
things like that). Afaiui xarray tags are very fast to enumerate on if
you're looking for specific tags, but I might be wrong. Ideally this
would avoid the need for the duplicated amdgpu_job->sched.

Cheers, Daniel


> Regards,
> Christian.
>
> > Finally: You _really_ need explicit dependency handling for vulkan in your
> > uapi, instead of the kernel second-guessing what userspace might be doing.
> > That's really not how vulkan is designed to work :-)
>
> >
> > Cheers, Daniel
> >
> >
> >> Christian.
> >>
> >>>     /* Deduplicate if we already depend on a fence from the same context.
> >>>      * This lets the size of the array of deps scale with the number of
> >>>      * engines involved, rather than the number of BOs.
>

Christian König June 24, 2021, 5:56 p.m. UTC | #6

Am 24.06.21 um 19:43 schrieb Daniel Vetter:
> On Thu, Jun 24, 2021 at 7:38 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 24.06.21 um 19:29 schrieb Daniel Vetter:
>>> On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote:
>>>> Am 24.06.21 um 16:00 schrieb Daniel Vetter:
>>>>> This is essentially part of drm_sched_dependency_optimized(), which
>>>>> only amdgpu seems to make use of. Use it a bit more.
>>>>>
>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 370c336d383f..c31d7cf7df74 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
>>>>>      if (!fence)
>>>>>              return 0;
>>>>> +   /* if it's a fence from us it's guaranteed to be earlier */
>>>>> +   if (fence->context == job->entity->fence_context ||
>>>>> +       fence->context == job->entity->fence_context + 1) {
>>>>> +           dma_fence_put(fence);
>>>>> +           return 0;
>>>>> +   }
>>>>> +
>>>> Well NAK. That would break Vulkan.
> I'm assuming your reply means the NAK is retracted and was just the
> usual "this doesn't perfectly fit for amdgpu" reflex?

Well rather "NAK, you haven't considered that special handling in amdgpu 
and if you really want to unify this you need that as well."

>
>>>> The problem is that Vulkan can insert dependencies between jobs which run on
>>>> the same queue.
>>>>
>>>> So we need to track those as well and if the previous job for the same
>>>> queue/scheduler is not yet finished a pipeline synchronization needs to be
>>>> inserted.
>>>>
>>>> That's one of the reasons we wasn't able to unify the dependency handling
>>>> yet.
>>> That sounds like an extremely amdgpu specific constraint?
>> Yeah, that's totally hardware specific.
>>
>> It's just that I don't know how else we could track that without having
>> the same separation as in amdgpu between implicit and explicit fences.
>> And as far as I understand it that's exactly what you want to avoid.
>>
>> As I said this turned out to be really awkward.
>>
>>> You're also the
>>> only one who keeps track of whether the previous job we've scheduled has
>>> finished already (I guess they can get pipelined and you don't flush by
>>> default), so you insert fences.
>> Yes, exactly that.
>>
>>> I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not
>>> sure why we have to inflict this design constraint on all other drivers?
>>> At least I'm not seeing anything in lima, panfrost, v3d or entaviv that
>>> would break with this, and i915 will also be perfectly fine.
>>>
>>> Also note: I'm not using this for amdgpu, exactly because there's a few
>>> funny things going on.
>> Yeah, exactly the reason why we never unified this.
> Yeah there's clear limits to this, because you also can't use the
> await_implicit helper, because you have to keep filtering for owner or
> the current amdgpu uapi goes horribly slow. I think the benefit would
> be just that we could share the datastructure and the book-keeping,
> but aside from that you'd need your own integration in amdgpu.

Yeah, but that is trivial. The _add_dependency() function (or however we 
want to call it) needs to be exported anyway for adding fences from 
syncfile and syncobj.

Or do you also want to unify the handling for those?

> One idea I just had was whether we could use the tag bits xarray has
> for the amdgpu purposed. Like we could do a
> drm_sched_job_await_fence_tagged, where you supply additional
> information (like the "this might be relevant for the vm_flush" and
> things like that). Afaiui xarray tags are very fast to enumerate on if
> you're looking for specific tags, but I might be wrong. Ideally this
> would avoid the need for the duplicated amdgpu_job->sched.

That could work.

Essentially we just need the information from the scheduler which is the 
last fence which was dependency optimized.

In other words when you push jobs like those to the same scheduler

J1
J2 -> depends on J1.
J3 -> depends on whatever, but not j2

The hardware needs to insert a flush between J2 and J1, but not between 
j3 and j2.

This makes roughly 19% performance difference for some OpenGL games and 
incorrect rendering for Vulkan if you mess it up either way or the other.

Regards,
Christian.


>
> Cheers, Daniel
>
>
>> Regards,
>> Christian.
>>
>>> Finally: You _really_ need explicit dependency handling for vulkan in your
>>> uapi, instead of the kernel second-guessing what userspace might be doing.
>>> That's really not how vulkan is designed to work :-)
>>> Cheers, Daniel
>>>
>>>
>>>> Christian.
>>>>
>>>>>      /* Deduplicate if we already depend on a fence from the same context.
>>>>>       * This lets the size of the array of deps scale with the number of
>>>>>       * engines involved, rather than the number of BOs.
>

Daniel Vetter June 24, 2021, 6:21 p.m. UTC | #7

On Thu, Jun 24, 2021 at 7:56 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 24.06.21 um 19:43 schrieb Daniel Vetter:
> > On Thu, Jun 24, 2021 at 7:38 PM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 24.06.21 um 19:29 schrieb Daniel Vetter:
> >>> On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote:
> >>>> Am 24.06.21 um 16:00 schrieb Daniel Vetter:
> >>>>> This is essentially part of drm_sched_dependency_optimized(), which
> >>>>> only amdgpu seems to make use of. Use it a bit more.
> >>>>>
> >>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
> >>>>>     1 file changed, 7 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> index 370c336d383f..c31d7cf7df74 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
> >>>>>      if (!fence)
> >>>>>              return 0;
> >>>>> +   /* if it's a fence from us it's guaranteed to be earlier */
> >>>>> +   if (fence->context == job->entity->fence_context ||
> >>>>> +       fence->context == job->entity->fence_context + 1) {
> >>>>> +           dma_fence_put(fence);
> >>>>> +           return 0;
> >>>>> +   }
> >>>>> +
> >>>> Well NAK. That would break Vulkan.
> > I'm assuming your reply means the NAK is retracted and was just the
> > usual "this doesn't perfectly fit for amdgpu" reflex?
>
> Well rather "NAK, you haven't considered that special handling in amdgpu
> and if you really want to unify this you need that as well."
>
> >
> >>>> The problem is that Vulkan can insert dependencies between jobs which run on
> >>>> the same queue.
> >>>>
> >>>> So we need to track those as well and if the previous job for the same
> >>>> queue/scheduler is not yet finished a pipeline synchronization needs to be
> >>>> inserted.
> >>>>
> >>>> That's one of the reasons we wasn't able to unify the dependency handling
> >>>> yet.
> >>> That sounds like an extremely amdgpu specific constraint?
> >> Yeah, that's totally hardware specific.
> >>
> >> It's just that I don't know how else we could track that without having
> >> the same separation as in amdgpu between implicit and explicit fences.
> >> And as far as I understand it that's exactly what you want to avoid.
> >>
> >> As I said this turned out to be really awkward.
> >>
> >>> You're also the
> >>> only one who keeps track of whether the previous job we've scheduled has
> >>> finished already (I guess they can get pipelined and you don't flush by
> >>> default), so you insert fences.
> >> Yes, exactly that.
> >>
> >>> I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not
> >>> sure why we have to inflict this design constraint on all other drivers?
> >>> At least I'm not seeing anything in lima, panfrost, v3d or entaviv that
> >>> would break with this, and i915 will also be perfectly fine.
> >>>
> >>> Also note: I'm not using this for amdgpu, exactly because there's a few
> >>> funny things going on.
> >> Yeah, exactly the reason why we never unified this.
> > Yeah there's clear limits to this, because you also can't use the
> > await_implicit helper, because you have to keep filtering for owner or
> > the current amdgpu uapi goes horribly slow. I think the benefit would
> > be just that we could share the datastructure and the book-keeping,
> > but aside from that you'd need your own integration in amdgpu.
>
> Yeah, but that is trivial. The _add_dependency() function (or however we
> want to call it) needs to be exported anyway for adding fences from
> syncfile and syncobj.
>
> Or do you also want to unify the handling for those?

I guess we could add some convenience wrapper that pulls in a
sync_file or sync_objc automatically. But there's not that much code
involved there, and it's also not tricky. Also drivers might need to
add dependencies for whatever anyway. The await_implicit is a bit
different, because that defines how implicit sync is supposed to work.

I guess the bikeshed then boils down to which one is the simple
await_fence() function. The one that filters for same timeline, or the
one that doesnt. I'd make the non-filtering one the special case so
that amdgpu sticks out a bit more - out of 6 drivers with schedulers
(i915 included) it seems to be the special one.

> > One idea I just had was whether we could use the tag bits xarray has
> > for the amdgpu purposed. Like we could do a
> > drm_sched_job_await_fence_tagged, where you supply additional
> > information (like the "this might be relevant for the vm_flush" and
> > things like that). Afaiui xarray tags are very fast to enumerate on if
> > you're looking for specific tags, but I might be wrong. Ideally this
> > would avoid the need for the duplicated amdgpu_job->sched.
>
> That could work.
>
> Essentially we just need the information from the scheduler which is the
> last fence which was dependency optimized.
>
> In other words when you push jobs like those to the same scheduler
>
> J1
> J2 -> depends on J1.
> J3 -> depends on whatever, but not j2
>
> The hardware needs to insert a flush between J2 and J1, but not between
> j3 and j2.
>
> This makes roughly 19% performance difference for some OpenGL games and
> incorrect rendering for Vulkan if you mess it up either way or the other.

Yeah that's massive. On i915 "too many pipeline stalls" even within
batches is a lot less, so we never bothered with this at all.
-Daniel

>
> Regards,
> Christian.
>
>
> >
> > Cheers, Daniel
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>> Finally: You _really_ need explicit dependency handling for vulkan in your
> >>> uapi, instead of the kernel second-guessing what userspace might be doing.
> >>> That's really not how vulkan is designed to work :-)
> >>> Cheers, Daniel
> >>>
> >>>
> >>>> Christian.
> >>>>
> >>>>>      /* Deduplicate if we already depend on a fence from the same context.
> >>>>>       * This lets the size of the array of deps scale with the number of
> >>>>>       * engines involved, rather than the number of BOs.
> >
>

[10/11] drm/scheduler: Don't store self-dependencies

Commit Message

Comments

Patch