diff mbox series

[37/45] drm/ttm: add a helper to allocate a temp tt for copies.

Message ID 20200924051845.397177-38-airlied@gmail.com (mailing list archive)
State New, archived
Headers show
Series TTM move refactoring | expand

Commit Message

Dave Airlie Sept. 24, 2020, 5:18 a.m. UTC
From: Dave Airlie <airlied@redhat.com>

All the accel moves do the same pattern here, provide a helper

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 28 ++++++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h |  5 +++++
 2 files changed, 33 insertions(+)

Comments

Christian König Sept. 24, 2020, 12:42 p.m. UTC | #1
Am 24.09.20 um 07:18 schrieb Dave Airlie:
> From: Dave Airlie <airlied@redhat.com>
>
> All the accel moves do the same pattern here, provide a helper

And exactly that pattern I want to get away from.

See what happens if we (for example) have a VRAM -> SYSTEM move is the 
following:

1. TTM allocates a new ttm_resource object in the SYSTEM domain.
2. We call the driver to move from VRAM to SYSTEM.
3. Driver finds that it can't do this and calls TTM  to allocate GTT.
4. Since we are maybe out of GTT TTM evicts a different BO from GTT to 
SYSTEM and call driver again.

This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we 
should stop that immediately.

My suggestion is that we rewrite how drivers call the ttm_bo_validate() 
function so that we can guarantee that this never happens.

What do you think?

Thanks,
Christian.

>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 28 ++++++++++++++++++++++++++++
>   include/drm/ttm/ttm_bo_driver.h |  5 +++++
>   2 files changed, 33 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index eb76002aa53d..358d1580dc16 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1541,3 +1541,31 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>   	ttm_tt_destroy(bo->bdev, bo->ttm);
>   	bo->ttm = NULL;
>   }
> +
> +int ttm_bo_create_tt_tmp(struct ttm_buffer_object *bo,
> +			 struct ttm_operation_ctx *ctx,
> +			 struct ttm_resource *new_mem,
> +			 struct ttm_resource *new_temp)
> +{
> +	struct ttm_place placement_memtype = {
> +		.fpfn = 0,
> +		.lpfn = 0,
> +		.mem_type = TTM_PL_TT,
> +		.flags = TTM_PL_MASK_CACHING
> +	};
> +	struct ttm_placement placement;
> +	int ret;
> +
> +	placement.num_placement = placement.num_busy_placement = 1;
> +	placement.placement = placement.busy_placement = &placement_memtype;
> +
> +	*new_temp = *new_mem;
> +	new_temp->mm_node = NULL;
> +
> +	ret = ttm_bo_mem_space(bo, &placement, new_temp, ctx);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ttm_bo_create_tt_tmp);
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 4a63fec24e90..a7507dfaa89d 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -558,6 +558,11 @@ void ttm_mem_io_free(struct ttm_bo_device *bdev,
>   int ttm_bo_move_to_new_tt_mem(struct ttm_buffer_object *bo,
>   			      struct ttm_operation_ctx *ctx,
>   			      struct ttm_resource *new_mem);
> +
> +int ttm_bo_create_tt_tmp(struct ttm_buffer_object *bo,
> +			 struct ttm_operation_ctx *ctx,
> +			 struct ttm_resource *new_mem,
> +			 struct ttm_resource *new_temp);
>   /**
>    * ttm_bo_move_memcpy
>    *
Dave Airlie Sept. 24, 2020, 11:14 p.m. UTC | #2
On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
>
> Am 24.09.20 um 07:18 schrieb Dave Airlie:
> > From: Dave Airlie <airlied@redhat.com>
> >
> > All the accel moves do the same pattern here, provide a helper
>
> And exactly that pattern I want to get away from.

Currently this is just refactoring out the helper code in each driver, but I see
since it calls bo_mem_space we are probably moving a bit in the wrong direction.

> See what happens if we (for example) have a VRAM -> SYSTEM move is the
> following:
>
> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
> 2. We call the driver to move from VRAM to SYSTEM.
> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
> SYSTEM and call driver again.
>
> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
> should stop that immediately.
>
> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
> function so that we can guarantee that this never happens.
>
> What do you think?

I think that is likely the next step I'd like to take after this
refactor, it's a lot bigger, and I'm not sure how it will look yet.

Do we envision the driver calling validate in a loop but when it can't
find space it tells the driver and the driver does eviction and
recalls validate?

Dave.
Christian König Sept. 25, 2020, 7:39 a.m. UTC | #3
Am 25.09.20 um 01:14 schrieb Dave Airlie:
> On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
>> Am 24.09.20 um 07:18 schrieb Dave Airlie:
>>> From: Dave Airlie <airlied@redhat.com>
>>>
>>> All the accel moves do the same pattern here, provide a helper
>> And exactly that pattern I want to get away from.
> Currently this is just refactoring out the helper code in each driver, but I see
> since it calls bo_mem_space we are probably moving a bit in the wrong direction.

Exactly that's why I'm noting this.

>
>> See what happens if we (for example) have a VRAM -> SYSTEM move is the
>> following:
>>
>> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
>> 2. We call the driver to move from VRAM to SYSTEM.
>> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
>> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
>> SYSTEM and call driver again.
>>
>> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
>> should stop that immediately.
>>
>> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
>> function so that we can guarantee that this never happens.
>>
>> What do you think?
> I think that is likely the next step I'd like to take after this
> refactor, it's a lot bigger, and I'm not sure how it will look yet.

Agree, yes. I have some ideas in mind for that, but not fully baked either.

> Do we envision the driver calling validate in a loop but when it can't
> find space it tells the driver and the driver does eviction and
> recalls validate?

Not in a loop, but more like in a chain.

My plan is something like this:
Instead of having "normal" and "busy" placement we have a flag in the 
context if evictions are allowed or not.
The call to ttm_bo_validate are then replaced with two calls, first 
without evictions and if that didn't worked one with evictions.

Then the normal validate sequence should look like this:
1. If a BO is in the SYSTEM (or SWAP domain) we validate it to GTT first 
with evictions=true.
2. If a BO should be in VRAM we then validate it to VRAM. If evictions 
are only allowed if the GEM flags say that GTT is not desired.

For special BOs, like amdgpus GDS, GWS and OA domain or VMWGFX special 
domains that will obviously look a bit different.

Christian.

>
> Dave.
Daniel Vetter Sept. 25, 2020, 8:16 a.m. UTC | #4
On Fri, Sep 25, 2020 at 9:39 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 25.09.20 um 01:14 schrieb Dave Airlie:
> > On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
> >> Am 24.09.20 um 07:18 schrieb Dave Airlie:
> >>> From: Dave Airlie <airlied@redhat.com>
> >>>
> >>> All the accel moves do the same pattern here, provide a helper
> >> And exactly that pattern I want to get away from.
> > Currently this is just refactoring out the helper code in each driver, but I see
> > since it calls bo_mem_space we are probably moving a bit in the wrong direction.
>
> Exactly that's why I'm noting this.
>
> >
> >> See what happens if we (for example) have a VRAM -> SYSTEM move is the
> >> following:
> >>
> >> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
> >> 2. We call the driver to move from VRAM to SYSTEM.
> >> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
> >> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
> >> SYSTEM and call driver again.
> >>
> >> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
> >> should stop that immediately.
> >>
> >> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
> >> function so that we can guarantee that this never happens.
> >>
> >> What do you think?
> > I think that is likely the next step I'd like to take after this
> > refactor, it's a lot bigger, and I'm not sure how it will look yet.
>
> Agree, yes. I have some ideas in mind for that, but not fully baked either.
>
> > Do we envision the driver calling validate in a loop but when it can't
> > find space it tells the driver and the driver does eviction and
> > recalls validate?
>
> Not in a loop, but more like in a chain.
>
> My plan is something like this:
> Instead of having "normal" and "busy" placement we have a flag in the
> context if evictions are allowed or not.
> The call to ttm_bo_validate are then replaced with two calls, first
> without evictions and if that didn't worked one with evictions.
>
> Then the normal validate sequence should look like this:
> 1. If a BO is in the SYSTEM (or SWAP domain) we validate it to GTT first
> with evictions=true.
> 2. If a BO should be in VRAM we then validate it to VRAM. If evictions
> are only allowed if the GEM flags say that GTT is not desired.

That solves the trouble when you move a bo into vram as part of
validate. But I'm not seeing how this solves the "need gtt mapping to
move something out of vram" problem.

Or should we instead move the entire eviction logic out from ttm into
drivers, building it up from helpers? Then drivers which need gtt for
moving stuff out of vram can do that right away. Also, this would
allow us to implement very fancy eviction algorithms like all the
nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
hope that never ever becomes a thing again in future gpus, so this is
maybe more a what-if kind of thing). Not sure how that would look
like, maybe a special validate function which takes a ttm_resource the
driver already found (through evicting stuff or whatever) and then ttm
just does the move and book-keeping and everything. And drivers would
at first only call validate without allowing any eviction. Ofc anyone
without special needs could use the standard eviction function that
validate already has.
-Daniel

> For special BOs, like amdgpus GDS, GWS and OA domain or VMWGFX special
> domains that will obviously look a bit different.
>
> Christian.
>
> >
> > Dave.
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Sept. 25, 2020, 8:18 a.m. UTC | #5
On Fri, Sep 25, 2020 at 10:16 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Fri, Sep 25, 2020 at 9:39 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 25.09.20 um 01:14 schrieb Dave Airlie:
> > > On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
> > >> Am 24.09.20 um 07:18 schrieb Dave Airlie:
> > >>> From: Dave Airlie <airlied@redhat.com>
> > >>>
> > >>> All the accel moves do the same pattern here, provide a helper
> > >> And exactly that pattern I want to get away from.
> > > Currently this is just refactoring out the helper code in each driver, but I see
> > > since it calls bo_mem_space we are probably moving a bit in the wrong direction.
> >
> > Exactly that's why I'm noting this.
> >
> > >
> > >> See what happens if we (for example) have a VRAM -> SYSTEM move is the
> > >> following:
> > >>
> > >> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
> > >> 2. We call the driver to move from VRAM to SYSTEM.
> > >> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
> > >> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
> > >> SYSTEM and call driver again.
> > >>
> > >> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
> > >> should stop that immediately.
> > >>
> > >> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
> > >> function so that we can guarantee that this never happens.
> > >>
> > >> What do you think?
> > > I think that is likely the next step I'd like to take after this
> > > refactor, it's a lot bigger, and I'm not sure how it will look yet.
> >
> > Agree, yes. I have some ideas in mind for that, but not fully baked either.
> >
> > > Do we envision the driver calling validate in a loop but when it can't
> > > find space it tells the driver and the driver does eviction and
> > > recalls validate?
> >
> > Not in a loop, but more like in a chain.
> >
> > My plan is something like this:
> > Instead of having "normal" and "busy" placement we have a flag in the
> > context if evictions are allowed or not.
> > The call to ttm_bo_validate are then replaced with two calls, first
> > without evictions and if that didn't worked one with evictions.
> >
> > Then the normal validate sequence should look like this:
> > 1. If a BO is in the SYSTEM (or SWAP domain) we validate it to GTT first
> > with evictions=true.
> > 2. If a BO should be in VRAM we then validate it to VRAM. If evictions
> > are only allowed if the GEM flags say that GTT is not desired.
>
> That solves the trouble when you move a bo into vram as part of
> validate. But I'm not seeing how this solves the "need gtt mapping to
> move something out of vram" problem.
>
> Or should we instead move the entire eviction logic out from ttm into
> drivers, building it up from helpers? Then drivers which need gtt for
> moving stuff out of vram can do that right away. Also, this would
> allow us to implement very fancy eviction algorithms like all the
> nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
> hope that never ever becomes a thing again in future gpus, so this is
> maybe more a what-if kind of thing). Not sure how that would look
> like, maybe a special validate function which takes a ttm_resource the
> driver already found (through evicting stuff or whatever) and then ttm
> just does the move and book-keeping and everything. And drivers would
> at first only call validate without allowing any eviction. Ofc anyone
> without special needs could use the standard eviction function that
> validate already has.

Spinning this a bit more, we could have different default eviction
functions with this, e.g. so all the drivers that need gtt mapping for
moving stuff around can share that code, but with specific&flat
control flow instead of lots of ping-ping. And drivers that don't need
gtt mapping (like i915, we just need dma_map_sg which we assume works
always, or something from the ttm dma page pool, which really always
works) can then use something simpler that's completely flat.
-Daniel

>
> > For special BOs, like amdgpus GDS, GWS and OA domain or VMWGFX special
> > domains that will obviously look a bit different.
> >
> > Christian.
> >
> > >
> > > Dave.
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Christian König Sept. 25, 2020, 9:34 a.m. UTC | #6
Am 25.09.20 um 10:18 schrieb Daniel Vetter:
> On Fri, Sep 25, 2020 at 10:16 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Fri, Sep 25, 2020 at 9:39 AM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 25.09.20 um 01:14 schrieb Dave Airlie:
>>>> On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
>>>>> Am 24.09.20 um 07:18 schrieb Dave Airlie:
>>>>>> From: Dave Airlie <airlied@redhat.com>
>>>>>>
>>>>>> All the accel moves do the same pattern here, provide a helper
>>>>> And exactly that pattern I want to get away from.
>>>> Currently this is just refactoring out the helper code in each driver, but I see
>>>> since it calls bo_mem_space we are probably moving a bit in the wrong direction.
>>> Exactly that's why I'm noting this.
>>>
>>>>> See what happens if we (for example) have a VRAM -> SYSTEM move is the
>>>>> following:
>>>>>
>>>>> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
>>>>> 2. We call the driver to move from VRAM to SYSTEM.
>>>>> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
>>>>> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
>>>>> SYSTEM and call driver again.
>>>>>
>>>>> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
>>>>> should stop that immediately.
>>>>>
>>>>> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
>>>>> function so that we can guarantee that this never happens.
>>>>>
>>>>> What do you think?
>>>> I think that is likely the next step I'd like to take after this
>>>> refactor, it's a lot bigger, and I'm not sure how it will look yet.
>>> Agree, yes. I have some ideas in mind for that, but not fully baked either.
>>>
>>>> Do we envision the driver calling validate in a loop but when it can't
>>>> find space it tells the driver and the driver does eviction and
>>>> recalls validate?
>>> Not in a loop, but more like in a chain.
>>>
>>> My plan is something like this:
>>> Instead of having "normal" and "busy" placement we have a flag in the
>>> context if evictions are allowed or not.
>>> The call to ttm_bo_validate are then replaced with two calls, first
>>> without evictions and if that didn't worked one with evictions.
>>>
>>> Then the normal validate sequence should look like this:
>>> 1. If a BO is in the SYSTEM (or SWAP domain) we validate it to GTT first
>>> with evictions=true.
>>> 2. If a BO should be in VRAM we then validate it to VRAM. If evictions
>>> are only allowed if the GEM flags say that GTT is not desired.
>> That solves the trouble when you move a bo into vram as part of
>> validate. But I'm not seeing how this solves the "need gtt mapping to
>> move something out of vram" problem.

Eviction is not a problem because the driver gets asked where to put an 
evicted BO and then TTM does all the moving.

>>
>> Or should we instead move the entire eviction logic out from ttm into
>> drivers, building it up from helpers?

I've been playing with that thought for a while as well, but then 
decided against it.

The main problem I see is that we sometimes need to evict things from 
other drivers.

E.g. when we overcommitted system memory and move things to swap for 
example.

>> Then drivers which need gtt for
>> moving stuff out of vram can do that right away. Also, this would
>> allow us to implement very fancy eviction algorithms like all the
>> nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
>> hope that never ever becomes a thing again in future gpus, so this is
>> maybe more a what-if kind of thing). Not sure how that would look
>> like, maybe a special validate function which takes a ttm_resource the
>> driver already found (through evicting stuff or whatever) and then ttm
>> just does the move and book-keeping and everything. And drivers would
>> at first only call validate without allowing any eviction. Ofc anyone
>> without special needs could use the standard eviction function that
>> validate already has.
> Spinning this a bit more, we could have different default eviction
> functions with this, e.g. so all the drivers that need gtt mapping for
> moving stuff around can share that code, but with specific&flat
> control flow instead of lots of ping-ping. And drivers that don't need
> gtt mapping (like i915, we just need dma_map_sg which we assume works
> always, or something from the ttm dma page pool, which really always
> works) can then use something simpler that's completely flat.

Ok you need to explain a bit more what exactly the problem with the GTT 
eviction is here :)

Christian.

> -Daniel
>
>>> For special BOs, like amdgpus GDS, GWS and OA domain or VMWGFX special
>>> domains that will obviously look a bit different.
>>>
>>> Christian.
>>>
>>>> Dave.
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C66b5c2bfed8541cd986208d8612b8e1d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637366187002213152&amp;sdata=OPdXMe2rG1Uy%2BZD%2BjYKHfkLc9xVAQ1AL23QbEu3drnE%3D&amp;reserved=0
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C66b5c2bfed8541cd986208d8612b8e1d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637366187002223136&amp;sdata=7OMV01n5pzZq%2F1C9tdrB81E0KsuXXaGk0raFAyQXiYg%3D&amp;reserved=0
>
>
Daniel Vetter Sept. 25, 2020, 1:17 p.m. UTC | #7
On Fri, Sep 25, 2020 at 11:34 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 25.09.20 um 10:18 schrieb Daniel Vetter:
> > On Fri, Sep 25, 2020 at 10:16 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> On Fri, Sep 25, 2020 at 9:39 AM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 25.09.20 um 01:14 schrieb Dave Airlie:
> >>>> On Thu, 24 Sep 2020 at 22:42, Christian König <christian.koenig@amd.com> wrote:
> >>>>> Am 24.09.20 um 07:18 schrieb Dave Airlie:
> >>>>>> From: Dave Airlie <airlied@redhat.com>
> >>>>>>
> >>>>>> All the accel moves do the same pattern here, provide a helper
> >>>>> And exactly that pattern I want to get away from.
> >>>> Currently this is just refactoring out the helper code in each driver, but I see
> >>>> since it calls bo_mem_space we are probably moving a bit in the wrong direction.
> >>> Exactly that's why I'm noting this.
> >>>
> >>>>> See what happens if we (for example) have a VRAM -> SYSTEM move is the
> >>>>> following:
> >>>>>
> >>>>> 1. TTM allocates a new ttm_resource object in the SYSTEM domain.
> >>>>> 2. We call the driver to move from VRAM to SYSTEM.
> >>>>> 3. Driver finds that it can't do this and calls TTM  to allocate GTT.
> >>>>> 4. Since we are maybe out of GTT TTM evicts a different BO from GTT to
> >>>>> SYSTEM and call driver again.
> >>>>>
> >>>>> This is a horrible ping/pong between driver/TTM/driver/TTM/driver and we
> >>>>> should stop that immediately.
> >>>>>
> >>>>> My suggestion is that we rewrite how drivers call the ttm_bo_validate()
> >>>>> function so that we can guarantee that this never happens.
> >>>>>
> >>>>> What do you think?
> >>>> I think that is likely the next step I'd like to take after this
> >>>> refactor, it's a lot bigger, and I'm not sure how it will look yet.
> >>> Agree, yes. I have some ideas in mind for that, but not fully baked either.
> >>>
> >>>> Do we envision the driver calling validate in a loop but when it can't
> >>>> find space it tells the driver and the driver does eviction and
> >>>> recalls validate?
> >>> Not in a loop, but more like in a chain.
> >>>
> >>> My plan is something like this:
> >>> Instead of having "normal" and "busy" placement we have a flag in the
> >>> context if evictions are allowed or not.
> >>> The call to ttm_bo_validate are then replaced with two calls, first
> >>> without evictions and if that didn't worked one with evictions.
> >>>
> >>> Then the normal validate sequence should look like this:
> >>> 1. If a BO is in the SYSTEM (or SWAP domain) we validate it to GTT first
> >>> with evictions=true.
> >>> 2. If a BO should be in VRAM we then validate it to VRAM. If evictions
> >>> are only allowed if the GEM flags say that GTT is not desired.
> >> That solves the trouble when you move a bo into vram as part of
> >> validate. But I'm not seeing how this solves the "need gtt mapping to
> >> move something out of vram" problem.
>
> Eviction is not a problem because the driver gets asked where to put an
> evicted BO and then TTM does all the moving.

Hm I guess then I don't quite get where you see the ping-pong
happening, I thought that only happens for evicting stuff. But hey not
much actual working experience with ttm over here, I'm just reading
:-) I thought the issue is that ttm wants to evict from $something to
SYSTEM, and to do that the driver first needs to set a GTT mapping for
the SYSTEM ttm_resource allocation, so that it can use the
blitter/sdma engine or whatever to move the data over. But for
swap-in/validation I'm confused how you can end up with the "wrong"
placement, that feels like a driver bug.

How exactly can you get into a situation with validation where ttm
gives you SYSTEM, but not GTT and the driver has to fix that up? I'm
not really following I think, I guess there's something obvious I'm
missing.

> >> Or should we instead move the entire eviction logic out from ttm into
> >> drivers, building it up from helpers?
>
> I've been playing with that thought for a while as well, but then
> decided against it.
>
> The main problem I see is that we sometimes need to evict things from
> other drivers.
>
> E.g. when we overcommitted system memory and move things to swap for
> example.

Hm yeah ttm has that limit to avoid stepping into the shrinker,
directly calling into another driver to keep within the limit while
ignoring that there's other memory users and caches out there still
feels wrong, it's kinda a parallel world vs shrinker callbacks. And
there's nothing stopping you from doing the SYSTEM->SWAP movement from
a shrinker callback with the locking rules we've established around
dma_resv (just needs to be a trylock).

So feels a bit backwards if we design ttm eviction around this part of it ...

> >> Then drivers which need gtt for
> >> moving stuff out of vram can do that right away. Also, this would
> >> allow us to implement very fancy eviction algorithms like all the
> >> nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
> >> hope that never ever becomes a thing again in future gpus, so this is
> >> maybe more a what-if kind of thing). Not sure how that would look
> >> like, maybe a special validate function which takes a ttm_resource the
> >> driver already found (through evicting stuff or whatever) and then ttm
> >> just does the move and book-keeping and everything. And drivers would
> >> at first only call validate without allowing any eviction. Ofc anyone
> >> without special needs could use the standard eviction function that
> >> validate already has.
> > Spinning this a bit more, we could have different default eviction
> > functions with this, e.g. so all the drivers that need gtt mapping for
> > moving stuff around can share that code, but with specific&flat
> > control flow instead of lots of ping-ping. And drivers that don't need
> > gtt mapping (like i915, we just need dma_map_sg which we assume works
> > always, or something from the ttm dma page pool, which really always
> > works) can then use something simpler that's completely flat.
>
> Ok you need to explain a bit more what exactly the problem with the GTT
> eviction is here :)

So the full set of limitations are
- range limits
- power-of-two alignement of start
- some other (smaller) power of two alignment for size (lol)
- "color", i.e. different caching modes needs at least one page of
empty space in-between

Stuffing all that into a generic eviction logic is imo silly. On top
of that we have the eviction collector where we scan the entire thing
until we've built up a sufficiently big hole, then evict just these
buffers. If we don't do this then pretty much any big buffer with
constraints results in the entire GTT getting evicted. Again something
that's only worth it if you have ridiculous placement constraints like
these old intel chips. gen2/3 in i915.ko is maybe a bit extreme, but
having the driver in control of the eviction code feels like a much
better design than ttm inflicting a one-size-fits-all on everyone. Ofc
with defaults and building blocks and all that.
-Daniel

> Christian.
>
> > -Daniel
> >
> >>> For special BOs, like amdgpus GDS, GWS and OA domain or VMWGFX special
> >>> domains that will obviously look a bit different.
> >>>
> >>> Christian.
> >>>
> >>>> Dave.
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C66b5c2bfed8541cd986208d8612b8e1d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637366187002213152&amp;sdata=OPdXMe2rG1Uy%2BZD%2BjYKHfkLc9xVAQ1AL23QbEu3drnE%3D&amp;reserved=0
> >>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C66b5c2bfed8541cd986208d8612b8e1d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637366187002223136&amp;sdata=7OMV01n5pzZq%2F1C9tdrB81E0KsuXXaGk0raFAyQXiYg%3D&amp;reserved=0
> >
> >
>
Christian König Sept. 25, 2020, 1:40 p.m. UTC | #8
Am 25.09.20 um 15:17 schrieb Daniel Vetter:
> [SNIP]
>> Eviction is not a problem because the driver gets asked where to put an
>> evicted BO and then TTM does all the moving.
> Hm I guess then I don't quite get where you see the ping-pong
> happening, I thought that only happens for evicting stuff.

No, eviction is the nice case. One step after another.

E.g. from VRAM to GTT, then from GTT to SYSTEM and then eventually 
swapped out.

> But hey not much actual working experience with ttm over here, I'm just reading
> :-) I thought the issue is that ttm wants to evict from $something to
> SYSTEM, and to do that the driver first needs to set a GTT mapping for
> the SYSTEM ttm_resource allocation, so that it can use the
> blitter/sdma engine or whatever to move the data over. But for
> swap-in/validation I'm confused how you can end up with the "wrong"
> placement, that feels like a driver bug.

The problem happens in the other direction and always directly triggered 
by the driver.

> How exactly can you get into a situation with validation where ttm
> gives you SYSTEM, but not GTT and the driver has to fix that up? I'm
> not really following I think, I guess there's something obvious I'm
> missing.

For example a buffer which was evicted to SYSTEM needs to be moved back 
directly to VRAM.

Or we want to evacuate all buffers from VRAM to SYSTEM because of 
hibernation.

etc....

>>>> Or should we instead move the entire eviction logic out from ttm into
>>>> drivers, building it up from helpers?
>> I've been playing with that thought for a while as well, but then
>> decided against it.
>>
>> The main problem I see is that we sometimes need to evict things from
>> other drivers.
>>
>> E.g. when we overcommitted system memory and move things to swap for
>> example.
> Hm yeah ttm has that limit to avoid stepping into the shrinker,
> directly calling into another driver to keep within the limit while
> ignoring that there's other memory users and caches out there still
> feels wrong, it's kinda a parallel world vs shrinker callbacks. And
> there's nothing stopping you from doing the SYSTEM->SWAP movement from
> a shrinker callback with the locking rules we've established around
> dma_resv (just needs to be a trylock).
>
> So feels a bit backwards if we design ttm eviction around this part of it ...

Ok, that's a good point. Need to think about that a bit more then.

>>>> Then drivers which need gtt for
>>>> moving stuff out of vram can do that right away. Also, this would
>>>> allow us to implement very fancy eviction algorithms like all the
>>>> nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
>>>> hope that never ever becomes a thing again in future gpus, so this is
>>>> maybe more a what-if kind of thing). Not sure how that would look
>>>> like, maybe a special validate function which takes a ttm_resource the
>>>> driver already found (through evicting stuff or whatever) and then ttm
>>>> just does the move and book-keeping and everything. And drivers would
>>>> at first only call validate without allowing any eviction. Ofc anyone
>>>> without special needs could use the standard eviction function that
>>>> validate already has.
>>> Spinning this a bit more, we could have different default eviction
>>> functions with this, e.g. so all the drivers that need gtt mapping for
>>> moving stuff around can share that code, but with specific&flat
>>> control flow instead of lots of ping-ping. And drivers that don't need
>>> gtt mapping (like i915, we just need dma_map_sg which we assume works
>>> always, or something from the ttm dma page pool, which really always
>>> works) can then use something simpler that's completely flat.
>> Ok you need to explain a bit more what exactly the problem with the GTT
>> eviction is here :)
> So the full set of limitations are
> - range limits
> - power-of-two alignement of start
> - some other (smaller) power of two alignment for size (lol)
> - "color", i.e. different caching modes needs at least one page of
> empty space in-between
>
> Stuffing all that into a generic eviction logic is imo silly. On top
> of that we have the eviction collector where we scan the entire thing
> until we've built up a sufficiently big hole, then evict just these
> buffers. If we don't do this then pretty much any big buffer with
> constraints results in the entire GTT getting evicted. Again something
> that's only worth it if you have ridiculous placement constraints like
> these old intel chips. gen2/3 in i915.ko is maybe a bit extreme, but
> having the driver in control of the eviction code feels like a much
> better design than ttm inflicting a one-size-fits-all on everyone. Ofc
> with defaults and building blocks and all that.

Yeah, that makes a lot of sense.

Christian.

> -Daniel
>
Daniel Vetter Sept. 25, 2020, 1:56 p.m. UTC | #9
On Fri, Sep 25, 2020 at 3:40 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 25.09.20 um 15:17 schrieb Daniel Vetter:
> > [SNIP]
> >> Eviction is not a problem because the driver gets asked where to put an
> >> evicted BO and then TTM does all the moving.
> > Hm I guess then I don't quite get where you see the ping-pong
> > happening, I thought that only happens for evicting stuff.
>
> No, eviction is the nice case. One step after another.
>
> E.g. from VRAM to GTT, then from GTT to SYSTEM and then eventually
> swapped out.
>
> > But hey not much actual working experience with ttm over here, I'm just reading
> > :-) I thought the issue is that ttm wants to evict from $something to
> > SYSTEM, and to do that the driver first needs to set a GTT mapping for
> > the SYSTEM ttm_resource allocation, so that it can use the
> > blitter/sdma engine or whatever to move the data over. But for
> > swap-in/validation I'm confused how you can end up with the "wrong"
> > placement, that feels like a driver bug.
>
> The problem happens in the other direction and always directly triggered
> by the driver.
>
> > How exactly can you get into a situation with validation where ttm
> > gives you SYSTEM, but not GTT and the driver has to fix that up? I'm
> > not really following I think, I guess there's something obvious I'm
> > missing.
>
> For example a buffer which was evicted to SYSTEM needs to be moved back
> directly to VRAM.
>
> Or we want to evacuate all buffers from VRAM to SYSTEM because of
> hibernation.
>
> etc....

Ok, I think I get it. Eviction always goes one step only in the chain
(but maybe multiple times as part of an overall eu validate for all
the buffers). But swap in can go the entire length in one go. So yeah
demidlayering eviction isn't quite the right thing here, since it's
not the problem.
-Daniel

>
> >>>> Or should we instead move the entire eviction logic out from ttm into
> >>>> drivers, building it up from helpers?
> >> I've been playing with that thought for a while as well, but then
> >> decided against it.
> >>
> >> The main problem I see is that we sometimes need to evict things from
> >> other drivers.
> >>
> >> E.g. when we overcommitted system memory and move things to swap for
> >> example.
> > Hm yeah ttm has that limit to avoid stepping into the shrinker,
> > directly calling into another driver to keep within the limit while
> > ignoring that there's other memory users and caches out there still
> > feels wrong, it's kinda a parallel world vs shrinker callbacks. And
> > there's nothing stopping you from doing the SYSTEM->SWAP movement from
> > a shrinker callback with the locking rules we've established around
> > dma_resv (just needs to be a trylock).
> >
> > So feels a bit backwards if we design ttm eviction around this part of it ...
>
> Ok, that's a good point. Need to think about that a bit more then.
>
> >>>> Then drivers which need gtt for
> >>>> moving stuff out of vram can do that right away. Also, this would
> >>>> allow us to implement very fancy eviction algorithms like all the
> >>>> nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
> >>>> hope that never ever becomes a thing again in future gpus, so this is
> >>>> maybe more a what-if kind of thing). Not sure how that would look
> >>>> like, maybe a special validate function which takes a ttm_resource the
> >>>> driver already found (through evicting stuff or whatever) and then ttm
> >>>> just does the move and book-keeping and everything. And drivers would
> >>>> at first only call validate without allowing any eviction. Ofc anyone
> >>>> without special needs could use the standard eviction function that
> >>>> validate already has.
> >>> Spinning this a bit more, we could have different default eviction
> >>> functions with this, e.g. so all the drivers that need gtt mapping for
> >>> moving stuff around can share that code, but with specific&flat
> >>> control flow instead of lots of ping-ping. And drivers that don't need
> >>> gtt mapping (like i915, we just need dma_map_sg which we assume works
> >>> always, or something from the ttm dma page pool, which really always
> >>> works) can then use something simpler that's completely flat.
> >> Ok you need to explain a bit more what exactly the problem with the GTT
> >> eviction is here :)
> > So the full set of limitations are
> > - range limits
> > - power-of-two alignement of start
> > - some other (smaller) power of two alignment for size (lol)
> > - "color", i.e. different caching modes needs at least one page of
> > empty space in-between
> >
> > Stuffing all that into a generic eviction logic is imo silly. On top
> > of that we have the eviction collector where we scan the entire thing
> > until we've built up a sufficiently big hole, then evict just these
> > buffers. If we don't do this then pretty much any big buffer with
> > constraints results in the entire GTT getting evicted. Again something
> > that's only worth it if you have ridiculous placement constraints like
> > these old intel chips. gen2/3 in i915.ko is maybe a bit extreme, but
> > having the driver in control of the eviction code feels like a much
> > better design than ttm inflicting a one-size-fits-all on everyone. Ofc
> > with defaults and building blocks and all that.
>
> Yeah, that makes a lot of sense.
>
> Christian.
>
> > -Daniel
> >
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index eb76002aa53d..358d1580dc16 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1541,3 +1541,31 @@  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
 	ttm_tt_destroy(bo->bdev, bo->ttm);
 	bo->ttm = NULL;
 }
+
+int ttm_bo_create_tt_tmp(struct ttm_buffer_object *bo,
+			 struct ttm_operation_ctx *ctx,
+			 struct ttm_resource *new_mem,
+			 struct ttm_resource *new_temp)
+{
+	struct ttm_place placement_memtype = {
+		.fpfn = 0,
+		.lpfn = 0,
+		.mem_type = TTM_PL_TT,
+		.flags = TTM_PL_MASK_CACHING
+	};
+	struct ttm_placement placement;
+	int ret;
+
+	placement.num_placement = placement.num_busy_placement = 1;
+	placement.placement = placement.busy_placement = &placement_memtype;
+
+	*new_temp = *new_mem;
+	new_temp->mm_node = NULL;
+
+	ret = ttm_bo_mem_space(bo, &placement, new_temp, ctx);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+EXPORT_SYMBOL(ttm_bo_create_tt_tmp);
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 4a63fec24e90..a7507dfaa89d 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -558,6 +558,11 @@  void ttm_mem_io_free(struct ttm_bo_device *bdev,
 int ttm_bo_move_to_new_tt_mem(struct ttm_buffer_object *bo,
 			      struct ttm_operation_ctx *ctx,
 			      struct ttm_resource *new_mem);
+
+int ttm_bo_create_tt_tmp(struct ttm_buffer_object *bo,
+			 struct ttm_operation_ctx *ctx,
+			 struct ttm_resource *new_mem,
+			 struct ttm_resource *new_temp);
 /**
  * ttm_bo_move_memcpy
  *