mbox series

[RFC,v2,0/2] Discussion around eviction improvements

Message ID 20240516121822.19036-1-tursulin@igalia.com (mailing list archive)
Headers show
Series Discussion around eviction improvements | expand

Message

Tvrtko Ursulin May 16, 2024, 12:18 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Reduced re-spin of my previous series after Christian corrected a few
misconceptions that I had. So lets see if what remains makes sense or is still
misguided.

To summarise, the series address the following two issues:

 * Migration rate limiting does not work, at least not for the common case
   where userspace configures VRAM+GTT. It thinks it can stop migration attempts
   by playing with bo->allowed_domains vs bo->preferred domains but, both from
   the code, and from empirical experiments, I see that not working at all. When
   both masks are identical fiddling with them achieves nothing. Even when they
   are not identical allowed has a fallback GTT placement which means that when
   over the migration budget ttm_bo_validate with bo->allowed_domains can cause
   migration from GTT to VRAM.

 * Driver thinks it will be re-validating evicted buffers on the next submission
   but it does not for the very common case of VRAM+GTT because it only checks
   if current placement is *none* of the preferred placements.

These two patches appear to have a positive result for a memory intensive game
like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
the migration budget appears to keep buffer blits at bay and improves the
minimum frame rate, ie. makes things smoother.

From the game's built-in benchmark, average of three runs each:

						FPS
		migrated KiB	min	avg	max	min-1%	min-0.1%
  because	   20784781	10.00  37.00   89.67    22.00    12.33
  patched	    4227688	13.67  37.00   81.33    23.33    15.00

Disclaimers that I have is that more runs would be needed to be more confident
about the results. And more games. And APU versus discrete.

Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>

Tvrtko Ursulin (2):
  drm/amdgpu: Re-validate evicted buffers
  drm/amdgpu: Actually respect buffer migration budget

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
 2 files changed, 103 insertions(+), 30 deletions(-)

Comments

Alex Deucher May 16, 2024, 7:21 p.m. UTC | #1
On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
>
> To summarise, the series address the following two issues:
>
>  * Migration rate limiting does not work, at least not for the common case
>    where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>    by playing with bo->allowed_domains vs bo->preferred domains but, both from
>    the code, and from empirical experiments, I see that not working at all. When
>    both masks are identical fiddling with them achieves nothing. Even when they
>    are not identical allowed has a fallback GTT placement which means that when
>    over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>    migration from GTT to VRAM.
>
>  * Driver thinks it will be re-validating evicted buffers on the next submission
>    but it does not for the very common case of VRAM+GTT because it only checks
>    if current placement is *none* of the preferred placements.

For APUs at least, we should never migrate because GTT and VRAM are
both system memory so are effectively equal performance-wise.  Maybe
this regressed when Christian reworked ttm to better handle migrating
buffers back to VRAM after suspend on dGPUs?

Alex

>
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
>
> From the game's built-in benchmark, average of three runs each:
>
>                                                 FPS
>                 migrated KiB    min     avg     max     min-1%  min-0.1%
>   because          20784781     10.00  37.00   89.67    22.00    12.33
>   patched           4227688     13.67  37.00   81.33    23.33    15.00
>
> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>
> Tvrtko Ursulin (2):
>   drm/amdgpu: Re-validate evicted buffers
>   drm/amdgpu: Actually respect buffer migration budget
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>  2 files changed, 103 insertions(+), 30 deletions(-)
>
> --
> 2.44.0
>
Tvrtko Ursulin May 17, 2024, 7:41 a.m. UTC | #2
On 16/05/2024 20:21, Alex Deucher wrote:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>>   * Migration rate limiting does not work, at least not for the common case
>>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>>     the code, and from empirical experiments, I see that not working at all. When
>>     both masks are identical fiddling with them achieves nothing. Even when they
>>     are not identical allowed has a fallback GTT placement which means that when
>>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>>     migration from GTT to VRAM.
>>
>>   * Driver thinks it will be re-validating evicted buffers on the next submission
>>     but it does not for the very common case of VRAM+GTT because it only checks
>>     if current placement is *none* of the preferred placements.
> 
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise.  Maybe

I was curious about this but thought there could be a reason why VRAM 
carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or 
some other solution?

> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?

I will leave this to Christian to answer but for what this series is 
concerned I'd say it is orthogonal to that.

Here we have two fixes not limited to APU use cases, just so it happens 
fixing the migration throttling improves things there too. And that even 
despite the first patch which triggering *more* migration attempts. 
Because the second patch then correctly curbs them.

First patch should help with transient overcommit on discrete, allowing 
things get back into VRAM as soon as there is space.

Second patch tries to makes migration throttling work as intended.

Volunteers for testing on discrete? :)

>>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>>  From the game's built-in benchmark, average of three runs each:
>>
>>                                                  FPS
>>                  migrated KiB    min     avg     max     min-1%  min-0.1%
>>    because          20784781     10.00  37.00   89.67    22.00    12.33
>>    patched           4227688     13.67  37.00   81.33    23.33    15.00

Hmm! s/because/before/ here obviously!

Regards,

Tvrtko

>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>>    drm/amdgpu: Re-validate evicted buffers
>>    drm/amdgpu: Actually respect buffer migration budget
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>>   2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>
Alex Deucher May 17, 2024, 1:43 p.m. UTC | #3
On Fri, May 17, 2024 at 3:41 AM Tvrtko Ursulin
<tvrtko.ursulin@igalia.com> wrote:
>
>
> On 16/05/2024 20:21, Alex Deucher wrote:
> > On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>
> >> Reduced re-spin of my previous series after Christian corrected a few
> >> misconceptions that I had. So lets see if what remains makes sense or is still
> >> misguided.
> >>
> >> To summarise, the series address the following two issues:
> >>
> >>   * Migration rate limiting does not work, at least not for the common case
> >>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
> >>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
> >>     the code, and from empirical experiments, I see that not working at all. When
> >>     both masks are identical fiddling with them achieves nothing. Even when they
> >>     are not identical allowed has a fallback GTT placement which means that when
> >>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
> >>     migration from GTT to VRAM.
> >>
> >>   * Driver thinks it will be re-validating evicted buffers on the next submission
> >>     but it does not for the very common case of VRAM+GTT because it only checks
> >>     if current placement is *none* of the preferred placements.
> >
> > For APUs at least, we should never migrate because GTT and VRAM are
> > both system memory so are effectively equal performance-wise.  Maybe
>
> I was curious about this but thought there could be a reason why VRAM
> carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or
> some other solution?

It's really only needed for displays at boot up prior to the OS
loading or for displays on older APU that did not support
scatter/gather display.  The problem with increasing the carveout size
is that that prevents the memory from being available to the rest of
the system.

Alex

>
> > this regressed when Christian reworked ttm to better handle migrating
> > buffers back to VRAM after suspend on dGPUs?
>
> I will leave this to Christian to answer but for what this series is
> concerned I'd say it is orthogonal to that.
>
> Here we have two fixes not limited to APU use cases, just so it happens
> fixing the migration throttling improves things there too. And that even
> despite the first patch which triggering *more* migration attempts.
> Because the second patch then correctly curbs them.
>
> First patch should help with transient overcommit on discrete, allowing
> things get back into VRAM as soon as there is space.
>
> Second patch tries to makes migration throttling work as intended.
>
> Volunteers for testing on discrete? :)
>
> >>
> >> These two patches appear to have a positive result for a memory intensive game
> >> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> >> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> >> the migration budget appears to keep buffer blits at bay and improves the
> >> minimum frame rate, ie. makes things smoother.
> >>
> >>  From the game's built-in benchmark, average of three runs each:
> >>
> >>                                                  FPS
> >>                  migrated KiB    min     avg     max     min-1%  min-0.1%
> >>    because          20784781     10.00  37.00   89.67    22.00    12.33
> >>    patched           4227688     13.67  37.00   81.33    23.33    15.00
>
> Hmm! s/because/before/ here obviously!
>
> Regards,
>
> Tvrtko
>
> >> Disclaimers that I have is that more runs would be needed to be more confident
> >> about the results. And more games. And APU versus discrete.
> >>
> >> Cc: Christian König <christian.koenig@amd.com>
> >> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> >>
> >> Tvrtko Ursulin (2):
> >>    drm/amdgpu: Re-validate evicted buffers
> >>    drm/amdgpu: Actually respect buffer migration budget
> >>
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
> >>   2 files changed, 103 insertions(+), 30 deletions(-)
> >>
> >> --
> >> 2.44.0
> >>
Tvrtko Ursulin May 30, 2024, 10:17 a.m. UTC | #4
Hi,

On 16/05/2024 13:18, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> 
> Reduced re-spin of my previous series after Christian corrected a few
> misconceptions that I had. So lets see if what remains makes sense or is still
> misguided.
> 
> To summarise, the series address the following two issues:
> 
>   * Migration rate limiting does not work, at least not for the common case
>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>     the code, and from empirical experiments, I see that not working at all. When
>     both masks are identical fiddling with them achieves nothing. Even when they
>     are not identical allowed has a fallback GTT placement which means that when
>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>     migration from GTT to VRAM.
> 
>   * Driver thinks it will be re-validating evicted buffers on the next submission
>     but it does not for the very common case of VRAM+GTT because it only checks
>     if current placement is *none* of the preferred placements.
> 
> These two patches appear to have a positive result for a memory intensive game
> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
> the migration budget appears to keep buffer blits at bay and improves the
> minimum frame rate, ie. makes things smoother.
> 
>>From the game's built-in benchmark, average of three runs each:
> 
> 						FPS
> 		migrated KiB	min	avg	max	min-1%	min-0.1%
>    because	   20784781	10.00  37.00   89.67    22.00    12.33
>    patched	    4227688	13.67  37.00   81.33    23.33    15.00

Any feedback on this series?

As described above, neither migration rate limiting or re-validation of 
evicted buffers seems to work as expected currently.

Regards,

Tvrtko

> Disclaimers that I have is that more runs would be needed to be more confident
> about the results. And more games. And APU versus discrete.
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> 
> Tvrtko Ursulin (2):
>    drm/amdgpu: Re-validate evicted buffers
>    drm/amdgpu: Actually respect buffer migration budget
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>   2 files changed, 103 insertions(+), 30 deletions(-)
>
Christian König May 31, 2024, 9:12 a.m. UTC | #5
Am 16.05.24 um 21:21 schrieb Alex Deucher:
> On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin <tursulin@igalia.com> wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Reduced re-spin of my previous series after Christian corrected a few
>> misconceptions that I had. So lets see if what remains makes sense or is still
>> misguided.
>>
>> To summarise, the series address the following two issues:
>>
>>   * Migration rate limiting does not work, at least not for the common case
>>     where userspace configures VRAM+GTT. It thinks it can stop migration attempts
>>     by playing with bo->allowed_domains vs bo->preferred domains but, both from
>>     the code, and from empirical experiments, I see that not working at all. When
>>     both masks are identical fiddling with them achieves nothing. Even when they
>>     are not identical allowed has a fallback GTT placement which means that when
>>     over the migration budget ttm_bo_validate with bo->allowed_domains can cause
>>     migration from GTT to VRAM.
>>
>>   * Driver thinks it will be re-validating evicted buffers on the next submission
>>     but it does not for the very common case of VRAM+GTT because it only checks
>>     if current placement is *none* of the preferred placements.
> For APUs at least, we should never migrate because GTT and VRAM are
> both system memory so are effectively equal performance-wise.  Maybe
> this regressed when Christian reworked ttm to better handle migrating
> buffers back to VRAM after suspend on dGPUs?

Yeah, that's quite likely. I'm already looking into this.

Regards,
Christian.

>
> Alex
>
>> These two patches appear to have a positive result for a memory intensive game
>> like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
>> set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
>> the migration budget appears to keep buffer blits at bay and improves the
>> minimum frame rate, ie. makes things smoother.
>>
>>  From the game's built-in benchmark, average of three runs each:
>>
>>                                                  FPS
>>                  migrated KiB    min     avg     max     min-1%  min-0.1%
>>    because          20784781     10.00  37.00   89.67    22.00    12.33
>>    patched           4227688     13.67  37.00   81.33    23.33    15.00
>>
>> Disclaimers that I have is that more runs would be needed to be more confident
>> about the results. And more games. And APU versus discrete.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>
>> Tvrtko Ursulin (2):
>>    drm/amdgpu: Re-validate evicted buffers
>>    drm/amdgpu: Actually respect buffer migration budget
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++++++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 ++++-
>>   2 files changed, 103 insertions(+), 30 deletions(-)
>>
>> --
>> 2.44.0
>>