diff mbox series

[3/3] drm/i915/gtt: ignore min_page_size for paging structures

Message ID 20210623112637.266855-3-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915/ttm: consider all placements for the page alignment | expand

Commit Message

Matthew Auld June 23, 2021, 11:26 a.m. UTC
The min_page_size is only needed for pages inserted into the GTT, and
for our paging structures we only need at most 4K bytes, so simply
ignore the min_page_size restrictions here, otherwise we might see some
severe overallocation on some devices.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas Hellström June 23, 2021, 11:51 a.m. UTC | #1
On 6/23/21 1:26 PM, Matthew Auld wrote:
> The min_page_size is only needed for pages inserted into the GTT, and
> for our paging structures we only need at most 4K bytes, so simply
> ignore the min_page_size restrictions here, otherwise we might see some
> severe overallocation on some devices.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 084ea65d59c0..61e8a8c25374 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -16,7 +16,7 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>   {
>   	struct drm_i915_gem_object *obj;
>   
> -	obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
> +	obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
>   	/*
>   	 * Ensure all paging structures for this vm share the same dma-resv
>   	 * object underneath, with the idea that one object_lock() will lock

I think for this one the new gt migration code might break, because 
there we insert even PT pages into the GTT, so it might need a special 
interface? Ram is looking at supporter larger GPU PTE sizes with that code..

/Thomas
Matthew Auld June 23, 2021, 12:25 p.m. UTC | #2
On 23/06/2021 12:51, Thomas Hellström wrote:
> 
> On 6/23/21 1:26 PM, Matthew Auld wrote:
>> The min_page_size is only needed for pages inserted into the GTT, and
>> for our paging structures we only need at most 4K bytes, so simply
>> ignore the min_page_size restrictions here, otherwise we might see some
>> severe overallocation on some devices.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 084ea65d59c0..61e8a8c25374 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -16,7 +16,7 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct 
>> i915_address_space *vm, int sz)
>>   {
>>       struct drm_i915_gem_object *obj;
>> -    obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
>> +    obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
>>       /*
>>        * Ensure all paging structures for this vm share the same dma-resv
>>        * object underneath, with the idea that one object_lock() will 
>> lock
> 
> I think for this one the new gt migration code might break, because 
> there we insert even PT pages into the GTT, so it might need a special 
> interface? Ram is looking at supporter larger GPU PTE sizes with that 
> code..

For DG1 at least we don't need this. But yeah we can always just pass 
along the page size when allocating the stash I guess, if we need 
something special for migration?

But when we need to support huge PTEs for stuff other than DG1, then 
it's still a pile of work I assume, since we still need all the special 
PTE insertion routines specifically for insert_pte() which will differ 
wildly between generations, also each has quite different restrictions 
wrt min physical alignment of lmem, whether you can mix 64K/4K PTEs in 
the same 2M va range, whether 4K PTEs are even supported for lmem etc.

Not sure if it's simpler to go with mapping all of lmem upfront with the 
flat-ppGTT? Maybe that sidesteps some of these issues? At least for the 
physical alignment of paging structures that would no longer be a concern.

> 
> /Thomas
> 
> 
>
Thomas Hellström June 23, 2021, 12:44 p.m. UTC | #3
On Wed, 2021-06-23 at 13:25 +0100, Matthew Auld wrote:
> On 23/06/2021 12:51, Thomas Hellström wrote:
> > 
> > On 6/23/21 1:26 PM, Matthew Auld wrote:
> > > The min_page_size is only needed for pages inserted into the GTT,
> > > and
> > > for our paging structures we only need at most 4K bytes, so
> > > simply
> > > ignore the min_page_size restrictions here, otherwise we might
> > > see some
> > > severe overallocation on some devices.
> > > 
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > index 084ea65d59c0..61e8a8c25374 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > @@ -16,7 +16,7 @@ struct drm_i915_gem_object
> > > *alloc_pt_lmem(struct 
> > > i915_address_space *vm, int sz)
> > >   {
> > >       struct drm_i915_gem_object *obj;
> > > -    obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
> > > +    obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz,
> > > sz, 0);
> > >       /*
> > >        * Ensure all paging structures for this vm share the same
> > > dma-resv
> > >        * object underneath, with the idea that one object_lock()
> > > will 
> > > lock
> > 
> > I think for this one the new gt migration code might break, because
> > there we insert even PT pages into the GTT, so it might need a
> > special 
> > interface? Ram is looking at supporter larger GPU PTE sizes with
> > that 
> > code..
> 
> For DG1 at least we don't need this. But yeah we can always just pass
> along the page size when allocating the stash I guess, if we need 
> something special for migration?
> 
> But when we need to support huge PTEs for stuff other than DG1, then 
> it's still a pile of work I assume, since we still need all the
> special 
> PTE insertion routines specifically for insert_pte() which will
> differ 
> wildly between generations, also each has quite different
> restrictions 
> wrt min physical alignment of lmem, whether you can mix 64K/4K PTEs
> in 
> the same 2M va range, whether 4K PTEs are even supported for lmem
> etc.
> 
> Not sure if it's simpler to go with mapping all of lmem upfront with
> the 
> flat-ppGTT? Maybe that sidesteps some of these issues? At least for
> the 
> physical alignment of paging structures that would no longer be a
> concern.

Yes, that might be the simplest way forward.

/Thomas




> 
> > 
> > /Thomas
> > 
> > 
> >
Thomas Hellström June 23, 2021, 1:32 p.m. UTC | #4
On 6/23/21 1:26 PM, Matthew Auld wrote:
> The min_page_size is only needed for pages inserted into the GTT, and
> for our paging structures we only need at most 4K bytes, so simply
> ignore the min_page_size restrictions here, otherwise we might see some
> severe overallocation on some devices.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 084ea65d59c0..61e8a8c25374 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -16,7 +16,7 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>   {
>   	struct drm_i915_gem_object *obj;
>   
> -	obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
> +	obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);

So is this memory always required to be size aligned? or should it say 
sz, PAGE_SIZE?

/Thomas
Matthew Auld June 23, 2021, 1:38 p.m. UTC | #5
On 23/06/2021 14:32, Thomas Hellström wrote:
> 
> On 6/23/21 1:26 PM, Matthew Auld wrote:
>> The min_page_size is only needed for pages inserted into the GTT, and
>> for our paging structures we only need at most 4K bytes, so simply
>> ignore the min_page_size restrictions here, otherwise we might see some
>> severe overallocation on some devices.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 084ea65d59c0..61e8a8c25374 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -16,7 +16,7 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct 
>> i915_address_space *vm, int sz)
>>   {
>>       struct drm_i915_gem_object *obj;
>> -    obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
>> +    obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
> 
> So is this memory always required to be size aligned? or should it say 
> sz, PAGE_SIZE?

The scratch page also hits this path, which is another can of worms. In 
terms of size it might need to be 64K(with proper physical alignment), 
which is why we can't force 4K here, and instead need to use the passed 
in size, where the returned page will have the same alignment.

> 
> /Thomas
> 
>
Thomas Hellström June 23, 2021, 1:39 p.m. UTC | #6
On 6/23/21 3:38 PM, Matthew Auld wrote:
> On 23/06/2021 14:32, Thomas Hellström wrote:
>>
>> On 6/23/21 1:26 PM, Matthew Auld wrote:
>>> The min_page_size is only needed for pages inserted into the GTT, and
>>> for our paging structures we only need at most 4K bytes, so simply
>>> ignore the min_page_size restrictions here, otherwise we might see some
>>> severe overallocation on some devices.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 084ea65d59c0..61e8a8c25374 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -16,7 +16,7 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct 
>>> i915_address_space *vm, int sz)
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>> -    obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
>>> +    obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
>>
>> So is this memory always required to be size aligned? or should it 
>> say sz, PAGE_SIZE?
>
> The scratch page also hits this path, which is another can of worms. 
> In terms of size it might need to be 64K(with proper physical 
> alignment), which is why we can't force 4K here, and instead need to 
> use the passed in size, where the returned page will have the same 
> alignment.

OK. Perhaps a comment to explain that?

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



>
>>
>> /Thomas
>>
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 084ea65d59c0..61e8a8c25374 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -16,7 +16,7 @@  struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
 {
 	struct drm_i915_gem_object *obj;
 
-	obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
+	obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
 	/*
 	 * Ensure all paging structures for this vm share the same dma-resv
 	 * object underneath, with the idea that one object_lock() will lock