gen8_ppgtt: Use correct huge page manager for MTL

Message ID	20230217191813.1104342-1-jonathan.cavitt@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Jonathan Cavitt <jonathan.cavitt@intel.com> To: intel-gfx@lists.freedesktop.org Date: Fri, 17 Feb 2023 11:18:13 -0800 Message-Id: <20230217191813.1104342-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH] gen8_ppgtt: Use correct huge page manager for MTL Precedence: list Cc: jonathan.cavitt@intel.com, matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	gen8_ppgtt: Use correct huge page manager for MTL \| expand gen8_ppgtt: Use correct huge page manager for MTL

Cavitt, Jonathan Feb. 17, 2023, 7:18 p.m. UTC

MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
MTL reports as not supporting 64K pages, or more accurately, the system that reports
whether a platform has 64K pages reports false for MTL.  This is only half correct,
as the 64K page support reporting system only cares about 64K page support for LMEM,
which MTL doesn't have.

MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
using that manager doesn't resolve the issue because MTL is expecting the virtual
address space for the page table to be flushed after initialization, so we must also
add a flush statement there.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Matthew Auld Feb. 21, 2023, 4:06 p.m. UTC | #1

On 17/02/2023 19:18, Jonathan Cavitt wrote:
> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
> MTL reports as not supporting 64K pages, or more accurately, the system that reports
> whether a platform has 64K pages reports false for MTL.  This is only half correct,
> as the 64K page support reporting system only cares about 64K page support for LMEM,
> which MTL doesn't have.
> 
> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
> using that manager doesn't resolve the issue because MTL is expecting the virtual
> address space for the page table to be flushed after initialization, so we must also
> add a flush statement there.
> 
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Although it looks like the hugepage mock tests are failing with this. I 
assume the mock device just uses some "max" gen version or so, which now 
triggers this path. Any ideas for that?

> ---
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 4daaa6f55668..9c571185395f 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>   			}
>   		} while (rem >= page_size && index < max);
>   
> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
>   		vma_res->page_sizes_gtt |= page_size;
>   	} while (iter->sg && sg_dma_len(iter->sg));
>   }
> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma_res);
>   
>   	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
> -		if (HAS_64K_PAGES(vm->i915))
> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>   			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>   		else
>   			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);

Matthew Auld Feb. 21, 2023, 4:07 p.m. UTC | #2

On 21/02/2023 16:06, Matthew Auld wrote:
> On 17/02/2023 19:18, Jonathan Cavitt wrote:
>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  
>> This is because
>> MTL reports as not supporting 64K pages, or more accurately, the 
>> system that reports
>> whether a platform has 64K pages reports false for MTL.  This is only 
>> half correct,
>> as the 64K page support reporting system only cares about 64K page 
>> support for LMEM,
>> which MTL doesn't have.
>>
>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply 
>> changing over to
>> using that manager doesn't resolve the issue because MTL is expecting 
>> the virtual
>> address space for the page table to be flushed after initialization, 
>> so we must also
>> add a flush statement there.
>>
>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Also make sure we use "drm/i915:" prefix for commit title.

> 
> Although it looks like the hugepage mock tests are failing with this. I 
> assume the mock device just uses some "max" gen version or so, which now 
> triggers this path. Any ideas for that?
> 
>> ---
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 4daaa6f55668..9c571185395f 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct 
>> i915_address_space *vm,
>>               }
>>           } while (rem >= page_size && index < max);
>> +        drm_clflush_virt_range(vaddr, PAGE_SIZE);
>>           vma_res->page_sizes_gtt |= page_size;
>>       } while (iter->sg && sg_dma_len(iter->sg));
>>   }
>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct 
>> i915_address_space *vm,
>>       struct sgt_dma iter = sgt_dma(vma_res);
>>       if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>> -        if (HAS_64K_PAGES(vm->i915))
>> +        if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>>               xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, 
>> cache_level, flags);
>>           else
>>               gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, 
>> flags);

Cavitt, Jonathan Feb. 21, 2023, 4:28 p.m. UTC | #3

-----Original Message-----
From: Auld, Matthew <matthew.auld@intel.com> 
Sent: Tuesday, February 21, 2023 8:06 AM
To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> 
> On 17/02/2023 19:18, Jonathan Cavitt wrote:
> > MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
> > MTL reports as not supporting 64K pages, or more accurately, the system that reports
> > whether a platform has 64K pages reports false for MTL.  This is only half correct,
> > as the 64K page support reporting system only cares about 64K page support for LMEM,
> > which MTL doesn't have.
> > 
> > MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
> > using that manager doesn't resolve the issue because MTL is expecting the virtual
> > address space for the page table to be flushed after initialization, so we must also
> > add a flush statement there.
> > 
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> 
> Although it looks like the hugepage mock tests are failing with this. I 
> assume the mock device just uses some "max" gen version or so, which now 
> triggers this path. Any ideas for that?

With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
If the mock tests are run immediately after the live ones, that would explain this behavior.
I was informed when this was initially debugged that the error was a known IOMMU issue
rather than some novel regression, though it's hard to tell if that was just hopeful optimism
or not at this point.
-Jonathan Cavitt

> 
> > ---
> >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > index 4daaa6f55668..9c571185395f 100644
> > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
> >   			}
> >   		} while (rem >= page_size && index < max);
> >   
> > +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
> >   		vma_res->page_sizes_gtt |= page_size;
> >   	} while (iter->sg && sg_dma_len(iter->sg));
> >   }
> > @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
> >   	struct sgt_dma iter = sgt_dma(vma_res);
> >   
> >   	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
> > -		if (HAS_64K_PAGES(vm->i915))
> > +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
> >   			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> >   		else
> >   			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>

Matthew Auld Feb. 21, 2023, 4:32 p.m. UTC | #4

On 21/02/2023 16:28, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Tuesday, February 21, 2023 8:06 AM
> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>
>> On 17/02/2023 19:18, Jonathan Cavitt wrote:
>>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
>>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
>>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
>>> as the 64K page support reporting system only cares about 64K page support for LMEM,
>>> which MTL doesn't have.
>>>
>>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
>>> using that manager doesn't resolve the issue because MTL is expecting the virtual
>>> address space for the page table to be flushed after initialization, so we must also
>>> add a flush statement there.
>>>
>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>>
>> Although it looks like the hugepage mock tests are failing with this. I
>> assume the mock device just uses some "max" gen version or so, which now
>> triggers this path. Any ideas for that?
> 
> With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
> If the mock tests are run immediately after the live ones, that would explain this behavior.
> I was informed when this was initially debugged that the error was a known IOMMU issue
> rather than some novel regression, though it's hard to tell if that was just hopeful optimism
> or not at this point.

In the test results we now get:

6> [183.420316] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
<6> [183.436978] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
<6> [183.445777] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
<6> [183.904531] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
<3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
<3> [183.912784] i915/i915_gem_huge_page_mock_selftests: 
igt_mock_ppgtt_huge_fill failed with error -22

I didn't look any deeper than that though. Note that this a just a 
mock/fake device. I don't think its IOMMU related.

> -Jonathan Cavitt
> 
>>
>>> ---
>>>    drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 4daaa6f55668..9c571185395f 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>>    			}
>>>    		} while (rem >= page_size && index < max);
>>>    
>>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
>>>    		vma_res->page_sizes_gtt |= page_size;
>>>    	} while (iter->sg && sg_dma_len(iter->sg));
>>>    }
>>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>>    	struct sgt_dma iter = sgt_dma(vma_res);
>>>    
>>>    	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>> -		if (HAS_64K_PAGES(vm->i915))
>>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>>>    			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>    		else
>>>    			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>

Cavitt, Jonathan Feb. 21, 2023, 5:14 p.m. UTC | #5

-----Original Message-----
From: Auld, Matthew <matthew.auld@intel.com> 
Sent: Tuesday, February 21, 2023 8:33 AM
To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> 
> On 21/02/2023 16:28, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Auld, Matthew <matthew.auld@intel.com>
> > Sent: Tuesday, February 21, 2023 8:06 AM
> > To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
> > Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
> > Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> >>
> >> On 17/02/2023 19:18, Jonathan Cavitt wrote:
> >>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
> >>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
> >>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
> >>> as the 64K page support reporting system only cares about 64K page support for LMEM,
> >>> which MTL doesn't have.
> >>>
> >>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
> >>> using that manager doesn't resolve the issue because MTL is expecting the virtual
> >>> address space for the page table to be flushed after initialization, so we must also
> >>> add a flush statement there.
> >>>
> >>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> >> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> >>
> >> Although it looks like the hugepage mock tests are failing with this. I
> >> assume the mock device just uses some "max" gen version or so, which now
> >> triggers this path. Any ideas for that?
> > 
> > With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
> > If the mock tests are run immediately after the live ones, that would explain this behavior.
> > I was informed when this was initially debugged that the error was a known IOMMU issue
> > rather than some novel regression, though it's hard to tell if that was just hopeful optimism
> > or not at this point.
> 
> In the test results we now get:
> 
> 6> [183.420316] i915: Running 
> i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
> <6> [183.436978] i915: Running 
> i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
> <6> [183.445777] i915: Running 
> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
> <6> [183.904531] i915: Running 
> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
> <3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
> <3> [183.912784] i915/i915_gem_huge_page_mock_selftests: 
> igt_mock_ppgtt_huge_fill failed with error -22

                if (expected_gtt & I915_GTT_PAGE_SIZE_4K)
                        expected_gtt &= ~I915_GTT_PAGE_SIZE_64K;

I don't know why we're doing that to expected_gtt, but that seems to be the cause of the
problem in this case.
-Jonathan Cavitt

> 
> I didn't look any deeper than that though. Note that this a just a 
> mock/fake device. I don't think its IOMMU related.
> 
> > -Jonathan Cavitt
> > 
> >>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
> >>>    1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>> index 4daaa6f55668..9c571185395f 100644
> >>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
> >>>    			}
> >>>    		} while (rem >= page_size && index < max);
> >>>    
> >>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
> >>>    		vma_res->page_sizes_gtt |= page_size;
> >>>    	} while (iter->sg && sg_dma_len(iter->sg));
> >>>    }
> >>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
> >>>    	struct sgt_dma iter = sgt_dma(vma_res);
> >>>    
> >>>    	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
> >>> -		if (HAS_64K_PAGES(vm->i915))
> >>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
> >>>    			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> >>>    		else
> >>>    			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> >>
>

Matthew Auld Feb. 21, 2023, 5:46 p.m. UTC | #6

On 21/02/2023 17:14, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Tuesday, February 21, 2023 8:33 AM
> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>
>> On 21/02/2023 16:28, Cavitt, Jonathan wrote:
>>> -----Original Message-----
>>> From: Auld, Matthew <matthew.auld@intel.com>
>>> Sent: Tuesday, February 21, 2023 8:06 AM
>>> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
>>> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
>>> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>>>
>>>> On 17/02/2023 19:18, Jonathan Cavitt wrote:
>>>>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
>>>>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
>>>>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
>>>>> as the 64K page support reporting system only cares about 64K page support for LMEM,
>>>>> which MTL doesn't have.
>>>>>
>>>>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
>>>>> using that manager doesn't resolve the issue because MTL is expecting the virtual
>>>>> address space for the page table to be flushed after initialization, so we must also
>>>>> add a flush statement there.
>>>>>
>>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>>>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>>>>
>>>> Although it looks like the hugepage mock tests are failing with this. I
>>>> assume the mock device just uses some "max" gen version or so, which now
>>>> triggers this path. Any ideas for that?
>>>
>>> With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
>>> If the mock tests are run immediately after the live ones, that would explain this behavior.
>>> I was informed when this was initially debugged that the error was a known IOMMU issue
>>> rather than some novel regression, though it's hard to tell if that was just hopeful optimism
>>> or not at this point.
>>
>> In the test results we now get:
>>
>> 6> [183.420316] i915: Running
>> i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
>> <6> [183.436978] i915: Running
>> i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
>> <6> [183.445777] i915: Running
>> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
>> <6> [183.904531] i915: Running
>> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
>> <3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
>> <3> [183.912784] i915/i915_gem_huge_page_mock_selftests:
>> igt_mock_ppgtt_huge_fill failed with error -22
> 
>                  if (expected_gtt & I915_GTT_PAGE_SIZE_4K)
>                          expected_gtt &= ~I915_GTT_PAGE_SIZE_64K;
> 
> I don't know why we're doing that to expected_gtt, but that seems to be the cause of the
> problem in this case.

I think it's due to the older huge page model, where 64K requires the 
entire page-table to all use 64K pages underneath (pde level hint), so 
if we see 4K in there somewhere then we don't expect to get back 64K 
GTT. But on newer HW we now have have pte level hint, so I think the 
above can just be removed with this patch, since that's what the mock 
device now uses.

> -Jonathan Cavitt
> 
>>
>> I didn't look any deeper than that though. Note that this a just a
>> mock/fake device. I don't think its IOMMU related.
>>
>>> -Jonathan Cavitt
>>>
>>>>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>> index 4daaa6f55668..9c571185395f 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>>>>     			}
>>>>>     		} while (rem >= page_size && index < max);
>>>>>     
>>>>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
>>>>>     		vma_res->page_sizes_gtt |= page_size;
>>>>>     	} while (iter->sg && sg_dma_len(iter->sg));
>>>>>     }
>>>>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>>>>     	struct sgt_dma iter = sgt_dma(vma_res);
>>>>>     
>>>>>     	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>>>> -		if (HAS_64K_PAGES(vm->i915))
>>>>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>>>>>     			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>>>     		else
>>>>>     			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>>
>>

Cavitt, Jonathan Feb. 21, 2023, 6:34 p.m. UTC | #7

-----Original Message-----
From: Auld, Matthew <matthew.auld@intel.com> 
Sent: Tuesday, February 21, 2023 9:46 AM
To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> 
> On 21/02/2023 17:14, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Auld, Matthew <matthew.auld@intel.com>
> > Sent: Tuesday, February 21, 2023 8:33 AM
> > To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
> > Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> >>
> >> On 21/02/2023 16:28, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Auld, Matthew <matthew.auld@intel.com>
> >>> Sent: Tuesday, February 21, 2023 8:06 AM
> >>> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
> >>> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
> >>> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
> >>>>
> >>>> On 17/02/2023 19:18, Jonathan Cavitt wrote:
> >>>>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
> >>>>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
> >>>>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
> >>>>> as the 64K page support reporting system only cares about 64K page support for LMEM,
> >>>>> which MTL doesn't have.
> >>>>>
> >>>>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
> >>>>> using that manager doesn't resolve the issue because MTL is expecting the virtual
> >>>>> address space for the page table to be flushed after initialization, so we must also
> >>>>> add a flush statement there.
> >>>>>
> >>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> >>>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> >>>>
> >>>> Although it looks like the hugepage mock tests are failing with this. I
> >>>> assume the mock device just uses some "max" gen version or so, which now
> >>>> triggers this path. Any ideas for that?
> >>>
> >>> With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
> >>> If the mock tests are run immediately after the live ones, that would explain this behavior.
> >>> I was informed when this was initially debugged that the error was a known IOMMU issue
> >>> rather than some novel regression, though it's hard to tell if that was just hopeful optimism
> >>> or not at this point.
> >>
> >> In the test results we now get:
> >>
> >> 6> [183.420316] i915: Running
> >> i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
> >> <6> [183.436978] i915: Running
> >> i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
> >> <6> [183.445777] i915: Running
> >> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
> >> <6> [183.904531] i915: Running
> >> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
> >> <3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
> >> <3> [183.912784] i915/i915_gem_huge_page_mock_selftests:
> >> igt_mock_ppgtt_huge_fill failed with error -22
> > 
> >                  if (expected_gtt & I915_GTT_PAGE_SIZE_4K)
> >                          expected_gtt &= ~I915_GTT_PAGE_SIZE_64K;
> > 
> > I don't know why we're doing that to expected_gtt, but that seems to be the cause of the
> > problem in this case.
> 
> I think it's due to the older huge page model, where 64K requires the 
> entire page-table to all use 64K pages underneath (pde level hint), so 
> if we see 4K in there somewhere then we don't expect to get back 64K 
> GTT. But on newer HW we now have have pte level hint, so I think the 
> above can just be removed with this patch, since that's what the mock 
> device now uses.

Seems right.  I guess that would be... what?  Is it:
A. Platform specific?  I.E. we need s generation check in the selftest to proceed, such as the following:

                 if (expected_gtt & I915_GTT_PAGE_SIZE_4K && GRAPHICS_VER(i915) >= 12)

B. Systems specific?  I.E. we have a special check for this functionality such as:

                 if (expected_gtt & I915_GTT_PAGE_SIZE_4K && has_pte_level_hint(i915))

C. The new norm.  I.E. we can just remove this line from the test and everything will work out fine.

-Jonathan Cavitt

> 
> > -Jonathan Cavitt
> > 
> >>
> >> I didn't look any deeper than that though. Note that this a just a
> >> mock/fake device. I don't think its IOMMU related.
> >>
> >>> -Jonathan Cavitt
> >>>
> >>>>
> >>>>> ---
> >>>>>     drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
> >>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>>> index 4daaa6f55668..9c571185395f 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
> >>>>>     			}
> >>>>>     		} while (rem >= page_size && index < max);
> >>>>>     
> >>>>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
> >>>>>     		vma_res->page_sizes_gtt |= page_size;
> >>>>>     	} while (iter->sg && sg_dma_len(iter->sg));
> >>>>>     }
> >>>>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
> >>>>>     	struct sgt_dma iter = sgt_dma(vma_res);
> >>>>>     
> >>>>>     	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
> >>>>> -		if (HAS_64K_PAGES(vm->i915))
> >>>>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
> >>>>>     			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> >>>>>     		else
> >>>>>     			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> >>>>
> >>
>

Matthew Auld Feb. 22, 2023, 9:35 a.m. UTC | #8

On 21/02/2023 18:34, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Tuesday, February 21, 2023 9:46 AM
> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>
>> On 21/02/2023 17:14, Cavitt, Jonathan wrote:
>>> -----Original Message-----
>>> From: Auld, Matthew <matthew.auld@intel.com>
>>> Sent: Tuesday, February 21, 2023 8:33 AM
>>> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
>>> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>; intel-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>>>
>>>> On 21/02/2023 16:28, Cavitt, Jonathan wrote:
>>>>> -----Original Message-----
>>>>> From: Auld, Matthew <matthew.auld@intel.com>
>>>>> Sent: Tuesday, February 21, 2023 8:06 AM
>>>>> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>; intel-gfx@lists.freedesktop.org
>>>>> Cc: Dutt, Sudeep <sudeep.dutt@intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui@intel.com>
>>>>> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>>>>>
>>>>>> On 17/02/2023 19:18, Jonathan Cavitt wrote:
>>>>>>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
>>>>>>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
>>>>>>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
>>>>>>> as the 64K page support reporting system only cares about 64K page support for LMEM,
>>>>>>> which MTL doesn't have.
>>>>>>>
>>>>>>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
>>>>>>> using that manager doesn't resolve the issue because MTL is expecting the virtual
>>>>>>> address space for the page table to be flushed after initialization, so we must also
>>>>>>> add a flush statement there.
>>>>>>>
>>>>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>>>>>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>>>>>>
>>>>>> Although it looks like the hugepage mock tests are failing with this. I
>>>>>> assume the mock device just uses some "max" gen version or so, which now
>>>>>> triggers this path. Any ideas for that?
>>>>>
>>>>> With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
>>>>> If the mock tests are run immediately after the live ones, that would explain this behavior.
>>>>> I was informed when this was initially debugged that the error was a known IOMMU issue
>>>>> rather than some novel regression, though it's hard to tell if that was just hopeful optimism
>>>>> or not at this point.
>>>>
>>>> In the test results we now get:
>>>>
>>>> 6> [183.420316] i915: Running
>>>> i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
>>>> <6> [183.436978] i915: Running
>>>> i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
>>>> <6> [183.445777] i915: Running
>>>> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
>>>> <6> [183.904531] i915: Running
>>>> i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
>>>> <3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
>>>> <3> [183.912784] i915/i915_gem_huge_page_mock_selftests:
>>>> igt_mock_ppgtt_huge_fill failed with error -22
>>>
>>>                   if (expected_gtt & I915_GTT_PAGE_SIZE_4K)
>>>                           expected_gtt &= ~I915_GTT_PAGE_SIZE_64K;
>>>
>>> I don't know why we're doing that to expected_gtt, but that seems to be the cause of the
>>> problem in this case.
>>
>> I think it's due to the older huge page model, where 64K requires the
>> entire page-table to all use 64K pages underneath (pde level hint), so
>> if we see 4K in there somewhere then we don't expect to get back 64K
>> GTT. But on newer HW we now have have pte level hint, so I think the
>> above can just be removed with this patch, since that's what the mock
>> device now uses.
> 
> Seems right.  I guess that would be... what?  Is it:
> A. Platform specific?  I.E. we need s generation check in the selftest to proceed, such as the following:
> 
>                   if (expected_gtt & I915_GTT_PAGE_SIZE_4K && GRAPHICS_VER(i915) >= 12)
> 
> B. Systems specific?  I.E. we have a special check for this functionality such as:
> 
>                   if (expected_gtt & I915_GTT_PAGE_SIZE_4K && has_pte_level_hint(i915))
> 
> C. The new norm.  I.E. we can just remove this line from the test and everything will work out fine.

The mock device will always use the max graphics version:

RUNTIME_INFO(i915)->graphics.ip.ver = -1;

So I think option C.

> 
> -Jonathan Cavitt
> 
>>
>>> -Jonathan Cavitt
>>>
>>>>
>>>> I didn't look any deeper than that though. Note that this a just a
>>>> mock/fake device. I don't think its IOMMU related.
>>>>
>>>>> -Jonathan Cavitt
>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>>> index 4daaa6f55668..9c571185395f 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>>>>>>      			}
>>>>>>>      		} while (rem >= page_size && index < max);
>>>>>>>      
>>>>>>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
>>>>>>>      		vma_res->page_sizes_gtt |= page_size;
>>>>>>>      	} while (iter->sg && sg_dma_len(iter->sg));
>>>>>>>      }
>>>>>>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>>>>>>      	struct sgt_dma iter = sgt_dma(vma_res);
>>>>>>>      
>>>>>>>      	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>>>>>> -		if (HAS_64K_PAGES(vm->i915))
>>>>>>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>>>>>>>      			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>>>>>      		else
>>>>>>>      			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>>>>
>>>>
>>

gen8_ppgtt: Use correct huge page manager for MTL

Commit Message

Comments

Patch