drm/i915/mtl: Increase guard pages when vt-d is enabled

Message ID	20231102160644.1279801-1-radhakrishna.sripada@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Radhakrishna Sripada <radhakrishna.sripada@intel.com> To: intel-gfx@lists.freedesktop.org Date: Thu, 2 Nov 2023 09:06:44 -0700 Message-Id: <20231102160644.1279801-1-radhakrishna.sripada@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is enabled Precedence: list Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915/mtl: Increase guard pages when vt-d is enabled \| expand drm/i915/mtl: Increase guard pages when vt-d is enabled

Sripada, Radhakrishna Nov. 2, 2023, 4:06 p.m. UTC

Experiments were conducted with different multipliers to VTD_GUARD macro
with multiplier of 185 we were observing occasional pipe faults when
running kms_cursor_legacy --run-subtest single-bo

There could possibly be an underlying issue that is being investigated, for
now bump the guard pages for MTL.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
 1 file changed, 3 insertions(+)

Gustavo Sousa Nov. 2, 2023, 4:35 p.m. UTC | #1

Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00)
>Experiments were conducted with different multipliers to VTD_GUARD macro
>with multiplier of 185 we were observing occasional pipe faults when
>running kms_cursor_legacy --run-subtest single-bo
>
>There could possibly be an underlying issue that is being investigated, for
>now bump the guard pages for MTL.
>
>Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>Cc: Gustavo Sousa <gustavo.sousa@intel.com>
>Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>index 3770828f2eaf..b65f84c6bb3f 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>         if (intel_scanout_needs_vtd_wa(i915)) {
>                 unsigned int guard = VTD_GUARD;
> 

I remember trying increasing the guard, but with a much smaller multiplier. So
it turns out that using a much higher value did the "trick".

I would add a FIXME comment here to remind us that this is a hack.

With the FIXME in place,

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>

>+                if (IS_METEORLAKE(i915))
>+                        guard *= 200;
>+
>                 if (i915_gem_object_is_tiled(obj))
>                         guard = max(guard,
>                                     i915_gem_object_get_tile_row_size(obj));
>-- 
>2.34.1
>

Andrzej Hajda Nov. 2, 2023, 4:58 p.m. UTC | #2

On 02.11.2023 17:06, Radhakrishna Sripada wrote:
> Experiments were conducted with different multipliers to VTD_GUARD macro
> with multiplier of 185 we were observing occasional pipe faults when
> running kms_cursor_legacy --run-subtest single-bo
> 
> There could possibly be an underlying issue that is being investigated, for
> now bump the guard pages for MTL.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index 3770828f2eaf..b65f84c6bb3f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>   	if (intel_scanout_needs_vtd_wa(i915)) {
>   		unsigned int guard = VTD_GUARD;
>   
> +		if (IS_METEORLAKE(i915))
> +			guard *= 200;
> +

200 * VTD_GUARD = 200 * 168 * 4K = 131MB

Looks insanely high, 131MB for padding, if this is before and after it 
becomes even 262MB of wasted address per plane. Just signalling, I do 
not know if this actually hurts.

Regards
Andrzej



>   		if (i915_gem_object_is_tiled(obj))
>   			guard = max(guard,
>   				    i915_gem_object_get_tile_row_size(obj));

Tvrtko Ursulin Nov. 2, 2023, 5:41 p.m. UTC | #3

On 02/11/2023 16:58, Andrzej Hajda wrote:
> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
>> Experiments were conducted with different multipliers to VTD_GUARD macro
>> with multiplier of 185 we were observing occasional pipe faults when
>> running kms_cursor_legacy --run-subtest single-bo
>>
>> There could possibly be an underlying issue that is being 
>> investigated, for
>> now bump the guard pages for MTL.
>>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index 3770828f2eaf..b65f84c6bb3f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct 
>> drm_i915_gem_object *obj,
>>       if (intel_scanout_needs_vtd_wa(i915)) {
>>           unsigned int guard = VTD_GUARD;
>> +        if (IS_METEORLAKE(i915))
>> +            guard *= 200;
>> +
> 
> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
> 
> Looks insanely high, 131MB for padding, if this is before and after it 
> becomes even 262MB of wasted address per plane. Just signalling, I do 
> not know if this actually hurts.

Yeah this feels crazy. There must be some other explanation which is 
getting hidden by the crazy amount of padding so I'd rather we figured 
it out.

With 262MiB per fb how many fit in GGTT before eviction hits? N screens 
with double/triple buffering?

Regards,

Tvrtko

P.S. Where did the 185 from the commit message come from?

Gustavo Sousa Nov. 2, 2023, 6:01 p.m. UTC | #4

Quoting Gustavo Sousa (2023-11-02 13:35:53-03:00)
>Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00)
>>Experiments were conducted with different multipliers to VTD_GUARD macro
>>with multiplier of 185 we were observing occasional pipe faults when
>>running kms_cursor_legacy --run-subtest single-bo
>>
>>There could possibly be an underlying issue that is being investigated, for
>>now bump the guard pages for MTL.
>>
>>Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>>Cc: Gustavo Sousa <gustavo.sousa@intel.com>
>>Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>>---
>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>index 3770828f2eaf..b65f84c6bb3f 100644
>>--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>>         if (intel_scanout_needs_vtd_wa(i915)) {
>>                 unsigned int guard = VTD_GUARD;
>> 
>
>I remember trying increasing the guard, but with a much smaller multiplier. So
>it turns out that using a much higher value did the "trick".
>
>I would add a FIXME comment here to remind us that this is a hack.
>
>With the FIXME in place,
>
>Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>

Oops. I was too hasty on providing an r-b and did not really pay attention to
the resulting size of the padding and its implications as highlighted by the
other replies here. My bad, sorry about that. Please dismiss the r-b.

--
Gustavo Sousa

>
>>+                if (IS_METEORLAKE(i915))
>>+                        guard *= 200;
>>+
>>                 if (i915_gem_object_is_tiled(obj))
>>                         guard = max(guard,
>>                                     i915_gem_object_get_tile_row_size(obj));
>>-- 
>>2.34.1
>>

Sripada, Radhakrishna Nov. 2, 2023, 10:14 p.m. UTC | #5

Hi Tvrtko,

> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: Thursday, November 2, 2023 10:41 AM
> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna
> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
> enabled
> 
> 
> On 02/11/2023 16:58, Andrzej Hajda wrote:
> > On 02.11.2023 17:06, Radhakrishna Sripada wrote:
> >> Experiments were conducted with different multipliers to VTD_GUARD macro
> >> with multiplier of 185 we were observing occasional pipe faults when
> >> running kms_cursor_legacy --run-subtest single-bo
> >>
> >> There could possibly be an underlying issue that is being
> >> investigated, for
> >> now bump the guard pages for MTL.
> >>
> >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> >> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >> index 3770828f2eaf..b65f84c6bb3f 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
> >> drm_i915_gem_object *obj,
> >>       if (intel_scanout_needs_vtd_wa(i915)) {
> >>           unsigned int guard = VTD_GUARD;
> >> +        if (IS_METEORLAKE(i915))
> >> +            guard *= 200;
> >> +
> >
> > 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
> >
> > Looks insanely high, 131MB for padding, if this is before and after it
> > becomes even 262MB of wasted address per plane. Just signalling, I do
> > not know if this actually hurts.
> 
> Yeah this feels crazy. There must be some other explanation which is
> getting hidden by the crazy amount of padding so I'd rather we figured
> it out.
> 
> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
> with double/triple buffering?

I believe with this method we will have to limit the no of frame buffers in the system. One alternative
that worked is to do a proper clear range for the ggtt instead of doing a nop. Although it adds marginal
time during suspend/resume/boot it does not add restrictions to the no of fb's that can be used.
 
> 
> Regards,
> 
> Tvrtko
> 
> P.S. Where did the 185 from the commit message come from?
185 came from experiment to increase the guard size. It is not a standard number.

Regards,
Radhakrishna(RK) Sripada

Tvrtko Ursulin Nov. 3, 2023, 8:30 a.m. UTC | #6

On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
> Hi Tvrtko,
> 
>> -----Original Message-----
>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Sent: Thursday, November 2, 2023 10:41 AM
>> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna
>> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
>> enabled
>>
>>
>> On 02/11/2023 16:58, Andrzej Hajda wrote:
>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
>>>> Experiments were conducted with different multipliers to VTD_GUARD macro
>>>> with multiplier of 185 we were observing occasional pipe faults when
>>>> running kms_cursor_legacy --run-subtest single-bo
>>>>
>>>> There could possibly be an underlying issue that is being
>>>> investigated, for
>>>> now bump the guard pages for MTL.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> index 3770828f2eaf..b65f84c6bb3f 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
>>>> drm_i915_gem_object *obj,
>>>>        if (intel_scanout_needs_vtd_wa(i915)) {
>>>>            unsigned int guard = VTD_GUARD;
>>>> +        if (IS_METEORLAKE(i915))
>>>> +            guard *= 200;
>>>> +
>>>
>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
>>>
>>> Looks insanely high, 131MB for padding, if this is before and after it
>>> becomes even 262MB of wasted address per plane. Just signalling, I do
>>> not know if this actually hurts.
>>
>> Yeah this feels crazy. There must be some other explanation which is
>> getting hidden by the crazy amount of padding so I'd rather we figured
>> it out.
>>
>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
>> with double/triple buffering?
> 
> I believe with this method we will have to limit the no of frame buffers in the system. One alternative
> that worked is to do a proper clear range for the ggtt instead of doing a nop. Although it adds marginal
> time during suspend/resume/boot it does not add restrictions to the no of fb's that can be used.

And if we remember the guard pages replaced clearing to scratch, to 
improve suspend resume times, exactly for improving user experience. :(

Luckily there is time to fix this properly on MTL one way or the other. 
Is it just kms_cursor_legacy --run-subtest single-bo that is affected?

Regards,

Tvrtko


>>
>> Regards,
>>
>> Tvrtko
>>
>> P.S. Where did the 185 from the commit message come from?
> 185 came from experiment to increase the guard size. It is not a standard number.
> 
> Regards,
> Radhakrishna(RK) Sripada

Rodrigo Vivi Nov. 3, 2023, 3:35 p.m. UTC | #7

On Thu, Nov 02, 2023 at 01:35:53PM -0300, Gustavo Sousa wrote:
> Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00)
> >Experiments were conducted with different multipliers to VTD_GUARD macro
> >with multiplier of 185 we were observing occasional pipe faults when
> >running kms_cursor_legacy --run-subtest single-bo
> >
> >There could possibly be an underlying issue that is being investigated, for
> >now bump the guard pages for MTL.
> >
> >Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> >Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> >Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >---
> > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >index 3770828f2eaf..b65f84c6bb3f 100644
> >--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
> >         if (intel_scanout_needs_vtd_wa(i915)) {
> >                 unsigned int guard = VTD_GUARD;
> > 
> 
> I remember trying increasing the guard, but with a much smaller multiplier. So
> it turns out that using a much higher value did the "trick".

a much smaller multiplier could mess with the flags range?
it is really hard to understand what of that 'flags' is really those 12 flags
or what is this 'guard' and where that ends up...

> 
> I would add a FIXME comment here to remind us that this is a hack.
> 
> With the FIXME in place,
> 
> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
> 
> >+                if (IS_METEORLAKE(i915))
> >+                        guard *= 200;
> >+
> >                 if (i915_gem_object_is_tiled(obj))
> >                         guard = max(guard,
> >                                     i915_gem_object_get_tile_row_size(obj));
> >-- 
> >2.34.1
> >

Sripada, Radhakrishna Nov. 3, 2023, 3:53 p.m. UTC | #8

Hi Tvrtko,

> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: Friday, November 3, 2023 1:30 AM
> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej
> <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
> enabled
> 
> 
> On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
> > Hi Tvrtko,
> >
> >> -----Original Message-----
> >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >> Sent: Thursday, November 2, 2023 10:41 AM
> >> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna
> >> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org
> >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d
> is
> >> enabled
> >>
> >>
> >> On 02/11/2023 16:58, Andrzej Hajda wrote:
> >>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
> >>>> Experiments were conducted with different multipliers to VTD_GUARD
> macro
> >>>> with multiplier of 185 we were observing occasional pipe faults when
> >>>> running kms_cursor_legacy --run-subtest single-bo
> >>>>
> >>>> There could possibly be an underlying issue that is being
> >>>> investigated, for
> >>>> now bump the guard pages for MTL.
> >>>>
> >>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> >>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>> index 3770828f2eaf..b65f84c6bb3f 100644
> >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
> >>>> drm_i915_gem_object *obj,
> >>>>        if (intel_scanout_needs_vtd_wa(i915)) {
> >>>>            unsigned int guard = VTD_GUARD;
> >>>> +        if (IS_METEORLAKE(i915))
> >>>> +            guard *= 200;
> >>>> +
> >>>
> >>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
> >>>
> >>> Looks insanely high, 131MB for padding, if this is before and after it
> >>> becomes even 262MB of wasted address per plane. Just signalling, I do
> >>> not know if this actually hurts.
> >>
> >> Yeah this feels crazy. There must be some other explanation which is
> >> getting hidden by the crazy amount of padding so I'd rather we figured
> >> it out.
> >>
> >> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
> >> with double/triple buffering?
> >
> > I believe with this method we will have to limit the no of frame buffers in the
> system. One alternative
> > that worked is to do a proper clear range for the ggtt instead of doing a nop.
> Although it adds marginal
> > time during suspend/resume/boot it does not add restrictions to the no of fb's
> that can be used.
> 
> And if we remember the guard pages replaced clearing to scratch, to
> improve suspend resume times, exactly for improving user experience. :(
> 
> Luckily there is time to fix this properly on MTL one way or the other.
> Is it just kms_cursor_legacy --run-subtest single-bo that is affected?

I am trying to dump the page table entries at the time of failure for bot the fame buffer and if required
For the guard pages. Will see if I get some info from there.

Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public CI, I also see pipe errors
being reported with varying occurrences while running kms_cursor_crc, kms_pipe_crc_basic,
and kms_plane_scaling. More details on the occurrence can be found here [1].

Thanks,
RK

1. http://gfx-ci.igk.intel.com/cibuglog-ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a6633c

> 
> Regards,
> 
> Tvrtko
> 
> 
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >> P.S. Where did the 185 from the commit message come from?
> > 185 came from experiment to increase the guard size. It is not a standard
> number.
> >
> > Regards,
> > Radhakrishna(RK) Sripada

Sripada, Radhakrishna Nov. 3, 2023, 5:38 p.m. UTC | #9

Hi Rodrigo,

> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Sent: Friday, November 3, 2023 8:35 AM
> To: Sousa, Gustavo <gustavo.sousa@intel.com>
> Cc: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; intel-
> gfx@lists.freedesktop.org; Chris Wilson <chris.p.wilson@linux.intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
> enabled
> 
> On Thu, Nov 02, 2023 at 01:35:53PM -0300, Gustavo Sousa wrote:
> > Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00)
> > >Experiments were conducted with different multipliers to VTD_GUARD macro
> > >with multiplier of 185 we were observing occasional pipe faults when
> > >running kms_cursor_legacy --run-subtest single-bo
> > >
> > >There could possibly be an underlying issue that is being investigated, for
> > >now bump the guard pages for MTL.
> > >
> > >Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> > >Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> > >Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> > >---
> > > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > >index 3770828f2eaf..b65f84c6bb3f 100644
> > >--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > >+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > >@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
> drm_i915_gem_object *obj,
> > >         if (intel_scanout_needs_vtd_wa(i915)) {
> > >                 unsigned int guard = VTD_GUARD;
> > >
> >
> > I remember trying increasing the guard, but with a much smaller multiplier. So
> > it turns out that using a much higher value did the "trick".
> 
> a much smaller multiplier could mess with the flags range?
> it is really hard to understand what of that 'flags' is really those 12 flags
> or what is this 'guard' and where that ends up...
Based on my glance, if the multiplier fits in 32 bits then it should work. But the
problem here pointed out by Tvrtko is that we are adding awful lot(262 mb per fb) of padding
in the gurad pages clobbering up the ggtt address pace. Enough(10 to 20) fb's created we
will fall into the realm of evictions.

Regards,
Radhakrishna(RK) Sripada

> 
> >
> > I would add a FIXME comment here to remind us that this is a hack.
> >
> > With the FIXME in place,
> >
> > Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
> >
> > >+                if (IS_METEORLAKE(i915))
> > >+                        guard *= 200;
> > >+
> > >                 if (i915_gem_object_is_tiled(obj))
> > >                         guard = max(guard,
> > >                                     i915_gem_object_get_tile_row_size(obj));
> > >--
> > >2.34.1
> > >

Andrzej Hajda Nov. 3, 2023, 9:18 p.m. UTC | #10

On 03.11.2023 16:53, Sripada, Radhakrishna wrote:
> Hi Tvrtko,
>
>> -----Original Message-----
>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Sent: Friday, November 3, 2023 1:30 AM
>> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej
>> <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
>> enabled
>>
>>
>> On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
>>> Hi Tvrtko,
>>>
>>>> -----Original Message-----
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>> Sent: Thursday, November 2, 2023 10:41 AM
>>>> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna
>>>> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org
>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d
>> is
>>>> enabled
>>>>
>>>>
>>>> On 02/11/2023 16:58, Andrzej Hajda wrote:
>>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
>>>>>> Experiments were conducted with different multipliers to VTD_GUARD
>> macro
>>>>>> with multiplier of 185 we were observing occasional pipe faults when
>>>>>> running kms_cursor_legacy --run-subtest single-bo
>>>>>>
>>>>>> There could possibly be an underlying issue that is being
>>>>>> investigated, for
>>>>>> now bump the guard pages for MTL.
>>>>>>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>>>>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
>>>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> index 3770828f2eaf..b65f84c6bb3f 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
>>>>>> drm_i915_gem_object *obj,
>>>>>>         if (intel_scanout_needs_vtd_wa(i915)) {
>>>>>>             unsigned int guard = VTD_GUARD;
>>>>>> +        if (IS_METEORLAKE(i915))
>>>>>> +            guard *= 200;
>>>>>> +
>>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
>>>>>
>>>>> Looks insanely high, 131MB for padding, if this is before and after it
>>>>> becomes even 262MB of wasted address per plane. Just signalling, I do
>>>>> not know if this actually hurts.
>>>> Yeah this feels crazy. There must be some other explanation which is
>>>> getting hidden by the crazy amount of padding so I'd rather we figured
>>>> it out.
>>>>
>>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
>>>> with double/triple buffering?
>>> I believe with this method we will have to limit the no of frame buffers in the
>> system. One alternative
>>> that worked is to do a proper clear range for the ggtt instead of doing a nop.
>> Although it adds marginal
>>> time during suspend/resume/boot it does not add restrictions to the no of fb's
>> that can be used.
>>
>> And if we remember the guard pages replaced clearing to scratch, to
>> improve suspend resume times, exactly for improving user experience. :(
>>
>> Luckily there is time to fix this properly on MTL one way or the other.
>> Is it just kms_cursor_legacy --run-subtest single-bo that is affected?
> I am trying to dump the page table entries at the time of failure for bot the fame buffer and if required
> For the guard pages. Will see if I get some info from there.
>
> Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public CI, I also see pipe errors
> being reported with varying occurrences while running kms_cursor_crc, kms_pipe_crc_basic,
> and kms_plane_scaling. More details on the occurrence can be found here [1].
>
> Thanks,
> RK
>
> 1. http://gfx-ci.igk.intel.com/cibuglog-ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a6633c

Could you check if [1] helps?

[1]: https://patchwork.freedesktop.org/series/125926/

Regards
Andrzej

>> Regards,
>>
>> Tvrtko
>>
>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>> P.S. Where did the 185 from the commit message come from?
>>> 185 came from experiment to increase the guard size. It is not a standard
>> number.
>>> Regards,
>>> Radhakrishna(RK) Sripada

Sripada, Radhakrishna Nov. 3, 2023, 10:23 p.m. UTC | #11

Hi Andrzej,

The patch mentioned below does not help with the issue.

Thanks,
RK

> -----Original Message-----
> From: Hajda, Andrzej <andrzej.hajda@intel.com>
> Sent: Friday, November 3, 2023 2:18 PM
> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>; Vivi, Rodrigo
> <rodrigo.vivi@intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
> enabled
> 
> 
> 
> On 03.11.2023 16:53, Sripada, Radhakrishna wrote:
> > Hi Tvrtko,
> >
> >> -----Original Message-----
> >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >> Sent: Friday, November 3, 2023 1:30 AM
> >> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej
> >> <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org
> >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d
> is
> >> enabled
> >>
> >>
> >> On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
> >>> Hi Tvrtko,
> >>>
> >>>> -----Original Message-----
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>> Sent: Thursday, November 2, 2023 10:41 AM
> >>>> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna
> >>>> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org
> >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when
> vt-d
> >> is
> >>>> enabled
> >>>>
> >>>>
> >>>> On 02/11/2023 16:58, Andrzej Hajda wrote:
> >>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
> >>>>>> Experiments were conducted with different multipliers to VTD_GUARD
> >> macro
> >>>>>> with multiplier of 185 we were observing occasional pipe faults when
> >>>>>> running kms_cursor_legacy --run-subtest single-bo
> >>>>>>
> >>>>>> There could possibly be an underlying issue that is being
> >>>>>> investigated, for
> >>>>>> now bump the guard pages for MTL.
> >>>>>>
> >>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> >>>>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> >>>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> >>>>>>     1 file changed, 3 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> index 3770828f2eaf..b65f84c6bb3f 100644
> >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
> >>>>>> drm_i915_gem_object *obj,
> >>>>>>         if (intel_scanout_needs_vtd_wa(i915)) {
> >>>>>>             unsigned int guard = VTD_GUARD;
> >>>>>> +        if (IS_METEORLAKE(i915))
> >>>>>> +            guard *= 200;
> >>>>>> +
> >>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
> >>>>>
> >>>>> Looks insanely high, 131MB for padding, if this is before and after it
> >>>>> becomes even 262MB of wasted address per plane. Just signalling, I do
> >>>>> not know if this actually hurts.
> >>>> Yeah this feels crazy. There must be some other explanation which is
> >>>> getting hidden by the crazy amount of padding so I'd rather we figured
> >>>> it out.
> >>>>
> >>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
> >>>> with double/triple buffering?
> >>> I believe with this method we will have to limit the no of frame buffers in the
> >> system. One alternative
> >>> that worked is to do a proper clear range for the ggtt instead of doing a nop.
> >> Although it adds marginal
> >>> time during suspend/resume/boot it does not add restrictions to the no of
> fb's
> >> that can be used.
> >>
> >> And if we remember the guard pages replaced clearing to scratch, to
> >> improve suspend resume times, exactly for improving user experience. :(
> >>
> >> Luckily there is time to fix this properly on MTL one way or the other.
> >> Is it just kms_cursor_legacy --run-subtest single-bo that is affected?
> > I am trying to dump the page table entries at the time of failure for bot the fame
> buffer and if required
> > For the guard pages. Will see if I get some info from there.
> >
> > Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public
> CI, I also see pipe errors
> > being reported with varying occurrences while running kms_cursor_crc,
> kms_pipe_crc_basic,
> > and kms_plane_scaling. More details on the occurrence can be found here [1].
> >
> > Thanks,
> > RK
> >
> > 1. http://gfx-ci.igk.intel.com/cibuglog-
> ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a
> 6633c
> 
> Could you check if [1] helps?
> 
> [1]: https://patchwork.freedesktop.org/series/125926/
> 
> Regards
> Andrzej
> 
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>
> >>>> Regards,
> >>>>
> >>>> Tvrtko
> >>>>
> >>>> P.S. Where did the 185 from the commit message come from?
> >>> 185 came from experiment to increase the guard size. It is not a standard
> >> number.
> >>> Regards,
> >>> Radhakrishna(RK) Sripada

drm/i915/mtl: Increase guard pages when vt-d is enabled

Commit Message

Comments

Patch