Message ID | 20231102160644.1279801-1-radhakrishna.sripada@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/mtl: Increase guard pages when vt-d is enabled | expand |
Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00) >Experiments were conducted with different multipliers to VTD_GUARD macro >with multiplier of 185 we were observing occasional pipe faults when >running kms_cursor_legacy --run-subtest single-bo > >There could possibly be an underlying issue that is being investigated, for >now bump the guard pages for MTL. > >Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 >Cc: Gustavo Sousa <gustavo.sousa@intel.com> >Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> >--- > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >index 3770828f2eaf..b65f84c6bb3f 100644 >--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > if (intel_scanout_needs_vtd_wa(i915)) { > unsigned int guard = VTD_GUARD; > I remember trying increasing the guard, but with a much smaller multiplier. So it turns out that using a much higher value did the "trick". I would add a FIXME comment here to remind us that this is a hack. With the FIXME in place, Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> >+ if (IS_METEORLAKE(i915)) >+ guard *= 200; >+ > if (i915_gem_object_is_tiled(obj)) > guard = max(guard, > i915_gem_object_get_tile_row_size(obj)); >-- >2.34.1 >
On 02.11.2023 17:06, Radhakrishna Sripada wrote: > Experiments were conducted with different multipliers to VTD_GUARD macro > with multiplier of 185 we were observing occasional pipe faults when > running kms_cursor_legacy --run-subtest single-bo > > There could possibly be an underlying issue that is being investigated, for > now bump the guard pages for MTL. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > Cc: Gustavo Sousa <gustavo.sousa@intel.com> > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > index 3770828f2eaf..b65f84c6bb3f 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > if (intel_scanout_needs_vtd_wa(i915)) { > unsigned int guard = VTD_GUARD; > > + if (IS_METEORLAKE(i915)) > + guard *= 200; > + 200 * VTD_GUARD = 200 * 168 * 4K = 131MB Looks insanely high, 131MB for padding, if this is before and after it becomes even 262MB of wasted address per plane. Just signalling, I do not know if this actually hurts. Regards Andrzej > if (i915_gem_object_is_tiled(obj)) > guard = max(guard, > i915_gem_object_get_tile_row_size(obj));
On 02/11/2023 16:58, Andrzej Hajda wrote: > On 02.11.2023 17:06, Radhakrishna Sripada wrote: >> Experiments were conducted with different multipliers to VTD_GUARD macro >> with multiplier of 185 we were observing occasional pipe faults when >> running kms_cursor_legacy --run-subtest single-bo >> >> There could possibly be an underlying issue that is being >> investigated, for >> now bump the guard pages for MTL. >> >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 >> Cc: Gustavo Sousa <gustavo.sousa@intel.com> >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >> index 3770828f2eaf..b65f84c6bb3f 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct >> drm_i915_gem_object *obj, >> if (intel_scanout_needs_vtd_wa(i915)) { >> unsigned int guard = VTD_GUARD; >> + if (IS_METEORLAKE(i915)) >> + guard *= 200; >> + > > 200 * VTD_GUARD = 200 * 168 * 4K = 131MB > > Looks insanely high, 131MB for padding, if this is before and after it > becomes even 262MB of wasted address per plane. Just signalling, I do > not know if this actually hurts. Yeah this feels crazy. There must be some other explanation which is getting hidden by the crazy amount of padding so I'd rather we figured it out. With 262MiB per fb how many fit in GGTT before eviction hits? N screens with double/triple buffering? Regards, Tvrtko P.S. Where did the 185 from the commit message come from?
Quoting Gustavo Sousa (2023-11-02 13:35:53-03:00) >Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00) >>Experiments were conducted with different multipliers to VTD_GUARD macro >>with multiplier of 185 we were observing occasional pipe faults when >>running kms_cursor_legacy --run-subtest single-bo >> >>There could possibly be an underlying issue that is being investigated, for >>now bump the guard pages for MTL. >> >>Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 >>Cc: Gustavo Sousa <gustavo.sousa@intel.com> >>Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> >>--- >> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>index 3770828f2eaf..b65f84c6bb3f 100644 >>--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, >> if (intel_scanout_needs_vtd_wa(i915)) { >> unsigned int guard = VTD_GUARD; >> > >I remember trying increasing the guard, but with a much smaller multiplier. So >it turns out that using a much higher value did the "trick". > >I would add a FIXME comment here to remind us that this is a hack. > >With the FIXME in place, > >Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Oops. I was too hasty on providing an r-b and did not really pay attention to the resulting size of the padding and its implications as highlighted by the other replies here. My bad, sorry about that. Please dismiss the r-b. -- Gustavo Sousa > >>+ if (IS_METEORLAKE(i915)) >>+ guard *= 200; >>+ >> if (i915_gem_object_is_tiled(obj)) >> guard = max(guard, >> i915_gem_object_get_tile_row_size(obj)); >>-- >>2.34.1 >>
Hi Tvrtko, > -----Original Message----- > From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Sent: Thursday, November 2, 2023 10:41 AM > To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna > <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is > enabled > > > On 02/11/2023 16:58, Andrzej Hajda wrote: > > On 02.11.2023 17:06, Radhakrishna Sripada wrote: > >> Experiments were conducted with different multipliers to VTD_GUARD macro > >> with multiplier of 185 we were observing occasional pipe faults when > >> running kms_cursor_legacy --run-subtest single-bo > >> > >> There could possibly be an underlying issue that is being > >> investigated, for > >> now bump the guard pages for MTL. > >> > >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > >> Cc: Gustavo Sousa <gustavo.sousa@intel.com> > >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > >> --- > >> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >> index 3770828f2eaf..b65f84c6bb3f 100644 > >> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct > >> drm_i915_gem_object *obj, > >> if (intel_scanout_needs_vtd_wa(i915)) { > >> unsigned int guard = VTD_GUARD; > >> + if (IS_METEORLAKE(i915)) > >> + guard *= 200; > >> + > > > > 200 * VTD_GUARD = 200 * 168 * 4K = 131MB > > > > Looks insanely high, 131MB for padding, if this is before and after it > > becomes even 262MB of wasted address per plane. Just signalling, I do > > not know if this actually hurts. > > Yeah this feels crazy. There must be some other explanation which is > getting hidden by the crazy amount of padding so I'd rather we figured > it out. > > With 262MiB per fb how many fit in GGTT before eviction hits? N screens > with double/triple buffering? I believe with this method we will have to limit the no of frame buffers in the system. One alternative that worked is to do a proper clear range for the ggtt instead of doing a nop. Although it adds marginal time during suspend/resume/boot it does not add restrictions to the no of fb's that can be used. > > Regards, > > Tvrtko > > P.S. Where did the 185 from the commit message come from? 185 came from experiment to increase the guard size. It is not a standard number. Regards, Radhakrishna(RK) Sripada
On 02/11/2023 22:14, Sripada, Radhakrishna wrote: > Hi Tvrtko, > >> -----Original Message----- >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Sent: Thursday, November 2, 2023 10:41 AM >> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna >> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is >> enabled >> >> >> On 02/11/2023 16:58, Andrzej Hajda wrote: >>> On 02.11.2023 17:06, Radhakrishna Sripada wrote: >>>> Experiments were conducted with different multipliers to VTD_GUARD macro >>>> with multiplier of 185 we were observing occasional pipe faults when >>>> running kms_cursor_legacy --run-subtest single-bo >>>> >>>> There could possibly be an underlying issue that is being >>>> investigated, for >>>> now bump the guard pages for MTL. >>>> >>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 >>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com> >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>> index 3770828f2eaf..b65f84c6bb3f 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct >>>> drm_i915_gem_object *obj, >>>> if (intel_scanout_needs_vtd_wa(i915)) { >>>> unsigned int guard = VTD_GUARD; >>>> + if (IS_METEORLAKE(i915)) >>>> + guard *= 200; >>>> + >>> >>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB >>> >>> Looks insanely high, 131MB for padding, if this is before and after it >>> becomes even 262MB of wasted address per plane. Just signalling, I do >>> not know if this actually hurts. >> >> Yeah this feels crazy. There must be some other explanation which is >> getting hidden by the crazy amount of padding so I'd rather we figured >> it out. >> >> With 262MiB per fb how many fit in GGTT before eviction hits? N screens >> with double/triple buffering? > > I believe with this method we will have to limit the no of frame buffers in the system. One alternative > that worked is to do a proper clear range for the ggtt instead of doing a nop. Although it adds marginal > time during suspend/resume/boot it does not add restrictions to the no of fb's that can be used. And if we remember the guard pages replaced clearing to scratch, to improve suspend resume times, exactly for improving user experience. :( Luckily there is time to fix this properly on MTL one way or the other. Is it just kms_cursor_legacy --run-subtest single-bo that is affected? Regards, Tvrtko >> >> Regards, >> >> Tvrtko >> >> P.S. Where did the 185 from the commit message come from? > 185 came from experiment to increase the guard size. It is not a standard number. > > Regards, > Radhakrishna(RK) Sripada
On Thu, Nov 02, 2023 at 01:35:53PM -0300, Gustavo Sousa wrote: > Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00) > >Experiments were conducted with different multipliers to VTD_GUARD macro > >with multiplier of 185 we were observing occasional pipe faults when > >running kms_cursor_legacy --run-subtest single-bo > > > >There could possibly be an underlying issue that is being investigated, for > >now bump the guard pages for MTL. > > > >Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > >Cc: Gustavo Sousa <gustavo.sousa@intel.com> > >Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > >--- > > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >index 3770828f2eaf..b65f84c6bb3f 100644 > >--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > > if (intel_scanout_needs_vtd_wa(i915)) { > > unsigned int guard = VTD_GUARD; > > > > I remember trying increasing the guard, but with a much smaller multiplier. So > it turns out that using a much higher value did the "trick". a much smaller multiplier could mess with the flags range? it is really hard to understand what of that 'flags' is really those 12 flags or what is this 'guard' and where that ends up... > > I would add a FIXME comment here to remind us that this is a hack. > > With the FIXME in place, > > Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> > > >+ if (IS_METEORLAKE(i915)) > >+ guard *= 200; > >+ > > if (i915_gem_object_is_tiled(obj)) > > guard = max(guard, > > i915_gem_object_get_tile_row_size(obj)); > >-- > >2.34.1 > >
Hi Tvrtko, > -----Original Message----- > From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Sent: Friday, November 3, 2023 1:30 AM > To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej > <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is > enabled > > > On 02/11/2023 22:14, Sripada, Radhakrishna wrote: > > Hi Tvrtko, > > > >> -----Original Message----- > >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > >> Sent: Thursday, November 2, 2023 10:41 AM > >> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna > >> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org > >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d > is > >> enabled > >> > >> > >> On 02/11/2023 16:58, Andrzej Hajda wrote: > >>> On 02.11.2023 17:06, Radhakrishna Sripada wrote: > >>>> Experiments were conducted with different multipliers to VTD_GUARD > macro > >>>> with multiplier of 185 we were observing occasional pipe faults when > >>>> running kms_cursor_legacy --run-subtest single-bo > >>>> > >>>> There could possibly be an underlying issue that is being > >>>> investigated, for > >>>> now bump the guard pages for MTL. > >>>> > >>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > >>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com> > >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > >>>> --- > >>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > >>>> 1 file changed, 3 insertions(+) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>> index 3770828f2eaf..b65f84c6bb3f 100644 > >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct > >>>> drm_i915_gem_object *obj, > >>>> if (intel_scanout_needs_vtd_wa(i915)) { > >>>> unsigned int guard = VTD_GUARD; > >>>> + if (IS_METEORLAKE(i915)) > >>>> + guard *= 200; > >>>> + > >>> > >>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB > >>> > >>> Looks insanely high, 131MB for padding, if this is before and after it > >>> becomes even 262MB of wasted address per plane. Just signalling, I do > >>> not know if this actually hurts. > >> > >> Yeah this feels crazy. There must be some other explanation which is > >> getting hidden by the crazy amount of padding so I'd rather we figured > >> it out. > >> > >> With 262MiB per fb how many fit in GGTT before eviction hits? N screens > >> with double/triple buffering? > > > > I believe with this method we will have to limit the no of frame buffers in the > system. One alternative > > that worked is to do a proper clear range for the ggtt instead of doing a nop. > Although it adds marginal > > time during suspend/resume/boot it does not add restrictions to the no of fb's > that can be used. > > And if we remember the guard pages replaced clearing to scratch, to > improve suspend resume times, exactly for improving user experience. :( > > Luckily there is time to fix this properly on MTL one way or the other. > Is it just kms_cursor_legacy --run-subtest single-bo that is affected? I am trying to dump the page table entries at the time of failure for bot the fame buffer and if required For the guard pages. Will see if I get some info from there. Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public CI, I also see pipe errors being reported with varying occurrences while running kms_cursor_crc, kms_pipe_crc_basic, and kms_plane_scaling. More details on the occurrence can be found here [1]. Thanks, RK 1. http://gfx-ci.igk.intel.com/cibuglog-ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a6633c > > Regards, > > Tvrtko > > > >> > >> Regards, > >> > >> Tvrtko > >> > >> P.S. Where did the 185 from the commit message come from? > > 185 came from experiment to increase the guard size. It is not a standard > number. > > > > Regards, > > Radhakrishna(RK) Sripada
Hi Rodrigo, > -----Original Message----- > From: Vivi, Rodrigo <rodrigo.vivi@intel.com> > Sent: Friday, November 3, 2023 8:35 AM > To: Sousa, Gustavo <gustavo.sousa@intel.com> > Cc: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; intel- > gfx@lists.freedesktop.org; Chris Wilson <chris.p.wilson@linux.intel.com> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is > enabled > > On Thu, Nov 02, 2023 at 01:35:53PM -0300, Gustavo Sousa wrote: > > Quoting Radhakrishna Sripada (2023-11-02 13:06:44-03:00) > > >Experiments were conducted with different multipliers to VTD_GUARD macro > > >with multiplier of 185 we were observing occasional pipe faults when > > >running kms_cursor_legacy --run-subtest single-bo > > > > > >There could possibly be an underlying issue that is being investigated, for > > >now bump the guard pages for MTL. > > > > > >Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > > >Cc: Gustavo Sousa <gustavo.sousa@intel.com> > > >Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > > >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > > >--- > > > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > >index 3770828f2eaf..b65f84c6bb3f 100644 > > >--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > >+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > >@@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct > drm_i915_gem_object *obj, > > > if (intel_scanout_needs_vtd_wa(i915)) { > > > unsigned int guard = VTD_GUARD; > > > > > > > I remember trying increasing the guard, but with a much smaller multiplier. So > > it turns out that using a much higher value did the "trick". > > a much smaller multiplier could mess with the flags range? > it is really hard to understand what of that 'flags' is really those 12 flags > or what is this 'guard' and where that ends up... Based on my glance, if the multiplier fits in 32 bits then it should work. But the problem here pointed out by Tvrtko is that we are adding awful lot(262 mb per fb) of padding in the gurad pages clobbering up the ggtt address pace. Enough(10 to 20) fb's created we will fall into the realm of evictions. Regards, Radhakrishna(RK) Sripada > > > > > I would add a FIXME comment here to remind us that this is a hack. > > > > With the FIXME in place, > > > > Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> > > > > >+ if (IS_METEORLAKE(i915)) > > >+ guard *= 200; > > >+ > > > if (i915_gem_object_is_tiled(obj)) > > > guard = max(guard, > > > i915_gem_object_get_tile_row_size(obj)); > > >-- > > >2.34.1 > > >
On 03.11.2023 16:53, Sripada, Radhakrishna wrote: > Hi Tvrtko, > >> -----Original Message----- >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Sent: Friday, November 3, 2023 1:30 AM >> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej >> <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is >> enabled >> >> >> On 02/11/2023 22:14, Sripada, Radhakrishna wrote: >>> Hi Tvrtko, >>> >>>> -----Original Message----- >>>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>>> Sent: Thursday, November 2, 2023 10:41 AM >>>> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna >>>> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d >> is >>>> enabled >>>> >>>> >>>> On 02/11/2023 16:58, Andrzej Hajda wrote: >>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote: >>>>>> Experiments were conducted with different multipliers to VTD_GUARD >> macro >>>>>> with multiplier of 185 we were observing occasional pipe faults when >>>>>> running kms_cursor_legacy --run-subtest single-bo >>>>>> >>>>>> There could possibly be an underlying issue that is being >>>>>> investigated, for >>>>>> now bump the guard pages for MTL. >>>>>> >>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 >>>>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com> >>>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> >>>>>> --- >>>>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>>>> index 3770828f2eaf..b65f84c6bb3f 100644 >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c >>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct >>>>>> drm_i915_gem_object *obj, >>>>>> if (intel_scanout_needs_vtd_wa(i915)) { >>>>>> unsigned int guard = VTD_GUARD; >>>>>> + if (IS_METEORLAKE(i915)) >>>>>> + guard *= 200; >>>>>> + >>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB >>>>> >>>>> Looks insanely high, 131MB for padding, if this is before and after it >>>>> becomes even 262MB of wasted address per plane. Just signalling, I do >>>>> not know if this actually hurts. >>>> Yeah this feels crazy. There must be some other explanation which is >>>> getting hidden by the crazy amount of padding so I'd rather we figured >>>> it out. >>>> >>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens >>>> with double/triple buffering? >>> I believe with this method we will have to limit the no of frame buffers in the >> system. One alternative >>> that worked is to do a proper clear range for the ggtt instead of doing a nop. >> Although it adds marginal >>> time during suspend/resume/boot it does not add restrictions to the no of fb's >> that can be used. >> >> And if we remember the guard pages replaced clearing to scratch, to >> improve suspend resume times, exactly for improving user experience. :( >> >> Luckily there is time to fix this properly on MTL one way or the other. >> Is it just kms_cursor_legacy --run-subtest single-bo that is affected? > I am trying to dump the page table entries at the time of failure for bot the fame buffer and if required > For the guard pages. Will see if I get some info from there. > > Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public CI, I also see pipe errors > being reported with varying occurrences while running kms_cursor_crc, kms_pipe_crc_basic, > and kms_plane_scaling. More details on the occurrence can be found here [1]. > > Thanks, > RK > > 1. http://gfx-ci.igk.intel.com/cibuglog-ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a6633c Could you check if [1] helps? [1]: https://patchwork.freedesktop.org/series/125926/ Regards Andrzej >> Regards, >> >> Tvrtko >> >> >>>> Regards, >>>> >>>> Tvrtko >>>> >>>> P.S. Where did the 185 from the commit message come from? >>> 185 came from experiment to increase the guard size. It is not a standard >> number. >>> Regards, >>> Radhakrishna(RK) Sripada
Hi Andrzej, The patch mentioned below does not help with the issue. Thanks, RK > -----Original Message----- > From: Hajda, Andrzej <andrzej.hajda@intel.com> > Sent: Friday, November 3, 2023 2:18 PM > To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Tvrtko Ursulin > <tvrtko.ursulin@linux.intel.com>; intel-gfx@lists.freedesktop.org > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>; Vivi, Rodrigo > <rodrigo.vivi@intel.com> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is > enabled > > > > On 03.11.2023 16:53, Sripada, Radhakrishna wrote: > > Hi Tvrtko, > > > >> -----Original Message----- > >> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > >> Sent: Friday, November 3, 2023 1:30 AM > >> To: Sripada, Radhakrishna <radhakrishna.sripada@intel.com>; Hajda, Andrzej > >> <andrzej.hajda@intel.com>; intel-gfx@lists.freedesktop.org > >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d > is > >> enabled > >> > >> > >> On 02/11/2023 22:14, Sripada, Radhakrishna wrote: > >>> Hi Tvrtko, > >>> > >>>> -----Original Message----- > >>>> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > >>>> Sent: Thursday, November 2, 2023 10:41 AM > >>>> To: Hajda, Andrzej <andrzej.hajda@intel.com>; Sripada, Radhakrishna > >>>> <radhakrishna.sripada@intel.com>; intel-gfx@lists.freedesktop.org > >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when > vt-d > >> is > >>>> enabled > >>>> > >>>> > >>>> On 02/11/2023 16:58, Andrzej Hajda wrote: > >>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote: > >>>>>> Experiments were conducted with different multipliers to VTD_GUARD > >> macro > >>>>>> with multiplier of 185 we were observing occasional pipe faults when > >>>>>> running kms_cursor_legacy --run-subtest single-bo > >>>>>> > >>>>>> There could possibly be an underlying issue that is being > >>>>>> investigated, for > >>>>>> now bump the guard pages for MTL. > >>>>>> > >>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 > >>>>>> Cc: Gustavo Sousa <gustavo.sousa@intel.com> > >>>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> > >>>>>> --- > >>>>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > >>>>>> 1 file changed, 3 insertions(+) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>>>> index 3770828f2eaf..b65f84c6bb3f 100644 > >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > >>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct > >>>>>> drm_i915_gem_object *obj, > >>>>>> if (intel_scanout_needs_vtd_wa(i915)) { > >>>>>> unsigned int guard = VTD_GUARD; > >>>>>> + if (IS_METEORLAKE(i915)) > >>>>>> + guard *= 200; > >>>>>> + > >>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB > >>>>> > >>>>> Looks insanely high, 131MB for padding, if this is before and after it > >>>>> becomes even 262MB of wasted address per plane. Just signalling, I do > >>>>> not know if this actually hurts. > >>>> Yeah this feels crazy. There must be some other explanation which is > >>>> getting hidden by the crazy amount of padding so I'd rather we figured > >>>> it out. > >>>> > >>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens > >>>> with double/triple buffering? > >>> I believe with this method we will have to limit the no of frame buffers in the > >> system. One alternative > >>> that worked is to do a proper clear range for the ggtt instead of doing a nop. > >> Although it adds marginal > >>> time during suspend/resume/boot it does not add restrictions to the no of > fb's > >> that can be used. > >> > >> And if we remember the guard pages replaced clearing to scratch, to > >> improve suspend resume times, exactly for improving user experience. :( > >> > >> Luckily there is time to fix this properly on MTL one way or the other. > >> Is it just kms_cursor_legacy --run-subtest single-bo that is affected? > > I am trying to dump the page table entries at the time of failure for bot the fame > buffer and if required > > For the guard pages. Will see if I get some info from there. > > > > Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public > CI, I also see pipe errors > > being reported with varying occurrences while running kms_cursor_crc, > kms_pipe_crc_basic, > > and kms_plane_scaling. More details on the occurrence can be found here [1]. > > > > Thanks, > > RK > > > > 1. http://gfx-ci.igk.intel.com/cibuglog- > ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a > 6633c > > Could you check if [1] helps? > > [1]: https://patchwork.freedesktop.org/series/125926/ > > Regards > Andrzej > > >> Regards, > >> > >> Tvrtko > >> > >> > >>>> Regards, > >>>> > >>>> Tvrtko > >>>> > >>>> P.S. Where did the 185 from the commit message come from? > >>> 185 came from experiment to increase the guard size. It is not a standard > >> number. > >>> Regards, > >>> Radhakrishna(RK) Sripada
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 3770828f2eaf..b65f84c6bb3f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (intel_scanout_needs_vtd_wa(i915)) { unsigned int guard = VTD_GUARD; + if (IS_METEORLAKE(i915)) + guard *= 200; + if (i915_gem_object_is_tiled(obj)) guard = max(guard, i915_gem_object_get_tile_row_size(obj));
Experiments were conducted with different multipliers to VTD_GUARD macro with multiplier of 185 we were observing occasional pipe faults when running kms_cursor_legacy --run-subtest single-bo There could possibly be an underlying issue that is being investigated, for now bump the guard pages for MTL. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017 Cc: Gustavo Sousa <gustavo.sousa@intel.com> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ 1 file changed, 3 insertions(+)