drm/i915/perf: Use GTT when saving/restoring engine GPR
diff mbox series

Message ID 20200709195837.4285-1-umesh.nerlige.ramappa@intel.com
State New
Headers show
Series
  • drm/i915/perf: Use GTT when saving/restoring engine GPR
Related show

Commit Message

Umesh Nerlige Ramappa July 9, 2020, 7:58 p.m. UTC
MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
translation to use when saving restoring the engine general purpose
registers to and from the GT scratch. Since GT scratch is mapped to
ggtt, we need to set an additional bit in the command to use GTT.

Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Chris Wilson July 9, 2020, 8:37 p.m. UTC | #1
Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37)
> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
> translation to use when saving restoring the engine general purpose
> registers to and from the GT scratch. Since GT scratch is mapped to
> ggtt, we need to set an additional bit in the command to use GTT.
> 
> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index de69d430b1ed..c6f6370283cf 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>         u32 d;
>  
>         cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
> +       cmd |= MI_SRM_LRM_GLOBAL_GTT;

Indeed.

For bonus points, please write a selftest to verify that the user's GPR
are restored. Something as simple as live_noa_delay, but instead of
measuring the time; submit a request to write telltales into the GPR,
submit a request to run noa_wait; then readback the telltales from a
final request.

Now since the noa_wait is being run from the GGTT, the address space
selector should be reading from the GGTT. So I am actually curious as to
whether we have a bug or not.
-Chris
Lionel Landwerlin July 9, 2020, 8:40 p.m. UTC | #2
On 09/07/2020 22:58, Umesh Nerlige Ramappa wrote:
> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
> translation to use when saving restoring the engine general purpose
> registers to and from the GT scratch. Since GT scratch is mapped to
> ggtt, we need to set an additional bit in the command to use GTT.
>
> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


Thanks a lot!

> ---
>   drivers/gpu/drm/i915/i915_perf.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index de69d430b1ed..c6f6370283cf 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>   	u32 d;
>   
>   	cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
> +	cmd |= MI_SRM_LRM_GLOBAL_GTT;
>   	if (INTEL_GEN(stream->perf->i915) >= 8)
>   		cmd++;
>
Lionel Landwerlin July 9, 2020, 8:42 p.m. UTC | #3
On 09/07/2020 23:37, Chris Wilson wrote:
> Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37)
>> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
>> translation to use when saving restoring the engine general purpose
>> registers to and from the GT scratch. Since GT scratch is mapped to
>> ggtt, we need to set an additional bit in the command to use GTT.
>>
>> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
>> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index de69d430b1ed..c6f6370283cf 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>>          u32 d;
>>   
>>          cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
>> +       cmd |= MI_SRM_LRM_GLOBAL_GTT;
> Indeed.
>
> For bonus points, please write a selftest to verify that the user's GPR
> are restored. Something as simple as live_noa_delay, but instead of
> measuring the time; submit a request to write telltales into the GPR,
> submit a request to run noa_wait; then readback the telltales from a
> final request.
>
> Now since the noa_wait is being run from the GGTT, the address space
> selector should be reading from the GGTT. So I am actually curious as to
> whether we have a bug or not.
> -Chris

I'm not super competent on the PPGTT setup, but I assumed this worked 
because we wrote into the ppgtt scratch page.

Potentially trashing an app VM space if it executes a reconfiguration.


-Lionel
Chris Wilson July 9, 2020, 8:46 p.m. UTC | #4
Quoting Lionel Landwerlin (2020-07-09 21:42:39)
> On 09/07/2020 23:37, Chris Wilson wrote:
> > Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37)
> >> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
> >> translation to use when saving restoring the engine general purpose
> >> registers to and from the GT scratch. Since GT scratch is mapped to
> >> ggtt, we need to set an additional bit in the command to use GTT.
> >>
> >> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
> >> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_perf.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> index de69d430b1ed..c6f6370283cf 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
> >>          u32 d;
> >>   
> >>          cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
> >> +       cmd |= MI_SRM_LRM_GLOBAL_GTT;
> > Indeed.
> >
> > For bonus points, please write a selftest to verify that the user's GPR
> > are restored. Something as simple as live_noa_delay, but instead of
> > measuring the time; submit a request to write telltales into the GPR,
> > submit a request to run noa_wait; then readback the telltales from a
> > final request.
> >
> > Now since the noa_wait is being run from the GGTT, the address space
> > selector should be reading from the GGTT. So I am actually curious as to
> > whether we have a bug or not.
> > -Chris
> 
> I'm not super competent on the PPGTT setup, but I assumed this worked 
> because we wrote into the ppgtt scratch page.

It's not a STORE_INDEX, it's writing to an absolute address based on the
address space selector which is a combination of the batch address space
and the command bits. Certainly the batch itself is read from the GGTT,
but I can't off hand remember the rules for the various MI (whether they
could even access ppGTT when launched from GGTT).
-Chris
Lionel Landwerlin July 9, 2020, 8:50 p.m. UTC | #5
On 09/07/2020 23:46, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2020-07-09 21:42:39)
>> On 09/07/2020 23:37, Chris Wilson wrote:
>>> Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37)
>>>> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
>>>> translation to use when saving restoring the engine general purpose
>>>> registers to and from the GT scratch. Since GT scratch is mapped to
>>>> ggtt, we need to set an additional bit in the command to use GTT.
>>>>
>>>> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
>>>> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_perf.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>>> index de69d430b1ed..c6f6370283cf 100644
>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>>>>           u32 d;
>>>>    
>>>>           cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
>>>> +       cmd |= MI_SRM_LRM_GLOBAL_GTT;
>>> Indeed.
>>>
>>> For bonus points, please write a selftest to verify that the user's GPR
>>> are restored. Something as simple as live_noa_delay, but instead of
>>> measuring the time; submit a request to write telltales into the GPR,
>>> submit a request to run noa_wait; then readback the telltales from a
>>> final request.
>>>
>>> Now since the noa_wait is being run from the GGTT, the address space
>>> selector should be reading from the GGTT. So I am actually curious as to
>>> whether we have a bug or not.
>>> -Chris
>> I'm not super competent on the PPGTT setup, but I assumed this worked
>> because we wrote into the ppgtt scratch page.
> It's not a STORE_INDEX, it's writing to an absolute address based on the
> address space selector which is a combination of the batch address space
> and the command bits. Certainly the batch itself is read from the GGTT,
> but I can't off hand remember the rules for the various MI (whether they
> could even access ppGTT when launched from GGTT).
> -Chris

My understanding was that from a GGTT batch you could read/write into PPGTT.

But not the other way around (obvisously).

I thought the kernel mapped a page throughout the entire PPGTT space to 
prevent pagefaults. Is that not the case?


-Lionel
Chris Wilson July 9, 2020, 9 p.m. UTC | #6
Quoting Lionel Landwerlin (2020-07-09 21:50:30)
> On 09/07/2020 23:46, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2020-07-09 21:42:39)
> >> On 09/07/2020 23:37, Chris Wilson wrote:
> >>> Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37)
> >>>> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
> >>>> translation to use when saving restoring the engine general purpose
> >>>> registers to and from the GT scratch. Since GT scratch is mapped to
> >>>> ggtt, we need to set an additional bit in the command to use GTT.
> >>>>
> >>>> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
> >>>> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> >>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/i915_perf.c | 1 +
> >>>>    1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >>>> index de69d430b1ed..c6f6370283cf 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_perf.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >>>> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
> >>>>           u32 d;
> >>>>    
> >>>>           cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
> >>>> +       cmd |= MI_SRM_LRM_GLOBAL_GTT;
> >>> Indeed.
> >>>
> >>> For bonus points, please write a selftest to verify that the user's GPR
> >>> are restored. Something as simple as live_noa_delay, but instead of
> >>> measuring the time; submit a request to write telltales into the GPR,
> >>> submit a request to run noa_wait; then readback the telltales from a
> >>> final request.
> >>>
> >>> Now since the noa_wait is being run from the GGTT, the address space
> >>> selector should be reading from the GGTT. So I am actually curious as to
> >>> whether we have a bug or not.
> >>> -Chris
> >> I'm not super competent on the PPGTT setup, but I assumed this worked
> >> because we wrote into the ppgtt scratch page.
> > It's not a STORE_INDEX, it's writing to an absolute address based on the
> > address space selector which is a combination of the batch address space
> > and the command bits. Certainly the batch itself is read from the GGTT,
> > but I can't off hand remember the rules for the various MI (whether they
> > could even access ppGTT when launched from GGTT).
> > -Chris
> 
> My understanding was that from a GGTT batch you could read/write into PPGTT.
> 
> But not the other way around (obvisously).
> 
> I thought the kernel mapped a page throughout the entire PPGTT space to 
> prevent pagefaults. Is that not the case?

Yes, a readonly page, where supported.

Ah, now I understand you meant *that* scratch page as opposed to a
page we allocated for the purpose of saving per-context state like the
gt->scratch page.
-Chris
Chris Wilson July 10, 2020, 9:43 a.m. UTC | #7
Quoting Lionel Landwerlin (2020-07-09 21:40:50)
> On 09/07/2020 22:58, Umesh Nerlige Ramappa wrote:
> > MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
> > translation to use when saving restoring the engine general purpose
> > registers to and from the GT scratch. Since GT scratch is mapped to
> > ggtt, we need to set an additional bit in the command to use GTT.
> >
> > Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
> > Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> 
> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> 
> 
> Thanks a lot!

Pushed, thanks for the fix.
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index de69d430b1ed..c6f6370283cf 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1592,6 +1592,7 @@  static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
 	u32 d;
 
 	cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM;
+	cmd |= MI_SRM_LRM_GLOBAL_GTT;
 	if (INTEL_GEN(stream->perf->i915) >= 8)
 		cmd++;