diff mbox series

[v4] drm/i915/selftest/gsc: Ensure GSC Proxy init completes before selftests

Message ID 20230711220204.2085513-1-alan.previn.teres.alexis@intel.com (mailing list archive)
State New, archived
Headers show
Series [v4] drm/i915/selftest/gsc: Ensure GSC Proxy init completes before selftests | expand

Commit Message

Alan Previn July 11, 2023, 10:02 p.m. UTC
On MTL, if the GSC Proxy init flows haven't completed, submissions to the
GSC engine will fail. Those init flows are dependent on the mei's
gsc_proxy component that is loaded in parallel with i915 and a
worker that could potentially start after i915 driver init is done.

That said, all subsytems that access the GSC engine today does check
for such init flow completion before using the GSC engine. However,
selftests currently don't wait on anything before starting.

To fix this, add a waiter function at the start of __run_selftests
that waits for gsc-proxy init flows to complete.

Difference from prior versions:
   v4: - Remove generalized waiters function table framework (Tvrtko).
       - Remove mention of CI-framework-timeout from comments (Tvrtko).
   v3: - Rebase to latest drm-tip.
   v2: - Based on internal testing, increase the timeout for gsc-proxy
         specific case to 8 seconds.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 .../gpu/drm/i915/selftests/i915_selftest.c    | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)


base-commit: 01c4678ab6c623c621a1dea438133e39711291d4

Comments

Tvrtko Ursulin July 12, 2023, 9:19 a.m. UTC | #1
On 11/07/2023 23:02, Alan Previn wrote:
> On MTL, if the GSC Proxy init flows haven't completed, submissions to the
> GSC engine will fail. Those init flows are dependent on the mei's
> gsc_proxy component that is loaded in parallel with i915 and a
> worker that could potentially start after i915 driver init is done.
> 
> That said, all subsytems that access the GSC engine today does check
> for such init flow completion before using the GSC engine. However,
> selftests currently don't wait on anything before starting.
> 
> To fix this, add a waiter function at the start of __run_selftests
> that waits for gsc-proxy init flows to complete.
> 
> Difference from prior versions:
>     v4: - Remove generalized waiters function table framework (Tvrtko).
>         - Remove mention of CI-framework-timeout from comments (Tvrtko).
>     v3: - Rebase to latest drm-tip.
>     v2: - Based on internal testing, increase the timeout for gsc-proxy
>           specific case to 8 seconds.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> ---
>   .../gpu/drm/i915/selftests/i915_selftest.c    | 25 +++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c
> index 39da0fb0d6d2..bbfaaaeef505 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_selftest.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c
> @@ -24,6 +24,8 @@
>   #include <linux/random.h>
>   
>   #include "gt/intel_gt_pm.h"
> +#include "gt/uc/intel_gsc_fw.h"
> +
>   #include "i915_driver.h"
>   #include "i915_drv.h"
>   #include "i915_selftest.h"
> @@ -127,6 +129,26 @@ static void set_default_test_all(struct selftest *st, unsigned int count)
>   		st[i].enabled = true;
>   }
>   
> +static void
> +__wait_gsc_proxy_completed(struct drm_i915_private *i915)
> +{
> +	bool need_to_wait = (IS_ENABLED(CONFIG_INTEL_MEI_GSC_PROXY) &&
> +			     i915->media_gt &&
> +			     HAS_ENGINE(i915->media_gt, GSC0) &&
> +			     intel_uc_fw_is_loadable(&i915->media_gt->uc.gsc.fw));
> +	/*
> +	 * The gsc proxy component depends on the kernel component driver load ordering
> +	 * and in corner cases (the first time after an IFWI flash), init-completion
> +	 * firmware flows take longer.
> +	 */
> +	unsigned long timeout_ms = 8000;
> +
> +	if (need_to_wait &&
> +	    (wait_for(intel_gsc_uc_fw_proxy_init_done(&i915->media_gt->uc.gsc, true),
> +	    timeout_ms)))
> +		pr_info(DRIVER_NAME "Timed out waiting for gsc_proxy_completion!\n");

Would it make sense to error out here? Or at least upgrade to pr_warn or 
something?

I didn't quite understand the points Daniele raised about engine loops 
and resets - in my mind GSC engine is this special thing exercised for 
highly specialized operations and not touched in random for_each_engine 
loop tests, but I also did not really look so might be totally wrong.

In any case, v4 reads clear - no confusing comments and not 
over-engineered so is acceptable to me.

Regards,

Tvrtko

P.S. Maybe the check *could* be moved to i915_live_selftests, where hw 
dependencies conceptually fit better, and maybe i915_perf_selftests 
would need it too then (?), but it is up to you.

Maybe even in the array selftests/i915_live_selftests.h if we could add 
a facility to make unskippable tests and add this one after the sanity 
check. Which would then achieve the same generalized thing you had in 
the previous version without needing to add a new array/loop.

> +}
> +
>   static int __run_selftests(const char *name,
>   			   struct selftest *st,
>   			   unsigned int count,
> @@ -134,6 +156,9 @@ static int __run_selftests(const char *name,
>   {
>   	int err = 0;
>   
> +	if (data)
> +		__wait_gsc_proxy_completed(data);
> +
>   	while (!i915_selftest.random_seed)
>   		i915_selftest.random_seed = get_random_u32();
>   
> 
> base-commit: 01c4678ab6c623c621a1dea438133e39711291d4
Alan Previn July 12, 2023, 5:49 p.m. UTC | #2
On Wed, 2023-07-12 at 10:19 +0100, Tvrtko Ursulin wrote:
> On 11/07/2023 23:02, Alan Previn wrote:
> > On MTL, if the GSC Proxy init flows haven't completed, submissions to the
> > GSC engine will fail. Those init flows are dependent on the mei's
> > gsc_proxy component that is loaded in parallel with i915 and a
> > worker that could potentially start after i915 driver init is done.
> > 
> > That said, all subsytems that access the GSC engine today does check
> > for such init flow completion before using the GSC engine. However,
> > selftests currently don't wait on anything before starting.
> > 
> > 
> > 
alan:snip

> > +	/*
> > +	 * The gsc proxy component depends on the kernel component driver load ordering
> > +	 * and in corner cases (the first time after an IFWI flash), init-completion
> > +	 * firmware flows take longer.
> > +	 */
> > +	unsigned long timeout_ms = 8000;
> > +
> > +	if (need_to_wait &&
> > +	    (wait_for(intel_gsc_uc_fw_proxy_init_done(&i915->media_gt->uc.gsc, true),
> > +	    timeout_ms)))
> > +		pr_info(DRIVER_NAME "Timed out waiting for gsc_proxy_completion!\n");
> 
> Would it make sense to error out here? Or at least upgrade to pr_warn or 
> something?
alan: agree on pr_warn (especially since need_for_wait being true and we tried for 8 secs - this should never happen).

> 
> I didn't quite understand the points Daniele raised about engine loops 
> and resets - in my mind GSC engine is this special thing exercised for 
> highly specialized operations and not touched in random for_each_engine 
> loop tests, but I also did not really look so might be totally wrong.

alan: after consulting with Daniele further, the concern in the case of
having gsc-proxy-init mid-execution while other selttests
are running (when thinking of tests that have nothing to do with GSC
but has indirect effect like memory-pressure, engine-resets,
etc) is that the proxy-init will bail-out print an error but
that would result in CI reporting a false-negative. We can't skip
that error since CONFIG_INTEL_MEI_GSC_PROXY tells us that the kernel
owner wants GSC-PROXY working.

> 
> In any case, v4 reads clear - no confusing comments and not 
> over-engineered so is acceptable to me.
> 
alan: thanks Tvrtko.


> P.S. Maybe the check *could* be moved to i915_live_selftests, where hw 
> dependencies conceptually fit better, and maybe i915_perf_selftests 
> would need it too then (?), but it is up to you.
alan: i can do this quickly as a rev5 (after a bit of manual check if perf needs it)

> 
> Maybe even in the array selftests/i915_live_selftests.h if we could add 
> a facility to make unskippable tests and add this one after the sanity 
> check. Which would then achieve the same generalized thing you had in 
> the previous version without needing to add a new array/loop.
alan: i rather not attempt this as part of the current patch but will
consider it as a separate patch if you are okay with that?
Tvrtko Ursulin July 13, 2023, 7:40 a.m. UTC | #3
On 12/07/2023 18:49, Teres Alexis, Alan Previn wrote:
> On Wed, 2023-07-12 at 10:19 +0100, Tvrtko Ursulin wrote:
>> On 11/07/2023 23:02, Alan Previn wrote:
>>> On MTL, if the GSC Proxy init flows haven't completed, submissions to the
>>> GSC engine will fail. Those init flows are dependent on the mei's
>>> gsc_proxy component that is loaded in parallel with i915 and a
>>> worker that could potentially start after i915 driver init is done.
>>>
>>> That said, all subsytems that access the GSC engine today does check
>>> for such init flow completion before using the GSC engine. However,
>>> selftests currently don't wait on anything before starting.
>>>
>>>
>>>
> alan:snip
> 
>>> +	/*
>>> +	 * The gsc proxy component depends on the kernel component driver load ordering
>>> +	 * and in corner cases (the first time after an IFWI flash), init-completion
>>> +	 * firmware flows take longer.
>>> +	 */
>>> +	unsigned long timeout_ms = 8000;
>>> +
>>> +	if (need_to_wait &&
>>> +	    (wait_for(intel_gsc_uc_fw_proxy_init_done(&i915->media_gt->uc.gsc, true),
>>> +	    timeout_ms)))
>>> +		pr_info(DRIVER_NAME "Timed out waiting for gsc_proxy_completion!\n");
>>
>> Would it make sense to error out here? Or at least upgrade to pr_warn or
>> something?
> alan: agree on pr_warn (especially since need_for_wait being true and we tried for 8 secs - this should never happen).
> 
>>
>> I didn't quite understand the points Daniele raised about engine loops
>> and resets - in my mind GSC engine is this special thing exercised for
>> highly specialized operations and not touched in random for_each_engine
>> loop tests, but I also did not really look so might be totally wrong.
> 
> alan: after consulting with Daniele further, the concern in the case of
> having gsc-proxy-init mid-execution while other selttests
> are running (when thinking of tests that have nothing to do with GSC
> but has indirect effect like memory-pressure, engine-resets,
> etc) is that the proxy-init will bail-out print an error but
> that would result in CI reporting a false-negative. We can't skip
> that error since CONFIG_INTEL_MEI_GSC_PROXY tells us that the kernel
> owner wants GSC-PROXY working.
> 
>>
>> In any case, v4 reads clear - no confusing comments and not
>> over-engineered so is acceptable to me.
>>
> alan: thanks Tvrtko.
> 
> 
>> P.S. Maybe the check *could* be moved to i915_live_selftests, where hw
>> dependencies conceptually fit better, and maybe i915_perf_selftests
>> would need it too then (?), but it is up to you.
> alan: i can do this quickly as a rev5 (after a bit of manual check if perf needs it)
> 
>>
>> Maybe even in the array selftests/i915_live_selftests.h if we could add
>> a facility to make unskippable tests and add this one after the sanity
>> check. Which would then achieve the same generalized thing you had in
>> the previous version without needing to add a new array/loop.
> alan: i rather not attempt this as part of the current patch but will
> consider it as a separate patch if you are okay with that?

Yes that is fine.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c
index 39da0fb0d6d2..bbfaaaeef505 100644
--- a/drivers/gpu/drm/i915/selftests/i915_selftest.c
+++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c
@@ -24,6 +24,8 @@ 
 #include <linux/random.h>
 
 #include "gt/intel_gt_pm.h"
+#include "gt/uc/intel_gsc_fw.h"
+
 #include "i915_driver.h"
 #include "i915_drv.h"
 #include "i915_selftest.h"
@@ -127,6 +129,26 @@  static void set_default_test_all(struct selftest *st, unsigned int count)
 		st[i].enabled = true;
 }
 
+static void
+__wait_gsc_proxy_completed(struct drm_i915_private *i915)
+{
+	bool need_to_wait = (IS_ENABLED(CONFIG_INTEL_MEI_GSC_PROXY) &&
+			     i915->media_gt &&
+			     HAS_ENGINE(i915->media_gt, GSC0) &&
+			     intel_uc_fw_is_loadable(&i915->media_gt->uc.gsc.fw));
+	/*
+	 * The gsc proxy component depends on the kernel component driver load ordering
+	 * and in corner cases (the first time after an IFWI flash), init-completion
+	 * firmware flows take longer.
+	 */
+	unsigned long timeout_ms = 8000;
+
+	if (need_to_wait &&
+	    (wait_for(intel_gsc_uc_fw_proxy_init_done(&i915->media_gt->uc.gsc, true),
+	    timeout_ms)))
+		pr_info(DRIVER_NAME "Timed out waiting for gsc_proxy_completion!\n");
+}
+
 static int __run_selftests(const char *name,
 			   struct selftest *st,
 			   unsigned int count,
@@ -134,6 +156,9 @@  static int __run_selftests(const char *name,
 {
 	int err = 0;
 
+	if (data)
+		__wait_gsc_proxy_completed(data);
+
 	while (!i915_selftest.random_seed)
 		i915_selftest.random_seed = get_random_u32();