diff mbox series

drm/i914/guc: Fix resume on platforms w/o GuC submission but enabled

Message ID 20191024162958.11839-1-don.hiatt@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i914/guc: Fix resume on platforms w/o GuC submission but enabled | expand

Commit Message

Hiatt, Don Oct. 24, 2019, 4:29 p.m. UTC
From: Don Hiatt <don.hiatt@intel.com>

Check to see if GuC submission is enabled before requesting the
EXIT_S_STATE action.

On some platforms (e.g. KBL) that do not support GuC submission, but
the user enabled the GuC communication (e.g for HuC authentication)
calling the GuC EXIT_S_STATE action results in lose of ability to
enter RC6. Guard against this by only requesting the GuC action on
platforms that support GuC submission.

I've verfied that intel_guc_resume() only gets called when driver
is loaded with: guc_enable={1,2,3}, all other cases (no args,
guc_enable={0,-1} the intel_guc_resume() is not called.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Summers, Stuart Oct. 24, 2019, 6:27 p.m. UTC | #1
On Thu, 2019-10-24 at 09:29 -0700, don.hiatt@intel.com wrote:
> From: Don Hiatt <don.hiatt@intel.com>
> 
> Check to see if GuC submission is enabled before requesting the
> EXIT_S_STATE action.
> 
> On some platforms (e.g. KBL) that do not support GuC submission, but
> the user enabled the GuC communication (e.g for HuC authentication)
> calling the GuC EXIT_S_STATE action results in lose of ability to
> enter RC6. Guard against this by only requesting the GuC action on
> platforms that support GuC submission.
> 
> I've verfied that intel_guc_resume() only gets called when driver
> is loaded with: guc_enable={1,2,3}, all other cases (no args,
> guc_enable={0,-1} the intel_guc_resume() is not called.
> 
> Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 37f7bcbf7dac..33318ed135c0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
>  		GUC_POWER_D0,
>  	};
>  
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +	if (guc->submission_supported)

Hey Don,

I might be missing something here, but glancing over the code for
submission_supported, it looks like this relies on the availability of
the firmware for the intended platform. Looking at the GuC table for
KBL, I do see this present (using KBL per your commit above). So
wouldn't this return true here if enable_guc is set to 1 or 3?

Thanks,
Stuart

> +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +
> +	return 0;
>  }
>  
>  /**
Hiatt, Don Oct. 24, 2019, 8:10 p.m. UTC | #2
> On Thu, 2019-10-24 at 09:29 -0700, don.hiatt@intel.com wrote:
> > From: Don Hiatt <don.hiatt@intel.com>
> >
> > Check to see if GuC submission is enabled before requesting the
> > EXIT_S_STATE action.
> >
> > On some platforms (e.g. KBL) that do not support GuC submission, but
> > the user enabled the GuC communication (e.g for HuC authentication)
> > calling the GuC EXIT_S_STATE action results in lose of ability to
> > enter RC6. Guard against this by only requesting the GuC action on
> > platforms that support GuC submission.
> >
> > I've verfied that intel_guc_resume() only gets called when driver
> > is loaded with: guc_enable={1,2,3}, all other cases (no args,
> > guc_enable={0,-1} the intel_guc_resume() is not called.
> >
> > Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 37f7bcbf7dac..33318ed135c0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
> >  		GUC_POWER_D0,
> >  	};
> >
> > -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > +	if (guc->submission_supported)
> 
> Hey Don,
> 
> I might be missing something here, but glancing over the code for
> submission_supported, it looks like this relies on the availability of
> the firmware for the intended platform. Looking at the GuC table for
> KBL, I do see this present (using KBL per your commit above). So
> wouldn't this return true here if enable_guc is set to 1 or 3?
> 
> Thanks,
> Stuart

Hi Stuart,

KBL does not support GuC submission, just HuC authentication. I've instrumented
the code and verified that all guc->submission_supported is always false when guc_enable
is set for KBL.

Thanks,

don

> 
> > +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > +
> > +	return 0;
> >  }
> >
> >  /**
Daniele Ceraolo Spurio Oct. 28, 2019, 4:44 p.m. UTC | #3
On 10/24/19 9:29 AM, don.hiatt@intel.com wrote:
> From: Don Hiatt <don.hiatt@intel.com>
> 
> Check to see if GuC submission is enabled before requesting the
> EXIT_S_STATE action.
> 

You're only skipping the resume, but does it make any sense to do the 
suspend action if we're not going to call the resume one? Does guc do 
anything in the suspend action that we still require? I thought it only 
saved the submission status, which we don't care about if guc submission 
is disabled.

Daniele

> On some platforms (e.g. KBL) that do not support GuC submission, but
> the user enabled the GuC communication (e.g for HuC authentication)
> calling the GuC EXIT_S_STATE action results in lose of ability to
> enter RC6. Guard against this by only requesting the GuC action on
> platforms that support GuC submission.
> 
> I've verfied that intel_guc_resume() only gets called when driver
> is loaded with: guc_enable={1,2,3}, all other cases (no args,
> guc_enable={0,-1} the intel_guc_resume() is not called.
> 
> Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 37f7bcbf7dac..33318ed135c0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
>   		GUC_POWER_D0,
>   	};
>   
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +	if (guc->submission_supported)
> +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +
> +	return 0;
>   }
>   
>   /**
>
Hiatt, Don Oct. 28, 2019, 6:17 p.m. UTC | #4
> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> Sent: Monday, October 28, 2019 9:44 AM
> To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o GuC
> submission but enabled
> 
> 
> 
> On 10/24/19 9:29 AM, don.hiatt@intel.com wrote:
> > From: Don Hiatt <don.hiatt@intel.com>
> >
> > Check to see if GuC submission is enabled before requesting the
> > EXIT_S_STATE action.
> >
> 
> You're only skipping the resume, but does it make any sense to do the
> suspend action if we're not going to call the resume one? Does guc do
> anything in the suspend action that we still require? I thought it only
> saved the submission status, which we don't care about if guc submission
> is disabled.
> 
> Daniele
> 

Hi Daniele,

I tried skipping the suspend all together but then the HuC gets timeouts
waiting for the GuC to acknowledge the authentication request which leads to a
wedged GPU. ☹ 

BTW, I made a typo in the patch, should be 'drm/i915' not '914', I'll fix that
up.

Thanks,

don


> > On some platforms (e.g. KBL) that do not support GuC submission, but
> > the user enabled the GuC communication (e.g for HuC authentication)
> > calling the GuC EXIT_S_STATE action results in lose of ability to
> > enter RC6. Guard against this by only requesting the GuC action on
> > platforms that support GuC submission.
> >
> > I've verfied that intel_guc_resume() only gets called when driver
> > is loaded with: guc_enable={1,2,3}, all other cases (no args,
> > guc_enable={0,-1} the intel_guc_resume() is not called.
> >
> > Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 37f7bcbf7dac..33318ed135c0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
> >   		GUC_POWER_D0,
> >   	};
> >
> > -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > +	if (guc->submission_supported)
> > +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > +
> > +	return 0;
> >   }
> >
> >   /**
> >
Daniele Ceraolo Spurio Oct. 28, 2019, 6:30 p.m. UTC | #5
On 10/28/19 11:17 AM, Hiatt, Don wrote:
>> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
>> Sent: Monday, October 28, 2019 9:44 AM
>> To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
>> Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o GuC
>> submission but enabled
>>
>>
>>
>> On 10/24/19 9:29 AM, don.hiatt@intel.com wrote:
>>> From: Don Hiatt <don.hiatt@intel.com>
>>>
>>> Check to see if GuC submission is enabled before requesting the
>>> EXIT_S_STATE action.
>>>
>>
>> You're only skipping the resume, but does it make any sense to do the
>> suspend action if we're not going to call the resume one? Does guc do
>> anything in the suspend action that we still require? I thought it only
>> saved the submission status, which we don't care about if guc submission
>> is disabled.
>>
>> Daniele
>>
> 
> Hi Daniele,
> 
> I tried skipping the suspend all together but then the HuC gets timeouts
> waiting for the GuC to acknowledge the authentication request which leads to a
> wedged GPU. ☹
> 

Do we know why? if we skip the suspend/resume H2G and reload the blobs 
after resetting the HW it should look like a clean boot from the HW 
perspective, so the fact that HuC auth times out feels weird and might 
hide other issues. I asked one of the guc devs and he also thinks this 
is not expected behavior. Can you dig a bit more?

Thanks,
Daniele

> BTW, I made a typo in the patch, should be 'drm/i915' not '914', I'll fix that
> up.
> 
> Thanks,
> 
> don
> 
> 
>>> On some platforms (e.g. KBL) that do not support GuC submission, but
>>> the user enabled the GuC communication (e.g for HuC authentication)
>>> calling the GuC EXIT_S_STATE action results in lose of ability to
>>> enter RC6. Guard against this by only requesting the GuC action on
>>> platforms that support GuC submission.
>>>
>>> I've verfied that intel_guc_resume() only gets called when driver
>>> is loaded with: guc_enable={1,2,3}, all other cases (no args,
>>> guc_enable={0,-1} the intel_guc_resume() is not called.
>>>
>>> Signed-off-by: Don Hiatt <don.hiatt@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> index 37f7bcbf7dac..33318ed135c0 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
>>>    		GUC_POWER_D0,
>>>    	};
>>>
>>> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>> +	if (guc->submission_supported)
>>> +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>> +
>>> +	return 0;
>>>    }
>>>
>>>    /**
>>>
Hiatt, Don Oct. 28, 2019, 8:04 p.m. UTC | #6
> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> Sent: Monday, October 28, 2019 11:30 AM
> To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o GuC
> submission but enabled
> 
> 
> 
> On 10/28/19 11:17 AM, Hiatt, Don wrote:
> >> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> >> Sent: Monday, October 28, 2019 9:44 AM
> >> To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
> >> Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o
> GuC
> >> submission but enabled
> >>
> >>
> >>
> >> On 10/24/19 9:29 AM, don.hiatt@intel.com wrote:
> >>> From: Don Hiatt <don.hiatt@intel.com>
> >>>
> >>> Check to see if GuC submission is enabled before requesting the
> >>> EXIT_S_STATE action.
> >>>
> >>
> >> You're only skipping the resume, but does it make any sense to do the
> >> suspend action if we're not going to call the resume one? Does guc do
> >> anything in the suspend action that we still require? I thought it only
> >> saved the submission status, which we don't care about if guc submission
> >> is disabled.
> >>
> >> Daniele
> >>
> >
> > Hi Daniele,
> >
> > I tried skipping the suspend all together but then the HuC gets timeouts
> > waiting for the GuC to acknowledge the authentication request which leads to
> a
> > wedged GPU. ☹
> >
> 
> Do we know why? if we skip the suspend/resume H2G and reload the blobs
> after resetting the HW it should look like a clean boot from the HW
> perspective, so the fact that HuC auth times out feels weird and might
> hide other issues. I asked one of the guc devs and he also thinks this
> is not expected behavior. Can you dig a bit more?
> 
> Thanks,
> Daniele
> 

No idea why but I'll do some digging and see what I find.

Thanks!

don

> > BTW, I made a typo in the patch, should be 'drm/i915' not '914', I'll fix that
> > up.
> >
> > Thanks,
> >
> > don
> >
> >
> >>> On some platforms (e.g. KBL) that do not support GuC submission, but
> >>> the user enabled the GuC communication (e.g for HuC authentication)
> >>> calling the GuC EXIT_S_STATE action results in lose of ability to
> >>> enter RC6. Guard against this by only requesting the GuC action on
> >>> platforms that support GuC submission.
> >>>
> >>> I've verfied that intel_guc_resume() only gets called when driver
> >>> is loaded with: guc_enable={1,2,3}, all other cases (no args,
> >>> guc_enable={0,-1} the intel_guc_resume() is not called.
> >>>
> >>> Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
> >>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>> index 37f7bcbf7dac..33318ed135c0 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>> @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
> >>>    		GUC_POWER_D0,
> >>>    	};
> >>>
> >>> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> >>> +	if (guc->submission_supported)
> >>> +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> >>> +
> >>> +	return 0;
> >>>    }
> >>>
> >>>    /**
> >>>
Hiatt, Don Oct. 28, 2019, 9:25 p.m. UTC | #7
> > From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> > Sent: Monday, October 28, 2019 11:30 AM
> > To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o
> GuC
> > submission but enabled
> >
> >
> >
> > On 10/28/19 11:17 AM, Hiatt, Don wrote:
> > >> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> > >> Sent: Monday, October 28, 2019 9:44 AM
> > >> To: Hiatt, Don <don.hiatt@intel.com>; intel-gfx@lists.freedesktop.org
> > >> Subject: Re: [Intel-gfx] [PATCH] drm/i914/guc: Fix resume on platforms w/o
> > GuC
> > >> submission but enabled
> > >>
> > >>
> > >>
> > >> On 10/24/19 9:29 AM, don.hiatt@intel.com wrote:
> > >>> From: Don Hiatt <don.hiatt@intel.com>
> > >>>
> > >>> Check to see if GuC submission is enabled before requesting the
> > >>> EXIT_S_STATE action.
> > >>>
> > >>
> > >> You're only skipping the resume, but does it make any sense to do the
> > >> suspend action if we're not going to call the resume one? Does guc do
> > >> anything in the suspend action that we still require? I thought it only
> > >> saved the submission status, which we don't care about if guc submission
> > >> is disabled.
> > >>
> > >> Daniele
> > >>
> > >
> > > Hi Daniele,
> > >
> > > I tried skipping the suspend all together but then the HuC gets timeouts
> > > waiting for the GuC to acknowledge the authentication request which leads
> to
> > a
> > > wedged GPU. ☹
> > >
> >
> > Do we know why? if we skip the suspend/resume H2G and reload the blobs
> > after resetting the HW it should look like a clean boot from the HW
> > perspective, so the fact that HuC auth times out feels weird and might
> > hide other issues. I asked one of the guc devs and he also thinks this
> > is not expected behavior. Can you dig a bit more?
> >
> > Thanks,
> > Daniele
> >
> 
> No idea why but I'll do some digging and see what I find.
> 
> Thanks!
> 
> don
> 
Hi Daniele,

I was a little overzealous on my removal of suspend/resume. We still need to go
through the steps of enable/disable GuC communication on suspend/resume but
just not send the GuC action. My first attempt was not handling the GuC communication
properly so that is why I was seeing the HuC authentication timesouts.

I'm submitting new patch -- with the proper 'drm/i915' -- and will CC you.

Thanks!

don


> > > BTW, I made a typo in the patch, should be 'drm/i915' not '914', I'll fix that
> > > up.
> > >
> > > Thanks,
> > >
> > > don
> > >
> > >
> > >>> On some platforms (e.g. KBL) that do not support GuC submission, but
> > >>> the user enabled the GuC communication (e.g for HuC authentication)
> > >>> calling the GuC EXIT_S_STATE action results in lose of ability to
> > >>> enter RC6. Guard against this by only requesting the GuC action on
> > >>> platforms that support GuC submission.
> > >>>
> > >>> I've verfied that intel_guc_resume() only gets called when driver
> > >>> is loaded with: guc_enable={1,2,3}, all other cases (no args,
> > >>> guc_enable={0,-1} the intel_guc_resume() is not called.
> > >>>
> > >>> Signed-off-by: Don Hiatt <don.hiatt@intel.com>
> > >>> ---
> > >>>    drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 ++++-
> > >>>    1 file changed, 4 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > >> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > >>> index 37f7bcbf7dac..33318ed135c0 100644
> > >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > >>> @@ -565,7 +565,10 @@ int intel_guc_resume(struct intel_guc *guc)
> > >>>    		GUC_POWER_D0,
> > >>>    	};
> > >>>
> > >>> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > >>> +	if (guc->submission_supported)
> > >>> +		return intel_guc_send(guc, action, ARRAY_SIZE(action));
> > >>> +
> > >>> +	return 0;
> > >>>    }
> > >>>
> > >>>    /**
> > >>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 37f7bcbf7dac..33318ed135c0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -565,7 +565,10 @@  int intel_guc_resume(struct intel_guc *guc)
 		GUC_POWER_D0,
 	};
 
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+	if (guc->submission_supported)
+		return intel_guc_send(guc, action, ARRAY_SIZE(action));
+
+	return 0;
 }
 
 /**