diff mbox

[2/5] drm/i915: Notify user about outdated dmc firmware

Message ID 1442589429-27813-2-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala Sept. 18, 2015, 3:17 p.m. UTC
If csr/dmc firmware is known to be outdated, notify
user.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jani Nikula Sept. 21, 2015, 7:30 a.m. UTC | #1
On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote:
> If csr/dmc firmware is known to be outdated, notify
> user.

What would break if we requested a firmware version that works? Or we've
made it so that we only request the major version because there's not
supposed to be changes like this between minor versions...?

BR,
Jani.



>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
> index 58edc3f..73807c3 100644
> --- a/drivers/gpu/drm/i915/intel_csr.c
> +++ b/drivers/gpu/drm/i915/intel_csr.c
> @@ -45,6 +45,9 @@
>  
>  MODULE_FIRMWARE(I915_CSR_SKL);
>  
> +#define RECOMMENDED_FW_MAJOR		1
> +#define RECOMMENDED_FW_MINOR		21
> +
>  /*
>  * SKL CSR registers for DC5 and DC6
>  */
> @@ -387,6 +390,12 @@ static void finish_csr_load(const struct firmware *fw, void *context)
>  
>  	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
>  		      csr->dmc_ver_major, csr->dmc_ver_minor);
> +
> +	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
> +	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
> +		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
> +			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
> +
>  out:
>  	if (fw_loaded)
>  		intel_runtime_pm_put(dev_priv);
> -- 
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Mika Kuoppala Sept. 21, 2015, 8:30 a.m. UTC | #2
Jani Nikula <jani.nikula@linux.intel.com> writes:

> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote:
>> If csr/dmc firmware is known to be outdated, notify
>> user.
>
> What would break if we requested a firmware version that works? Or we've
> made it so that we only request the major version because there's not
> supposed to be changes like this between minor versions...?
>

I guess the question is more of a what should we do
if there is only outdated (known bad) firmware available.

Refuse to load and limb onwards, or return with error code
on driver init.

Latter would force firmware and version to be mandatory and the
version to be tightly coupled to kernel driver version.

-Mika

> BR,
> Jani.
>
>
>
>>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
>> index 58edc3f..73807c3 100644
>> --- a/drivers/gpu/drm/i915/intel_csr.c
>> +++ b/drivers/gpu/drm/i915/intel_csr.c
>> @@ -45,6 +45,9 @@
>>  
>>  MODULE_FIRMWARE(I915_CSR_SKL);
>>  
>> +#define RECOMMENDED_FW_MAJOR		1
>> +#define RECOMMENDED_FW_MINOR		21
>> +
>>  /*
>>  * SKL CSR registers for DC5 and DC6
>>  */
>> @@ -387,6 +390,12 @@ static void finish_csr_load(const struct firmware *fw, void *context)
>>  
>>  	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
>>  		      csr->dmc_ver_major, csr->dmc_ver_minor);
>> +
>> +	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
>> +	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
>> +		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
>> +			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
>> +
>>  out:
>>  	if (fw_loaded)
>>  		intel_runtime_pm_put(dev_priv);
>> -- 
>> 2.1.4
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Jani Nikula, Intel Open Source Technology Center
Animesh Manna Oct. 8, 2015, 9:41 a.m. UTC | #3
On 9/21/2015 2:00 PM, Mika Kuoppala wrote:
> Jani Nikula <jani.nikula@linux.intel.com> writes:
>
>> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote:
>>> If csr/dmc firmware is known to be outdated, notify
>>> user.
>> What would break if we requested a firmware version that works? Or we've
>> made it so that we only request the major version because there's not
>> supposed to be changes like this between minor versions...?
>>
> I guess the question is more of a what should we do
> if there is only outdated (known bad) firmware available.
>
> Refuse to load and limb onwards, or return with error code
> on driver init.
>
> Latter would force firmware and version to be mandatory and the
> version to be tightly coupled to kernel driver version.

A softlink is used to use recommended firmware for dmc and the same information is published through 01.org for the firmware user.
Imo, we should not have this kind of hack in code which will change over time and this is responsibility of repo-owner to link correct recommended firmware for new kernel update.

-Animesh

> -Mika
>
>> BR,
>> Jani.
>>
>>
>>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
>>> index 58edc3f..73807c3 100644
>>> --- a/drivers/gpu/drm/i915/intel_csr.c
>>> +++ b/drivers/gpu/drm/i915/intel_csr.c
>>> @@ -45,6 +45,9 @@
>>>   
>>>   MODULE_FIRMWARE(I915_CSR_SKL);
>>>   
>>> +#define RECOMMENDED_FW_MAJOR		1
>>> +#define RECOMMENDED_FW_MINOR		21
>>> +
>>>   /*
>>>   * SKL CSR registers for DC5 and DC6
>>>   */
>>> @@ -387,6 +390,12 @@ static void finish_csr_load(const struct firmware *fw, void *context)
>>>   
>>>   	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
>>>   		      csr->dmc_ver_major, csr->dmc_ver_minor);
>>> +
>>> +	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
>>> +	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
>>> +		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
>>> +			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
>>> +
>>>   out:
>>>   	if (fw_loaded)
>>>   		intel_runtime_pm_put(dev_priv);
>>> -- 
>>> 2.1.4
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> -- 
>> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Mika Kuoppala Oct. 8, 2015, 12:23 p.m. UTC | #4
Animesh Manna <animesh.manna@intel.com> writes:

> On 9/21/2015 2:00 PM, Mika Kuoppala wrote:
>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>
>>> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote:
>>>> If csr/dmc firmware is known to be outdated, notify
>>>> user.
>>> What would break if we requested a firmware version that works? Or we've
>>> made it so that we only request the major version because there's not
>>> supposed to be changes like this between minor versions...?
>>>
>> I guess the question is more of a what should we do
>> if there is only outdated (known bad) firmware available.
>>
>> Refuse to load and limb onwards, or return with error code
>> on driver init.
>>
>> Latter would force firmware and version to be mandatory and the
>> version to be tightly coupled to kernel driver version.
>
> A softlink is used to use recommended firmware for dmc and the same information is published through 01.org for the firmware user.
> Imo, we should not have this kind of hack in code which will change over time and this is responsibility of repo-owner to link correct recommended firmware for new kernel update.
>

On machines that had 1.19 symlinked, in filesystem, execlist submission
sometimes broke due to interrupt delivery problem. To reach a conclusion
that it was csr firmware, before 1.21 was out, took quite amount of work.

I bet there are still machines with 1.19 only, and we get to 
wade through error states trying to connect the dots.

The dmc/csr firmware is part of our driver functionality. Apparently
it is very tightly coupled to our driver functionality as it can
break things outside of its own domain.

And currently it is loosely coupled black box with our driver,
through symlink, accepting any version that happens to be in customers filesystem.

So we recommend latest in website and end up in a situation
that user gets what happens to be in filesystem. Even a known
broken version? And we will keep debugging these problems caused by
broken version? I don't want any more dimensions in our triaging
space, the distributio/firmware version dimension.

Symlink also means that bisectability is very close to worthless on these
kind of bugs. Both in our machines and also on customers. We have
loosely coupled, black box entity, affecting our driver depending
on customers filesystem. Symlink threw that valuable tool out, and
we gained what?

So we are left with triaging. Which is true detective work as there are
no traces of firmware versions nor loading success/fails on
logs/error states.

From where I look at, the version blacklist is not a hack. It is a cure.

-Mika

> -Animesh
>
>> -Mika
>>
>>> BR,
>>> Jani.
>>>
>>>
>>>
>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
>>>>   1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
>>>> index 58edc3f..73807c3 100644
>>>> --- a/drivers/gpu/drm/i915/intel_csr.c
>>>> +++ b/drivers/gpu/drm/i915/intel_csr.c
>>>> @@ -45,6 +45,9 @@
>>>>   
>>>>   MODULE_FIRMWARE(I915_CSR_SKL);
>>>>   
>>>> +#define RECOMMENDED_FW_MAJOR		1
>>>> +#define RECOMMENDED_FW_MINOR		21
>>>> +
>>>>   /*
>>>>   * SKL CSR registers for DC5 and DC6
>>>>   */
>>>> @@ -387,6 +390,12 @@ static void finish_csr_load(const struct firmware *fw, void *context)
>>>>   
>>>>   	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
>>>>   		      csr->dmc_ver_major, csr->dmc_ver_minor);
>>>> +
>>>> +	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
>>>> +	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
>>>> +		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
>>>> +			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
>>>> +
>>>>   out:
>>>>   	if (fw_loaded)
>>>>   		intel_runtime_pm_put(dev_priv);
>>>> -- 
>>>> 2.1.4
>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>> -- 
>>> Jani Nikula, Intel Open Source Technology Center
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Animesh Manna Oct. 8, 2015, 2:45 p.m. UTC | #5
On 10/8/2015 5:53 PM, Mika Kuoppala wrote:
> Animesh Manna <animesh.manna@intel.com> writes:
>
>> On 9/21/2015 2:00 PM, Mika Kuoppala wrote:
>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>>
>>>> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote:
>>>>> If csr/dmc firmware is known to be outdated, notify
>>>>> user.
>>>> What would break if we requested a firmware version that works? Or we've
>>>> made it so that we only request the major version because there's not
>>>> supposed to be changes like this between minor versions...?
>>>>
>>> I guess the question is more of a what should we do
>>> if there is only outdated (known bad) firmware available.
>>>
>>> Refuse to load and limb onwards, or return with error code
>>> on driver init.
>>>
>>> Latter would force firmware and version to be mandatory and the
>>> version to be tightly coupled to kernel driver version.
>> A softlink is used to use recommended firmware for dmc and the same information is published through 01.org for the firmware user.
>> Imo, we should not have this kind of hack in code which will change over time and this is responsibility of repo-owner to link correct recommended firmware for new kernel update.
>>
> On machines that had 1.19 symlinked, in filesystem, execlist submission
> sometimes broke due to interrupt delivery problem. To reach a conclusion
> that it was csr firmware, before 1.21 was out, took quite amount of work.
>
> I bet there are still machines with 1.19 only, and we get to
> wade through error states trying to connect the dots.
>
> The dmc/csr firmware is part of our driver functionality. Apparently
> it is very tightly coupled to our driver functionality as it can
> break things outside of its own domain.
>
> And currently it is loosely coupled black box with our driver,
> through symlink, accepting any version that happens to be in customers filesystem.
>
> So we recommend latest in website and end up in a situation
> that user gets what happens to be in filesystem. Even a known
> broken version? And we will keep debugging these problems caused by
> broken version? I don't want any more dimensions in our triaging
> space, the distributio/firmware version dimension.
>
> Symlink also means that bisectability is very close to worthless on these
> kind of bugs. Both in our machines and also on customers. We have
> loosely coupled, black box entity, affecting our driver depending
> on customers filesystem. Symlink threw that valuable tool out, and
> we gained what?
>
> So we are left with triaging. Which is true detective work as there are
> no traces of firmware versions nor loading success/fails on
> logs/error states.
>
>  From where I look at, the version blacklist is not a hack. It is a cure.

I completely understand your concern and we discussed a lot on same during firmware naming
convention and finally decided to have symlink.

If we really want to tightly couple firmware and driver then imo putting exact firmware name
will be better option.

Next I saw your subsequent patch where you are not loading the firmware if it is older than 1.21.
http://lists.freedesktop.org/archives/intel-gfx/2015-September/076422.html
Curious to know the gpu-hang issue present for any version less than 1.21.

-Animesh


>
> -Mika
>
>> -Animesh
>>
>>> -Mika
>>>
>>>> BR,
>>>> Jani.
>>>>
>>>>
>>>>
>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
>>>>>    1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
>>>>> index 58edc3f..73807c3 100644
>>>>> --- a/drivers/gpu/drm/i915/intel_csr.c
>>>>> +++ b/drivers/gpu/drm/i915/intel_csr.c
>>>>> @@ -45,6 +45,9 @@
>>>>>    
>>>>>    MODULE_FIRMWARE(I915_CSR_SKL);
>>>>>    
>>>>> +#define RECOMMENDED_FW_MAJOR		1
>>>>> +#define RECOMMENDED_FW_MINOR		21
>>>>> +
>>>>>    /*
>>>>>    * SKL CSR registers for DC5 and DC6
>>>>>    */
>>>>> @@ -387,6 +390,12 @@ static void finish_csr_load(const struct firmware *fw, void *context)
>>>>>    
>>>>>    	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
>>>>>    		      csr->dmc_ver_major, csr->dmc_ver_minor);
>>>>> +
>>>>> +	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
>>>>> +	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
>>>>> +		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
>>>>> +			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
>>>>> +
>>>>>    out:
>>>>>    	if (fw_loaded)
>>>>>    		intel_runtime_pm_put(dev_priv);
>>>>> -- 
>>>>> 2.1.4
>>>>>
>>>>> _______________________________________________
>>>>> Intel-gfx mailing list
>>>>> Intel-gfx@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>> -- 
>>>> Jani Nikula, Intel Open Source Technology Center
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Dave Gordon Oct. 13, 2015, 12:30 p.m. UTC | #6
On 08/10/15 15:45, Animesh Manna wrote:
>
> On 10/8/2015 5:53 PM, Mika Kuoppala wrote:
>> Animesh Manna <animesh.manna@intel.com> writes:
>>
>>> On 9/21/2015 2:00 PM, Mika Kuoppala wrote:
>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>>>
>>>>> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>>> wrote:
>>>>>> If csr/dmc firmware is known to be outdated, notify
>>>>>> user.
>>>>> What would break if we requested a firmware version that works? Or
>>>>> we've
>>>>> made it so that we only request the major version because there's not
>>>>> supposed to be changes like this between minor versions...?
>>>>>
>>>> I guess the question is more of a what should we do
>>>> if there is only outdated (known bad) firmware available.
>>>>
>>>> Refuse to load and limb onwards, or return with error code
>>>> on driver init.
>>>>
>>>> Latter would force firmware and version to be mandatory and the
>>>> version to be tightly coupled to kernel driver version.
>>> A softlink is used to use recommended firmware for dmc and the same
>>> information is published through 01.org for the firmware user.
>>> Imo, we should not have this kind of hack in code which will change
>>> over time and this is responsibility of repo-owner to link correct
>>> recommended firmware for new kernel update.
>>>
>> On machines that had 1.19 symlinked, in filesystem, execlist submission
>> sometimes broke due to interrupt delivery problem. To reach a conclusion
>> that it was csr firmware, before 1.21 was out, took quite amount of work.
>>
>> I bet there are still machines with 1.19 only, and we get to
>> wade through error states trying to connect the dots.
>>
>> The dmc/csr firmware is part of our driver functionality. Apparently
>> it is very tightly coupled to our driver functionality as it can
>> break things outside of its own domain.
>>
>> And currently it is loosely coupled black box with our driver,
>> through symlink, accepting any version that happens to be in customers
>> filesystem.
>>
>> So we recommend latest in website and end up in a situation
>> that user gets what happens to be in filesystem. Even a known
>> broken version? And we will keep debugging these problems caused by
>> broken version? I don't want any more dimensions in our triaging
>> space, the distributio/firmware version dimension.
>>
>> Symlink also means that bisectability is very close to worthless on these
>> kind of bugs. Both in our machines and also on customers. We have
>> loosely coupled, black box entity, affecting our driver depending
>> on customers filesystem. Symlink threw that valuable tool out, and
>> we gained what?
>>
>> So we are left with triaging. Which is true detective work as there are
>> no traces of firmware versions nor loading success/fails on
>> logs/error states.
>>
>>  From where I look at, the version blacklist is not a hack. It is a cure.
>
> I completely understand your concern and we discussed a lot on same
> during firmware naming
> convention and finally decided to have symlink.
>
> If we really want to tightly couple firmware and driver then imo putting
> exact firmware name
> will be better option.
>
> Next I saw your subsequent patch where you are not loading the firmware
> if it is older than 1.21.
> http://lists.freedesktop.org/archives/intel-gfx/2015-September/076422.html
> Curious to know the gpu-hang issue present for any version less than 1.21.
>
> -Animesh

The GuC loader always had this sort of functionality, so the driver can 
be built to know that anything older than a specific minor version is bogus.

The proposed unified loader therefore tested (=major, >=minor) criteria 
for each of the various chunks of uC device firmware being loaded.

.Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
index 58edc3f..73807c3 100644
--- a/drivers/gpu/drm/i915/intel_csr.c
+++ b/drivers/gpu/drm/i915/intel_csr.c
@@ -45,6 +45,9 @@ 
 
 MODULE_FIRMWARE(I915_CSR_SKL);
 
+#define RECOMMENDED_FW_MAJOR		1
+#define RECOMMENDED_FW_MINOR		21
+
 /*
 * SKL CSR registers for DC5 and DC6
 */
@@ -387,6 +390,12 @@  static void finish_csr_load(const struct firmware *fw, void *context)
 
 	DRM_DEBUG_KMS("Finished loading %s v%u.%u\n", dev_priv->csr.fw_path,
 		      csr->dmc_ver_major, csr->dmc_ver_minor);
+
+	if (csr->dmc_ver_major < RECOMMENDED_FW_MAJOR ||
+	    csr->dmc_ver_minor < RECOMMENDED_FW_MINOR)
+		DRM_INFO("Outdated dmc firmware found, please upgrade to %u.%u or newer\n",
+			 RECOMMENDED_FW_MAJOR, RECOMMENDED_FW_MINOR);
+
 out:
 	if (fw_loaded)
 		intel_runtime_pm_put(dev_priv);