diff mbox series

[thermal] adding check if the thermal firmware is running

Message ID 20240516200555.33982-1-trintaeoitogc@gmail.com (mailing list archive)
State Rejected
Delegated to: Johannes Berg
Headers show
Series [thermal] adding check if the thermal firmware is running | expand

Commit Message

Guilherme Giacomo Simoes May 16, 2024, 8:05 p.m. UTC
in the dmesg is showing the message "failed to read out thermal zone"
as if the temperature read is failed by don't find the thermal zone.

After researching and debugging, I see that this specific error is
occurrence because the thermal try read the temperature when is started,
but the firmware is not running yet.

For more legibiliti i create the NOTLOAD error code in the errno-base.h,
and in the iwl_mvm_tzone_get_temp() on tt.c i check if firmware is
running and I set the NOTLOAD code for ret variable and goto out.

After this, in the update_temperature() in the thermal_code.c i received
the return of thermal_zone_get_temp() and check if return is NOTLOAD,
because if it is, I print the warning message "firmware yet not load"
and return for caller

The thermal_core.c i think that is generic for any thermal drivers and
not only used for tt.c of course.
But if this ipotetic driver not check if firmware is running before read
the temperature, the thermal_code.c is work as a before this change.

After this change, in my computer I compile and install kernel in /boot
and in my dmesg the message "failed to read out thermal zone" is not
show any more. In your place the warning messafe "Firmware yet not
load" is showing.

I would like to thank you in advance for any contribution, suggestion
or criticism of my patch suggestion.
---
 drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 10 ++++++++--
 drivers/thermal/thermal_core.c              | 10 +++++++---
 include/uapi/asm-generic/errno-base.h       |  1 +
 3 files changed, 16 insertions(+), 5 deletions(-)

Comments

Wysocki, Rafael J May 17, 2024, 10:20 a.m. UTC | #1
On 5/16/2024 10:05 PM, Guilherme Giacomo Simoes wrote:
> in the dmesg is showing the message "failed to read out thermal zone"
> as if the temperature read is failed by don't find the thermal zone.
>
> After researching and debugging, I see that this specific error is
> occurrence because the thermal try read the temperature when is started,
> but the firmware is not running yet.
>
> For more legibiliti i create the NOTLOAD error code in the errno-base.h,
> and in the iwl_mvm_tzone_get_temp() on tt.c i check if firmware is
> running and I set the NOTLOAD code for ret variable and goto out.
>
> After this, in the update_temperature() in the thermal_code.c i received
> the return of thermal_zone_get_temp() and check if return is NOTLOAD,
> because if it is, I print the warning message "firmware yet not load"
> and return for caller
>
> The thermal_core.c i think that is generic for any thermal drivers and
> not only used for tt.c of course.
> But if this ipotetic driver not check if firmware is running before read
> the temperature, the thermal_code.c is work as a before this change.
>
> After this change, in my computer I compile and install kernel in /boot
> and in my dmesg the message "failed to read out thermal zone" is not
> show any more. In your place the warning messafe "Firmware yet not
> load" is showing.
>
> I would like to thank you in advance for any contribution, suggestion
> or criticism of my patch suggestion.
> ---
>   drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 10 ++++++++--
>   drivers/thermal/thermal_core.c              | 10 +++++++---
>   include/uapi/asm-generic/errno-base.h       |  1 +
>   3 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> index 8083c4b2ab6b..dd5725db06d2 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> @@ -620,8 +620,14 @@ static int iwl_mvm_tzone_get_temp(struct thermal_zone_device *device,
>   
>   	mutex_lock(&mvm->mutex);
>   
> -	if (!iwl_mvm_firmware_running(mvm) ||
> -	    mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
> +	const int res = iwl_mvm_firmware_running(mvm);
> +
> +	if (!res) {
> +		ret = -NOTLOAD;
> +		goto out;
> +	}
> +
> +	if (mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
>   		ret = -ENODATA;
>   		goto out;
>   	}
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 34a31bc72023..4116d312d4a1 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -414,10 +414,14 @@ static void handle_thermal_trip(struct thermal_zone_device *tz,
>   
>   static void update_temperature(struct thermal_zone_device *tz)
>   {
> -	int temp, ret;
> -
> -	ret = __thermal_zone_get_temp(tz, &temp);
> +	int temp;
> +	int ret = __thermal_zone_get_temp(tz, &temp);
>   	if (ret) {
> +		if (ret == -NOTLOAD) {
> +			pr_warn("Firmware yet not load");
> +			return;
> +		}
> +

The thermal core doesn't need to be modified for this.

Please print the new message from the driver and you may as well return 
-EAGAIN from it in all cases when the issue is expected to be 
intermittent to prevent the core from printing the (existing) warning 
message.

Thanks!


>   		if (ret != -EAGAIN)
>   			dev_warn(&tz->device,
>   				 "failed to read out thermal zone (%d)\n",
> diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
> index 9653140bff92..8b92c41f7993 100644
> --- a/include/uapi/asm-generic/errno-base.h
> +++ b/include/uapi/asm-generic/errno-base.h
> @@ -36,5 +36,6 @@
>   #define	EPIPE		32	/* Broken pipe */
>   #define	EDOM		33	/* Math argument out of domain of func */
>   #define	ERANGE		34	/* Math result not representable */
> +#define	NOTLOAD		35	/* Firmware yet not load */
>   
>   #endif
Guilherme Giacomo Simoes May 17, 2024, 2:17 p.m. UTC | #2
Okay, this makes sense.
I will send a new patch with your suggestions.

Thanks


Em sex., 17 de mai. de 2024 às 07:20, Wysocki, Rafael J
<rafael.j.wysocki@intel.com> escreveu:
>
>
> On 5/16/2024 10:05 PM, Guilherme Giacomo Simoes wrote:
> > in the dmesg is showing the message "failed to read out thermal zone"
> > as if the temperature read is failed by don't find the thermal zone.
> >
> > After researching and debugging, I see that this specific error is
> > occurrence because the thermal try read the temperature when is started,
> > but the firmware is not running yet.
> >
> > For more legibiliti i create the NOTLOAD error code in the errno-base.h,
> > and in the iwl_mvm_tzone_get_temp() on tt.c i check if firmware is
> > running and I set the NOTLOAD code for ret variable and goto out.
> >
> > After this, in the update_temperature() in the thermal_code.c i received
> > the return of thermal_zone_get_temp() and check if return is NOTLOAD,
> > because if it is, I print the warning message "firmware yet not load"
> > and return for caller
> >
> > The thermal_core.c i think that is generic for any thermal drivers and
> > not only used for tt.c of course.
> > But if this ipotetic driver not check if firmware is running before read
> > the temperature, the thermal_code.c is work as a before this change.
> >
> > After this change, in my computer I compile and install kernel in /boot
> > and in my dmesg the message "failed to read out thermal zone" is not
> > show any more. In your place the warning messafe "Firmware yet not
> > load" is showing.
> >
> > I would like to thank you in advance for any contribution, suggestion
> > or criticism of my patch suggestion.
> > ---
> >   drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 10 ++++++++--
> >   drivers/thermal/thermal_core.c              | 10 +++++++---
> >   include/uapi/asm-generic/errno-base.h       |  1 +
> >   3 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > index 8083c4b2ab6b..dd5725db06d2 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > @@ -620,8 +620,14 @@ static int iwl_mvm_tzone_get_temp(struct thermal_zone_device *device,
> >
> >       mutex_lock(&mvm->mutex);
> >
> > -     if (!iwl_mvm_firmware_running(mvm) ||
> > -         mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
> > +     const int res = iwl_mvm_firmware_running(mvm);
> > +
> > +     if (!res) {
> > +             ret = -NOTLOAD;
> > +             goto out;
> > +     }
> > +
> > +     if (mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
> >               ret = -ENODATA;
> >               goto out;
> >       }
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 34a31bc72023..4116d312d4a1 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -414,10 +414,14 @@ static void handle_thermal_trip(struct thermal_zone_device *tz,
> >
> >   static void update_temperature(struct thermal_zone_device *tz)
> >   {
> > -     int temp, ret;
> > -
> > -     ret = __thermal_zone_get_temp(tz, &temp);
> > +     int temp;
> > +     int ret = __thermal_zone_get_temp(tz, &temp);
> >       if (ret) {
> > +             if (ret == -NOTLOAD) {
> > +                     pr_warn("Firmware yet not load");
> > +                     return;
> > +             }
> > +
>
> The thermal core doesn't need to be modified for this.
>
> Please print the new message from the driver and you may as well return
> -EAGAIN from it in all cases when the issue is expected to be
> intermittent to prevent the core from printing the (existing) warning
> message.
>
> Thanks!
>
>
> >               if (ret != -EAGAIN)
> >                       dev_warn(&tz->device,
> >                                "failed to read out thermal zone (%d)\n",
> > diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
> > index 9653140bff92..8b92c41f7993 100644
> > --- a/include/uapi/asm-generic/errno-base.h
> > +++ b/include/uapi/asm-generic/errno-base.h
> > @@ -36,5 +36,6 @@
> >   #define     EPIPE           32      /* Broken pipe */
> >   #define     EDOM            33      /* Math argument out of domain of func */
> >   #define     ERANGE          34      /* Math result not representable */
> > +#define      NOTLOAD         35      /* Firmware yet not load */
> >
> >   #endif
diff mbox series

Patch

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
index 8083c4b2ab6b..dd5725db06d2 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
@@ -620,8 +620,14 @@  static int iwl_mvm_tzone_get_temp(struct thermal_zone_device *device,
 
 	mutex_lock(&mvm->mutex);
 
-	if (!iwl_mvm_firmware_running(mvm) ||
-	    mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
+	const int res = iwl_mvm_firmware_running(mvm);
+
+	if (!res) {
+		ret = -NOTLOAD;
+		goto out;
+	}
+
+	if (mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
 		ret = -ENODATA;
 		goto out;
 	}
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 34a31bc72023..4116d312d4a1 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -414,10 +414,14 @@  static void handle_thermal_trip(struct thermal_zone_device *tz,
 
 static void update_temperature(struct thermal_zone_device *tz)
 {
-	int temp, ret;
-
-	ret = __thermal_zone_get_temp(tz, &temp);
+	int temp;
+	int ret = __thermal_zone_get_temp(tz, &temp);
 	if (ret) {
+		if (ret == -NOTLOAD) {
+			pr_warn("Firmware yet not load");
+			return;
+		}
+
 		if (ret != -EAGAIN)
 			dev_warn(&tz->device,
 				 "failed to read out thermal zone (%d)\n",
diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
index 9653140bff92..8b92c41f7993 100644
--- a/include/uapi/asm-generic/errno-base.h
+++ b/include/uapi/asm-generic/errno-base.h
@@ -36,5 +36,6 @@ 
 #define	EPIPE		32	/* Broken pipe */
 #define	EDOM		33	/* Math argument out of domain of func */
 #define	ERANGE		34	/* Math result not representable */
+#define	NOTLOAD		35	/* Firmware yet not load */
 
 #endif