diff mbox series

thermal: int340x: fix tcc offset on resume

Message ID 20210908161632.15520-1-atenart@kernel.org (mailing list archive)
State Superseded, archived
Headers show
Series thermal: int340x: fix tcc offset on resume | expand

Commit Message

Antoine Tenart Sept. 8, 2021, 4:16 p.m. UTC
After upgrading to Linux 5.13.3 I started noticing my laptop would
shutdown due to overheat (when it should not). It turned out this is due
to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").

What happens is this drivers uses a global variable to keep track of the
tcc offset (tcc_offset_save) and uses it on resume. The issue is this
variable is initialized to 0, but is only set in
tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
set by userspace. If that does not happen, the resume path will set the
offset to 0 (in my case the h/w default being 3, the offset would become
too low after a suspend/resume cycle).

The issue did not arise before commit fe6a6de6692e, as the function
setting the offset would return if the offset was 0. This is no longer
the case (rightfully).

Fix this by keeping track of the offset in a suspend op, so the right
value gets restored after a resume if it was not set by userspace.

The logic to restore the offset after a resume was there long before
commit fe6a6de6692e, but as a value of 0 was considered invalid I'm
referencing the commit that made the issue possible in the Fixes tag
instead.

Fixes: fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting")
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---

Hi all,

Tbh I'm not familiar with this code and subsystem, I wrote this as it
does fixes the issue and helps to start a discussion. If the fix looks
good to you, great, but there might be better ways of fixing this.

It did work on my laptop, but other tests are welcomed!

One note: I chose not to return an error in suspend, as I'm not sure the
resume would be called if that happens. If that's not the case, I can
propagate the issue. Also setting tcc_offset_save to -1 might not be
needed, if ->suspend is guaranteed to be called before ->resume.

Thanks!
Antoine

 .../intel/int340x_thermal/int3401_thermal.c   |  8 +++-
 .../processor_thermal_device.c                | 39 ++++++++++++++-----
 .../processor_thermal_device.h                |  1 +
 .../processor_thermal_device_pci.c            | 18 ++++++++-
 .../processor_thermal_device_pci_legacy.c     |  8 +++-
 5 files changed, 62 insertions(+), 12 deletions(-)

Comments

srinivas pandruvada Sept. 9, 2021, 5:47 a.m. UTC | #1
On Wed, 2021-09-08 at 18:16 +0200, Antoine Tenart wrote:
> After upgrading to Linux 5.13.3 I started noticing my laptop would
> shutdown due to overheat (when it should not). It turned out this is
> due
> to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix
> tcc setting").
> 
> What happens is this drivers uses a global variable to keep track of
> the
> tcc offset (tcc_offset_save) and uses it on resume. The issue is this
> variable is initialized to 0, but is only set in
> tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
> set by userspace. If that does not happen, the resume path will set the
> offset to 0 (in my case the h/w default being 3, the offset would
> become
> too low after a suspend/resume cycle).
> 
> The issue did not arise before commit fe6a6de6692e, as the function
> setting the offset would return if the offset was 0. This is no longer
> the case (rightfully).
> 
> Fix this by keeping track of the offset in a suspend op, so the right
> value gets restored after a resume if it was not set by userspace.
> 
> The logic to restore the offset after a resume was there long before
> commit fe6a6de6692e, but as a value of 0 was considered invalid I'm
> referencing the commit that made the issue possible in the Fixes tag
> instead.
> 
> Fixes: fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix
> tcc setting")
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: stable@vger.kernel.org # 5.12+

> Signed-off-by: Antoine Tenart <atenart@kernel.org>
 Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>

> ---
> 
Thanks Tenart for identifying and fixing.

We have to mark for stable. But It mayn't cleanly apply. So we have to
submit for 5.12+ kernels.

Hi Rui, Daniel,

Please consider this patch 5.15-rc.

Thanks,
Srinivas



> Hi all,
> 
> Tbh I'm not familiar with this code and subsystem, I wrote this as it
> does fixes the issue and helps to start a discussion. If the fix looks
> good to you, great, but there might be better ways of fixing this.
> 
> It did work on my laptop, but other tests are welcomed!
> 
> One note: I chose not to return an error in suspend, as I'm not sure
> the
> resume would be called if that happens. If that's not the case, I can
> propagate the issue. Also setting tcc_offset_save to -1 might not be
> needed, if ->suspend is guaranteed to be called before ->resume.
> 
> Thanks!
> Antoine
> 
>  .../intel/int340x_thermal/int3401_thermal.c   |  8 +++-
>  .../processor_thermal_device.c                | 39 ++++++++++++++-----
>  .../processor_thermal_device.h                |  1 +
>  .../processor_thermal_device_pci.c            | 18 ++++++++-
>  .../processor_thermal_device_pci_legacy.c     |  8 +++-
>  5 files changed, 62 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> index acebc8ba94e2..217786fba185 100644
> --- a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> +++ b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> @@ -44,15 +44,21 @@ static int int3401_remove(struct platform_device
> *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int int3401_thermal_suspend(struct device *dev)
> +{
> +       return proc_thermal_suspend(dev);
> +}
>  static int int3401_thermal_resume(struct device *dev)
>  {
>         return proc_thermal_resume(dev);
>  }
>  #else
> +#define int3401_thermal_suspend NULL
>  #define int3401_thermal_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm, NULL,
> int3401_thermal_resume);
> +static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm,
> int3401_thermal_suspend,
> +                        int3401_thermal_resume);
>  
>  static struct platform_driver int3401_driver = {
>         .probe = int3401_add,
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> index 0f0038af2ad4..a8d98f1bd6c6 100644
> --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> @@ -68,8 +68,7 @@ static const struct attribute_group
> power_limit_attribute_group = {
>         .name = "power_limits"
>  };
>  
> -static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
> -                              struct device_attribute *attr, char
> *buf)
> +static int tcc_get_offset(void)
>  {
>         u64 val;
>         int err;
> @@ -78,8 +77,20 @@ static ssize_t tcc_offset_degree_celsius_show(struct
> device *dev,
>         if (err)
>                 return err;
>  
> -       val = (val >> 24) & 0x3f;
> -       return sprintf(buf, "%d\n", (int)val);
> +       return (val >> 24) & 0x3f;
> +}
> +
> +static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
> +                                             struct device_attribute
> *attr,
> +                                             char *buf)
> +{
> +       int tcc;
> +
> +       tcc = tcc_get_offset();
> +       if (tcc < 0)
> +               return tcc;
> +
> +       return sprintf(buf, "%d\n", tcc);
>  }
>  
>  static int tcc_offset_update(unsigned int tcc)
> @@ -107,8 +118,6 @@ static int tcc_offset_update(unsigned int tcc)
>         return 0;
>  }
>  
> -static unsigned int tcc_offset_save;
> -
>  static ssize_t tcc_offset_degree_celsius_store(struct device *dev,
>                                 struct device_attribute *attr, const
> char *buf,
>                                 size_t count)
> @@ -131,8 +140,6 @@ static ssize_t
> tcc_offset_degree_celsius_store(struct device *dev,
>         if (err)
>                 return err;
>  
> -       tcc_offset_save = tcc;
> -
>         return count;
>  }
>  
> @@ -345,6 +352,18 @@ void proc_thermal_remove(struct
> proc_thermal_device *proc_priv)
>  }
>  EXPORT_SYMBOL_GPL(proc_thermal_remove);
>  
> +static int tcc_offset_save = -1;
> +
> +int proc_thermal_suspend(struct device *dev)
> +{
> +       tcc_offset_save = tcc_get_offset();
> +       if (tcc_offset_save < 0)
> +               dev_warn(dev, "failed to save offset (%d)\n",
> tcc_offset_save);
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(proc_thermal_suspend);
> +
>  int proc_thermal_resume(struct device *dev)
>  {
>         struct proc_thermal_device *proc_dev;
> @@ -352,7 +371,9 @@ int proc_thermal_resume(struct device *dev)
>         proc_dev = dev_get_drvdata(dev);
>         proc_thermal_read_ppcc(proc_dev);
>  
> -       tcc_offset_update(tcc_offset_save);
> +       /* Do not update if saving failed */
> +       if (tcc_offset_save >= 0)
> +               tcc_offset_update(tcc_offset_save);
>  
>         return 0;
>  }
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> index 5a1cfe4864f1..c1d8de6dc3d1 100644
> --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> @@ -83,6 +83,7 @@ void proc_thermal_mbox_remove(struct pci_dev *pdev);
>  int processor_thermal_send_mbox_cmd(struct pci_dev *pdev, u16 cmd_id,
> u32 cmd_data, u32 *cmd_resp);
>  int proc_thermal_add(struct device *dev, struct proc_thermal_device
> *priv);
>  void proc_thermal_remove(struct proc_thermal_device *proc_priv);
> +int proc_thermal_suspend(struct device *dev);
>  int proc_thermal_resume(struct device *dev);
>  int proc_thermal_mmio_add(struct pci_dev *pdev,
>                           struct proc_thermal_device *proc_priv,
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
> index 11dd2e825f4f..b4bcd3fe9eb2 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
> @@ -314,6 +314,20 @@ static void proc_thermal_pci_remove(struct pci_dev
> *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int proc_thermal_pci_suspend(struct device *dev)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +       struct proc_thermal_device *proc_priv;
> +       struct proc_thermal_pci *pci_info;
> +
> +       proc_priv = pci_get_drvdata(pdev);
> +       pci_info = proc_priv->priv_data;
> +
> +       if (!pci_info->no_legacy)
> +               return proc_thermal_suspend(dev);
> +
> +       return 0;
> +}
>  static int proc_thermal_pci_resume(struct device *dev)
>  {
>         struct pci_dev *pdev = to_pci_dev(dev);
> @@ -335,10 +349,12 @@ static int proc_thermal_pci_resume(struct device
> *dev)
>         return 0;
>  }
>  #else
> +#define proc_thermal_pci_suspend NULL
>  #define proc_thermal_pci_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL,
> proc_thermal_pci_resume);
> +static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm,
> proc_thermal_pci_suspend,
> +                        proc_thermal_pci_resume);
>  
>  static const struct pci_device_id proc_thermal_pci_ids[] = {
>         { PCI_DEVICE_DATA(INTEL, ADL_THERMAL, PROC_THERMAL_FEATURE_RAPL
> | PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS |
> PROC_THERMAL_FEATURE_MBOX) },
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_le
> gacy.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_le
> gacy.c
> index f5fc1791b11e..4571a1a53b84 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_le
> gacy.c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_le
> gacy.c
> @@ -107,15 +107,21 @@ static void proc_thermal_pci_remove(struct
> pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int proc_thermal_pci_suspend(struct device *dev)
> +{
> +       return proc_thermal_suspend(dev);
> +}
>  static int proc_thermal_pci_resume(struct device *dev)
>  {
>         return proc_thermal_resume(dev);
>  }
>  #else
> +#define proc_thermal_pci_suspend NULL
>  #define proc_thermal_pci_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL,
> proc_thermal_pci_resume);
> +static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm,
> proc_thermal_pci_suspend,
> +                        proc_thermal_pci_resume);
>  
>  static const struct pci_device_id proc_thermal_pci_ids[] = {
>         { PCI_DEVICE_DATA(INTEL, BDW_THERMAL, 0) },
Zhang Rui Sept. 9, 2021, 6:33 a.m. UTC | #2
On Wed, 2021-09-08 at 18:16 +0200, Antoine Tenart wrote:
> After upgrading to Linux 5.13.3 I started noticing my laptop would
> shutdown due to overheat (when it should not). It turned out this is
> due
> to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal:
> Fix tcc setting").
> 
> What happens is this drivers uses a global variable to keep track of
> the
> tcc offset (tcc_offset_save) and uses it on resume. The issue is this
> variable is initialized to 0, but is only set in
> tcc_offset_degree_celsius_store, i.e. when the tcc offset is
> explicitly
> set by userspace. If that does not happen, the resume path will set
> the
> offset to 0 (in my case the h/w default being 3, the offset would
> become
> too low after a suspend/resume cycle).
> 
> The issue did not arise before commit fe6a6de6692e, as the function
> setting the offset would return if the offset was 0. This is no
> longer
> the case (rightfully).
> 
> Fix this by keeping track of the offset in a suspend op, so the right
> value gets restored after a resume if it was not set by userspace.
> 
> The logic to restore the offset after a resume was there long before
> commit fe6a6de6692e, but as a value of 0 was considered invalid I'm
> referencing the commit that made the issue possible in the Fixes tag
> instead.
> 
> Fixes: fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix
> tcc setting")
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>

Acked-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui
> ---
> 
> Hi all,
> 
> Tbh I'm not familiar with this code and subsystem, I wrote this as it
> does fixes the issue and helps to start a discussion. If the fix
> looks
> good to you, great, but there might be better ways of fixing this.
> 
> It did work on my laptop, but other tests are welcomed!
> 
> One note: I chose not to return an error in suspend, as I'm not sure
> the
> resume would be called if that happens. If that's not the case, I can
> propagate the issue. Also setting tcc_offset_save to -1 might not be
> needed, if ->suspend is guaranteed to be called before ->resume.
> 
> Thanks!
> Antoine
> 
>  .../intel/int340x_thermal/int3401_thermal.c   |  8 +++-
>  .../processor_thermal_device.c                | 39 ++++++++++++++---
> --
>  .../processor_thermal_device.h                |  1 +
>  .../processor_thermal_device_pci.c            | 18 ++++++++-
>  .../processor_thermal_device_pci_legacy.c     |  8 +++-
>  5 files changed, 62 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> index acebc8ba94e2..217786fba185 100644
> --- a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> +++ b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
> @@ -44,15 +44,21 @@ static int int3401_remove(struct platform_device
> *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int int3401_thermal_suspend(struct device *dev)
> +{
> +	return proc_thermal_suspend(dev);
> +}
>  static int int3401_thermal_resume(struct device *dev)
>  {
>  	return proc_thermal_resume(dev);
>  }
>  #else
> +#define int3401_thermal_suspend NULL
>  #define int3401_thermal_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm, NULL,
> int3401_thermal_resume);
> +static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm,
> int3401_thermal_suspend,
> +			 int3401_thermal_resume);
>  
>  static struct platform_driver int3401_driver = {
>  	.probe = int3401_add,
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> index 0f0038af2ad4..a8d98f1bd6c6 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> @@ -68,8 +68,7 @@ static const struct attribute_group
> power_limit_attribute_group = {
>  	.name = "power_limits"
>  };
>  
> -static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
> -			       struct device_attribute *attr, char
> *buf)
> +static int tcc_get_offset(void)
>  {
>  	u64 val;
>  	int err;
> @@ -78,8 +77,20 @@ static ssize_t
> tcc_offset_degree_celsius_show(struct device *dev,
>  	if (err)
>  		return err;
>  
> -	val = (val >> 24) & 0x3f;
> -	return sprintf(buf, "%d\n", (int)val);
> +	return (val >> 24) & 0x3f;
> +}
> +
> +static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
> +					      struct device_attribute
> *attr,
> +					      char *buf)
> +{
> +	int tcc;
> +
> +	tcc = tcc_get_offset();
> +	if (tcc < 0)
> +		return tcc;
> +
> +	return sprintf(buf, "%d\n", tcc);
>  }
>  
>  static int tcc_offset_update(unsigned int tcc)
> @@ -107,8 +118,6 @@ static int tcc_offset_update(unsigned int tcc)
>  	return 0;
>  }
>  
> -static unsigned int tcc_offset_save;
> -
>  static ssize_t tcc_offset_degree_celsius_store(struct device *dev,
>  				struct device_attribute *attr, const
> char *buf,
>  				size_t count)
> @@ -131,8 +140,6 @@ static ssize_t
> tcc_offset_degree_celsius_store(struct device *dev,
>  	if (err)
>  		return err;
>  
> -	tcc_offset_save = tcc;
> -
>  	return count;
>  }
>  
> @@ -345,6 +352,18 @@ void proc_thermal_remove(struct
> proc_thermal_device *proc_priv)
>  }
>  EXPORT_SYMBOL_GPL(proc_thermal_remove);
>  
> +static int tcc_offset_save = -1;
> +
> +int proc_thermal_suspend(struct device *dev)
> +{
> +	tcc_offset_save = tcc_get_offset();
> +	if (tcc_offset_save < 0)
> +		dev_warn(dev, "failed to save offset (%d)\n",
> tcc_offset_save);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(proc_thermal_suspend);
> +
>  int proc_thermal_resume(struct device *dev)
>  {
>  	struct proc_thermal_device *proc_dev;
> @@ -352,7 +371,9 @@ int proc_thermal_resume(struct device *dev)
>  	proc_dev = dev_get_drvdata(dev);
>  	proc_thermal_read_ppcc(proc_dev);
>  
> -	tcc_offset_update(tcc_offset_save);
> +	/* Do not update if saving failed */
> +	if (tcc_offset_save >= 0)
> +		tcc_offset_update(tcc_offset_save);
>  
>  	return 0;
>  }
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> index 5a1cfe4864f1..c1d8de6dc3d1 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
> @@ -83,6 +83,7 @@ void proc_thermal_mbox_remove(struct pci_dev
> *pdev);
>  int processor_thermal_send_mbox_cmd(struct pci_dev *pdev, u16
> cmd_id, u32 cmd_data, u32 *cmd_resp);
>  int proc_thermal_add(struct device *dev, struct proc_thermal_device
> *priv);
>  void proc_thermal_remove(struct proc_thermal_device *proc_priv);
> +int proc_thermal_suspend(struct device *dev);
>  int proc_thermal_resume(struct device *dev);
>  int proc_thermal_mmio_add(struct pci_dev *pdev,
>  			  struct proc_thermal_device *proc_priv,
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.
> c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.
> c
> index 11dd2e825f4f..b4bcd3fe9eb2 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.
> c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.
> c
> @@ -314,6 +314,20 @@ static void proc_thermal_pci_remove(struct
> pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int proc_thermal_pci_suspend(struct device *dev)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct proc_thermal_device *proc_priv;
> +	struct proc_thermal_pci *pci_info;
> +
> +	proc_priv = pci_get_drvdata(pdev);
> +	pci_info = proc_priv->priv_data;
> +
> +	if (!pci_info->no_legacy)
> +		return proc_thermal_suspend(dev);
> +
> +	return 0;
> +}
>  static int proc_thermal_pci_resume(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
> @@ -335,10 +349,12 @@ static int proc_thermal_pci_resume(struct
> device *dev)
>  	return 0;
>  }
>  #else
> +#define proc_thermal_pci_suspend NULL
>  #define proc_thermal_pci_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL,
> proc_thermal_pci_resume);
> +static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm,
> proc_thermal_pci_suspend,
> +			 proc_thermal_pci_resume);
>  
>  static const struct pci_device_id proc_thermal_pci_ids[] = {
>  	{ PCI_DEVICE_DATA(INTEL, ADL_THERMAL, PROC_THERMAL_FEATURE_RAPL
> | PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS |
> PROC_THERMAL_FEATURE_MBOX) },
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_
> legacy.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_
> legacy.c
> index f5fc1791b11e..4571a1a53b84 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_
> legacy.c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_
> legacy.c
> @@ -107,15 +107,21 @@ static void proc_thermal_pci_remove(struct
> pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> +static int proc_thermal_pci_suspend(struct device *dev)
> +{
> +	return proc_thermal_suspend(dev);
> +}
>  static int proc_thermal_pci_resume(struct device *dev)
>  {
>  	return proc_thermal_resume(dev);
>  }
>  #else
> +#define proc_thermal_pci_suspend NULL
>  #define proc_thermal_pci_resume NULL
>  #endif
>  
> -static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL,
> proc_thermal_pci_resume);
> +static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm,
> proc_thermal_pci_suspend,
> +			 proc_thermal_pci_resume);
>  
>  static const struct pci_device_id proc_thermal_pci_ids[] = {
>  	{ PCI_DEVICE_DATA(INTEL, BDW_THERMAL, 0) },
Antoine Tenart Sept. 9, 2021, 7:09 a.m. UTC | #3
Quoting Srinivas Pandruvada (2021-09-09 07:47:23)
> 
> We have to mark for stable. But It mayn't cleanly apply. So we have to
> submit for 5.12+ kernels.

If this can't be easily applied on stable kernels another approach could
be to apply a simple fix (similar to the previous logic) for older
kernels[1] and applying this patch only for 5.15+.

[1] Something like:

  -static unsigned int tcc_offset_save;
  +static unsigned int tcc_offset_save = -1;

   int proc_thermal_resume(struct device *dev)
   {
          struct proc_thermal_device *proc_dev;
  @@ -352,7 +371,9 @@ int proc_thermal_resume(struct device *dev)
          proc_dev = dev_get_drvdata(dev);
          proc_thermal_read_ppcc(proc_dev);

  -       tcc_offset_update(tcc_offset_save);
  +       if (tcc_offset_save >= 0)
  +               tcc_offset_update(tcc_offset_save);

          return 0;
   }

Let me know if you prefer this approach.

Thanks,
Antoine
srinivas pandruvada Sept. 9, 2021, 7:30 a.m. UTC | #4
On Thu, 2021-09-09 at 09:09 +0200, Antoine Tenart wrote:
> Quoting Srinivas Pandruvada (2021-09-09 07:47:23)
> > 
> > We have to mark for stable. But It mayn't cleanly apply. So we have
> > to
> > submit for 5.12+ kernels.
> 
> If this can't be easily applied on stable kernels another approach
> could
> be to apply a simple fix (similar to the previous logic) for older
> kernels[1] and applying this patch only for 5.15+.
> 
> [1] Something like:
> 
>   -static unsigned int tcc_offset_save;
>   +static unsigned int tcc_offset_save = -1;
> 
>    int proc_thermal_resume(struct device *dev)
>    {
>           struct proc_thermal_device *proc_dev;
>   @@ -352,7 +371,9 @@ int proc_thermal_resume(struct device *dev)
>           proc_dev = dev_get_drvdata(dev);
>           proc_thermal_read_ppcc(proc_dev);
> 
>   -       tcc_offset_update(tcc_offset_save);
>   +       if (tcc_offset_save >= 0)
>   +               tcc_offset_update(tcc_offset_save);
> 
>           return 0;

This is a better approach. First submit this change with simple fix and
mark for stable and the other fix for 5.15 kernel.

Thanks,
Srinivas

>    }
> 
> Let me know if you prefer this approach.
> 
> Thanks,
> Antoine
Antoine Tenart Sept. 9, 2021, 7:39 a.m. UTC | #5
Quoting Srinivas Pandruvada (2021-09-09 09:30:17)
> On Thu, 2021-09-09 at 09:09 +0200, Antoine Tenart wrote:
> > Quoting Srinivas Pandruvada (2021-09-09 07:47:23)
> > > 
> > > We have to mark for stable. But It mayn't cleanly apply. So we have
> > > to
> > > submit for 5.12+ kernels.
> > 
> > If this can't be easily applied on stable kernels another approach
> > could
> > be to apply a simple fix (similar to the previous logic) for older
> > kernels[1] and applying this patch only for 5.15+.
> > 
> > [1] Something like:
> > 
> >   -static unsigned int tcc_offset_save;
> >   +static unsigned int tcc_offset_save = -1;
> > 
> >    int proc_thermal_resume(struct device *dev)
> >    {
> >           struct proc_thermal_device *proc_dev;
> >   @@ -352,7 +371,9 @@ int proc_thermal_resume(struct device *dev)
> >           proc_dev = dev_get_drvdata(dev);
> >           proc_thermal_read_ppcc(proc_dev);
> > 
> >   -       tcc_offset_update(tcc_offset_save);
> >   +       if (tcc_offset_save >= 0)
> >   +               tcc_offset_update(tcc_offset_save);
> > 
> >           return 0;
> 
> This is a better approach. First submit this change with simple fix and
> mark for stable and the other fix for 5.15 kernel.

All right, will do.

Antoine
diff mbox series

Patch

diff --git a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
index acebc8ba94e2..217786fba185 100644
--- a/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
+++ b/drivers/thermal/intel/int340x_thermal/int3401_thermal.c
@@ -44,15 +44,21 @@  static int int3401_remove(struct platform_device *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
+static int int3401_thermal_suspend(struct device *dev)
+{
+	return proc_thermal_suspend(dev);
+}
 static int int3401_thermal_resume(struct device *dev)
 {
 	return proc_thermal_resume(dev);
 }
 #else
+#define int3401_thermal_suspend NULL
 #define int3401_thermal_resume NULL
 #endif
 
-static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm, NULL, int3401_thermal_resume);
+static SIMPLE_DEV_PM_OPS(int3401_proc_thermal_pm, int3401_thermal_suspend,
+			 int3401_thermal_resume);
 
 static struct platform_driver int3401_driver = {
 	.probe = int3401_add,
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 0f0038af2ad4..a8d98f1bd6c6 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -68,8 +68,7 @@  static const struct attribute_group power_limit_attribute_group = {
 	.name = "power_limits"
 };
 
-static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
-			       struct device_attribute *attr, char *buf)
+static int tcc_get_offset(void)
 {
 	u64 val;
 	int err;
@@ -78,8 +77,20 @@  static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
 	if (err)
 		return err;
 
-	val = (val >> 24) & 0x3f;
-	return sprintf(buf, "%d\n", (int)val);
+	return (val >> 24) & 0x3f;
+}
+
+static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
+					      struct device_attribute *attr,
+					      char *buf)
+{
+	int tcc;
+
+	tcc = tcc_get_offset();
+	if (tcc < 0)
+		return tcc;
+
+	return sprintf(buf, "%d\n", tcc);
 }
 
 static int tcc_offset_update(unsigned int tcc)
@@ -107,8 +118,6 @@  static int tcc_offset_update(unsigned int tcc)
 	return 0;
 }
 
-static unsigned int tcc_offset_save;
-
 static ssize_t tcc_offset_degree_celsius_store(struct device *dev,
 				struct device_attribute *attr, const char *buf,
 				size_t count)
@@ -131,8 +140,6 @@  static ssize_t tcc_offset_degree_celsius_store(struct device *dev,
 	if (err)
 		return err;
 
-	tcc_offset_save = tcc;
-
 	return count;
 }
 
@@ -345,6 +352,18 @@  void proc_thermal_remove(struct proc_thermal_device *proc_priv)
 }
 EXPORT_SYMBOL_GPL(proc_thermal_remove);
 
+static int tcc_offset_save = -1;
+
+int proc_thermal_suspend(struct device *dev)
+{
+	tcc_offset_save = tcc_get_offset();
+	if (tcc_offset_save < 0)
+		dev_warn(dev, "failed to save offset (%d)\n", tcc_offset_save);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(proc_thermal_suspend);
+
 int proc_thermal_resume(struct device *dev)
 {
 	struct proc_thermal_device *proc_dev;
@@ -352,7 +371,9 @@  int proc_thermal_resume(struct device *dev)
 	proc_dev = dev_get_drvdata(dev);
 	proc_thermal_read_ppcc(proc_dev);
 
-	tcc_offset_update(tcc_offset_save);
+	/* Do not update if saving failed */
+	if (tcc_offset_save >= 0)
+		tcc_offset_update(tcc_offset_save);
 
 	return 0;
 }
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
index 5a1cfe4864f1..c1d8de6dc3d1 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
@@ -83,6 +83,7 @@  void proc_thermal_mbox_remove(struct pci_dev *pdev);
 int processor_thermal_send_mbox_cmd(struct pci_dev *pdev, u16 cmd_id, u32 cmd_data, u32 *cmd_resp);
 int proc_thermal_add(struct device *dev, struct proc_thermal_device *priv);
 void proc_thermal_remove(struct proc_thermal_device *proc_priv);
+int proc_thermal_suspend(struct device *dev);
 int proc_thermal_resume(struct device *dev);
 int proc_thermal_mmio_add(struct pci_dev *pdev,
 			  struct proc_thermal_device *proc_priv,
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index 11dd2e825f4f..b4bcd3fe9eb2 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -314,6 +314,20 @@  static void proc_thermal_pci_remove(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
+static int proc_thermal_pci_suspend(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct proc_thermal_device *proc_priv;
+	struct proc_thermal_pci *pci_info;
+
+	proc_priv = pci_get_drvdata(pdev);
+	pci_info = proc_priv->priv_data;
+
+	if (!pci_info->no_legacy)
+		return proc_thermal_suspend(dev);
+
+	return 0;
+}
 static int proc_thermal_pci_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -335,10 +349,12 @@  static int proc_thermal_pci_resume(struct device *dev)
 	return 0;
 }
 #else
+#define proc_thermal_pci_suspend NULL
 #define proc_thermal_pci_resume NULL
 #endif
 
-static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL, proc_thermal_pci_resume);
+static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, proc_thermal_pci_suspend,
+			 proc_thermal_pci_resume);
 
 static const struct pci_device_id proc_thermal_pci_ids[] = {
 	{ PCI_DEVICE_DATA(INTEL, ADL_THERMAL, PROC_THERMAL_FEATURE_RAPL | PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_MBOX) },
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_legacy.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_legacy.c
index f5fc1791b11e..4571a1a53b84 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_legacy.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci_legacy.c
@@ -107,15 +107,21 @@  static void proc_thermal_pci_remove(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
+static int proc_thermal_pci_suspend(struct device *dev)
+{
+	return proc_thermal_suspend(dev);
+}
 static int proc_thermal_pci_resume(struct device *dev)
 {
 	return proc_thermal_resume(dev);
 }
 #else
+#define proc_thermal_pci_suspend NULL
 #define proc_thermal_pci_resume NULL
 #endif
 
-static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, NULL, proc_thermal_pci_resume);
+static SIMPLE_DEV_PM_OPS(proc_thermal_pci_pm, proc_thermal_pci_suspend,
+			 proc_thermal_pci_resume);
 
 static const struct pci_device_id proc_thermal_pci_ids[] = {
 	{ PCI_DEVICE_DATA(INTEL, BDW_THERMAL, 0) },