Message ID | 20240110030704.11305-1-ricardo.neri-calderon@linux.intel.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | [v3] thermal: intel: hfi: Add syscore callbacks | expand |
On Wed, Jan 10, 2024 at 4:05 AM Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote: > > The kernel allocates a memory buffer and provides its location to the > hardware, which uses it to update the HFI table. This allocation occurs > during boot and remains constant throughout runtime. > > When resuming from hibernation, the restore kernel allocates a second > memory buffer and reprograms the HFI hardware with the new location as part > of a normal boot. The location of the second memory buffer may differ from > the one allocated by the image kernel. When the restore kernel transfers > control to the image kernel, the second buffer becomes invalid, potentially > leading to memory corruption if the hardware writes to it (the hardware > continues using the buffer from the restore kernel). > > It is also possible that the hardware forgets the address of the memory > buffer when resuming from "deep" suspend. Memory corruption may also occur > in such a scenario. > > To prevent the described memory corruption, disable HFI when preparing to > suspend or hibernate. Enable it when resuming. > > Add syscore callbacks to handle the package of the boot CPU (packages of > non-boot CPUs are handled via CPU offline). Syscore ops always run on the > boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend > and hibernation. Syscore ops only run in these cases. > > Cc: Chen Yu <yu.c.chen@intel.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> > Cc: Zhang Rui <rui.zhang@intel.com> > Cc: Zhao Liu <zhao1.liu@linux.intel.com> > Cc: linux-pm@vger.kernel.org > Cc: stable@vger.kernel.org # 6.1 > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > This is v3 of the patch [4/4] of [1]. Rafael has already accepted the > first three patches. Thanks! > > I tested this patch on the testing branch of Rafael on Jan 9th. 2024. I > tested it on a Meteor Lake system. It completed 1000 hibernation-resume > cycles. > > [1]. https://lore.kernel.org/all/20240103041459.11113-1-ricardo.neri-calderon@linux.intel.com/ > > --- > Changes since v2: > * Switched to syscore ops instead of a suspend notifier as it is a better > fit: we only need to handle the boot CPU, disabling HFI is only needed > for "deep" suspend and hibernation. (Rafael) > * Removed incorrect statement about the suspend flow in the commit > message. (Rafael) > > Changes since v1: > * Moved registration of the suspend notifier towards the end of > intel_hfi_init(). (Stan) > * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan > will use these functions outside the suspend notifier. (Stan) > * Added locking to calls to hfi_[enable|disable]() from the suspend > notifier. (Rafael) > --- > drivers/thermal/intel/intel_hfi.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c > index 22445403b520..768cd5f9eb32 100644 > --- a/drivers/thermal/intel/intel_hfi.c > +++ b/drivers/thermal/intel/intel_hfi.c > @@ -35,7 +35,9 @@ > #include <linux/processor.h> > #include <linux/slab.h> > #include <linux/spinlock.h> > +#include <linux/suspend.h> > #include <linux/string.h> > +#include <linux/syscore_ops.h> > #include <linux/topology.h> > #include <linux/workqueue.h> > > @@ -571,6 +573,30 @@ static __init int hfi_parse_features(void) > return 0; > } > > +static void hfi_do_enable(void) > +{ > + /* If we are here, we are on the boot CPU */ > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0); > + struct hfi_instance *hfi_instance = info->hfi_instance; > + > + /* No locking needed. There is no concurrency with CPU online. */ > + hfi_set_hw_table(hfi_instance); > + hfi_enable(); > +} > + > +static int hfi_do_disable(void) > +{ > + /* No locking needed. There is no concurrency with CPU offline. */ > + hfi_disable(); > + > + return 0; > +} > + > +static struct syscore_ops hfi_pm_ops = { > + .resume = hfi_do_enable, > + .suspend = hfi_do_disable, > +}; > + > void __init intel_hfi_init(void) > { > struct hfi_instance *hfi_instance; > @@ -602,6 +628,8 @@ void __init intel_hfi_init(void) > if (!hfi_updates_wq) > goto err_nomem; > > + register_syscore_ops(&hfi_pm_ops); > + > return; > > err_nomem: > -- Applied as 6.8-rc1 material, with edited subject and changelog, and I've reworded a comment in hfi_do_enable() to speak about the code. Thanks!
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 22445403b520..768cd5f9eb32 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -35,7 +35,9 @@ #include <linux/processor.h> #include <linux/slab.h> #include <linux/spinlock.h> +#include <linux/suspend.h> #include <linux/string.h> +#include <linux/syscore_ops.h> #include <linux/topology.h> #include <linux/workqueue.h> @@ -571,6 +573,30 @@ static __init int hfi_parse_features(void) return 0; } +static void hfi_do_enable(void) +{ + /* If we are here, we are on the boot CPU */ + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0); + struct hfi_instance *hfi_instance = info->hfi_instance; + + /* No locking needed. There is no concurrency with CPU online. */ + hfi_set_hw_table(hfi_instance); + hfi_enable(); +} + +static int hfi_do_disable(void) +{ + /* No locking needed. There is no concurrency with CPU offline. */ + hfi_disable(); + + return 0; +} + +static struct syscore_ops hfi_pm_ops = { + .resume = hfi_do_enable, + .suspend = hfi_do_disable, +}; + void __init intel_hfi_init(void) { struct hfi_instance *hfi_instance; @@ -602,6 +628,8 @@ void __init intel_hfi_init(void) if (!hfi_updates_wq) goto err_nomem; + register_syscore_ops(&hfi_pm_ops); + return; err_nomem:
The kernel allocates a memory buffer and provides its location to the hardware, which uses it to update the HFI table. This allocation occurs during boot and remains constant throughout runtime. When resuming from hibernation, the restore kernel allocates a second memory buffer and reprograms the HFI hardware with the new location as part of a normal boot. The location of the second memory buffer may differ from the one allocated by the image kernel. When the restore kernel transfers control to the image kernel, the second buffer becomes invalid, potentially leading to memory corruption if the hardware writes to it (the hardware continues using the buffer from the restore kernel). It is also possible that the hardware forgets the address of the memory buffer when resuming from "deep" suspend. Memory corruption may also occur in such a scenario. To prevent the described memory corruption, disable HFI when preparing to suspend or hibernate. Enable it when resuming. Add syscore callbacks to handle the package of the boot CPU (packages of non-boot CPUs are handled via CPU offline). Syscore ops always run on the boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend and hibernation. Syscore ops only run in these cases. Cc: Chen Yu <yu.c.chen@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Zhao Liu <zhao1.liu@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> --- This is v3 of the patch [4/4] of [1]. Rafael has already accepted the first three patches. Thanks! I tested this patch on the testing branch of Rafael on Jan 9th. 2024. I tested it on a Meteor Lake system. It completed 1000 hibernation-resume cycles. [1]. https://lore.kernel.org/all/20240103041459.11113-1-ricardo.neri-calderon@linux.intel.com/ --- Changes since v2: * Switched to syscore ops instead of a suspend notifier as it is a better fit: we only need to handle the boot CPU, disabling HFI is only needed for "deep" suspend and hibernation. (Rafael) * Removed incorrect statement about the suspend flow in the commit message. (Rafael) Changes since v1: * Moved registration of the suspend notifier towards the end of intel_hfi_init(). (Stan) * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan will use these functions outside the suspend notifier. (Stan) * Added locking to calls to hfi_[enable|disable]() from the suspend notifier. (Rafael) --- drivers/thermal/intel/intel_hfi.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)