Message ID | 164918880566.2022733.9710638662231385597.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | [v2] PM: CXL: Disable suspend | expand |
On Tue, Apr 5, 2022 at 10:00 PM Dan Williams <dan.j.williams@intel.com> wrote: > > The CXL specification claims S3 support at a hardware level, but at a > system software level there are some missing pieces. Section 9.4 (CXL > 2.0) rightly claims that "CXL mem adapters may need aux power to retain > memory context across S3", but there is no enumeration mechanism for the > OS to determine if a given adapter has that support. Moreover the save > state and resume image for the system may inadvertantly end up in a CXL > device that needs to be restored before the save state is recoverable. > I.e. a circular dependency that is not resolvable without a third party > save-area. > > Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly > allows for suspend, but requires unbinding all CXL memory devices before > the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind > flow is intended to also tear down all CXL memory regions associated > with a given cxl_memdev. > > It is reasonable to assume that any device participating in a System RAM > range published in the EFI memory map is covered by aux power and > save-area outside the device itself. So this restriction can be > minimized in the future once pre-existing region enumeration support > arrives, and perhaps a spec update to clarify if the EFI memory map is > sufficent for determining the range of devices managed by > platform-firmware for S3 support. > > Per Rafael, if the CXL configuration prevents suspend then it should > fail early before tasks are frozen, and mem_sleep should stop showing > 'mem' as an option [1]. Effectively CXL augments the platform suspend > ->valid() op since, for example, the ACPI ops are not aware of the CXL / > PCI dependencies. Given the split role of platform firmware vs OS > provisioned CXL memory it is up to the cxl_mem driver to determine if > the CXL configuration has elements that platform firmware may not be > prepared to restore. > > Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Pavel Machek <pavel@ucw.cz> > Cc: Len Brown <len.brown@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> The majority of the changes are in the CXL code, so Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> or please let me know if you want me to take this. Thanks! > --- > drivers/Makefile | 2 +- > drivers/cxl/Kconfig | 4 ++++ > drivers/cxl/Makefile | 2 +- > drivers/cxl/core/Makefile | 1 + > drivers/cxl/core/memdev.c | 1 - > drivers/cxl/core/suspend.c | 23 +++++++++++++++++++++++ > drivers/cxl/cxlmem.h | 11 +++++++++++ > drivers/cxl/mem.c | 22 +++++++++++++++++++++- > include/linux/pm.h | 9 +++++++++ > kernel/power/hibernate.c | 2 +- > kernel/power/main.c | 5 ++++- > kernel/power/suspend.c | 3 ++- > 12 files changed, 78 insertions(+), 7 deletions(-) > create mode 100644 drivers/cxl/core/suspend.c > > diff --git a/drivers/Makefile b/drivers/Makefile > index 020780b6b4d2..f735c4955143 100644 > --- a/drivers/Makefile > +++ b/drivers/Makefile > @@ -72,9 +72,9 @@ obj-$(CONFIG_PARPORT) += parport/ > obj-y += base/ block/ misc/ mfd/ nfc/ > obj-$(CONFIG_LIBNVDIMM) += nvdimm/ > obj-$(CONFIG_DAX) += dax/ > -obj-$(CONFIG_CXL_BUS) += cxl/ > obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ > obj-$(CONFIG_NUBUS) += nubus/ > +obj-y += cxl/ > obj-y += macintosh/ > obj-y += scsi/ > obj-y += nvme/ > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > index b88ab956bb7c..f64e3984689f 100644 > --- a/drivers/cxl/Kconfig > +++ b/drivers/cxl/Kconfig > @@ -98,4 +98,8 @@ config CXL_PORT > default CXL_BUS > tristate > > +config CXL_SUSPEND > + def_bool y > + depends on SUSPEND && CXL_MEM > + > endif > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile > index ce267ef11d93..a78270794150 100644 > --- a/drivers/cxl/Makefile > +++ b/drivers/cxl/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > -obj-$(CONFIG_CXL_BUS) += core/ > +obj-y += core/ > obj-$(CONFIG_CXL_PCI) += cxl_pci.o > obj-$(CONFIG_CXL_MEM) += cxl_mem.o > obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile > index 6d37cd78b151..9d35085d25af 100644 > --- a/drivers/cxl/core/Makefile > +++ b/drivers/cxl/core/Makefile > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-$(CONFIG_CXL_BUS) += cxl_core.o > +obj-$(CONFIG_CXL_SUSPEND) += suspend.o > > ccflags-y += -I$(srctree)/drivers/cxl > cxl_core-y := port.o > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > index 1f76b28f9826..efe4d2e9bfef 100644 > --- a/drivers/cxl/core/memdev.c > +++ b/drivers/cxl/core/memdev.c > @@ -251,7 +251,6 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > dev->bus = &cxl_bus_type; > dev->devt = MKDEV(cxl_mem_major, cxlmd->id); > dev->type = &cxl_memdev_type; > - device_set_pm_not_required(dev); > INIT_WORK(&cxlmd->detach_work, detach_memdev); > > cdev = &cxlmd->cdev; > diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c > new file mode 100644 > index 000000000000..88bdbe30a1df > --- /dev/null > +++ b/drivers/cxl/core/suspend.c > @@ -0,0 +1,23 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ > +#include <linux/atomic.h> > +#include <linux/export.h> > + > +static atomic_t mem_active; > + > +bool cxl_mem_active(void) > +{ > + return atomic_read(&mem_active) != 0; > +} > + > +void cxl_mem_active_inc(void) > +{ > + atomic_inc(&mem_active); > +} > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL); > + > +void cxl_mem_active_dec(void) > +{ > + atomic_dec(&mem_active); > +} > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL); > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index 243dd86a8b46..7235d2f976e5 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -353,6 +353,17 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds); > struct cxl_dev_state *cxl_dev_state_create(struct device *dev); > void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > +#ifdef CONFIG_CXL_SUSPEND > +void cxl_mem_active_inc(void); > +void cxl_mem_active_dec(void); > +#else > +static inline void cxl_mem_active_inc(void) > +{ > +} > +static inline void cxl_mem_active_dec(void) > +{ > +} > +#endif > > struct cxl_hdm { > struct cxl_component_regs regs; > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > index 49a4b1c47299..0576d2d3df07 100644 > --- a/drivers/cxl/mem.c > +++ b/drivers/cxl/mem.c > @@ -129,6 +129,11 @@ __mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds) > return do_hdm_init; > } > > +static void enable_suspend(void *data) > +{ > + cxl_mem_active_dec(); > +} > + > static int cxl_mem_probe(struct device *dev) > { > struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > @@ -207,7 +212,22 @@ static int cxl_mem_probe(struct device *dev) > out: > cxl_device_unlock(&parent_port->dev); > put_device(&parent_port->dev); > - return rc; > + > + /* > + * The kernel may be operating out of CXL memory on this device, > + * there is no spec defined way to determine whether this device > + * preserves contents over suspend, and there is no simple way > + * to arrange for the suspend image to avoid CXL memory which > + * would setup a circular dependency between PCI resume and save > + * state restoration. > + * > + * TODO: support suspend when all the regions this device is > + * hosting are locked and covered by the system address map, > + * i.e. platform firmware owns restoring the HDM configuration > + * that it locked. > + */ > + cxl_mem_active_inc(); > + return devm_add_action_or_reset(dev, enable_suspend, NULL); > } > > static struct cxl_driver cxl_mem_driver = { > diff --git a/include/linux/pm.h b/include/linux/pm.h > index e65b3ab28377..7911c4c9a7be 100644 > --- a/include/linux/pm.h > +++ b/include/linux/pm.h > @@ -36,6 +36,15 @@ static inline void pm_vt_switch_unregister(struct device *dev) > } > #endif /* CONFIG_VT_CONSOLE_SLEEP */ > > +#ifdef CONFIG_CXL_SUSPEND > +bool cxl_mem_active(void); > +#else > +static inline bool cxl_mem_active(void) > +{ > + return false; > +} > +#endif > + > /* > * Device power management > */ > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > index 938d5c78b421..20a66bf9f465 100644 > --- a/kernel/power/hibernate.c > +++ b/kernel/power/hibernate.c > @@ -83,7 +83,7 @@ bool hibernation_available(void) > { > return nohibernate == 0 && > !security_locked_down(LOCKDOWN_HIBERNATION) && > - !secretmem_active(); > + !secretmem_active() && !cxl_mem_active(); > } > > /** > diff --git a/kernel/power/main.c b/kernel/power/main.c > index 7e646079fbeb..3e6be1c33e0b 100644 > --- a/kernel/power/main.c > +++ b/kernel/power/main.c > @@ -127,7 +127,9 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > char *s = buf; > suspend_state_t i; > > - for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) > + for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) { > + if (i >= PM_SUSPEND_MEM && cxl_mem_active()) > + continue; > if (mem_sleep_states[i]) { > const char *label = mem_sleep_states[i]; > > @@ -136,6 +138,7 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > else > s += sprintf(s, "%s ", label); > } > + } > > /* Convert the last space to a newline if needed. */ > if (s != buf) > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 6fcdee7e87a5..827075944d28 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -236,7 +236,8 @@ EXPORT_SYMBOL_GPL(suspend_valid_only_mem); > > static bool sleep_state_supported(suspend_state_t state) > { > - return state == PM_SUSPEND_TO_IDLE || valid_state(state); > + return state == PM_SUSPEND_TO_IDLE || > + (valid_state(state) && !cxl_mem_active()); > } > > static int platform_suspend_prepare(suspend_state_t state) >
On Tue, Apr 12, 2022 at 10:02 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Apr 5, 2022 at 10:00 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > The CXL specification claims S3 support at a hardware level, but at a > > system software level there are some missing pieces. Section 9.4 (CXL > > 2.0) rightly claims that "CXL mem adapters may need aux power to retain > > memory context across S3", but there is no enumeration mechanism for the > > OS to determine if a given adapter has that support. Moreover the save > > state and resume image for the system may inadvertantly end up in a CXL > > device that needs to be restored before the save state is recoverable. > > I.e. a circular dependency that is not resolvable without a third party > > save-area. > > > > Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly > > allows for suspend, but requires unbinding all CXL memory devices before > > the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind > > flow is intended to also tear down all CXL memory regions associated > > with a given cxl_memdev. > > > > It is reasonable to assume that any device participating in a System RAM > > range published in the EFI memory map is covered by aux power and > > save-area outside the device itself. So this restriction can be > > minimized in the future once pre-existing region enumeration support > > arrives, and perhaps a spec update to clarify if the EFI memory map is > > sufficent for determining the range of devices managed by > > platform-firmware for S3 support. > > > > Per Rafael, if the CXL configuration prevents suspend then it should > > fail early before tasks are frozen, and mem_sleep should stop showing > > 'mem' as an option [1]. Effectively CXL augments the platform suspend > > ->valid() op since, for example, the ACPI ops are not aware of the CXL / > > PCI dependencies. Given the split role of platform firmware vs OS > > provisioned CXL memory it is up to the cxl_mem driver to determine if > > the CXL configuration has elements that platform firmware may not be > > prepared to restore. > > > > Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] > > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > > Cc: Pavel Machek <pavel@ucw.cz> > > Cc: Len Brown <len.brown@intel.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > The majority of the changes are in the CXL code, so > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > or please let me know if you want me to take this. Thanks Rafael, I'll do one more update to return the call to device_set_pm_not_required() since with this new scheme the CXL dev_pm_ops are not used. I'll take this through cxl.git. > > Thanks! > > > --- > > drivers/Makefile | 2 +- > > drivers/cxl/Kconfig | 4 ++++ > > drivers/cxl/Makefile | 2 +- > > drivers/cxl/core/Makefile | 1 + > > drivers/cxl/core/memdev.c | 1 - > > drivers/cxl/core/suspend.c | 23 +++++++++++++++++++++++ > > drivers/cxl/cxlmem.h | 11 +++++++++++ > > drivers/cxl/mem.c | 22 +++++++++++++++++++++- > > include/linux/pm.h | 9 +++++++++ > > kernel/power/hibernate.c | 2 +- > > kernel/power/main.c | 5 ++++- > > kernel/power/suspend.c | 3 ++- > > 12 files changed, 78 insertions(+), 7 deletions(-) > > create mode 100644 drivers/cxl/core/suspend.c > > > > diff --git a/drivers/Makefile b/drivers/Makefile > > index 020780b6b4d2..f735c4955143 100644 > > --- a/drivers/Makefile > > +++ b/drivers/Makefile > > @@ -72,9 +72,9 @@ obj-$(CONFIG_PARPORT) += parport/ > > obj-y += base/ block/ misc/ mfd/ nfc/ > > obj-$(CONFIG_LIBNVDIMM) += nvdimm/ > > obj-$(CONFIG_DAX) += dax/ > > -obj-$(CONFIG_CXL_BUS) += cxl/ > > obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ > > obj-$(CONFIG_NUBUS) += nubus/ > > +obj-y += cxl/ > > obj-y += macintosh/ > > obj-y += scsi/ > > obj-y += nvme/ > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index b88ab956bb7c..f64e3984689f 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -98,4 +98,8 @@ config CXL_PORT > > default CXL_BUS > > tristate > > > > +config CXL_SUSPEND > > + def_bool y > > + depends on SUSPEND && CXL_MEM > > + > > endif > > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile > > index ce267ef11d93..a78270794150 100644 > > --- a/drivers/cxl/Makefile > > +++ b/drivers/cxl/Makefile > > @@ -1,5 +1,5 @@ > > # SPDX-License-Identifier: GPL-2.0 > > -obj-$(CONFIG_CXL_BUS) += core/ > > +obj-y += core/ > > obj-$(CONFIG_CXL_PCI) += cxl_pci.o > > obj-$(CONFIG_CXL_MEM) += cxl_mem.o > > obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o > > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile > > index 6d37cd78b151..9d35085d25af 100644 > > --- a/drivers/cxl/core/Makefile > > +++ b/drivers/cxl/core/Makefile > > @@ -1,5 +1,6 @@ > > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_CXL_BUS) += cxl_core.o > > +obj-$(CONFIG_CXL_SUSPEND) += suspend.o > > > > ccflags-y += -I$(srctree)/drivers/cxl > > cxl_core-y := port.o > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > > index 1f76b28f9826..efe4d2e9bfef 100644 > > --- a/drivers/cxl/core/memdev.c > > +++ b/drivers/cxl/core/memdev.c > > @@ -251,7 +251,6 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > > dev->bus = &cxl_bus_type; > > dev->devt = MKDEV(cxl_mem_major, cxlmd->id); > > dev->type = &cxl_memdev_type; > > - device_set_pm_not_required(dev); > > INIT_WORK(&cxlmd->detach_work, detach_memdev); > > > > cdev = &cxlmd->cdev; > > diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c > > new file mode 100644 > > index 000000000000..88bdbe30a1df > > --- /dev/null > > +++ b/drivers/cxl/core/suspend.c > > @@ -0,0 +1,23 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ > > +#include <linux/atomic.h> > > +#include <linux/export.h> > > + > > +static atomic_t mem_active; > > + > > +bool cxl_mem_active(void) > > +{ > > + return atomic_read(&mem_active) != 0; > > +} > > + > > +void cxl_mem_active_inc(void) > > +{ > > + atomic_inc(&mem_active); > > +} > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL); > > + > > +void cxl_mem_active_dec(void) > > +{ > > + atomic_dec(&mem_active); > > +} > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL); > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > > index 243dd86a8b46..7235d2f976e5 100644 > > --- a/drivers/cxl/cxlmem.h > > +++ b/drivers/cxl/cxlmem.h > > @@ -353,6 +353,17 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds); > > struct cxl_dev_state *cxl_dev_state_create(struct device *dev); > > void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > > void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > > +#ifdef CONFIG_CXL_SUSPEND > > +void cxl_mem_active_inc(void); > > +void cxl_mem_active_dec(void); > > +#else > > +static inline void cxl_mem_active_inc(void) > > +{ > > +} > > +static inline void cxl_mem_active_dec(void) > > +{ > > +} > > +#endif > > > > struct cxl_hdm { > > struct cxl_component_regs regs; > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 49a4b1c47299..0576d2d3df07 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -129,6 +129,11 @@ __mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds) > > return do_hdm_init; > > } > > > > +static void enable_suspend(void *data) > > +{ > > + cxl_mem_active_dec(); > > +} > > + > > static int cxl_mem_probe(struct device *dev) > > { > > struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > > @@ -207,7 +212,22 @@ static int cxl_mem_probe(struct device *dev) > > out: > > cxl_device_unlock(&parent_port->dev); > > put_device(&parent_port->dev); > > - return rc; > > + > > + /* > > + * The kernel may be operating out of CXL memory on this device, > > + * there is no spec defined way to determine whether this device > > + * preserves contents over suspend, and there is no simple way > > + * to arrange for the suspend image to avoid CXL memory which > > + * would setup a circular dependency between PCI resume and save > > + * state restoration. > > + * > > + * TODO: support suspend when all the regions this device is > > + * hosting are locked and covered by the system address map, > > + * i.e. platform firmware owns restoring the HDM configuration > > + * that it locked. > > + */ > > + cxl_mem_active_inc(); > > + return devm_add_action_or_reset(dev, enable_suspend, NULL); > > } > > > > static struct cxl_driver cxl_mem_driver = { > > diff --git a/include/linux/pm.h b/include/linux/pm.h > > index e65b3ab28377..7911c4c9a7be 100644 > > --- a/include/linux/pm.h > > +++ b/include/linux/pm.h > > @@ -36,6 +36,15 @@ static inline void pm_vt_switch_unregister(struct device *dev) > > } > > #endif /* CONFIG_VT_CONSOLE_SLEEP */ > > > > +#ifdef CONFIG_CXL_SUSPEND > > +bool cxl_mem_active(void); > > +#else > > +static inline bool cxl_mem_active(void) > > +{ > > + return false; > > +} > > +#endif > > + > > /* > > * Device power management > > */ > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > > index 938d5c78b421..20a66bf9f465 100644 > > --- a/kernel/power/hibernate.c > > +++ b/kernel/power/hibernate.c > > @@ -83,7 +83,7 @@ bool hibernation_available(void) > > { > > return nohibernate == 0 && > > !security_locked_down(LOCKDOWN_HIBERNATION) && > > - !secretmem_active(); > > + !secretmem_active() && !cxl_mem_active(); > > } > > > > /** > > diff --git a/kernel/power/main.c b/kernel/power/main.c > > index 7e646079fbeb..3e6be1c33e0b 100644 > > --- a/kernel/power/main.c > > +++ b/kernel/power/main.c > > @@ -127,7 +127,9 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > > char *s = buf; > > suspend_state_t i; > > > > - for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) > > + for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) { > > + if (i >= PM_SUSPEND_MEM && cxl_mem_active()) > > + continue; > > if (mem_sleep_states[i]) { > > const char *label = mem_sleep_states[i]; > > > > @@ -136,6 +138,7 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > > else > > s += sprintf(s, "%s ", label); > > } > > + } > > > > /* Convert the last space to a newline if needed. */ > > if (s != buf) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 6fcdee7e87a5..827075944d28 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -236,7 +236,8 @@ EXPORT_SYMBOL_GPL(suspend_valid_only_mem); > > > > static bool sleep_state_supported(suspend_state_t state) > > { > > - return state == PM_SUSPEND_TO_IDLE || valid_state(state); > > + return state == PM_SUSPEND_TO_IDLE || > > + (valid_state(state) && !cxl_mem_active()); > > } > > > > static int platform_suspend_prepare(suspend_state_t state) > >
diff --git a/drivers/Makefile b/drivers/Makefile index 020780b6b4d2..f735c4955143 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -72,9 +72,9 @@ obj-$(CONFIG_PARPORT) += parport/ obj-y += base/ block/ misc/ mfd/ nfc/ obj-$(CONFIG_LIBNVDIMM) += nvdimm/ obj-$(CONFIG_DAX) += dax/ -obj-$(CONFIG_CXL_BUS) += cxl/ obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ obj-$(CONFIG_NUBUS) += nubus/ +obj-y += cxl/ obj-y += macintosh/ obj-y += scsi/ obj-y += nvme/ diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index b88ab956bb7c..f64e3984689f 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -98,4 +98,8 @@ config CXL_PORT default CXL_BUS tristate +config CXL_SUSPEND + def_bool y + depends on SUSPEND && CXL_MEM + endif diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index ce267ef11d93..a78270794150 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_CXL_BUS) += core/ +obj-y += core/ obj-$(CONFIG_CXL_PCI) += cxl_pci.o obj-$(CONFIG_CXL_MEM) += cxl_mem.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 6d37cd78b151..9d35085d25af 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_CXL_BUS) += cxl_core.o +obj-$(CONFIG_CXL_SUSPEND) += suspend.o ccflags-y += -I$(srctree)/drivers/cxl cxl_core-y := port.o diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 1f76b28f9826..efe4d2e9bfef 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -251,7 +251,6 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, dev->bus = &cxl_bus_type; dev->devt = MKDEV(cxl_mem_major, cxlmd->id); dev->type = &cxl_memdev_type; - device_set_pm_not_required(dev); INIT_WORK(&cxlmd->detach_work, detach_memdev); cdev = &cxlmd->cdev; diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c new file mode 100644 index 000000000000..88bdbe30a1df --- /dev/null +++ b/drivers/cxl/core/suspend.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ +#include <linux/atomic.h> +#include <linux/export.h> + +static atomic_t mem_active; + +bool cxl_mem_active(void) +{ + return atomic_read(&mem_active) != 0; +} + +void cxl_mem_active_inc(void) +{ + atomic_inc(&mem_active); +} +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL); + +void cxl_mem_active_dec(void) +{ + atomic_dec(&mem_active); +} +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 243dd86a8b46..7235d2f976e5 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -353,6 +353,17 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds); struct cxl_dev_state *cxl_dev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); +#ifdef CONFIG_CXL_SUSPEND +void cxl_mem_active_inc(void); +void cxl_mem_active_dec(void); +#else +static inline void cxl_mem_active_inc(void) +{ +} +static inline void cxl_mem_active_dec(void) +{ +} +#endif struct cxl_hdm { struct cxl_component_regs regs; diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 49a4b1c47299..0576d2d3df07 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -129,6 +129,11 @@ __mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds) return do_hdm_init; } +static void enable_suspend(void *data) +{ + cxl_mem_active_dec(); +} + static int cxl_mem_probe(struct device *dev) { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); @@ -207,7 +212,22 @@ static int cxl_mem_probe(struct device *dev) out: cxl_device_unlock(&parent_port->dev); put_device(&parent_port->dev); - return rc; + + /* + * The kernel may be operating out of CXL memory on this device, + * there is no spec defined way to determine whether this device + * preserves contents over suspend, and there is no simple way + * to arrange for the suspend image to avoid CXL memory which + * would setup a circular dependency between PCI resume and save + * state restoration. + * + * TODO: support suspend when all the regions this device is + * hosting are locked and covered by the system address map, + * i.e. platform firmware owns restoring the HDM configuration + * that it locked. + */ + cxl_mem_active_inc(); + return devm_add_action_or_reset(dev, enable_suspend, NULL); } static struct cxl_driver cxl_mem_driver = { diff --git a/include/linux/pm.h b/include/linux/pm.h index e65b3ab28377..7911c4c9a7be 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -36,6 +36,15 @@ static inline void pm_vt_switch_unregister(struct device *dev) } #endif /* CONFIG_VT_CONSOLE_SLEEP */ +#ifdef CONFIG_CXL_SUSPEND +bool cxl_mem_active(void); +#else +static inline bool cxl_mem_active(void) +{ + return false; +} +#endif + /* * Device power management */ diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 938d5c78b421..20a66bf9f465 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -83,7 +83,7 @@ bool hibernation_available(void) { return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION) && - !secretmem_active(); + !secretmem_active() && !cxl_mem_active(); } /** diff --git a/kernel/power/main.c b/kernel/power/main.c index 7e646079fbeb..3e6be1c33e0b 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -127,7 +127,9 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, char *s = buf; suspend_state_t i; - for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) + for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) { + if (i >= PM_SUSPEND_MEM && cxl_mem_active()) + continue; if (mem_sleep_states[i]) { const char *label = mem_sleep_states[i]; @@ -136,6 +138,7 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, else s += sprintf(s, "%s ", label); } + } /* Convert the last space to a newline if needed. */ if (s != buf) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6fcdee7e87a5..827075944d28 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -236,7 +236,8 @@ EXPORT_SYMBOL_GPL(suspend_valid_only_mem); static bool sleep_state_supported(suspend_state_t state) { - return state == PM_SUSPEND_TO_IDLE || valid_state(state); + return state == PM_SUSPEND_TO_IDLE || + (valid_state(state) && !cxl_mem_active()); } static int platform_suspend_prepare(suspend_state_t state)
The CXL specification claims S3 support at a hardware level, but at a system software level there are some missing pieces. Section 9.4 (CXL 2.0) rightly claims that "CXL mem adapters may need aux power to retain memory context across S3", but there is no enumeration mechanism for the OS to determine if a given adapter has that support. Moreover the save state and resume image for the system may inadvertantly end up in a CXL device that needs to be restored before the save state is recoverable. I.e. a circular dependency that is not resolvable without a third party save-area. Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly allows for suspend, but requires unbinding all CXL memory devices before the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind flow is intended to also tear down all CXL memory regions associated with a given cxl_memdev. It is reasonable to assume that any device participating in a System RAM range published in the EFI memory map is covered by aux power and save-area outside the device itself. So this restriction can be minimized in the future once pre-existing region enumeration support arrives, and perhaps a spec update to clarify if the EFI memory map is sufficent for determining the range of devices managed by platform-firmware for S3 support. Per Rafael, if the CXL configuration prevents suspend then it should fail early before tasks are frozen, and mem_sleep should stop showing 'mem' as an option [1]. Effectively CXL augments the platform suspend ->valid() op since, for example, the ACPI ops are not aware of the CXL / PCI dependencies. Given the split role of platform firmware vs OS provisioned CXL memory it is up to the cxl_mem driver to determine if the CXL configuration has elements that platform firmware may not be prepared to restore. Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/Makefile | 2 +- drivers/cxl/Kconfig | 4 ++++ drivers/cxl/Makefile | 2 +- drivers/cxl/core/Makefile | 1 + drivers/cxl/core/memdev.c | 1 - drivers/cxl/core/suspend.c | 23 +++++++++++++++++++++++ drivers/cxl/cxlmem.h | 11 +++++++++++ drivers/cxl/mem.c | 22 +++++++++++++++++++++- include/linux/pm.h | 9 +++++++++ kernel/power/hibernate.c | 2 +- kernel/power/main.c | 5 ++++- kernel/power/suspend.c | 3 ++- 12 files changed, 78 insertions(+), 7 deletions(-) create mode 100644 drivers/cxl/core/suspend.c