Message ID | 20250305180225.1226-4-shiju.jose@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ACPI: Add support for ACPI RAS2 feature table | expand |
On Wed, 5 Mar 2025 18:02:24 +0000 <shiju.jose@huawei.com> wrote: > From: Shiju Jose <shiju.jose@huawei.com> > > Memory ACPI RAS2 auxiliary driver binds to the auxiliary device > add by the ACPI RAS2 table parser. > > Driver uses a PCC subspace for communicating with the ACPI compliant > platform. > > Device with ACPI RAS2 scrub feature registers with EDAC device driver, > which retrieves the scrub descriptor from EDAC scrub and exposes > the scrub control attributes for RAS2 scrub instance to userspace in > /sys/bus/edac/devices/acpi_ras_mem0/scrubX/. > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com> > diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst > index daab929cdba1..fc8dcbd13f91 100644 > --- a/Documentation/edac/scrub.rst > +++ b/Documentation/edac/scrub.rst > @@ -264,3 +264,76 @@ Sysfs files are documented in ... > +1.2.4. Program background scrubbing for RAS2 device to repeat in every 21600 seconds (quarter of a day). wrap to 80 chars. I think that is fine for titles in sphinx. > + > +# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration > + > +1.2.5. Start 'background scrubbing'. > + > +# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background > diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c > new file mode 100644 > index 000000000000..2f9317aa7b81 > --- /dev/null > +++ b/drivers/ras/acpi_ras2.c > @@ -0,0 +1,391 @@ > +struct acpi_ras2_ps_shared_mem { > + struct acpi_ras2_shmem common; > + struct acpi_ras2_patrol_scrub_param params; > +}; ... > +static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx) > +{ > + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = > + (void *)ras2_ctx->comm_addr; Would a container_of() be better here given the type cast is doing that with the assumption of it being first element of ps_shared_mem. Same in other places, so maybe a macro. > + int ret; > + > + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB; > + ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS; ... > + > +static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable) > +{ > + struct ras2_mem_ctx *ras2_ctx = drv_data; > + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = > + (void *)ras2_ctx->comm_addr; As above, maybe container_of appropriate as we have a definition of what we are casting it to that has the thing we are casting from as first element. > + bool running; > + int ret; > + ... > + > +static int ras2_probe(struct auxiliary_device *auxdev, > + const struct auxiliary_device_id *id) > +{ > + struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev); > + struct edac_dev_feature ras_features[RAS2_DEV_NUM_RAS_FEATURES]; Given we only have 1 RAS2 feature I'd be tempted to leave making this flexible for some future series that adds a second one. So maybe just have a single feature rather than array of 1. > + char scrub_name[RAS2_SCRUB_NAME_LEN]; > + int num_ras_features = 0; With change below this isn't needed. > + int ret; > + > + if (!ras2_is_patrol_scrub_support(ras2_ctx)) > + return -EOPNOTSUPP; > + > + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); > + if (ret) > + return ret; > + > + snprintf(scrub_name, sizeof(scrub_name), "acpi_ras_mem%d", > + ras2_ctx->id); > + > + ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB; > + ras_features[num_ras_features].instance = ras2_ctx->instance; > + ras_features[num_ras_features].scrub_ops = &ras2_scrub_ops; > + ras_features[num_ras_features].ctx = ras2_ctx; > + num_ras_features++; As above, can also just assume this is 1 becasue it always is. > + > + return edac_dev_register(&auxdev->dev, scrub_name, NULL, > + num_ras_features, ras_features); here pass in &ras_feature after making it not be an array. > +}
> +static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size) > +{ > + struct ras2_mem_ctx *ras2_ctx = drv_data; > + int ret; > + > + if (ras2_ctx->bg_scrub) > + return -EBUSY; > + > + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); > + if (ret) > + return ret; > + > + *size = ras2_ctx->size; > + > + return 0; > +} Calling ras2_update_patrol_scrub_params_cache here is problematic. Imagine: echo 0x1000 > size cat size echo 0x2000000000 > addr What happens here? What happens is the scrub range is not what you expect it to be. Once you cat size, you reset the size from what you initially set it to. I don't think that is what anyone will expect. It certainly caused us to stumble while testing. Regards, ~Daniel
>-----Original Message----- >From: Daniel Ferguson <danielf@os.amperecomputing.com> >Sent: 07 March 2025 21:52 >To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux- >acpi@vger.kernel.org; bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; >lenb@kernel.org; mchehab@kernel.org; leo.duran@amd.com; >Yazen.Ghannam@amd.com >Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; >Jonathan Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com; >alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; >david@redhat.com; Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux- >kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com; >Jon.Grimm@amd.com; dave.hansen@linux.intel.com; >naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com; >somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; >duenwen@google.com; gthelen@google.com; >wschwartz@amperecomputing.com; dferguson@amperecomputing.com; >wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei ><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto >Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com; >wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm ><linuxarm@huawei.com> >Subject: Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver > > >> +static int ras2_hw_scrub_read_size(struct device *dev, void >> +*drv_data, u64 *size) { >> + struct ras2_mem_ctx *ras2_ctx = drv_data; >> + int ret; >> + >> + if (ras2_ctx->bg_scrub) >> + return -EBUSY; >> + >> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); >> + if (ret) >> + return ret; >> + >> + *size = ras2_ctx->size; >> + >> + return 0; >> +} > >Calling ras2_update_patrol_scrub_params_cache here is problematic. > >Imagine: > echo 0x1000 > size > cat size > echo 0x2000000000 > addr > >What happens here? What happens is the scrub range is not what you expect it >to be. Once you cat size, you reset the size from what you initially set it to. >I don't think that is what anyone will expect. It certainly caused us to stumble >while testing. This is an expected behavior and this extra call was added here when changed using attribute 'addr' to start the on-demand scrub operation instead of previous separate attribute ' enable_on_demand' to start the on-demand scrub operation, according to Borislav's suggestion in v13. Please see the following comment in the ras2_hw_scrub_read_addr() fnction, "Userspace will get the status of the demand scrubbing through the address range read from the firmware. When the demand scrubbing is finished firmware must reset actual address range to 0. Otherwise userspace assumes demand scrubbing is in progress." Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the firmware. In my opinion, reading back the address range size set in the sysfs before actually writing the address range to the firmware and starting the on-demand scrub operation doesn't hold much significance? > >Regards, >~Daniel Thanks, Shiju
>-----Original Message----- >From: Shiju Jose >Sent: 10 March 2025 11:12 >To: 'Daniel Ferguson' <danielf@os.amperecomputing.com>; linux- >edac@vger.kernel.org; linux-acpi@vger.kernel.org; bp@alien8.de; >tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org; >mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com >Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; >Jonathan Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com; >alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; >david@redhat.com; Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux- >kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com; >Jon.Grimm@amd.com; dave.hansen@linux.intel.com; >naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com; >somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; >duenwen@google.com; gthelen@google.com; >wschwartz@amperecomputing.com; dferguson@amperecomputing.com; >wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei ><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto >Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com; >wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm ><linuxarm@huawei.com> >Subject: RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver > >>-----Original Message----- >>From: Daniel Ferguson <danielf@os.amperecomputing.com> >>Sent: 07 March 2025 21:52 >>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; >>linux- acpi@vger.kernel.org; bp@alien8.de; tony.luck@intel.com; >>rafael@kernel.org; lenb@kernel.org; mchehab@kernel.org; >>leo.duran@amd.com; Yazen.Ghannam@amd.com >>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; >>dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>; >>dave.jiang@intel.com; alison.schofield@intel.com; >>vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com; >>Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux- >>kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com; >>Jon.Grimm@amd.com; dave.hansen@linux.intel.com; >>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com; >>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; >>duenwen@google.com; gthelen@google.com; >wschwartz@amperecomputing.com; >>dferguson@amperecomputing.com; wbs@os.amperecomputing.com; >>nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) >><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>; >>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>; >>Linuxarm <linuxarm@huawei.com> >>Subject: Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver >> >> >>> +static int ras2_hw_scrub_read_size(struct device *dev, void >>> +*drv_data, u64 *size) { >>> + struct ras2_mem_ctx *ras2_ctx = drv_data; >>> + int ret; >>> + >>> + if (ras2_ctx->bg_scrub) >>> + return -EBUSY; >>> + >>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); >>> + if (ret) >>> + return ret; >>> + >>> + *size = ras2_ctx->size; >>> + >>> + return 0; >>> +} >> >>Calling ras2_update_patrol_scrub_params_cache here is problematic. >> >>Imagine: >> echo 0x1000 > size >> cat size >> echo 0x2000000000 > addr >> >>What happens here? What happens is the scrub range is not what you >>expect it to be. Once you cat size, you reset the size from what you initially set >it to. >>I don't think that is what anyone will expect. It certainly caused us >>to stumble while testing. > >This is an expected behavior and this extra call was added here when changed >using attribute 'addr' to start the on-demand scrub operation instead of >previous separate attribute ' enable_on_demand' to start the on-demand scrub >operation, according to Borislav's suggestion in v13. > > Please see the following comment in the ras2_hw_scrub_read_addr() fnction, >"Userspace will get the status of the demand scrubbing through the address >range read from the firmware. When the demand scrubbing is finished >firmware must reset actual address range to 0. Otherwise userspace assumes >demand scrubbing is in progress." > >Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range >of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the >firmware. > >In my opinion, reading back the address range size set in the sysfs before >actually writing the address range to the firmware and starting the on-demand >scrub operation doesn't hold much significance? After further discussion, I will add a fix for this case to return the 'size' which the user set in the sysfs until the scrubbing is started. Thanks, Shiju >
>>>> +static int ras2_hw_scrub_read_size(struct device *dev, void >>>> +*drv_data, u64 *size) { >>>> + struct ras2_mem_ctx *ras2_ctx = drv_data; >>>> + int ret; >>>> + >>>> + if (ras2_ctx->bg_scrub) >>>> + return -EBUSY; >>>> + >>>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + *size = ras2_ctx->size; >>>> + >>>> + return 0; >>>> +} >>> >>> Calling ras2_update_patrol_scrub_params_cache here is problematic. >>> >>> Imagine: >>> echo 0x1000 > size >>> cat size >>> echo 0x2000000000 > addr >>> >>> What happens here? What happens is the scrub range is not what you >>> expect it to be. Once you cat size, you reset the size from what you initially set >> it to. >>> I don't think that is what anyone will expect. It certainly caused us >>> to stumble while testing. >> >> This is an expected behavior and this extra call was added here when changed >> using attribute 'addr' to start the on-demand scrub operation instead of >> previous separate attribute ' enable_on_demand' to start the on-demand scrub >> operation, according to Borislav's suggestion in v13. >> >> Please see the following comment in the ras2_hw_scrub_read_addr() fnction, >> "Userspace will get the status of the demand scrubbing through the address >> range read from the firmware. When the demand scrubbing is finished >> firmware must reset actual address range to 0. Otherwise userspace assumes >> demand scrubbing is in progress." Why not just use Bit[0] in the Flags register of the Parameter Block Structure for PATROL_SCRUB? It seems having firmware reset the actual address range is extra complexity for something we already have a facility for. >> >> Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range >> of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the >> firmware. >> >> In my opinion, reading back the address range size set in the sysfs before >> actually writing the address range to the firmware and starting the on-demand >> scrub operation doesn't hold much significance? > > After further discussion, I will add a fix for this case to return the 'size' which the user set in the sysfs > until the scrubbing is started. I think fixing this will make the interface less confusing, but I also agree that it doesn't hold much significance technically. Regards, Daniel > > Thanks, > Shiju >> >
diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst index daab929cdba1..fc8dcbd13f91 100644 --- a/Documentation/edac/scrub.rst +++ b/Documentation/edac/scrub.rst @@ -264,3 +264,76 @@ Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-scrub` `Documentation/ABI/testing/sysfs-edac-ecs` + +Examples +-------- + +The usage takes the form shown in these examples: + +1. ACPI RAS2 + +1.1 On demand scrubbing for a specific memory region. + +1.1.1. Query what is device default/current scrub cycle setting. + + Applicable to both on-demand and background scrubbing. + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration + +36000 + +1.1.2 Query the range of device supported scrub cycle for a memory region. + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/min_cycle_duration + +3600 + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/max_cycle_duration + +86400 + +1.1.3. Program scrubbing for the memory region in RAS2 device to repeat every 43200 seconds (half a day). + +# echo 43200 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration + +1.1.4. Program address and size of the memory region to scrub + +Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished. + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr + +0 + +Write 'size' of the memory region to scrub. + +# echo 0x300000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/size + +Write 'addr' starts demand scrubbing, please make sure other attributes are set prior to that. + +# echo 0x200000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr + +Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished. + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr + +0x200000 + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr + +0 + +1.2 Background scrubbing the entire memory + +1.2.3 Query the status of background scrubbing. + +# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background + +0 + +1.2.4. Program background scrubbing for RAS2 device to repeat in every 21600 seconds (quarter of a day). + +# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration + +1.2.5. Start 'background scrubbing'. + +# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig index fc4f4bb94a4c..a88002f1f462 100644 --- a/drivers/ras/Kconfig +++ b/drivers/ras/Kconfig @@ -46,4 +46,15 @@ config RAS_FMPM Memory will be retired during boot time and run time depending on platform-specific policies. +config MEM_ACPI_RAS2 + tristate "Memory ACPI RAS2 driver" + depends on ACPI_RAS2 + depends on EDAC + depends on EDAC_SCRUB + help + The driver binds to the platform device added by the ACPI RAS2 + table parser. Use a PCC channel subspace for communicating with + the ACPI compliant platform to provide control of memory scrub + parameters to the user via the EDAC scrub. + endif diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile index 11f95d59d397..a0e6e903d6b0 100644 --- a/drivers/ras/Makefile +++ b/drivers/ras/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_RAS) += ras.o obj-$(CONFIG_DEBUG_FS) += debugfs.o obj-$(CONFIG_RAS_CEC) += cec.o +obj-$(CONFIG_MEM_ACPI_RAS2) += acpi_ras2.o obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o obj-y += amd/atl/ diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c new file mode 100644 index 000000000000..2f9317aa7b81 --- /dev/null +++ b/drivers/ras/acpi_ras2.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * ACPI RAS2 memory driver + * + * Copyright (c) 2024-2025 HiSilicon Limited. + * + */ + +#define pr_fmt(fmt) "ACPI RAS2 MEMORY: " fmt + +#include <linux/bitfield.h> +#include <linux/edac.h> +#include <linux/platform_device.h> +#include <acpi/ras2.h> + +#define RAS2_DEV_NUM_RAS_FEATURES 1 + +#define RAS2_SUPPORT_HW_PARTOL_SCRUB BIT(0) +#define RAS2_TYPE_PATROL_SCRUB 0x0000 + +#define RAS2_GET_PATROL_PARAMETERS 0x01 +#define RAS2_START_PATROL_SCRUBBER 0x02 +#define RAS2_STOP_PATROL_SCRUBBER 0x03 + +/* + * RAS2 patrol scrub + */ +#define RAS2_PS_SC_HRS_IN_MASK GENMASK(15, 8) +#define RAS2_PS_EN_BACKGROUND BIT(0) +#define RAS2_PS_SC_HRS_OUT_MASK GENMASK(7, 0) +#define RAS2_PS_MIN_SC_HRS_OUT_MASK GENMASK(15, 8) +#define RAS2_PS_MAX_SC_HRS_OUT_MASK GENMASK(23, 16) +#define RAS2_PS_FLAG_SCRUB_RUNNING BIT(0) + +#define RAS2_SCRUB_NAME_LEN 128 +#define RAS2_HOUR_IN_SECS 3600 + +struct acpi_ras2_ps_shared_mem { + struct acpi_ras2_shmem common; + struct acpi_ras2_patrol_scrub_param params; +}; + +static int ras2_is_patrol_scrub_support(struct ras2_mem_ctx *ras2_ctx) +{ + struct acpi_ras2_shmem __iomem *common = (void *)ras2_ctx->comm_addr; + + guard(mutex)(&ras2_ctx->lock); + common->set_caps[0] = 0; + + return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB; +} + +static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx) +{ + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = + (void *)ras2_ctx->comm_addr; + int ret; + + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB; + ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS; + + ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC); + if (ret) { + dev_err(ras2_ctx->dev, "failed to read parameters\n"); + return ret; + } + + ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PS_MIN_SC_HRS_OUT_MASK, + ps_sm->params.scrub_params_out); + ras2_ctx->max_scrub_cycle = FIELD_GET(RAS2_PS_MAX_SC_HRS_OUT_MASK, + ps_sm->params.scrub_params_out); + if (!ras2_ctx->bg_scrub) { + ras2_ctx->base = ps_sm->params.actl_addr_range[0]; + ras2_ctx->size = ps_sm->params.actl_addr_range[1]; + } + + ras2_ctx->scrub_cycle_hrs = FIELD_GET(RAS2_PS_SC_HRS_OUT_MASK, + ps_sm->params.scrub_params_out); + + return 0; +} + +/* Context - lock must be held */ +static int ras2_get_patrol_scrub_running(struct ras2_mem_ctx *ras2_ctx, + bool *running) +{ + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = + (void *)ras2_ctx->comm_addr; + int ret; + + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB; + ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS; + + ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC); + if (ret) { + dev_err(ras2_ctx->dev, "failed to read parameters\n"); + return ret; + } + + *running = ps_sm->params.flags & RAS2_PS_FLAG_SCRUB_RUNNING; + + return 0; +} + +static int ras2_hw_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data, + u32 *min) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + + *min = ras2_ctx->min_scrub_cycle * RAS2_HOUR_IN_SECS; + + return 0; +} + +static int ras2_hw_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data, + u32 *max) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + + *max = ras2_ctx->max_scrub_cycle * RAS2_HOUR_IN_SECS; + + return 0; +} + +static int ras2_hw_scrub_cycle_read(struct device *dev, void *drv_data, + u32 *scrub_cycle_secs) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + + *scrub_cycle_secs = ras2_ctx->scrub_cycle_hrs * RAS2_HOUR_IN_SECS; + + return 0; +} + +static int ras2_hw_scrub_cycle_write(struct device *dev, void *drv_data, + u32 scrub_cycle_secs) +{ + u8 scrub_cycle_hrs = scrub_cycle_secs / RAS2_HOUR_IN_SECS; + struct ras2_mem_ctx *ras2_ctx = drv_data; + bool running; + int ret; + + guard(mutex)(&ras2_ctx->lock); + ret = ras2_get_patrol_scrub_running(ras2_ctx, &running); + if (ret) + return ret; + + if (running) + return -EBUSY; + + if (scrub_cycle_hrs < ras2_ctx->min_scrub_cycle || + scrub_cycle_hrs > ras2_ctx->max_scrub_cycle) + return -EINVAL; + + ras2_ctx->scrub_cycle_hrs = scrub_cycle_hrs; + + return 0; +} + +static int ras2_hw_scrub_read_addr(struct device *dev, void *drv_data, u64 *base) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + int ret; + + /* + * When BG scrubbing is enabled the actual address range is not valid. + * Return -EBUSY now unless find out a method to retrieve actual full PA range. + */ + if (ras2_ctx->bg_scrub) + return -EBUSY; + + /* + * When demand scrubbing is finished firmware must reset actual + * address range to 0. Otherwise userspace assumes demand scrubbing + * is in progress. + */ + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); + if (ret) + return ret; + + *base = ras2_ctx->base; + + return 0; +} + +static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + int ret; + + if (ras2_ctx->bg_scrub) + return -EBUSY; + + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); + if (ret) + return ret; + + *size = ras2_ctx->size; + + return 0; +} + +static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = + (void *)ras2_ctx->comm_addr; + bool running; + int ret; + + guard(mutex)(&ras2_ctx->lock); + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB; + if (ras2_ctx->bg_scrub) + return -EBUSY; + + if (!base || !ras2_ctx->size) { + dev_warn(ras2_ctx->dev, + "%s: Invalid address range, base=0x%llx " + "size=0x%llx\n", __func__, + base, ras2_ctx->size); + return -ERANGE; + } + + ret = ras2_get_patrol_scrub_running(ras2_ctx, &running); + if (ret) + return ret; + + if (running) + return -EBUSY; + + ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK; + ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK, + ras2_ctx->scrub_cycle_hrs); + ps_sm->params.req_addr_range[0] = base; + ps_sm->params.req_addr_range[1] = ras2_ctx->size; + ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND; + ps_sm->params.cmd = RAS2_START_PATROL_SCRUBBER; + + ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC); + if (ret) { + dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n"); + return ret; + } + + return ras2_update_patrol_scrub_params_cache(ras2_ctx); +} + +static int ras2_hw_scrub_write_size(struct device *dev, void *drv_data, u64 size) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + bool running; + int ret; + + guard(mutex)(&ras2_ctx->lock); + ret = ras2_get_patrol_scrub_running(ras2_ctx, &running); + if (ret) + return ret; + + if (running) + return -EBUSY; + + if (!size) { + dev_warn(dev, "%s: Invalid address range size=0x%llx\n", + __func__, size); + return -EINVAL; + } + + ras2_ctx->size = size; + + return 0; +} + +static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + struct acpi_ras2_ps_shared_mem __iomem *ps_sm = + (void *)ras2_ctx->comm_addr; + bool running; + int ret; + + guard(mutex)(&ras2_ctx->lock); + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB; + ret = ras2_get_patrol_scrub_running(ras2_ctx, &running); + if (ret) + return ret; + + if (enable) { + if (ras2_ctx->bg_scrub || running) + return -EBUSY; + ps_sm->params.req_addr_range[0] = 0; + ps_sm->params.req_addr_range[1] = 0; + ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK; + ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK, + ras2_ctx->scrub_cycle_hrs); + ps_sm->params.cmd = RAS2_START_PATROL_SCRUBBER; + } else { + if (!ras2_ctx->bg_scrub) + return -EPERM; + ps_sm->params.cmd = RAS2_STOP_PATROL_SCRUBBER; + } + + ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND; + ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_EN_BACKGROUND, + enable); + ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC); + if (ret) { + dev_err(ras2_ctx->dev, "Failed to %s background scrubbing\n", + str_enable_disable(enable)); + return ret; + } + + if (enable) { + ras2_ctx->bg_scrub = true; + /* Update the cache to account for rounding of supplied parameters and similar */ + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); + } else { + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); + ras2_ctx->bg_scrub = false; + } + + return ret; +} + +static int ras2_hw_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled) +{ + struct ras2_mem_ctx *ras2_ctx = drv_data; + + *enabled = ras2_ctx->bg_scrub; + + return 0; +} + +static const struct edac_scrub_ops ras2_scrub_ops = { + .read_addr = ras2_hw_scrub_read_addr, + .read_size = ras2_hw_scrub_read_size, + .write_addr = ras2_hw_scrub_write_addr, + .write_size = ras2_hw_scrub_write_size, + .get_enabled_bg = ras2_hw_scrub_get_enabled_bg, + .set_enabled_bg = ras2_hw_scrub_set_enabled_bg, + .get_min_cycle = ras2_hw_scrub_read_min_scrub_cycle, + .get_max_cycle = ras2_hw_scrub_read_max_scrub_cycle, + .get_cycle_duration = ras2_hw_scrub_cycle_read, + .set_cycle_duration = ras2_hw_scrub_cycle_write, +}; + +static int ras2_probe(struct auxiliary_device *auxdev, + const struct auxiliary_device_id *id) +{ + struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev); + struct edac_dev_feature ras_features[RAS2_DEV_NUM_RAS_FEATURES]; + char scrub_name[RAS2_SCRUB_NAME_LEN]; + int num_ras_features = 0; + int ret; + + if (!ras2_is_patrol_scrub_support(ras2_ctx)) + return -EOPNOTSUPP; + + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx); + if (ret) + return ret; + + snprintf(scrub_name, sizeof(scrub_name), "acpi_ras_mem%d", + ras2_ctx->id); + + ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB; + ras_features[num_ras_features].instance = ras2_ctx->instance; + ras_features[num_ras_features].scrub_ops = &ras2_scrub_ops; + ras_features[num_ras_features].ctx = ras2_ctx; + num_ras_features++; + + return edac_dev_register(&auxdev->dev, scrub_name, NULL, + num_ras_features, ras_features); +} + +static const struct auxiliary_device_id ras2_mem_dev_id_table[] = { + { .name = RAS2_AUX_DEV_NAME "." RAS2_MEM_DEV_ID_NAME, }, + { } +}; + +MODULE_DEVICE_TABLE(auxiliary, ras2_mem_dev_id_table); + +static struct auxiliary_driver ras2_mem_driver = { + .name = RAS2_MEM_DEV_ID_NAME, + .probe = ras2_probe, + .id_table = ras2_mem_dev_id_table, +}; +module_auxiliary_driver(ras2_mem_driver); + +MODULE_IMPORT_NS("ACPI_RAS2"); +MODULE_DESCRIPTION("ACPI RAS2 memory driver"); +MODULE_LICENSE("GPL"); diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h index 5b27c1f30096..c9a6b63745dc 100644 --- a/include/acpi/ras2.h +++ b/include/acpi/ras2.h @@ -31,7 +31,13 @@ struct ras2_mem_ctx { struct device *dev; struct acpi_ras2_shmem __iomem *comm_addr; void *pcc_subspace; + u64 base, size; int id; + u8 instance; + u8 scrub_cycle_hrs; + u8 min_scrub_cycle; + u8 max_scrub_cycle; + bool bg_scrub; }; #ifdef CONFIG_ACPI_RAS2