From patchwork Fri Oct 25 17:13:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiju Jose X-Patchwork-Id: 13851171 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1D782036E4; Fri, 25 Oct 2024 17:14:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729876476; cv=none; b=po0KUCH9q/VrTcOVuZvOtm57NF4zgaHDfMpnHLolKkPJSM8Kd2vc94slp+prh8FsJnD5d8taq2uCpBZIuidYhhi+EeiMvqOjCCtZ6kAjmt9DtFJnBG8KKQTLA7659qR36fYXwB6uFDRENH3V6+NABdCxbN7xNFrH/hXUNvNF7nA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729876476; c=relaxed/simple; bh=UgOuqCaacK7H6FX4OQA2hXdQboRpa0JQZ4LqJn1zeSs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oCP5O6cbeQ9XsGwn3AU6QmfXxjeZ2eO+32wzgdNikbtJh+AzUBdjbw+baKvBE8ooz8aWZgVG0zUa7LX7+8FeTArGXtGBEdm37/JQGJFNw/K+RcWB8jDx7GRvJq9WBOjPHTGMME68fJS8EGwu3eZNFQ543kMtYAMR1+wuFUZlcD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XZq8h4MNQz6K6wH; Sat, 26 Oct 2024 01:12:20 +0800 (CST) Received: from frapeml500007.china.huawei.com (unknown [7.182.85.172]) by mail.maildlp.com (Postfix) with ESMTPS id 935DF140B38; Sat, 26 Oct 2024 01:14:30 +0800 (CST) Received: from P_UKIT01-A7bmah.china.huawei.com (10.48.151.104) by frapeml500007.china.huawei.com (7.182.85.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 25 Oct 2024 19:14:28 +0200 From: To: , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v14 03/14] EDAC: Add ECS control feature Date: Fri, 25 Oct 2024 18:13:44 +0100 Message-ID: <20241025171356.1377-4-shiju.jose@huawei.com> X-Mailer: git-send-email 2.43.0.windows.1 In-Reply-To: <20241025171356.1377-1-shiju.jose@huawei.com> References: <20241025171356.1377-1-shiju.jose@huawei.com> Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: lhrpeml500002.china.huawei.com (7.191.160.78) To frapeml500007.china.huawei.com (7.182.85.172) From: Shiju Jose Add EDAC ECS (Error Check Scrub) control to manage a memory device's ECS feature. The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts. The DDR5 device contains number of memory media FRUs per device. The DDR5 ECS feature and thus the ECS control driver supports configuring the ECS parameters per FRU. Memory devices support the ECS feature register with the EDAC device driver, which retrieves the ECS descriptor from the EDAC ECS driver. This driver exposes sysfs ECS control attributes to userspace via /sys/bus/edac/devices//ecs_fruX/. The common sysfs ECS control interface abstracts the control of an arbitrary ECS functionality to a common set of functions. Support for the ECS feature is added separately because the control attributes of the DDR5 ECS feature differ from those of the scrub feature. The sysfs ECS attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the necessary operations to the EDAC RAS feature driver during registration. Co-developed-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose --- Documentation/ABI/testing/sysfs-edac-ecs | 76 +++++++ drivers/edac/Makefile | 2 +- drivers/edac/ecs.c | 240 +++++++++++++++++++++++ drivers/edac/edac_device.c | 15 ++ include/linux/edac.h | 51 ++++- 5 files changed, 381 insertions(+), 3 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-edac-ecs create mode 100755 drivers/edac/ecs.c diff --git a/Documentation/ABI/testing/sysfs-edac-ecs b/Documentation/ABI/testing/sysfs-edac-ecs new file mode 100644 index 000000000000..b94cc09f9222 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-edac-ecs @@ -0,0 +1,76 @@ +What: /sys/bus/edac/devices//ecs_fruX +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + The sysfs EDAC bus devices //ecs_fruX subdirectory + belongs to the memory media ECS (Error Check Scrub) control + feature, where directory corresponds to a device + registered with the EDAC device driver for the ECS feature. + /ecs_fruX belongs to the media FRUs (Field Replaceable Unit) + under the memory device. + The sysfs ECS attr nodes are only present if the client + driver has implemented the corresponding attr callback + function and passed in ops to the EDAC RAS feature driver + during registration. + +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RW) The log entry type of how the DDR5 ECS log is reported. + 00b - per DRAM. + 01b - per memory media FRU. + +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type_per_dram +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RO) True if current log entry type is per DRAM. + +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type_per_memory_media +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RO) True if current log entry type is per memory media FRU. + +What: /sys/bus/edac/devices//ecs_fruX/mode +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RW) The mode of how the DDR5 ECS counts the errors. + 0 - ECS counts rows with errors. + 1 - ECS counts codewords with errors. + +What: /sys/bus/edac/devices//ecs_fruX/mode_counts_rows +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RO) True if current mode is ECS counts rows with errors. + +What: /sys/bus/edac/devices//ecs_fruX/mode_counts_codewords +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RO) True if current mode is ECS counts codewords with errors. + +What: /sys/bus/edac/devices//ecs_fruX/reset +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (WO) ECS reset ECC counter. + 1 - reset ECC counter to the default value. + +What: /sys/bus/edac/devices//ecs_fruX/threshold +Date: Jan 2025 +KernelVersion: 6.13 +Contact: linux-edac@vger.kernel.org +Description: + (RW) ECS threshold count per gigabits of memory cells. diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index 188501e676c7..b24c2c112d9c 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o edac_core-y += edac_module.o edac_device_sysfs.o wq.o -edac_core-y += scrub.o +edac_core-y += scrub.o ecs.o edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c new file mode 100755 index 000000000000..a2b64d7bf6b6 --- /dev/null +++ b/drivers/edac/ecs.c @@ -0,0 +1,240 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Generic ECS driver in order to support control the on die + * error check scrub (e.g. DDR5 ECS). The common sysfs ECS + * interface abstracts the control of an arbitrary ECS + * functionality to a common set of functions. + * + * Copyright (c) 2024 HiSilicon Limited. + */ + +#define pr_fmt(fmt) "EDAC ECS: " fmt + +#include + +#define EDAC_ECS_FRU_NAME "ecs_fru" + +enum edac_ecs_attributes { + ECS_LOG_ENTRY_TYPE, + ECS_LOG_ENTRY_TYPE_PER_DRAM, + ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA, + ECS_MODE, + ECS_MODE_COUNTS_ROWS, + ECS_MODE_COUNTS_CODEWORDS, + ECS_RESET, + ECS_THRESHOLD, + ECS_MAX_ATTRS +}; + +struct edac_ecs_dev_attr { + struct device_attribute dev_attr; + int fru_id; +}; + +struct edac_ecs_fru_context { + char name[EDAC_FEAT_NAME_LEN]; + struct edac_ecs_dev_attr ecs_dev_attr[ECS_MAX_ATTRS]; + struct attribute *ecs_attrs[ECS_MAX_ATTRS + 1]; + struct attribute_group group; +}; + +struct edac_ecs_context { + u16 num_media_frus; + struct edac_ecs_fru_context *fru_ctxs; +}; + +#define TO_ECS_DEV_ATTR(_dev_attr) \ + container_of(_dev_attr, struct edac_ecs_dev_attr, dev_attr) + +#define EDAC_ECS_ATTR_SHOW(attrib, cb, type, format) \ +static ssize_t attrib##_show(struct device *ras_feat_dev, \ + struct device_attribute *attr, char *buf) \ +{ \ + struct edac_ecs_dev_attr *ecs_dev_attr = TO_ECS_DEV_ATTR(attr); \ + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); \ + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; \ + type data; \ + int ret; \ + \ + ret = ops->cb(ras_feat_dev->parent, ctx->ecs.private, \ + ecs_dev_attr->fru_id, &data); \ + if (ret) \ + return ret; \ + \ + return sysfs_emit(buf, format, data); \ +} + +EDAC_ECS_ATTR_SHOW(log_entry_type, get_log_entry_type, u32, "%u\n") +EDAC_ECS_ATTR_SHOW(log_entry_type_per_dram, get_log_entry_type_per_dram, u32, "%u\n") +EDAC_ECS_ATTR_SHOW(log_entry_type_per_memory_media, get_log_entry_type_per_memory_media, + u32, "%u\n") +EDAC_ECS_ATTR_SHOW(mode, get_mode, u32, "%u\n") +EDAC_ECS_ATTR_SHOW(mode_counts_rows, get_mode_counts_rows, u32, "%u\n") +EDAC_ECS_ATTR_SHOW(mode_counts_codewords, get_mode_counts_codewords, u32, "%u\n") +EDAC_ECS_ATTR_SHOW(threshold, get_threshold, u32, "%u\n") + +#define EDAC_ECS_ATTR_STORE(attrib, cb, type, conv_func) \ +static ssize_t attrib##_store(struct device *ras_feat_dev, \ + struct device_attribute *attr, \ + const char *buf, size_t len) \ +{ \ + struct edac_ecs_dev_attr *ecs_dev_attr = TO_ECS_DEV_ATTR(attr); \ + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); \ + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; \ + type data; \ + int ret; \ + \ + ret = conv_func(buf, 0, &data); \ + if (ret < 0) \ + return ret; \ + \ + ret = ops->cb(ras_feat_dev->parent, ctx->ecs.private, \ + ecs_dev_attr->fru_id, data); \ + if (ret) \ + return ret; \ + \ + return len; \ +} + +EDAC_ECS_ATTR_STORE(log_entry_type, set_log_entry_type, unsigned long, kstrtoul) +EDAC_ECS_ATTR_STORE(mode, set_mode, unsigned long, kstrtoul) +EDAC_ECS_ATTR_STORE(reset, reset, unsigned long, kstrtoul) +EDAC_ECS_ATTR_STORE(threshold, set_threshold, unsigned long, kstrtoul) + +static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id) +{ + struct device *ras_feat_dev = kobj_to_dev(kobj); + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; + + switch (attr_id) { + case ECS_LOG_ENTRY_TYPE: + if (ops->get_log_entry_type) { + if (ops->set_log_entry_type) + return a->mode; + else + return 0444; + } + break; + case ECS_LOG_ENTRY_TYPE_PER_DRAM: + if (ops->get_log_entry_type_per_dram) + return a->mode; + break; + case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA: + if (ops->get_log_entry_type_per_memory_media) + return a->mode; + break; + case ECS_MODE: + if (ops->get_mode) { + if (ops->set_mode) + return a->mode; + else + return 0444; + } + break; + case ECS_MODE_COUNTS_ROWS: + if (ops->get_mode_counts_rows) + return a->mode; + break; + case ECS_MODE_COUNTS_CODEWORDS: + if (ops->get_mode_counts_codewords) + return a->mode; + break; + case ECS_RESET: + if (ops->reset) + return a->mode; + break; + case ECS_THRESHOLD: + if (ops->get_threshold) { + if (ops->set_threshold) + return a->mode; + else + return 0444; + } + break; + default: + break; + } + + return 0; +} + +#define EDAC_ECS_ATTR_RO(_name, _fru_id) \ + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \ + .fru_id = _fru_id }) + +#define EDAC_ECS_ATTR_WO(_name, _fru_id) \ + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \ + .fru_id = _fru_id }) + +#define EDAC_ECS_ATTR_RW(_name, _fru_id) \ + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \ + .fru_id = _fru_id }) + +static int ecs_create_desc(struct device *ecs_dev, + const struct attribute_group **attr_groups, u16 num_media_frus) +{ + struct edac_ecs_context *ecs_ctx; + u32 fru; + + ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL); + if (!ecs_ctx) + return -ENOMEM; + + ecs_ctx->num_media_frus = num_media_frus; + ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus, + sizeof(*ecs_ctx->fru_ctxs), + GFP_KERNEL); + if (!ecs_ctx->fru_ctxs) + return -ENOMEM; + + for (fru = 0; fru < num_media_frus; fru++) { + struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru]; + struct attribute_group *group = &fru_ctx->group; + int i; + + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru); + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_DRAM] = + EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru); + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] = + EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru); + fru_ctx->ecs_dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru); + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] = + EDAC_ECS_ATTR_RO(mode_counts_rows, fru); + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] = + EDAC_ECS_ATTR_RO(mode_counts_codewords, fru); + fru_ctx->ecs_dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru); + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru); + for (i = 0; i < ECS_MAX_ATTRS; i++) + fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr; + + sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru); + group->name = fru_ctx->name; + group->attrs = fru_ctx->ecs_attrs; + group->is_visible = ecs_attr_visible; + + attr_groups[fru] = group; + } + + return 0; +} + +/** + * edac_ecs_get_desc - get EDAC ECS descriptors + * @ecs_dev: client device, supports ECS feature + * @attr_groups: pointer to attribute group container + * @num_media_frus: number of media FRUs in the device + * + * Return: + * * %0 - Success. + * * %-EINVAL - Invalid parameters passed. + * * %-ENOMEM - Dynamic memory allocation failed. + */ +int edac_ecs_get_desc(struct device *ecs_dev, + const struct attribute_group **attr_groups, u16 num_media_frus) +{ + if (!ecs_dev || !attr_groups || !num_media_frus) + return -EINVAL; + + return ecs_create_desc(ecs_dev, attr_groups, num_media_frus); +} diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c index 91552271b34a..5fc3ec7f25eb 100644 --- a/drivers/edac/edac_device.c +++ b/drivers/edac/edac_device.c @@ -626,6 +626,9 @@ int edac_dev_register(struct device *parent, char *name, attr_gcnt++; scrub_cnt++; break; + case RAS_FEAT_ECS: + attr_gcnt += ras_features[feat].ecs_info.num_media_frus; + break; default: return -EINVAL; } @@ -667,6 +670,18 @@ int edac_dev_register(struct device *parent, char *name, scrub_inst++; attr_gcnt++; break; + case RAS_FEAT_ECS: + if (!ras_features->ecs_ops) + goto data_mem_free; + dev_data = &ctx->ecs; + dev_data->ecs_ops = ras_features->ecs_ops; + dev_data->private = ras_features->ctx; + ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt], + ras_features->ecs_info.num_media_frus); + if (ret) + goto data_mem_free; + attr_gcnt += ras_features->ecs_info.num_media_frus; + break; default: ret = -EINVAL; goto data_mem_free; diff --git a/include/linux/edac.h b/include/linux/edac.h index 3620a09c0476..077d3c252e99 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -669,6 +669,7 @@ static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci, /* RAS feature type */ enum edac_dev_feat { RAS_FEAT_SCRUB, + RAS_FEAT_ECS, RAS_FEAT_MAX }; @@ -702,9 +703,50 @@ int edac_scrub_get_desc(struct device *scrub_dev, const struct attribute_group **attr_groups, u8 instance); +/** + * struct edac_ecs_ops - ECS device operations (all elements optional) + * @get_log_entry_type: read the log entry type value. + * @set_log_entry_type: set the log entry type value. + * @get_log_entry_type_per_dram: read the log entry type per dram value. + * @get_log_entry_type_memory_media: read the log entry type per memory media value. + * @get_mode: read the mode value. + * @set_mode: set the mode value. + * @get_mode_counts_rows: read the mode counts rows value. + * @get_mode_counts_codewords: read the mode counts codewords value. + * @reset: reset the ECS counter. + * @get_threshold: read the threshold count per gigabits of memory cells. + * @set_threshold: set the threshold count per gigabits of memory cells. + */ +struct edac_ecs_ops { + int (*get_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 *val); + int (*set_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 val); + int (*get_log_entry_type_per_dram)(struct device *dev, void *drv_data, + int fru_id, u32 *val); + int (*get_log_entry_type_per_memory_media)(struct device *dev, void *drv_data, + int fru_id, u32 *val); + int (*get_mode)(struct device *dev, void *drv_data, int fru_id, u32 *val); + int (*set_mode)(struct device *dev, void *drv_data, int fru_id, u32 val); + int (*get_mode_counts_rows)(struct device *dev, void *drv_data, int fru_id, u32 *val); + int (*get_mode_counts_codewords)(struct device *dev, void *drv_data, int fru_id, u32 *val); + int (*reset)(struct device *dev, void *drv_data, int fru_id, u32 val); + int (*get_threshold)(struct device *dev, void *drv_data, int fru_id, u32 *threshold); + int (*set_threshold)(struct device *dev, void *drv_data, int fru_id, u32 threshold); +}; + +struct edac_ecs_ex_info { + u16 num_media_frus; +}; + +int edac_ecs_get_desc(struct device *ecs_dev, + const struct attribute_group **attr_groups, + u16 num_media_frus); + /* EDAC device feature information structure */ struct edac_dev_data { - const struct edac_scrub_ops *scrub_ops; + union { + const struct edac_scrub_ops *scrub_ops; + const struct edac_ecs_ops *ecs_ops; + }; u8 instance; void *private; }; @@ -713,13 +755,18 @@ struct edac_dev_feat_ctx { struct device dev; void *private; struct edac_dev_data *scrub; + struct edac_dev_data ecs; }; struct edac_dev_feature { enum edac_dev_feat ft_type; u8 instance; - const struct edac_scrub_ops *scrub_ops; + union { + const struct edac_scrub_ops *scrub_ops; + const struct edac_ecs_ops *ecs_ops; + }; void *ctx; + struct edac_ecs_ex_info ecs_info; }; int edac_dev_register(struct device *parent, char *dev_name,