diff mbox series

RAS/CEC: Move non-debug attributes out of debugfs

Message ID 20230323202158.37937-1-kyle.meyer@hpe.com (mailing list archive)
State New, archived
Headers show
Series RAS/CEC: Move non-debug attributes out of debugfs | expand

Commit Message

Kyle Meyer March 23, 2023, 8:22 p.m. UTC
From: Kyle Meyer <kyle.meyer@hpe.com>

When kernel lockdown is in effect, use of debugfs is not permitted. Move
decay_interval and action_threshold out of debugfs, from debugfs/ras/cec
to sysfs/system/devices/machinecheck/cec.

Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
---
 arch/x86/include/asm/mce.h     |   1 +
 arch/x86/kernel/cpu/mce/core.c |   3 +-
 arch/x86/ras/Kconfig           |   4 +-
 drivers/ras/cec.c              | 141 ++++++++++++++++++++++-----------
 4 files changed, 101 insertions(+), 48 deletions(-)

Comments

Dave Hansen March 23, 2023, 8:29 p.m. UTC | #1
On 3/23/23 13:22, kyle-meyer wrote:
> From: Kyle Meyer <kyle.meyer@hpe.com>
> 
> When kernel lockdown is in effect, use of debugfs is not permitted. Move
> decay_interval and action_threshold out of debugfs, from debugfs/ras/cec
> to sysfs/system/devices/machinecheck/cec.

You forgot to attach the patch that you wrote that updates
Documentation/ABI/. ;)

Also, why *should* these be part of the stable sysfs ABI?  What app is
using them?  Why does it need them?  Why these two and only these two?
What's left in debugfs?
Kyle Meyer March 23, 2023, 9:52 p.m. UTC | #2
> You forgot to attach the patch that you wrote that updates
> Documentation/ABI/. ;)

Thanks, I will include the documentation with a second version of the
patch.

> Also, why *should* these be part of the stable sysfs ABI?  What app is
> using them?  Why does it need them?

We have system scripts that adjust decay_interval and action_threshold.
They can't access those attributes when lockdown is enabled. If there is a
more appropriate place for those attributes, please let me know.

> Why these two and only these two? What's left in debugfs?

The other attributes (pfn and array) are used to test CEC. They are only
created when RAS_CEC_DEBUG is enabled. 

Thanks,
Kyle Meyer
Dave Hansen March 23, 2023, 10:01 p.m. UTC | #3
On 3/23/23 14:52, Meyer, Kyle wrote:
>> Also, why *should* these be part of the stable sysfs ABI?  What app is
>> using them?  Why does it need them?
> We have system scripts that adjust decay_interval and action_threshold.
> They can't access those attributes when lockdown is enabled. If there is a
> more appropriate place for those attributes, please let me know.

Thanks for the info.  That helps a bit.  But, I'd also appreciate if you
could expand on this a little more.  What "system scripts" are these?
Who is using them?  What are they trying to accomplish?

We can try to find the best home for these attributes with that info in
hand, if it's not sysfs.

>> Why these two and only these two? What's left in debugfs?
> The other attributes (pfn and array) are used to test CEC. They are only
> created when RAS_CEC_DEBUG is enabled.

Oh, that's good info too.  Can you please include that in some form in
your new changelog?
Borislav Petkov March 24, 2023, 12:49 a.m. UTC | #4
On Thu, Mar 23, 2023 at 03:22:01PM -0500, kyle-meyer wrote:
> From: Kyle Meyer <kyle.meyer@hpe.com>
> 
> When kernel lockdown is in effect, use of debugfs is not permitted. Move
> decay_interval and action_threshold out of debugfs, from debugfs/ras/cec
> to sysfs/system/devices/machinecheck/cec.

All those knobs are in debugfs because we wanted to discuss the proper
interface design first and only then cast them in stone. I guess that
has not happened yet.

What you're doing is certainly not what we had in mind so just because
some lockdown policy says so, is not good enough.

> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2eec60f50057..1a3eaa501ae4 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -2376,10 +2376,11 @@ static void mce_enable_ce(void *all)
>  		__mcheck_cpu_init_timer();
>  }
>  
> -static struct bus_type mce_subsys = {
> +struct bus_type mce_subsys = {
>  	.name		= "machinecheck",
>  	.dev_name	= "machinecheck",
>  };
> +EXPORT_SYMBOL_GPL(mce_subsys);

Nope, this is not going to happen.

Besides, that error collector is used on x86 but it is generic enough so
that it can be used by other arches. So if anything, it should not be
exposed in the x86's "machinecheck" hierarchy but somewhere generic.

And until that proper interface has been hammered out, you can just as
well disable it in your lockdown configs.

Thx.
Yazen Ghannam March 24, 2023, 8:50 p.m. UTC | #5
On Fri, Mar 24, 2023 at 01:49:19AM +0100, Borislav Petkov wrote:
...
> 
> Besides, that error collector is used on x86 but it is generic enough so
> that it can be used by other arches. So if anything, it should not be
> exposed in the x86's "machinecheck" hierarchy but somewhere generic.
>

How about "/sys/ras" to collect global RAS interfaces? Maybe it can link to
other directories like "/sys/devices/*" for hardware things like MCA, and
"/sys/kernel/*" for kernel things like CEC.

Thanks,
Yazen
Borislav Petkov March 27, 2023, 10:07 p.m. UTC | #6
On Fri, Mar 24, 2023 at 04:50:31PM -0400, Yazen Ghannam wrote:
> How about "/sys/ras" to collect global RAS interfaces?

Looking a bit at

Documentation/admin-guide/sysfs-rules.rst

it does sound like /sys/<subsystem> could be a good place. Might wanna
run it by Greg, though, first.

> Maybe it can link to other directories like "/sys/devices/*" for
> hardware things like MCA,

Those are at

/sys/devices/system/machinecheck/

> and "/sys/kernel/*" for kernel things like CEC.

Or /sys/drivers

Remember there's also

/sys/devices/system/edac/

so a lot of RAS junk spread all over the place.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 9646ed6e8c0b..c5d358499899 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -206,6 +206,7 @@  static inline void enable_copy_mc_fragile(void)
 struct cper_ia_proc_ctx;
 
 #ifdef CONFIG_X86_MCE
+extern struct bus_type mce_subsys;
 int mcheck_init(void);
 void mcheck_cpu_init(struct cpuinfo_x86 *c);
 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2eec60f50057..1a3eaa501ae4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2376,10 +2376,11 @@  static void mce_enable_ce(void *all)
 		__mcheck_cpu_init_timer();
 }
 
-static struct bus_type mce_subsys = {
+struct bus_type mce_subsys = {
 	.name		= "machinecheck",
 	.dev_name	= "machinecheck",
 };
+EXPORT_SYMBOL_GPL(mce_subsys);
 
 DEFINE_PER_CPU(struct device *, mce_device);
 
diff --git a/arch/x86/ras/Kconfig b/arch/x86/ras/Kconfig
index 7488c715427e..5f5f6f9a5f3c 100644
--- a/arch/x86/ras/Kconfig
+++ b/arch/x86/ras/Kconfig
@@ -1,7 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0
 config RAS_CEC
 	bool "Correctable Errors Collector"
-	depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS
+	depends on X86_MCE && MEMORY_FAILURE
 	help
 	  This is a small cache which collects correctable memory errors per 4K
 	  page PFN and counts their repeated occurrence. Once the counter for a
@@ -15,7 +15,7 @@  config RAS_CEC
 config RAS_CEC_DEBUG
 	bool "CEC debugging machinery"
 	default n
-	depends on RAS_CEC
+	depends on RAS_CEC && DEBUG_FS
 	help
 	  Add extra files to (debugfs)/ras/cec to test the correctable error
 	  collector feature. "pfn" is a writable file that allows user to
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 321af498ee11..b45b4e90b8de 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -7,6 +7,7 @@ 
 #include <linux/ras.h>
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
+#include <linux/device.h>
 
 #include <asm/mce.h>
 
@@ -394,53 +395,96 @@  static int cec_add_elem(u64 pfn)
 	return ret;
 }
 
-static int u64_get(void *data, u64 *val)
-{
-	*val = *(u64 *)data;
+static struct kobject *cec_kobj;
 
-	return 0;
+static ssize_t decay_interval_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%llu\n", decay_interval);
 }
 
-static int pfn_set(void *data, u64 val)
+static ssize_t decay_interval_store(struct kobject *kobj,
+				    struct kobj_attribute *attr,
+				    const char *buf, size_t count)
 {
-	*(u64 *)data = val;
+	unsigned long long res;
+	int ret;
 
-	cec_add_elem(val);
+	ret = kstrtoull(buf, 10, &res);
+	if (ret)
+		return ret;
 
-	return 0;
+	if (res < CEC_DECAY_MIN_INTERVAL)
+		return -EINVAL;
+
+	if (res > CEC_DECAY_MAX_INTERVAL)
+		return -EINVAL;
+
+	decay_interval = res;
+
+	cec_mod_work(decay_interval);
+
+	return count;
 }
 
-DEFINE_DEBUGFS_ATTRIBUTE(pfn_ops, u64_get, pfn_set, "0x%llx\n");
+static ssize_t action_threshold_show(struct kobject *kobj,
+				     struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%llu\n", action_threshold);
+}
 
-static int decay_interval_set(void *data, u64 val)
+static ssize_t action_threshold_store(struct kobject *kobj,
+				      struct kobj_attribute *attr,
+				      const char *buf, size_t count)
 {
-	if (val < CEC_DECAY_MIN_INTERVAL)
-		return -EINVAL;
+	unsigned long long res;
+	int ret;
 
-	if (val > CEC_DECAY_MAX_INTERVAL)
-		return -EINVAL;
+	ret = kstrtoull(buf, 10, &res);
+	if (ret)
+		return ret;
 
-	*(u64 *)data   = val;
-	decay_interval = val;
+	if (res > COUNT_MASK)
+		res = COUNT_MASK;
 
-	cec_mod_work(decay_interval);
+	action_threshold = res;
+
+	return count;
+}
+
+static struct kobj_attribute decay_interval_attr =
+	__ATTR_RW_MODE(decay_interval, 0600);
+
+static struct kobj_attribute action_threshold_attr =
+	__ATTR_RW_MODE(action_threshold, 0600);
+
+static struct attribute *cec_attrs[] = {
+	&decay_interval_attr.attr,
+	&action_threshold_attr.attr,
+	NULL
+};
+
+static const struct attribute_group cec_attr_group = {
+	.attrs = cec_attrs
+};
+
+static int u64_get(void *data, u64 *val)
+{
+	*val = *(u64 *)data;
 
 	return 0;
 }
-DEFINE_DEBUGFS_ATTRIBUTE(decay_interval_ops, u64_get, decay_interval_set, "%lld\n");
 
-static int action_threshold_set(void *data, u64 val)
+static int pfn_set(void *data, u64 val)
 {
 	*(u64 *)data = val;
 
-	if (val > COUNT_MASK)
-		val = COUNT_MASK;
-
-	action_threshold = val;
+	cec_add_elem(val);
 
 	return 0;
 }
-DEFINE_DEBUGFS_ATTRIBUTE(action_threshold_ops, u64_get, action_threshold_set, "%lld\n");
+
+DEFINE_DEBUGFS_ATTRIBUTE(pfn_ops, u64_get, pfn_set, "0x%llx\n");
 
 static const char * const bins[] = { "00", "01", "10", "11" };
 
@@ -480,7 +524,7 @@  DEFINE_SHOW_ATTRIBUTE(array);
 
 static int __init create_debugfs_nodes(void)
 {
-	struct dentry *d, *pfn, *decay, *count, *array;
+	struct dentry *d, *pfn, *array;
 
 	d = debugfs_create_dir("cec", ras_debugfs_dir);
 	if (!d) {
@@ -488,23 +532,6 @@  static int __init create_debugfs_nodes(void)
 		return -1;
 	}
 
-	decay = debugfs_create_file("decay_interval", S_IRUSR | S_IWUSR, d,
-				    &decay_interval, &decay_interval_ops);
-	if (!decay) {
-		pr_warn("Error creating decay_interval debugfs node!\n");
-		goto err;
-	}
-
-	count = debugfs_create_file("action_threshold", S_IRUSR | S_IWUSR, d,
-				    &action_threshold, &action_threshold_ops);
-	if (!count) {
-		pr_warn("Error creating action_threshold debugfs node!\n");
-		goto err;
-	}
-
-	if (!IS_ENABLED(CONFIG_RAS_CEC_DEBUG))
-		return 0;
-
 	pfn = debugfs_create_file("pfn", S_IRUSR | S_IWUSR, d, &dfs_pfn, &pfn_ops);
 	if (!pfn) {
 		pr_warn("Error creating pfn debugfs node!\n");
@@ -553,6 +580,8 @@  static struct notifier_block cec_nb = {
 
 static int __init cec_init(void)
 {
+	int ret;
+
 	if (ce_arr.disabled)
 		return -ENODEV;
 
@@ -570,9 +599,22 @@  static int __init cec_init(void)
 		return -ENOMEM;
 	}
 
-	if (create_debugfs_nodes()) {
-		free_page((unsigned long)ce_arr.array);
-		return -ENOMEM;
+	cec_kobj = kobject_create_and_add("cec", &mce_subsys.dev_root->kobj);
+	if (!cec_kobj) {
+		pr_err("Error creating CEC kobject!\n");
+		ret = -ENOMEM;
+		goto err_kobject;
+	}
+
+	ret = sysfs_create_group(cec_kobj, &cec_attr_group);
+	if (ret) {
+		pr_err("Error creating CEC attribute group!\n");
+		goto err_group;
+	}
+
+	if (IS_ENABLED(CONFIG_RAS_CEC_DEBUG) && create_debugfs_nodes()) {
+		ret = -ENOMEM;
+		goto err_debug;
 	}
 
 	INIT_DELAYED_WORK(&cec_work, cec_work_fn);
@@ -582,6 +624,15 @@  static int __init cec_init(void)
 
 	pr_info("Correctable Errors collector initialized.\n");
 	return 0;
+
+err_debug:
+	sysfs_remove_group(cec_kobj, &cec_attr_group);
+err_group:
+	kobject_put(cec_kobj);
+err_kobject:
+	free_page((unsigned long)ce_arr.array);
+
+	return ret;
 }
 late_initcall(cec_init);