diff mbox series

[ndctl,v2,2/2] cxl: add inject-poison & clear-poison commands to cxl tool

Message ID 20230517032311.19923-3-junhyeok.im@samsung.com
State New, archived
Headers show
Series Support for inject and clear poison | expand

Commit Message

Junhyeok Im May 17, 2023, 3:23 a.m. UTC
Add new commands to cli tool, to inject and clear poison into dpa(-a) on the
memory device(memdev), and add man page documentation for the commands
(inject-poison, clear-poison).

Since the validity verification of the dpa would be done in
'cxl_validate_poison_dpa' of CXL driver, no additional logic is added
in this patch.

Also since it is expected no use case of injecting / clearing poison into the
same address for multiple devices, this command targets only one memdev,
like 'write-labels' command.

 usage: cxl {inject,clear}-poison <memdev> -a <dpa> [<options>]

    -v, --verbose         turn on debug
    -S, --serial          use serial numbers to id memdevs
    -a, --address <dpa>   DPA to inject or clear poison

Link to corresponding kernel patch:
- inject poison: https://lore.kernel.org/linux-cxl/241c64115e6bd2effed9c7a20b08b3908dd7be8f.1681874357.git.alison.schofield@intel.com/
- clear poison: https://lore.kernel.org/linux-cxl/8682c30ec24bd9c45af5feccb04b02be51e58c0a.1681874357.git.alison.schofield@intel.com/

Signed-off-by: Junhyeok Im <junhyeok.im@samsung.com>
---
 Documentation/cxl/cxl-clear-poison.txt  | 41 ++++++++++++
 Documentation/cxl/cxl-inject-poison.txt | 42 ++++++++++++
 Documentation/cxl/meson.build           |  2 +
 cxl/builtin.h                           |  2 +
 cxl/cxl.c                               |  2 +
 cxl/memdev.c                            | 85 ++++++++++++++++++++++++-
 6 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/cxl/cxl-clear-poison.txt
 create mode 100644 Documentation/cxl/cxl-inject-poison.txt

Comments

Alison Schofield May 19, 2023, 2:34 a.m. UTC | #1
On Wed, May 17, 2023 at 12:23:11PM +0900, Junhyeok Im wrote:
> Add new commands to cli tool, to inject and clear poison into dpa(-a) on the
> memory device(memdev), and add man page documentation for the commands
> (inject-poison, clear-poison).
> 
> Since the validity verification of the dpa would be done in
> 'cxl_validate_poison_dpa' of CXL driver, no additional logic is added
> in this patch.

Hi Junhyeok,

Thanks for syncing this up with the driver changes.

How about supporting the end user with checks on that DPA cmdline parameter?

As you stated, the driver validate the DPA. It does emit meaningful messages
in a debug kernel:
dev_dbg(cxlds->dev, "device has no dpa resource\n");
dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n"
The driver returns the same, -EINVAL response, to all those errors.

In addition to being more user friendly, stopping the bad input,
as soon as it can be detected, is just good practice.

a bit more below...

> 
> Also since it is expected no use case of injecting / clearing poison into the
> same address for multiple devices, this command targets only one memdev,
> like 'write-labels' command.
> 
>  usage: cxl {inject,clear}-poison <memdev> -a <dpa> [<options>]
> 
>     -v, --verbose         turn on debug
>     -S, --serial          use serial numbers to id memdevs
>     -a, --address <dpa>   DPA to inject or clear poison

What do you think about adding a length option and supporting the inject
and clear of a range of addresses?

That's all for now,
Alison

--snip
Junhyeok Im May 23, 2023, 7:57 a.m. UTC | #2
On Thu, May 18, 2023 at 07:34:13PM -0700, Alison Schofield wrote:
> On Wed, May 17, 2023 at 12:23:11PM +0900, Junhyeok Im wrote:
> > Add new commands to cli tool, to inject and clear poison into dpa(-a) on the
> > memory device(memdev), and add man page documentation for the commands
> > (inject-poison, clear-poison).
> > 
> > Since the validity verification of the dpa would be done in
> > 'cxl_validate_poison_dpa' of CXL driver, no additional logic is added
> > in this patch.
> 
> Hi Junhyeok,
> 
> Thanks for syncing this up with the driver changes.
> 
> How about supporting the end user with checks on that DPA cmdline parameter?
> 
> As you stated, the driver validate the DPA. It does emit meaningful messages
> in a debug kernel:
> dev_dbg(cxlds->dev, "device has no dpa resource\n");
> dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n"
> The driver returns the same, -EINVAL response, to all those errors.
> 
> In addition to being more user friendly, stopping the bad input,
> as soon as it can be detected, is just good practice.
> 

Thanks for the comments. 
I will include your feedback in the next (v3) patch, probably contain a 
'validate_poison_dpa' static function in memdev.c

> a bit more below...
> 
> > 
> > Also since it is expected no use case of injecting / clearing poison into the
> > same address for multiple devices, this command targets only one memdev,
> > like 'write-labels' command.
> > 
> >  usage: cxl {inject,clear}-poison <memdev> -a <dpa> [<options>]
> > 
> >     -v, --verbose         turn on debug
> >     -S, --serial          use serial numbers to id memdevs
> >     -a, --address <dpa>   DPA to inject or clear poison
> 
> What do you think about adding a length option and supporting the inject
> and clear of a range of addresses?
> 
> That's all for now,
> Alison
> 

I think it is a necessary and usable option.
Regarding the specific action with 'length' option, I would like to sync
up with what you have in your mind.
Let's see the following examples, while '-l' means 'length' in bytes from
the dpa specified by '-a'.
  1) $ cxl inject-poison mem0 -a 0x1000 
  2) $ cxl inject-poison mem0 -a 0x1000 -l 0
  3) $ cxl inject-poison mem0 -a 0x1000 -l 0x40
  4) $ cxl inject-poison mem0 -a 0x1000 -l 0x80
In my humble opinion, 
  for the case 1), poison injected to 0x1000
  for the case 2), invalid length
  for the case 3), poison injected to 0x1000, like 1)
  for the case 4), poison injected to 0x1000 and 0x1040.
May I ask what you think about above?

Really thanks.
- Junhyeok Im.
> --snip
diff mbox series

Patch

diff --git a/Documentation/cxl/cxl-clear-poison.txt b/Documentation/cxl/cxl-clear-poison.txt
new file mode 100644
index 0000000..eabad00
--- /dev/null
+++ b/Documentation/cxl/cxl-clear-poison.txt
@@ -0,0 +1,41 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-clear-poison(1)
+====================
+
+NAME
+----
+cxl-clear-poison - send clear poison command to specified CXL memdev
+                   targeting given DPA.
+
+SYNOPSIS
+--------
+[verse]
+'cxl clear-poison <memdev> -a <dpa> [<options>]'
+
+DESCRIPTION
+-----------
+Clear poison from the <dpa> of the <memdev>, and remove the address from the
+device's Poison List. Note that it is not an error to clear poison from an
+address that does not have poison set.
+
+OPTIONS
+-------
+<memory device>::
+        A 'memX' device name, or a memdev id number. Restrict the operation to
+        the specified memdev.
+
+-a::
+--address=::
+        Physical address of a CXL memdev to inject poison into.
+
+-S::
+--serial=::
+        Specify CXL memory device serial number(s) to filter the listing
+
+include::verbose-option.txt[]
+
+
+SEE ALSO
+--------
+CXL-3.0 8.2.9.8.4.3
diff --git a/Documentation/cxl/cxl-inject-poison.txt b/Documentation/cxl/cxl-inject-poison.txt
new file mode 100644
index 0000000..5a0fac2
--- /dev/null
+++ b/Documentation/cxl/cxl-inject-poison.txt
@@ -0,0 +1,42 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-inject-poison(1)
+====================
+
+NAME
+----
+cxl-inject-poison - send inject poison command to specified CXL memdev
+                    targeting given DPA.
+
+SYNOPSIS
+--------
+[verse]
+'cxl inject-poison <memdev> -a <dpa> [<options>]'
+
+DESCRIPTION
+-----------
+Add the <dpa> to the <memdev> poison list and the error source shall be set
+to an injected error. In addition, the device shall add an appropriate poison
+creation event to its internal Informational Event Log, update the Event Status
+register, and if configured, interrupt the host.
+
+OPTIONS
+-------
+<memory device>::
+        A 'memX' device name, or a memdev id number. Restrict the operation to
+        the specified memdev.
+
+-a::
+--address=::
+        Physical address of a CXL memdev to inject poison into.
+
+-S::
+--serial=::
+        Specify CXL memory device serial number(s) to filter the listing
+
+include::verbose-option.txt[]
+
+
+SEE ALSO
+--------
+CXL-3.0 8.2.9.8.4.2
diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
index a6d77ab..c0067ab 100644
--- a/Documentation/cxl/meson.build
+++ b/Documentation/cxl/meson.build
@@ -46,6 +46,8 @@  cxl_manpages = [
   'cxl-enable-region.txt',
   'cxl-destroy-region.txt',
   'cxl-monitor.txt',
+  'cxl-inject-poison.txt',
+  'cxl-clear-poison.txt',
 ]
 
 foreach man : cxl_manpages
diff --git a/cxl/builtin.h b/cxl/builtin.h
index 9baa43b..60b081c 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -22,6 +22,8 @@  int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_inject_poison(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_clear_poison(int argc, const char **argv, struct cxl_ctx *ctx);
 #ifdef ENABLE_LIBTRACEFS
 int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
 #else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index 3be7026..359e619 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -77,6 +77,8 @@  static struct cmd_struct commands[] = {
 	{ "disable-region", .c_fn = cmd_disable_region },
 	{ "destroy-region", .c_fn = cmd_destroy_region },
 	{ "monitor", .c_fn = cmd_monitor },
+	{ "inject-poison", .c_fn = cmd_inject_poison },
+	{ "clear-poison", .c_fn = cmd_clear_poison },
 };
 
 int main(int argc, const char **argv)
diff --git a/cxl/memdev.c b/cxl/memdev.c
index 0b3ad02..d1fed3c 100644
--- a/cxl/memdev.c
+++ b/cxl/memdev.c
@@ -34,6 +34,7 @@  static struct parameters {
 	const char *type;
 	const char *size;
 	const char *decoder_filter;
+	const char *poison_address;
 } param;
 
 static struct log_ctx ml;
@@ -85,6 +86,10 @@  OPT_STRING('t', "type", &param.type, "type",                   \
 OPT_BOOLEAN('f', "force", &param.force,                        \
 	    "Attempt 'expected to fail' operations")
 
+#define POISON_OPTIONS()                                       \
+OPT_STRING('a', "address", &param.poison_address, "dpa",       \
+	   "DPA to inject or clear poison")
+
 static const struct option read_options[] = {
 	BASE_OPTIONS(),
 	LABEL_OPTIONS(),
@@ -135,6 +140,12 @@  static const struct option free_dpa_options[] = {
 	OPT_END(),
 };
 
+static const struct option poison_options[] = {
+	BASE_OPTIONS(),
+	POISON_OPTIONS(),
+	OPT_END(),
+};
+
 enum reserve_dpa_mode {
 	DPA_ALLOC,
 	DPA_FREE,
@@ -351,6 +362,42 @@  static int action_free_dpa(struct cxl_memdev *memdev,
 	return __reserve_dpa(memdev, DPA_FREE, actx);
 }
 
+static int action_inject_poison(struct cxl_memdev *memdev,
+				struct action_context *actx)
+{
+	int rc;
+
+	if (!param.poison_address) {
+		log_err(&ml, "%s: set dpa to inject poison.\n",
+			cxl_memdev_get_devname(memdev));
+		return -EINVAL;
+	}
+	rc = cxl_memdev_inject_poison(memdev, param.poison_address);
+	if (rc < 0) {
+		log_err(&ml, "%s: inject poison failed: %s\n",
+			cxl_memdev_get_devname(memdev), strerror(-rc));
+	}
+	return rc;
+}
+
+static int action_clear_poison(struct cxl_memdev *memdev,
+			       struct action_context *actx)
+{
+	int rc;
+
+	if (!param.poison_address) {
+		log_err(&ml, "%s: set dpa to clear poison.\n",
+			cxl_memdev_get_devname(memdev));
+		return -EINVAL;
+	}
+	rc = cxl_memdev_clear_poison(memdev, param.poison_address);
+	if (rc < 0) {
+		log_err(&ml, "%s: clear poison failed: %s\n",
+			cxl_memdev_get_devname(memdev), strerror(-rc));
+	}
+	return rc;
+}
+
 static int action_disable(struct cxl_memdev *memdev, struct action_context *actx)
 {
 	if (!cxl_memdev_is_enabled(memdev))
@@ -755,7 +802,9 @@  static int memdev_action(int argc, const char **argv, struct cxl_ctx *ctx,
 				continue;
 			found = true;
 
-			if (action == action_write) {
+			if ((action == action_write) ||
+			    (action == action_inject_poison) ||
+			    (action == action_clear_poison)) {
 				single = memdev;
 				rc = 0;
 			} else
@@ -771,9 +820,17 @@  static int memdev_action(int argc, const char **argv, struct cxl_ctx *ctx,
 	}
 	rc = err;
 
-	if (action == action_write) {
+	if ((action == action_write) || (action == action_inject_poison) ||
+	    (action == action_clear_poison)) {
 		if (count > 1) {
-			error("write-labels only supports writing a single memdev\n");
+			if (action == action_write) {
+				error("write-labels only supports writing "
+				      "a single memdev\n");
+			} else {
+				error("poison command only supports injecting "
+				      "or clearing poison into a single "
+				      "memdev\n");
+			}
 			usage_with_options(u, options);
 			return -EINVAL;
 		} else if (single) {
@@ -893,3 +950,25 @@  int cmd_free_dpa(int argc, const char **argv, struct cxl_ctx *ctx)
 
 	return count >= 0 ? 0 : EXIT_FAILURE;
 }
+
+int cmd_inject_poison(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int count = memdev_action(
+		argc, argv, ctx, action_inject_poison, poison_options,
+		"cxl inject-poison <memdev> -a <dpa> [<options>]");
+	log_info(&ml, "inject-poison %d mem%s\n", count >= 0 ? count : 0,
+		 count > 1 ? "s" : "");
+
+	return count >= 0 ? 0 : EXIT_FAILURE;
+}
+
+int cmd_clear_poison(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int count = memdev_action(
+		argc, argv, ctx, action_clear_poison, poison_options,
+		"cxl clear-poison <memdev> -a <dpa> [<options>]");
+	log_info(&ml, "clear-poison %d mem%s\n", count >= 0 ? count : 0,
+		 count > 1 ? "s" : "");
+
+	return count >= 0 ? 0 : EXIT_FAILURE;
+}