Message ID | 20250411085218.120312-1-lizhijian@fujitsu.com |
---|---|
State | New |
Headers | show |
Series | [ndctl] cxl/region: Prevent destroying region out of order | expand |
On Fri, Apr 11, 2025 at 04:52:18PM +0800, Li Zhijian wrote: > The destruction operation lacks atomicity, which may result in partial > completion of the destruction process when attempting to destroy an out > of order region. > > Previously, a out of order destroy will return: > $ sudo cxl destroy-region region1 -f > cxl region: destroy_region: decoder4.0: set_dpa_size failed: Device or resource busy > cxl region: decoder_region_action: region1: failed: Device or resource busy > cxl region: region_action: one or more failures, last failure: Device or resource busy > cxl region: cmd_destroy_region: destroyed 0 region > > Then, the components under this root decoder enter a stale state, the user > cannot create region any more under this root decoder. > What's the complete setup here? I'm thinking of the test case we can add to cxl unit tests for this. I notice the -f, force. What happens without force? > The kernel has already ensured that region/decoder committed in order, > so just ensure to destroy the last region is enough. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > cxl/lib/libcxl.c | 7 +++++++ > cxl/lib/libcxl.sym | 1 + > cxl/libcxl.h | 1 + > cxl/region.c | 11 ++++++++++- > 4 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > index 63aa4ef3acdc..c13bb9377237 100644 > --- a/cxl/lib/libcxl.c > +++ b/cxl/lib/libcxl.c > @@ -706,6 +706,13 @@ CXL_EXPORT struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder) > return list_top(&decoder->regions, struct cxl_region, list); > } > > +CXL_EXPORT struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder) > +{ > + cxl_regions_init(decoder); > + > + return list_tail(&decoder->regions, struct cxl_region, list); > +} > + > CXL_EXPORT struct cxl_region *cxl_region_get_next(struct cxl_region *region) > { > struct cxl_decoder *decoder = region->decoder; > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym > index 0c155a40ad47..3ce408b9cc55 100644 > --- a/cxl/lib/libcxl.sym > +++ b/cxl/lib/libcxl.sym > @@ -180,6 +180,7 @@ global: > cxl_decoder_set_dpa_size; > cxl_decoder_set_mode; > cxl_region_get_first; > + cxl_region_get_last; > cxl_region_get_next; > cxl_region_decode_is_committed; > cxl_region_is_enabled; > diff --git a/cxl/libcxl.h b/cxl/libcxl.h > index 0a5fd0e13cc2..9b5657203eec 100644 > --- a/cxl/libcxl.h > +++ b/cxl/libcxl.h > @@ -305,6 +305,7 @@ int cxl_memdev_is_enabled(struct cxl_memdev *memdev); > > struct cxl_region; > struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder); > +struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder); > struct cxl_region *cxl_region_get_next(struct cxl_region *region); > int cxl_region_decode_is_committed(struct cxl_region *region); > int cxl_region_is_enabled(struct cxl_region *region); > diff --git a/cxl/region.c b/cxl/region.c > index 96aa5931d228..2eb6f5416433 100644 > --- a/cxl/region.c > +++ b/cxl/region.c > @@ -835,7 +835,16 @@ static int destroy_region(struct cxl_region *region) > { > const char *devname = cxl_region_get_devname(region); > unsigned int ways, i; > - int rc; > + int rc = 0, , did = cxl_region_get_id(region); > + struct cxl_decoder *decoder = cxl_region_get_decoder(region); > + struct cxl_region *last_region = cxl_region_get_last(decoder); > + int last_id = cxl_region_get_id(last_region); > + > + if (last_id > did) { > + log_err(&rl, "Unable to destroy region out of order, " > + "please destroy region%d first.\n", last_id); not so sure we want to make that suggestion and again, not sure user will understand that 'out of order' means in this message. Is that explained in cxl destroy-region man page? > + return -EINVAL; > + } > > /* First, unbind/disable the region if needed */ > if (cxl_region_is_enabled(region)) { > -- > 2.47.0 >
Alison Schofield wrote: > On Fri, Apr 11, 2025 at 04:52:18PM +0800, Li Zhijian wrote: > > The destruction operation lacks atomicity, which may result in partial > > completion of the destruction process when attempting to destroy an out > > of order region. > > > > Previously, a out of order destroy will return: > > $ sudo cxl destroy-region region1 -f > > cxl region: destroy_region: decoder4.0: set_dpa_size failed: Device or resource busy > > cxl region: decoder_region_action: region1: failed: Device or resource busy > > cxl region: region_action: one or more failures, last failure: Device or resource busy > > cxl region: cmd_destroy_region: destroyed 0 region > > > > Then, the components under this root decoder enter a stale state, the user > > cannot create region any more under this root decoder. Right, but the user in this case passed --force, they had better be prepared for all of the bad consequences. > What's the complete setup here? I'm thinking of the test case we can add to > cxl unit tests for this. > > I notice the -f, force. What happens without force? That's documented in the destroy-region man page. If the region is enabled the --force disables it. Recall that the safe way to shutdown a CXL region is: daxctl offline-memory $daxdev #only if this succeeds proceed to next step cxl disable-region $region cxl destroy-region $region In comparison "cxl destroy-region -f" ignores "daxctl offline-memory" failures! What that means in practice is that pinned pages prevent the memory space from being reclaimed. It is highly unadvisable to use --force in production, but it is a quick shorthand for testing. > > The kernel has already ensured that region/decoder committed in order, > > so just ensure to destroy the last region is enough. > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > > --- > > cxl/lib/libcxl.c | 7 +++++++ > > cxl/lib/libcxl.sym | 1 + > > cxl/libcxl.h | 1 + > > cxl/region.c | 11 ++++++++++- > > 4 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > > index 63aa4ef3acdc..c13bb9377237 100644 > > --- a/cxl/lib/libcxl.c > > +++ b/cxl/lib/libcxl.c > > @@ -706,6 +706,13 @@ CXL_EXPORT struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder) > > return list_top(&decoder->regions, struct cxl_region, list); > > } > > > > +CXL_EXPORT struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder) > > +{ > > + cxl_regions_init(decoder); > > + > > + return list_tail(&decoder->regions, struct cxl_region, list); > > +} > > + > > CXL_EXPORT struct cxl_region *cxl_region_get_next(struct cxl_region *region) > > { > > struct cxl_decoder *decoder = region->decoder; > > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym > > index 0c155a40ad47..3ce408b9cc55 100644 > > --- a/cxl/lib/libcxl.sym > > +++ b/cxl/lib/libcxl.sym > > @@ -180,6 +180,7 @@ global: > > cxl_decoder_set_dpa_size; > > cxl_decoder_set_mode; > > cxl_region_get_first; > > + cxl_region_get_last; > > cxl_region_get_next; > > cxl_region_decode_is_committed; > > cxl_region_is_enabled; > > diff --git a/cxl/libcxl.h b/cxl/libcxl.h > > index 0a5fd0e13cc2..9b5657203eec 100644 > > --- a/cxl/libcxl.h > > +++ b/cxl/libcxl.h > > @@ -305,6 +305,7 @@ int cxl_memdev_is_enabled(struct cxl_memdev *memdev); > > > > struct cxl_region; > > struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder); > > +struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder); > > struct cxl_region *cxl_region_get_next(struct cxl_region *region); > > int cxl_region_decode_is_committed(struct cxl_region *region); > > int cxl_region_is_enabled(struct cxl_region *region); > > diff --git a/cxl/region.c b/cxl/region.c > > index 96aa5931d228..2eb6f5416433 100644 > > --- a/cxl/region.c > > +++ b/cxl/region.c > > @@ -835,7 +835,16 @@ static int destroy_region(struct cxl_region *region) > > { > > const char *devname = cxl_region_get_devname(region); > > unsigned int ways, i; > > - int rc; > > + int rc = 0, , did = cxl_region_get_id(region); > > + struct cxl_decoder *decoder = cxl_region_get_decoder(region); > > + struct cxl_region *last_region = cxl_region_get_last(decoder); > > + int last_id = cxl_region_get_id(last_region); > > + > > + if (last_id > did) { > > + log_err(&rl, "Unable to destroy region out of order, " > > + "please destroy region%d first.\n", last_id); > > not so sure we want to make that suggestion and again, not sure > user will understand that 'out of order' means in this message. > Is that explained in cxl destroy-region man page? Agree, this case needs some more documentation and the error message can be more precise, something like: "regions must be destroyed in reverse physical address order, otherwise the kernel pins port decoders active" > > + return -EINVAL; My problem with this is that it causes --force to fail, which is supposed to be the "I know what I am doing am prepared for the consequences" option. So at a minimum the old behavior needs to be maintained with --force, but you can add a new check for "destroy-region" without --force to fail with a new message. The kernel *is* tolerant of out-of-order region destruction, it waits for all regions to be destroyed before resetting mid-level port decoders.
> -----Original Message----- > From: Alison Schofield <alison.schofield@intel.com> > Sent: Saturday, April 12, 2025 12:44 AM > To: Li, Zhijian/ζ ζΊε <lizhijian@fujitsu.com> > Cc: linux-cxl@vger.kernel.org; Dave Jiang <dave.jiang@intel.com>; Vishal Verma > <vishal.l.verma@intel.com>; Ira Weiny <ira.weiny@intel.com>; Dan Williams > <dan.j.williams@intel.com> > Subject: Re: [ndctl PATCH] cxl/region: Prevent destroying region out of order > > On Fri, Apr 11, 2025 at 04:52:18PM +0800, Li Zhijian wrote: > > The destruction operation lacks atomicity, which may result in partial > > completion of the destruction process when attempting to destroy an > > out of order region. > > > > Previously, a out of order destroy will return: > > $ sudo cxl destroy-region region1 -f > > cxl region: destroy_region: decoder4.0: set_dpa_size failed: Device or > > resource busy cxl region: decoder_region_action: region1: failed: > > Device or resource busy cxl region: region_action: one or more > > failures, last failure: Device or resource busy cxl region: > > cmd_destroy_region: destroyed 0 region > > > > Then, the components under this root decoder enter a stale state, the > > user cannot create region any more under this root decoder. > > > > What's the complete setup here? I'm thinking of the test case we can add to cxl > unit tests for this. I emulated the following CXL topology with QEMU 1CFMW - 1HB -1Device(4GiB) Then create the region with -s to specify size 1G, repeat the same command below to create 2 regions A full processes are listed as following. > > I notice the -f, force. What happens without force? My bad, because my dax device will not automatically online, for simplicity, I often speicfies -f to destory after create the region. I have confirmed without -f is same(disable then destory without -f) [root@rdma-server ~]# cxl create-region -t ram -w 1 -d decoder0.0 -m mem0 -g 4096 -s 1G { "region":"region1", "resource":"0xa90000000", "size":"1024.00 MiB (1073.74 MB)", "type":"ram", "interleave_ways":1, "interleave_granularity":4096, "decode_state":"commit", "mappings":[ { "position":0, "memdev":"mem0", "decoder":"decoder2.0" } ], "qos_class_mismatch":true } cxl region: cmd_create_region: created 1 region [root@rdma-server ~]# cxl create-region -t ram -w 1 -d decoder0.0 -m mem0 -g 4096 -s 1G { "region":"region2", "resource":"0xad0000000", "size":"1024.00 MiB (1073.74 MB)", "type":"ram", "interleave_ways":1, "interleave_granularity":4096, "decode_state":"commit", "mappings":[ { "position":0, "memdev":"mem0", "decoder":"decoder2.1" } ], "qos_class_mismatch":true } cxl region: cmd_create_region: created 1 region [root@rdma-server ~]# [root@rdma-server ~]# cxl list [ { "memdevs":[ { "memdev":"mem0", "ram_size":4294967296, "serial":305419896, "host":"0000:36:00.0", "firmware_version":"BWFW VERSION 00" } ] }, { "regions":[ { "region":"region1", "resource":45365592064, "size":1073741824, "type":"ram", "interleave_ways":1, "interleave_granularity":4096, "decode_state":"commit", "qos_class_mismatch":true }, { "region":"region2", "resource":46439333888, "size":1073741824, "type":"ram", "interleave_ways":1, "interleave_granularity":4096, "decode_state":"commit", "qos_class_mismatch":true } ] } ] [root@rdma-server ~]# cxl disable-region 2 cxl region: cmd_disable_region: disabled 1 region [root@rdma-server ~]# cxl disable-region 1 cxl region: cmd_disable_region: disabled 1 region [root@rdma-server ~]# cxl destroy-region 1 cxl region: destroy_region: decoder2.0: set_dpa_size failed: Device or resource busy cxl region: decoder_region_action: region1: failed: Device or resource busy cxl region: region_action: one or more failures, last failure: Device or resource busy cxl region: cmd_destroy_region: destroyed 0 regions In the meantime, we will see some out-of-order reset dmesg [ 262.400243][ T843] cxl_port port1: decoder1.0: out of order reset, expected decoder1.1 [ 262.433064][ T843] cxl_port endpoint2: decoder2.0: out of order reset, expected decoder2.1 [ 262.489548][ T843] cxl decoder2.0: expected decoder2.1 Although the drive has some places to tolerate the out of order *reset*, but the dpa will not. It will return EBUSY when an out of order dpa free. see cxl_dpa_free() Thanks Zhijian > > > The kernel has already ensured that region/decoder committed in order, > > so just ensure to destroy the last region is enough. > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > > --- > > cxl/lib/libcxl.c | 7 +++++++ > > cxl/lib/libcxl.sym | 1 + > > cxl/libcxl.h | 1 + > > cxl/region.c | 11 ++++++++++- > > 4 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c index > > 63aa4ef3acdc..c13bb9377237 100644 > > --- a/cxl/lib/libcxl.c > > +++ b/cxl/lib/libcxl.c > > @@ -706,6 +706,13 @@ CXL_EXPORT struct cxl_region > *cxl_region_get_first(struct cxl_decoder *decoder) > > return list_top(&decoder->regions, struct cxl_region, list); } > > > > +CXL_EXPORT struct cxl_region *cxl_region_get_last(struct cxl_decoder > > +*decoder) { > > + cxl_regions_init(decoder); > > + > > + return list_tail(&decoder->regions, struct cxl_region, list); } > > + > > CXL_EXPORT struct cxl_region *cxl_region_get_next(struct cxl_region > > *region) { > > struct cxl_decoder *decoder = region->decoder; diff --git > > a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym index > > 0c155a40ad47..3ce408b9cc55 100644 > > --- a/cxl/lib/libcxl.sym > > +++ b/cxl/lib/libcxl.sym > > @@ -180,6 +180,7 @@ global: > > cxl_decoder_set_dpa_size; > > cxl_decoder_set_mode; > > cxl_region_get_first; > > + cxl_region_get_last; > > cxl_region_get_next; > > cxl_region_decode_is_committed; > > cxl_region_is_enabled; > > diff --git a/cxl/libcxl.h b/cxl/libcxl.h index > > 0a5fd0e13cc2..9b5657203eec 100644 > > --- a/cxl/libcxl.h > > +++ b/cxl/libcxl.h > > @@ -305,6 +305,7 @@ int cxl_memdev_is_enabled(struct cxl_memdev > > *memdev); > > > > struct cxl_region; > > struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder); > > +struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder); > > struct cxl_region *cxl_region_get_next(struct cxl_region *region); > > int cxl_region_decode_is_committed(struct cxl_region *region); int > > cxl_region_is_enabled(struct cxl_region *region); diff --git > > a/cxl/region.c b/cxl/region.c index 96aa5931d228..2eb6f5416433 100644 > > --- a/cxl/region.c > > +++ b/cxl/region.c > > @@ -835,7 +835,16 @@ static int destroy_region(struct cxl_region > > *region) { > > const char *devname = cxl_region_get_devname(region); > > unsigned int ways, i; > > - int rc; > > + int rc = 0, , did = cxl_region_get_id(region); > > + struct cxl_decoder *decoder = cxl_region_get_decoder(region); > > + struct cxl_region *last_region = cxl_region_get_last(decoder); > > + int last_id = cxl_region_get_id(last_region); > > + > > + if (last_id > did) { > > + log_err(&rl, "Unable to destroy region out of order, " > > + "please destroy region%d first.\n", last_id); > > not so sure we want to make that suggestion and again, not sure user will > understand that 'out of order' means in this message. > Is that explained in cxl destroy-region man page? > > > > + return -EINVAL; > > + } > > > > /* First, unbind/disable the region if needed */ > > if (cxl_region_is_enabled(region)) { > > -- > > 2.47.0 > >
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Saturday, April 12, 2025 1:57 AM > > > > not so sure we want to make that suggestion and again, not sure user > > will understand that 'out of order' means in this message. > > Is that explained in cxl destroy-region man page? > > Agree, this case needs some more documentation and the error message can be > more precise, something like: "regions must be destroyed in reverse physical > address order, otherwise the kernel pins port decoders active" Make sense. It should also include the suggested region number to make it easier to use. How about this: Unable to destroy region out of order, please destroy region%d first. Regions must be destroyed in reverse physical address order, otherwise the kernel pins port decoders active > > > > + return -EINVAL; > > My problem with this is that it causes --force to fail, which is supposed to be the "I > know what I am doing am prepared for the consequences" option. So at a > minimum the old behavior needs to be maintained with --force, but you can add a > new check for "destroy-region" without --force to fail with a new message. > > The kernel *is* tolerant of out-of-order region destruction, it waits for all regions > to be destroyed before resetting mid-level port decoders. I overlooked this before, it indeed behaviors like this. Before this patch, it requires the user have enough knowledge in how destroy-region works Thanks Zhijian
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c index 63aa4ef3acdc..c13bb9377237 100644 --- a/cxl/lib/libcxl.c +++ b/cxl/lib/libcxl.c @@ -706,6 +706,13 @@ CXL_EXPORT struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder) return list_top(&decoder->regions, struct cxl_region, list); } +CXL_EXPORT struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder) +{ + cxl_regions_init(decoder); + + return list_tail(&decoder->regions, struct cxl_region, list); +} + CXL_EXPORT struct cxl_region *cxl_region_get_next(struct cxl_region *region) { struct cxl_decoder *decoder = region->decoder; diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym index 0c155a40ad47..3ce408b9cc55 100644 --- a/cxl/lib/libcxl.sym +++ b/cxl/lib/libcxl.sym @@ -180,6 +180,7 @@ global: cxl_decoder_set_dpa_size; cxl_decoder_set_mode; cxl_region_get_first; + cxl_region_get_last; cxl_region_get_next; cxl_region_decode_is_committed; cxl_region_is_enabled; diff --git a/cxl/libcxl.h b/cxl/libcxl.h index 0a5fd0e13cc2..9b5657203eec 100644 --- a/cxl/libcxl.h +++ b/cxl/libcxl.h @@ -305,6 +305,7 @@ int cxl_memdev_is_enabled(struct cxl_memdev *memdev); struct cxl_region; struct cxl_region *cxl_region_get_first(struct cxl_decoder *decoder); +struct cxl_region *cxl_region_get_last(struct cxl_decoder *decoder); struct cxl_region *cxl_region_get_next(struct cxl_region *region); int cxl_region_decode_is_committed(struct cxl_region *region); int cxl_region_is_enabled(struct cxl_region *region); diff --git a/cxl/region.c b/cxl/region.c index 96aa5931d228..2eb6f5416433 100644 --- a/cxl/region.c +++ b/cxl/region.c @@ -835,7 +835,16 @@ static int destroy_region(struct cxl_region *region) { const char *devname = cxl_region_get_devname(region); unsigned int ways, i; - int rc; + int rc = 0, , did = cxl_region_get_id(region); + struct cxl_decoder *decoder = cxl_region_get_decoder(region); + struct cxl_region *last_region = cxl_region_get_last(decoder); + int last_id = cxl_region_get_id(last_region); + + if (last_id > did) { + log_err(&rl, "Unable to destroy region out of order, " + "please destroy region%d first.\n", last_id); + return -EINVAL; + } /* First, unbind/disable the region if needed */ if (cxl_region_is_enabled(region)) {
The destruction operation lacks atomicity, which may result in partial completion of the destruction process when attempting to destroy an out of order region. Previously, a out of order destroy will return: $ sudo cxl destroy-region region1 -f cxl region: destroy_region: decoder4.0: set_dpa_size failed: Device or resource busy cxl region: decoder_region_action: region1: failed: Device or resource busy cxl region: region_action: one or more failures, last failure: Device or resource busy cxl region: cmd_destroy_region: destroyed 0 region Then, the components under this root decoder enter a stale state, the user cannot create region any more under this root decoder. The kernel has already ensured that region/decoder committed in order, so just ensure to destroy the last region is enough. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- cxl/lib/libcxl.c | 7 +++++++ cxl/lib/libcxl.sym | 1 + cxl/libcxl.h | 1 + cxl/region.c | 11 ++++++++++- 4 files changed, 19 insertions(+), 1 deletion(-)