diff mbox series

[NDCTL,2/2] cxl: Add check for regions before disabling memdev

Message ID 169645731012.624805.15404457479294344934.stgit@djiang5-mobl3 (mailing list archive)
State Superseded
Delegated to: Vishal Verma
Headers show
Series [NDCTL,1/2] cxl: Save the number of decoders committed to a port | expand

Commit Message

Dave Jiang Oct. 4, 2023, 10:08 p.m. UTC
Add a check for memdev disable to see if there are active regions present
before disabling the device. This is necessary now regions are present to
fulfill the TODO that was left there. The best way to determine if a
region is active is to see if there are decoders enabled for the mem
device. This is also best effort as the state is only a snapshot the
kernel provides and is not atomic WRT the memdev disable operation. The
expectation is the admin issuing the command has full control of the mem
device and there are no other agents also attempt to control the device.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/memdev.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Cao, Quanquan/曹 全全 Nov. 28, 2023, 6:35 a.m. UTC | #1
> Add a check for memdev disable to see if there are active regions present
> before disabling the device. This is necessary now regions are present to
> fulfill the TODO that was left there. The best way to determine if a
> region is active is to see if there are decoders enabled for the mem
> device. This is also best effort as the state is only a snapshot the
> kernel provides and is not atomic WRT the memdev disable operation. The
> expectation is the admin issuing the command has full control of the mem
> device and there are no other agents also attempt to control the device.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>   cxl/memdev.c |   14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/cxl/memdev.c b/cxl/memdev.c
> index f6a2d3f1fdca..314bac082719 100644
> --- a/cxl/memdev.c
> +++ b/cxl/memdev.c
> @@ -373,11 +373,21 @@ static int action_free_dpa(struct cxl_memdev *memdev,
>   
>   static int action_disable(struct cxl_memdev *memdev, struct action_context *actx)
>   {
> +	struct cxl_endpoint *ep;
> +	struct cxl_port *port;
> +
>   	if (!cxl_memdev_is_enabled(memdev))
>   		return 0;
>   
> -	if (!param.force) {
> -		/* TODO: actually detect rather than assume active */
> +	ep = cxl_memdev_get_endpoint(memdev);
> +	if (!ep)
> +		return -ENODEV;
> +
> +	port = cxl_endpoint_get_port(ep);
> +	if (!port)
> +		return -ENODEV;
> +
> +	if (cxl_port_decoders_committed(port) && !param.force) {
>   		log_err(&ml, "%s is part of an active region\n",
>   			cxl_memdev_get_devname(memdev));
>   		return -EBUSY;
> 
> 
Hi Dave,

Based on my understanding of the logic in the "disable_region" and 
"destroy_region" code, in the code logic of 'disable-region -f,' after 
the check, it proceeds with the offline operation. In the code logic of 
'destroy-region -f,' after the check, it performs a disable operation on 
the region. For the 'disable-memdev -f' operation, after completing the 
check, is it also necessary to perform corresponding operations on the 
region(such as disabling region/destroying region) before disabling memdev?
Vishal Verma Nov. 28, 2023, 6:13 p.m. UTC | #2
On Tue, 2023-11-28 at 14:35 +0800, Cao, Quanquan/曹 全全 wrote:
> 
> > Add a check for memdev disable to see if there are active regions present
> > before disabling the device. This is necessary now regions are present to
> > fulfill the TODO that was left there. The best way to determine if a
> > region is active is to see if there are decoders enabled for the mem
> > device. This is also best effort as the state is only a snapshot the
> > kernel provides and is not atomic WRT the memdev disable operation. The
> > expectation is the admin issuing the command has full control of the mem
> > device and there are no other agents also attempt to control the device.
> > 
> > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > ---
> >   cxl/memdev.c |   14 ++++++++++++--
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/cxl/memdev.c b/cxl/memdev.c
> > index f6a2d3f1fdca..314bac082719 100644
> > --- a/cxl/memdev.c
> > +++ b/cxl/memdev.c
> > @@ -373,11 +373,21 @@ static int action_free_dpa(struct cxl_memdev *memdev,
> >   
> >   static int action_disable(struct cxl_memdev *memdev, struct action_context *actx)
> >   {
> > +       struct cxl_endpoint *ep;
> > +       struct cxl_port *port;
> > +
> >         if (!cxl_memdev_is_enabled(memdev))
> >                 return 0;
> >   
> > -       if (!param.force) {
> > -               /* TODO: actually detect rather than assume active */
> > +       ep = cxl_memdev_get_endpoint(memdev);
> > +       if (!ep)
> > +               return -ENODEV;
> > +
> > +       port = cxl_endpoint_get_port(ep);
> > +       if (!port)
> > +               return -ENODEV;
> > +
> > +       if (cxl_port_decoders_committed(port) && !param.force) {
> >                 log_err(&ml, "%s is part of an active region\n",
> >                         cxl_memdev_get_devname(memdev));
> >                 return -EBUSY;
> > 
> > 
> Hi Dave,
> 
> Based on my understanding of the logic in the "disable_region" and 
> "destroy_region" code, in the code logic of 'disable-region -f,' after 
> the check, it proceeds with the offline operation. In the code logic of 
> 'destroy-region -f,' after the check, it performs a disable operation on 
> the region. For the 'disable-memdev -f' operation, after completing the 
> check, is it also necessary to perform corresponding operations on the 
> region(such as disabling region/destroying region) before disabling memdev?
> 
I think it is better for a disable-memdev to just perform the check and
complain, instead of trying to unwind a _lot_ of things - i.e. offline
memory, disable region, and then disable memdev. If there is an error
in one of the intermediate steps, it can make it hard to trace down
what happened, and what state the system is in.

I do think that the current -f behavior of just unbind the driver
without the safety checks should come with a big loud warning in the
man page at least about what -f will do, and how it will potentially
break existing regions.
Alison Schofield Nov. 28, 2023, 6:31 p.m. UTC | #3
On Wed, Oct 04, 2023 at 03:08:30PM -0700, Dave Jiang wrote:
> Add a check for memdev disable to see if there are active regions present
> before disabling the device. This is necessary now regions are present to
> fulfill the TODO that was left there. The best way to determine if a
> region is active is to see if there are decoders enabled for the mem
> device. This is also best effort as the state is only a snapshot the
> kernel provides and is not atomic WRT the memdev disable operation. The
> expectation is the admin issuing the command has full control of the mem
> device and there are no other agents also attempt to control the device.
> 

snip

> +
> +	if (cxl_port_decoders_committed(port) && !param.force) {
>  		log_err(&ml, "%s is part of an active region\n",
>  			cxl_memdev_get_devname(memdev));
>  		return -EBUSY;

Can you emit the message either way, that is, even if it is forced,
let the user know what they just trampled on.

How easy is it to add the region names in the message?

> 
> 
>
Dave Jiang Nov. 28, 2023, 8:23 p.m. UTC | #4
On 11/28/23 11:31, Alison Schofield wrote:
> On Wed, Oct 04, 2023 at 03:08:30PM -0700, Dave Jiang wrote:
>> Add a check for memdev disable to see if there are active regions present
>> before disabling the device. This is necessary now regions are present to
>> fulfill the TODO that was left there. The best way to determine if a
>> region is active is to see if there are decoders enabled for the mem
>> device. This is also best effort as the state is only a snapshot the
>> kernel provides and is not atomic WRT the memdev disable operation. The
>> expectation is the admin issuing the command has full control of the mem
>> device and there are no other agents also attempt to control the device.
>>
> 
> snip
> 
>> +
>> +	if (cxl_port_decoders_committed(port) && !param.force) {
>>  		log_err(&ml, "%s is part of an active region\n",
>>  			cxl_memdev_get_devname(memdev));
>>  		return -EBUSY;
> 
> Can you emit the message either way, that is, even if it is forced,
> let the user know what they just trampled on.

Ok I'll change that.

> 
> How easy is it to add the region names in the message?

I don't think we can get to a region easily. The way we currently check is to see if there are active decoders and not region existence.  

> 
>>
>>
>>
diff mbox series

Patch

diff --git a/cxl/memdev.c b/cxl/memdev.c
index f6a2d3f1fdca..314bac082719 100644
--- a/cxl/memdev.c
+++ b/cxl/memdev.c
@@ -373,11 +373,21 @@  static int action_free_dpa(struct cxl_memdev *memdev,
 
 static int action_disable(struct cxl_memdev *memdev, struct action_context *actx)
 {
+	struct cxl_endpoint *ep;
+	struct cxl_port *port;
+
 	if (!cxl_memdev_is_enabled(memdev))
 		return 0;
 
-	if (!param.force) {
-		/* TODO: actually detect rather than assume active */
+	ep = cxl_memdev_get_endpoint(memdev);
+	if (!ep)
+		return -ENODEV;
+
+	port = cxl_endpoint_get_port(ep);
+	if (!port)
+		return -ENODEV;
+
+	if (cxl_port_decoders_committed(port) && !param.force) {
 		log_err(&ml, "%s is part of an active region\n",
 			cxl_memdev_get_devname(memdev));
 		return -EBUSY;