mbox series

[v2,00/14] remoteproc: Add support for detaching from rproc

Message ID 20201030195713.1366341-1-mathieu.poirier@linaro.org (mailing list archive)
Headers show
Series remoteproc: Add support for detaching from rproc | expand

Message

Mathieu Poirier Oct. 30, 2020, 7:56 p.m. UTC
Following the work done here [1], this set provides support for the
remoteproc core to release resources associated with a remote processor
without having to switch it off. That way a platform driver can be removed
or the applcation processor power cycled while the remote processor is
still operating.

The only thing that changes in this revision are the last two patches where
device tree bindings to control how to handle attached remote processors have
been added.  More specifically two bindings are being proposed:

"autonomous_on_core_reboot": When rproc_cdev_release() or rproc_del() are called
and the remote processor has been attached to, it will be detached from (rather
than turned off) if "autonomous_on_core_reboot" is specified in the DT.

"autonomous_on_remote_crash": When a remote processor that has been attached to
crashes, it will be detached from if "autonomous_on_remote_crash" is specified
in the DT. It is _not_ used in this set and presented to show how I intend to 
organise things. 

I spent a fair amount of time coming up with the name for the bindings and would
welcome other ideas.  I will write a proper yaml file and CC the linux-kernel
mailing list once we have an agreement on the naming convention.

Applies cleanly on v5.10-rc1

Thanks,
Mathieu

[1]. https://lkml.org/lkml/2020/7/14/1600

Mathieu Poirier (14):
  remoteproc: Re-check state in rproc_shutdown()
  remoteproc: Remove useless check in rproc_del()
  remoteproc: Add new RPROC_ATTACHED state
  remoteproc: Properly represent the attached state
  remoteproc: Properly deal with a kernel panic when attached
  remoteproc: Add new detach() remoteproc operation
  remoteproc: Introduce function __rproc_detach()
  remoteproc: Introduce function rproc_detach()
  remoteproc: Rename function rproc_actuate()
  remoteproc: Add return value to function rproc_shutdown()
  remoteproc: Properly deal with a stop request when attached
  remoteproc: Properly deal with detach request
  remoteproc: Add automation flags
  remoteproc: Refactor rproc delete and cdev release path

 drivers/remoteproc/remoteproc_cdev.c  |  24 +++-
 drivers/remoteproc/remoteproc_core.c  | 183 +++++++++++++++++++++-----
 drivers/remoteproc/remoteproc_sysfs.c |  17 ++-
 include/linux/remoteproc.h            |  19 ++-
 4 files changed, 199 insertions(+), 44 deletions(-)

Comments

Arnaud POULIQUEN Nov. 12, 2020, 6:20 p.m. UTC | #1
Hi Mathieu,


On 10/30/20 8:56 PM, Mathieu Poirier wrote:
> Following the work done here [1], this set provides support for the
> remoteproc core to release resources associated with a remote processor
> without having to switch it off. That way a platform driver can be removed
> or the applcation processor power cycled while the remote processor is
> still operating.

I performed some non-regression tests on firmware attachment + few tests on
detach. I don't see any major problem introduced by this patchset (except the
minor problem I reported in patch 11/14:remoteproc: correctly process a stop
request when attached).

My concern is that without the bindings, we still have a problem on the recovery.
If a crash occurs while attached to a remote processor, the remote framework is
in an unexpected state, which requires a system reset to recover it.

To reproduce the issue, simply generate the crash :
 cat 1 >/sys/kernel/debug/remoteproc/remoteproc0/crash

At the end of the mail, I attached a temporary patch to apply on top of this
series, waiting for the bindings management. The patch shutdowns the attached
remote processor instead of trying to recover it.

I wonder if we should fix this for version 4.10 based on the current
implementation (if the patch window is not closed)...
Please tell me what would be the best strategy. If it's not too late, I can
prepare and send a patch tomorrow for v5.10.

Regards
Arnaud
> 
> The only thing that changes in this revision are the last two patches where
> device tree bindings to control how to handle attached remote processors have
> been added.  More specifically two bindings are being proposed:
> 
> "autonomous_on_core_reboot": When rproc_cdev_release() or rproc_del() are called
> and the remote processor has been attached to, it will be detached from (rather
> than turned off) if "autonomous_on_core_reboot" is specified in the DT.
> 
> "autonomous_on_remote_crash": When a remote processor that has been attached to
> crashes, it will be detached from if "autonomous_on_remote_crash" is specified
> in the DT. It is _not_ used in this set and presented to show how I intend to 
> organise things. 


> 
> I spent a fair amount of time coming up with the name for the bindings and would
> welcome other ideas.  I will write a proper yaml file and CC the linux-kernel
> mailing list once we have an agreement on the naming convention.
> 
> Applies cleanly on v5.10-rc1
> 
> Thanks,
> Mathieu
> 
> [1]. https://lkml.org/lkml/2020/7/14/1600
> 
> Mathieu Poirier (14):
>   remoteproc: Re-check state in rproc_shutdown()
>   remoteproc: Remove useless check in rproc_del()
>   remoteproc: Add new RPROC_ATTACHED state
>   remoteproc: Properly represent the attached state
>   remoteproc: Properly deal with a kernel panic when attached
>   remoteproc: Add new detach() remoteproc operation
>   remoteproc: Introduce function __rproc_detach()
>   remoteproc: Introduce function rproc_detach()
>   remoteproc: Rename function rproc_actuate()
>   remoteproc: Add return value to function rproc_shutdown()
>   remoteproc: Properly deal with a stop request when attached
>   remoteproc: Properly deal with detach request
>   remoteproc: Add automation flags
>   remoteproc: Refactor rproc delete and cdev release path
> 
>  drivers/remoteproc/remoteproc_cdev.c  |  24 +++-
>  drivers/remoteproc/remoteproc_core.c  | 183 +++++++++++++++++++++-----
>  drivers/remoteproc/remoteproc_sysfs.c |  17 ++-
>  include/linux/remoteproc.h            |  19 ++-
>  4 files changed, 199 insertions(+), 44 deletions(-)
> 

From a04866cf0add0020b65e9ab80d62d44290a1d695 Mon Sep 17 00:00:00 2001
From: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Date: Thu, 12 Nov 2020 18:25:52 +0100
Subject: [PATCH] remoteproc: core fix unexpected state on crash for attached
 firmware

The recovery falls in an unexpected state when attached to a remote
processor.
As no firmware to load is found, the remote processor has
just been stopped but associated resources are not free.
As consequence rproc->power is remaining at 1 and it s no more
possible to recover the remote processor.
This patch shutdown the attached remote processor instead of trying
to recover it.

Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
---
 drivers/remoteproc/remoteproc_core.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c
b/drivers/remoteproc/remoteproc_core.c
index a0611d494758..a38209dd782c 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1739,6 +1739,8 @@ static void rproc_crash_handler_work(struct work_struct *work)
 {
 	struct rproc *rproc = container_of(work, struct rproc, crash_handler);
 	struct device *dev = &rproc->dev;
+	unsigned int rproc_attached = false;
+

 	dev_dbg(dev, "enter %s\n", __func__);

@@ -1750,15 +1752,21 @@ static void rproc_crash_handler_work(struct work_struct
*work)
 		return;
 	}

+	if (rproc->state == RPROC_ATTACHED)
+		rproc_attached = true;
+
 	rproc->state = RPROC_CRASHED;
 	dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
 		rproc->name);

 	mutex_unlock(&rproc->lock);

-	if (!rproc->recovery_disabled)
-		rproc_trigger_recovery(rproc);
-
+	if (!rproc->recovery_disabled) {
+		if (!rproc_attached)
+			rproc_trigger_recovery(rproc);
+		else
+			rproc_shutdown(rproc);
+	}
 	pm_relax(rproc->dev.parent);
 }

@@ -1862,7 +1870,8 @@ int rproc_shutdown(struct rproc *rproc)
 		return ret;
 	}

-	if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_ATTACHED) {
+	if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_ATTACHED &&
+	    rproc->state != RPROC_CRASHED) {
 		ret = -EPERM;
 		goto out;
 	}
Mathieu Poirier Nov. 12, 2020, 6:41 p.m. UTC | #2
Good day,

On Thu, Nov 12, 2020 at 07:20:49PM +0100, Arnaud POULIQUEN wrote:
> Hi Mathieu,
> 
> 
> On 10/30/20 8:56 PM, Mathieu Poirier wrote:
> > Following the work done here [1], this set provides support for the
> > remoteproc core to release resources associated with a remote processor
> > without having to switch it off. That way a platform driver can be removed
> > or the applcation processor power cycled while the remote processor is
> > still operating.
> 
> I performed some non-regression tests on firmware attachment + few tests on
> detach. I don't see any major problem introduced by this patchset (except the
> minor problem I reported in patch 11/14:remoteproc: correctly process a stop
> request when attached).
> 

Your assessment is correct - I will fix that.

> My concern is that without the bindings, we still have a problem on the recovery.

At this time (and while this feature is being worked on) we have problems with
shutdown and recovery.  The up side is that 1) we know about them and 2) we are
actively working on closing the gab.  As such I am weary of introducting
temporary measures visible to users that will have to be maintained in the
future.  I'd rather simply acknowledge that something is broken.

I will take into account your comments on the bindings (which I haven't looked
at yet).  From there I expect to spin off another revision early to middle of
next week.  If we get this part in for the v5.11 cycle then only crash scenarios
will be left to address.

Thanks for the feedback,
Mathieu

> If a crash occurs while attached to a remote processor, the remote framework is
> in an unexpected state, which requires a system reset to recover it.
> 
> To reproduce the issue, simply generate the crash :
>  cat 1 >/sys/kernel/debug/remoteproc/remoteproc0/crash
> 
> At the end of the mail, I attached a temporary patch to apply on top of this
> series, waiting for the bindings management. The patch shutdowns the attached
> remote processor instead of trying to recover it.
> 
> I wonder if we should fix this for version 4.10 based on the current
> implementation (if the patch window is not closed)...
> Please tell me what would be the best strategy. If it's not too late, I can
> prepare and send a patch tomorrow for v5.10.
> 
> Regards
> Arnaud
> > 
> > The only thing that changes in this revision are the last two patches where
> > device tree bindings to control how to handle attached remote processors have
> > been added.  More specifically two bindings are being proposed:
> > 
> > "autonomous_on_core_reboot": When rproc_cdev_release() or rproc_del() are called
> > and the remote processor has been attached to, it will be detached from (rather
> > than turned off) if "autonomous_on_core_reboot" is specified in the DT.
> > 
> > "autonomous_on_remote_crash": When a remote processor that has been attached to
> > crashes, it will be detached from if "autonomous_on_remote_crash" is specified
> > in the DT. It is _not_ used in this set and presented to show how I intend to 
> > organise things. 
> 
> 
> > 
> > I spent a fair amount of time coming up with the name for the bindings and would
> > welcome other ideas.  I will write a proper yaml file and CC the linux-kernel
> > mailing list once we have an agreement on the naming convention.
> > 
> > Applies cleanly on v5.10-rc1
> > 
> > Thanks,
> > Mathieu
> > 
> > [1]. https://lkml.org/lkml/2020/7/14/1600
> > 
> > Mathieu Poirier (14):
> >   remoteproc: Re-check state in rproc_shutdown()
> >   remoteproc: Remove useless check in rproc_del()
> >   remoteproc: Add new RPROC_ATTACHED state
> >   remoteproc: Properly represent the attached state
> >   remoteproc: Properly deal with a kernel panic when attached
> >   remoteproc: Add new detach() remoteproc operation
> >   remoteproc: Introduce function __rproc_detach()
> >   remoteproc: Introduce function rproc_detach()
> >   remoteproc: Rename function rproc_actuate()
> >   remoteproc: Add return value to function rproc_shutdown()
> >   remoteproc: Properly deal with a stop request when attached
> >   remoteproc: Properly deal with detach request
> >   remoteproc: Add automation flags
> >   remoteproc: Refactor rproc delete and cdev release path
> > 
> >  drivers/remoteproc/remoteproc_cdev.c  |  24 +++-
> >  drivers/remoteproc/remoteproc_core.c  | 183 +++++++++++++++++++++-----
> >  drivers/remoteproc/remoteproc_sysfs.c |  17 ++-
> >  include/linux/remoteproc.h            |  19 ++-
> >  4 files changed, 199 insertions(+), 44 deletions(-)
> > 
> 
> From a04866cf0add0020b65e9ab80d62d44290a1d695 Mon Sep 17 00:00:00 2001
> From: Arnaud Pouliquen <arnaud.pouliquen@st.com>
> Date: Thu, 12 Nov 2020 18:25:52 +0100
> Subject: [PATCH] remoteproc: core fix unexpected state on crash for attached
>  firmware
> 
> The recovery falls in an unexpected state when attached to a remote
> processor.
> As no firmware to load is found, the remote processor has
> just been stopped but associated resources are not free.
> As consequence rproc->power is remaining at 1 and it s no more
> possible to recover the remote processor.
> This patch shutdown the attached remote processor instead of trying
> to recover it.
> 
> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
> ---
>  drivers/remoteproc/remoteproc_core.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c
> b/drivers/remoteproc/remoteproc_core.c
> index a0611d494758..a38209dd782c 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1739,6 +1739,8 @@ static void rproc_crash_handler_work(struct work_struct *work)
>  {
>  	struct rproc *rproc = container_of(work, struct rproc, crash_handler);
>  	struct device *dev = &rproc->dev;
> +	unsigned int rproc_attached = false;
> +
> 
>  	dev_dbg(dev, "enter %s\n", __func__);
> 
> @@ -1750,15 +1752,21 @@ static void rproc_crash_handler_work(struct work_struct
> *work)
>  		return;
>  	}
> 
> +	if (rproc->state == RPROC_ATTACHED)
> +		rproc_attached = true;
> +
>  	rproc->state = RPROC_CRASHED;
>  	dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
>  		rproc->name);
> 
>  	mutex_unlock(&rproc->lock);
> 
> -	if (!rproc->recovery_disabled)
> -		rproc_trigger_recovery(rproc);
> -
> +	if (!rproc->recovery_disabled) {
> +		if (!rproc_attached)
> +			rproc_trigger_recovery(rproc);
> +		else
> +			rproc_shutdown(rproc);
> +	}
>  	pm_relax(rproc->dev.parent);
>  }
> 
> @@ -1862,7 +1870,8 @@ int rproc_shutdown(struct rproc *rproc)
>  		return ret;
>  	}
> 
> -	if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_ATTACHED) {
> +	if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_ATTACHED &&
> +	    rproc->state != RPROC_CRASHED) {
>  		ret = -EPERM;
>  		goto out;
>  	}
> -- 
> 2.17.1
> 
> 
>