diff mbox

remoteproc: qcom_q6v5: don't auto boot remote processor

Message ID 20180524192141.20323-1-ramon.fried@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ramon Fried May 24, 2018, 7:21 p.m. UTC
Sometimes that rmtfs userspace module is not brought
up fast enough and the modem crashes.
disabling automated boot in the driver and triggering
the boot from user-space sovles the problem.

Signed-off-by: Ramon Fried <ramon.fried@gmail.com>
---
 drivers/remoteproc/qcom_q6v5_pil.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Bjorn Andersson May 29, 2018, 4:20 a.m. UTC | #1
On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:

> Sometimes that rmtfs userspace module is not brought
> up fast enough and the modem crashes.
> disabling automated boot in the driver and triggering
> the boot from user-space sovles the problem.
> 
> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>

Thanks for your patch Ramon. While this nudges the behavior to make
things work slightly better I think we need to describe the explicit
dependency between the mss firmware and the existence of rmtfs.

As our remoteprocs are essentially always-on I would prefer that they
start "automatically" and not through use of the sysfs interface.

But we're at the point where this is a real problem on 410, 820 and 845,
so we have to come up with some way to tie these pieces together. If
your patch suits that solution I will happily take it.

Regards,
Bjorn

> ---
>  drivers/remoteproc/qcom_q6v5_pil.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_q6v5_pil.c b/drivers/remoteproc/qcom_q6v5_pil.c
> index cbbafdcaaecb..719ee96445b3 100644
> --- a/drivers/remoteproc/qcom_q6v5_pil.c
> +++ b/drivers/remoteproc/qcom_q6v5_pil.c
> @@ -1133,6 +1133,8 @@ static int q6v5_probe(struct platform_device *pdev)
>  		return -ENOMEM;
>  	}
>  
> +	rproc->auto_boot = false;
> +
>  	qproc = (struct q6v5 *)rproc->priv;
>  	qproc->dev = &pdev->dev;
>  	qproc->rproc = rproc;
> -- 
> 2.17.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-remoteproc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ramon Fried May 29, 2018, 11:31 a.m. UTC | #2
On Tue, May 29, 2018 at 7:20 AM, Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
> On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:
>
>> Sometimes that rmtfs userspace module is not brought
>> up fast enough and the modem crashes.
>> disabling automated boot in the driver and triggering
>> the boot from user-space sovles the problem.
>>
>> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>
>
> Thanks for your patch Ramon. While this nudges the behavior to make
> things work slightly better I think we need to describe the explicit
> dependency between the mss firmware and the existence of rmtfs.
>
> As our remoteprocs are essentially always-on I would prefer that they
> start "automatically" and not through use of the sysfs interface.
>
> But we're at the point where this is a real problem on 410, 820 and 845,
> so we have to come up with some way to tie these pieces together. If
> your patch suits that solution I will happily take it.
Yes. it was tested on 410, and it did the trick.
>
> Regards,
> Bjorn
>
>> ---
>>  drivers/remoteproc/qcom_q6v5_pil.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5_pil.c b/drivers/remoteproc/qcom_q6v5_pil.c
>> index cbbafdcaaecb..719ee96445b3 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pil.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pil.c
>> @@ -1133,6 +1133,8 @@ static int q6v5_probe(struct platform_device *pdev)
>>               return -ENOMEM;
>>       }
>>
>> +     rproc->auto_boot = false;
>> +
>>       qproc = (struct q6v5 *)rproc->priv;
>>       qproc->dev = &pdev->dev;
>>       qproc->rproc = rproc;
>> --
>> 2.17.0
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-remoteproc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sibi Sankar Jan. 18, 2019, 7:04 a.m. UTC | #3
On 2018-05-29 09:50, Bjorn Andersson wrote:
> On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:
> 
>> Sometimes that rmtfs userspace module is not brought
>> up fast enough and the modem crashes.
>> disabling automated boot in the driver and triggering
>> the boot from user-space sovles the problem.
>> 
>> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>
> 
> Thanks for your patch Ramon. While this nudges the behavior to make
> things work slightly better I think we need to describe the explicit
> dependency between the mss firmware and the existence of rmtfs.
> 
> As our remoteprocs are essentially always-on I would prefer that they
> start "automatically" and not through use of the sysfs interface.
> 
> But we're at the point where this is a real problem on 410, 820 and 
> 845,
> so we have to come up with some way to tie these pieces together. If
> your patch suits that solution I will happily take it.
> 
> Regards,
> Bjorn

After experimenting with in kernel solutions for
three revisions and observing problems on graceful
shutdown usecase, switching to controlling the
remoteproc mss through rmtfs seems to solve all
the known issues.

https://patchwork.kernel.org/patch/10662395/

we should probably get this merged in, now that
we are planning to start/stop mss through
rmtfs.


Acked-by: Sibi Sankar <sibis@codeaurora.org>

> 
>> ---
>>  drivers/remoteproc/qcom_q6v5_pil.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/drivers/remoteproc/qcom_q6v5_pil.c 
>> b/drivers/remoteproc/qcom_q6v5_pil.c
>> index cbbafdcaaecb..719ee96445b3 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pil.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pil.c
>> @@ -1133,6 +1133,8 @@ static int q6v5_probe(struct platform_device 
>> *pdev)
>>  		return -ENOMEM;
>>  	}
>> 
>> +	rproc->auto_boot = false;
>> +
>>  	qproc = (struct q6v5 *)rproc->priv;
>>  	qproc->dev = &pdev->dev;
>>  	qproc->rproc = rproc;
>> --
>> 2.17.0
>>
Brian Norris Jan. 18, 2019, 6:35 p.m. UTC | #4
On Thu, Jan 17, 2019 at 11:04 PM Sibi Sankar <sibis@codeaurora.org> wrote:
> On 2018-05-29 09:50, Bjorn Andersson wrote:
> > On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:

Whoa, bringing up a 7-month old patch? Nice.

> >> Sometimes that rmtfs userspace module is not brought
> >> up fast enough and the modem crashes.
> >> disabling automated boot in the driver and triggering
> >> the boot from user-space sovles the problem.
> >>
> >> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>
> >
> > Thanks for your patch Ramon. While this nudges the behavior to make
> > things work slightly better I think we need to describe the explicit
> > dependency between the mss firmware and the existence of rmtfs.
> >
> > As our remoteprocs are essentially always-on I would prefer that they
> > start "automatically" and not through use of the sysfs interface.
> >
> > But we're at the point where this is a real problem on 410, 820 and
> > 845,
> > so we have to come up with some way to tie these pieces together. If
> > your patch suits that solution I will happily take it.
> >
> > Regards,
> > Bjorn
>
> After experimenting with in kernel solutions for
> three revisions and observing problems on graceful
> shutdown usecase,

What exactly were the problems again? e.g., what were the deficiencies
with having the remoteproc device listen for the REMOTEFS_QMI_SVC_ID
service again? Sorry, but I sort of dropped off on reviewing that
stuff, and now I see this. I'd mildly prefer something that is
actually automatic, but if I'm missing some aspects, I'd like to hear
that. (And, I'd like to see them explained in the commit message, if
this is ever to be merged.)

> switching to controlling the
> remoteproc mss through rmtfs seems to solve all
> the known issues.

How so? It explicitly does NOT help at all if RMTFS crashes.
Because...who's going to stop the modem in that case? (It works if you
automatically respawn a new RMTFS daemon, to toggle the modem. But
that's kind of cheating, and you can do that anyway, even without this
patch.) On the contrary, your patch *would* resolve that, since the
modem would notice when the RMTFS server goes away, and it would stop
itself.

> https://patchwork.kernel.org/patch/10662395/
>
> we should probably get this merged in, now that
> we are planning to start/stop mss through
> rmtfs.

Sorry, who's planning to stop mss through rmtfs? Did I miss something?

Brian
Sibi Sankar Jan. 18, 2019, 7:46 p.m. UTC | #5
On 2019-01-19 00:05, Brian Norris wrote:
> On Thu, Jan 17, 2019 at 11:04 PM Sibi Sankar <sibis@codeaurora.org> 
> wrote:
>> On 2018-05-29 09:50, Bjorn Andersson wrote:
>> > On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:
> 
> Whoa, bringing up a 7-month old patch? Nice.
> 
>> >> Sometimes that rmtfs userspace module is not brought
>> >> up fast enough and the modem crashes.
>> >> disabling automated boot in the driver and triggering
>> >> the boot from user-space sovles the problem.
>> >>
>> >> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>
>> >
>> > Thanks for your patch Ramon. While this nudges the behavior to make
>> > things work slightly better I think we need to describe the explicit
>> > dependency between the mss firmware and the existence of rmtfs.
>> >
>> > As our remoteprocs are essentially always-on I would prefer that they
>> > start "automatically" and not through use of the sysfs interface.
>> >
>> > But we're at the point where this is a real problem on 410, 820 and
>> > 845,
>> > so we have to come up with some way to tie these pieces together. If
>> > your patch suits that solution I will happily take it.
>> >
>> > Regards,
>> > Bjorn
>> 
>> After experimenting with in kernel solutions for
>> three revisions and observing problems on graceful
>> shutdown usecase,
> 
> What exactly were the problems again? e.g., what were the deficiencies
> with having the remoteproc device listen for the REMOTEFS_QMI_SVC_ID
> service again? Sorry, but I sort of dropped off on reviewing that
> stuff, and now I see this. I'd mildly prefer something that is
> actually automatic, but if I'm missing some aspects, I'd like to hear
> that. (And, I'd like to see them explained in the commit message, if
> this is ever to be merged.)

bringing down the modem after the RMTFS server
goes down leaves the modem in limbo (It has a few
pending rmtfs transactions that cannot go through)
which results in sysmon graceful shutdown failing.
And we have to do a modem force-stop to proceed
which we want to avoid in graceful shutdown cases.
This is overcome by starting rproc mss from rmtfs
after REMOTEFS_QMI service is up and stopping
rproc mss from rmtfs on SIGKILL/SIGINT and other
program error signals before bringing down the
RMTFS_QMI service i.e before exiting the rmtfs
server loop.

> 
>> switching to controlling the
>> remoteproc mss through rmtfs seems to solve all
>> the known issues.
> 
> How so? It explicitly does NOT help at all if RMTFS crashes.
> Because...who's going to stop the modem in that case? (It works if you
> automatically respawn a new RMTFS daemon, to toggle the modem. But
> that's kind of cheating, and you can do that anyway, even without this
> patch.) On the contrary, your patch *would* resolve that, since the
> modem would notice when the RMTFS server goes away, and it would stop
> itself.

yeah we would want to mimic what the kernel
patch did with the exception of stopping modem
before bringing down the rmtfs server (not toggle
rproc state but start on rmtfs service up and stop
before rmtfs server exit). So in that case we would
not want the modem to auto-boot.

> 
>> https://patchwork.kernel.org/patch/10662395/
>> 
>> we should probably get this merged in, now that
>> we are planning to start/stop mss through
>> rmtfs.
> 
> Sorry, who's planning to stop mss through rmtfs? Did I miss something?

I have a working patch which I'll soon send
upstream for review, after it clears the internal
reviews/processes

> 
> Brian
Brian Norris Jan. 18, 2019, 9:04 p.m. UTC | #6
Hi Sibi,

On Fri, Jan 18, 2019 at 11:46 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> On 2019-01-19 00:05, Brian Norris wrote:
> > On Thu, Jan 17, 2019 at 11:04 PM Sibi Sankar <sibis@codeaurora.org>
> >> After experimenting with in kernel solutions for
> >> three revisions and observing problems on graceful
> >> shutdown usecase,
> >
> > What exactly were the problems again? e.g., what were the deficiencies
> > with having the remoteproc device listen for the REMOTEFS_QMI_SVC_ID
> > service again? Sorry, but I sort of dropped off on reviewing that
> > stuff, and now I see this. I'd mildly prefer something that is
> > actually automatic, but if I'm missing some aspects, I'd like to hear
> > that. (And, I'd like to see them explained in the commit message, if
> > this is ever to be merged.)
>
> bringing down the modem after the RMTFS server
> goes down leaves the modem in limbo (It has a few
> pending rmtfs transactions that cannot go through)
> which results in sysmon graceful shutdown failing.

Makes sense. Probably should be described in a re-send of this patch,
if we're going with that.

> And we have to do a modem force-stop to proceed
> which we want to avoid in graceful shutdown cases.

Shouldn't you do the "force-stop" in the kernel too? e.g., if rmtfs
daemon dies without doing a properly-timed stop, then we should still
force a stop in the kernel, no? Basically, why not do both mechanisms:
REMOTEFS_QMI_SVC-activated start/stop in the kernel, and manual stop
(and start? this likely might still be redundant) in the daemon?

> This is overcome by starting rproc mss from rmtfs
> after REMOTEFS_QMI service is up and stopping
> rproc mss from rmtfs on SIGKILL/SIGINT and other
> program error signals before bringing down the
> RMTFS_QMI service i.e before exiting the rmtfs
> server loop.

That's still not really a sure-fire way of doing things. For one, you
can't catch SIGKILL. Similarly, you can't really clean up from OOM,
segfault, etc. So it would still be helpful to hook into the QMI
service notifications in the kernel, I think.

> >> switching to controlling the
> >> remoteproc mss through rmtfs seems to solve all
> >> the known issues.
> >
> > How so? It explicitly does NOT help at all if RMTFS crashes.
> > Because...who's going to stop the modem in that case? (It works if you
> > automatically respawn a new RMTFS daemon, to toggle the modem. But
> > that's kind of cheating, and you can do that anyway, even without this
> > patch.) On the contrary, your patch *would* resolve that, since the
> > modem would notice when the RMTFS server goes away, and it would stop
> > itself.
>
> yeah we would want to mimic what the kernel
> patch did with the exception of stopping modem
> before bringing down the rmtfs server (not toggle
> rproc state but start on rmtfs service up and stop
> before rmtfs server exit). So in that case we would
> not want the modem to auto-boot.

See above. You can't really mimic what the kernel patch was doing
completely. You can try (which is probably a good idea), but you
should still have some fallback in the kernel, I expect.

> >> https://patchwork.kernel.org/patch/10662395/
> >>
> >> we should probably get this merged in, now that
> >> we are planning to start/stop mss through
> >> rmtfs.
> >
> > Sorry, who's planning to stop mss through rmtfs? Did I miss something?
>
> I have a working patch which I'll soon send
> upstream for review, after it clears the internal
> reviews/processes

OK.

Brian
Sibi Sankar Jan. 19, 2019, 4:17 a.m. UTC | #7
Hi Brian,
Thanks for the review

On 2019-01-19 02:34, Brian Norris wrote:
> Hi Sibi,
> 
> On Fri, Jan 18, 2019 at 11:46 AM Sibi Sankar <sibis@codeaurora.org> 
> wrote:
>> On 2019-01-19 00:05, Brian Norris wrote:
>> > On Thu, Jan 17, 2019 at 11:04 PM Sibi Sankar <sibis@codeaurora.org>
>> >> After experimenting with in kernel solutions for
>> >> three revisions and observing problems on graceful
>> >> shutdown usecase,
>> >
>> > What exactly were the problems again? e.g., what were the deficiencies
>> > with having the remoteproc device listen for the REMOTEFS_QMI_SVC_ID
>> > service again? Sorry, but I sort of dropped off on reviewing that
>> > stuff, and now I see this. I'd mildly prefer something that is
>> > actually automatic, but if I'm missing some aspects, I'd like to hear
>> > that. (And, I'd like to see them explained in the commit message, if
>> > this is ever to be merged.)
>> 
>> bringing down the modem after the RMTFS server
>> goes down leaves the modem in limbo (It has a few
>> pending rmtfs transactions that cannot go through)
>> which results in sysmon graceful shutdown failing.
> 
> Makes sense. Probably should be described in a re-send of this patch,
> if we're going with that.
> 
>> And we have to do a modem force-stop to proceed
>> which we want to avoid in graceful shutdown cases.
> 
> Shouldn't you do the "force-stop" in the kernel too? e.g., if rmtfs
> daemon dies without doing a properly-timed stop, then we should still
> force a stop in the kernel, no? Basically, why not do both mechanisms:
> REMOTEFS_QMI_SVC-activated start/stop in the kernel, and manual stop
> (and start? this likely might still be redundant) in the daemon?

yeah we already have that implemented in the kernel
i.e first we try graceful shutdown followed by
force-stop which will just timeout if graceful
shutdown was already completed. We would just call
stop from rmtfs without worrying about the underlying
details.

> 
>> This is overcome by starting rproc mss from rmtfs
>> after REMOTEFS_QMI service is up and stopping
>> rproc mss from rmtfs on SIGKILL/SIGINT and other
>> program error signals before bringing down the
>> RMTFS_QMI service i.e before exiting the rmtfs
>> server loop.
> 
> That's still not really a sure-fire way of doing things. For one, you
> can't catch SIGKILL. Similarly, you can't really clean up from OOM,
> segfault, etc. So it would still be helpful to hook into the QMI
> service notifications in the kernel, I think.

yes we would want a fail safe in the kernel
as well. Sorry I meant SIGTERM instead of
SIGKILL earlier

> 
>> >> switching to controlling the
>> >> remoteproc mss through rmtfs seems to solve all
>> >> the known issues.
>> >
>> > How so? It explicitly does NOT help at all if RMTFS crashes.
>> > Because...who's going to stop the modem in that case? (It works if you
>> > automatically respawn a new RMTFS daemon, to toggle the modem. But
>> > that's kind of cheating, and you can do that anyway, even without this
>> > patch.) On the contrary, your patch *would* resolve that, since the
>> > modem would notice when the RMTFS server goes away, and it would stop
>> > itself.
>> 
>> yeah we would want to mimic what the kernel
>> patch did with the exception of stopping modem
>> before bringing down the rmtfs server (not toggle
>> rproc state but start on rmtfs service up and stop
>> before rmtfs server exit). So in that case we would
>> not want the modem to auto-boot.
> 
> See above. You can't really mimic what the kernel patch was doing
> completely. You can try (which is probably a good idea), but you
> should still have some fallback in the kernel, I expect.

yeah

> 
>> >> https://patchwork.kernel.org/patch/10662395/
>> >>
>> >> we should probably get this merged in, now that
>> >> we are planning to start/stop mss through
>> >> rmtfs.
>> >
>> > Sorry, who's planning to stop mss through rmtfs? Did I miss something?
>> 
>> I have a working patch which I'll soon send
>> upstream for review, after it clears the internal
>> reviews/processes
> 
> OK.
> 
> Brian
Bjorn Andersson Jan. 30, 2019, 9:07 p.m. UTC | #8
On Thu 24 May 12:21 PDT 2018, Ramon Fried wrote:

> Sometimes that rmtfs userspace module is not brought
> up fast enough and the modem crashes.
> disabling automated boot in the driver and triggering
> the boot from user-space sovles the problem.
> 
> Signed-off-by: Ramon Fried <ramon.fried@gmail.com>

While I don't think we have fully settled on the mechanism yet, it's
clear that we don't want this to auto boot using the existing mechanism.

So I picked this patch, with Sibi's ack.

Thanks,
Bjorn

> ---
>  drivers/remoteproc/qcom_q6v5_pil.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_q6v5_pil.c b/drivers/remoteproc/qcom_q6v5_pil.c
> index cbbafdcaaecb..719ee96445b3 100644
> --- a/drivers/remoteproc/qcom_q6v5_pil.c
> +++ b/drivers/remoteproc/qcom_q6v5_pil.c
> @@ -1133,6 +1133,8 @@ static int q6v5_probe(struct platform_device *pdev)
>  		return -ENOMEM;
>  	}
>  
> +	rproc->auto_boot = false;
> +
>  	qproc = (struct q6v5 *)rproc->priv;
>  	qproc->dev = &pdev->dev;
>  	qproc->rproc = rproc;
> -- 
> 2.17.0
>
diff mbox

Patch

diff --git a/drivers/remoteproc/qcom_q6v5_pil.c b/drivers/remoteproc/qcom_q6v5_pil.c
index cbbafdcaaecb..719ee96445b3 100644
--- a/drivers/remoteproc/qcom_q6v5_pil.c
+++ b/drivers/remoteproc/qcom_q6v5_pil.c
@@ -1133,6 +1133,8 @@  static int q6v5_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
+	rproc->auto_boot = false;
+
 	qproc = (struct q6v5 *)rproc->priv;
 	qproc->dev = &pdev->dev;
 	qproc->rproc = rproc;