diff mbox series

[1/2] remoteproc: qcom: q6v5: Update running state before requesting stop

Message ID 20200602163257.26978-1-sibis@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series [1/2] remoteproc: qcom: q6v5: Update running state before requesting stop | expand

Commit Message

Sibi Sankar June 2, 2020, 4:32 p.m. UTC
Sometimes the stop triggers a watchdog rather than a stop-ack. Update
the running state to false on requesting stop to skip the watchdog
instead.

Error Logs:
$ echo stop > /sys/class/remoteproc/remoteproc0/state
ipa 1e40000.ipa: received modem stopping event
remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force stop
qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
ipa 1e40000.ipa: received modem offline event
remoteproc0: stopped remote processor 4080000.remoteproc-modem

Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource handling")
Cc: stable@vger.kernel.org
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/remoteproc/qcom_q6v5.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Evan Green June 2, 2020, 5:44 p.m. UTC | #1
On Tue, Jun 2, 2020 at 9:33 AM Sibi Sankar <sibis@codeaurora.org> wrote:
>
> Sometimes the stop triggers a watchdog rather than a stop-ack. Update
> the running state to false on requesting stop to skip the watchdog
> instead.
>
> Error Logs:
> $ echo stop > /sys/class/remoteproc/remoteproc0/state
> ipa 1e40000.ipa: received modem stopping event
> remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force stop
> qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
> ipa 1e40000.ipa: received modem offline event
> remoteproc0: stopped remote processor 4080000.remoteproc-modem
>
> Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource handling")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> ---

Are you sure you want to tolerate this behavior from MSS? This is a
graceful shutdown, modem shouldn't have a problem completing the
proper handshake. If they do, isn't that a bug on the modem side?

I just worry this will mask real issues that happen during graceful shutdown.
-Evan
Sibi Sankar June 3, 2020, 5:29 a.m. UTC | #2
Evan,
Thanks for taking time to review
the series.

On 2020-06-02 23:14, Evan Green wrote:
> On Tue, Jun 2, 2020 at 9:33 AM Sibi Sankar <sibis@codeaurora.org> 
> wrote:
>> 
>> Sometimes the stop triggers a watchdog rather than a stop-ack. Update
>> the running state to false on requesting stop to skip the watchdog
>> instead.
>> 
>> Error Logs:
>> $ echo stop > /sys/class/remoteproc/remoteproc0/state
>> ipa 1e40000.ipa: received modem stopping event
>> remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force 
>> stop
>> qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
>> ipa 1e40000.ipa: received modem offline event
>> remoteproc0: stopped remote processor 4080000.remoteproc-modem
>> 
>> Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource 
>> handling")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> ---
> 
> Are you sure you want to tolerate this behavior from MSS? This is a
> graceful shutdown, modem shouldn't have a problem completing the
> proper handshake. If they do, isn't that a bug on the modem side?

The graceful shutdown is achieved
though sysmon (enabled using
CONFIG_QCOM_SYSMON). When sysmon is
enabled we get a shutdown-ack when we
try to stop the modem, post which
request stop is a basically a nop.
Request stop is done to force stop
the modem during failure cases (like
rmtfs is not running and so on) and
we do want to mask the wdog that we get
during this scenario ( The locking
already prevents the servicing of the
wdog during shutdown, the check just
prevents the scheduling of crash handler
and err messages associated with it).
Also this check was always present and
was missed during common q6v5 resource
helper migration, hence the unused
running state in mss driver.

> 
> I just worry this will mask real issues that happen during graceful 
> shutdown.
> -Evan
Evan Green June 3, 2020, 10:33 p.m. UTC | #3
On Tue, Jun 2, 2020 at 10:29 PM Sibi Sankar <sibis@codeaurora.org> wrote:
>
> Evan,
> Thanks for taking time to review
> the series.
>
> On 2020-06-02 23:14, Evan Green wrote:
> > On Tue, Jun 2, 2020 at 9:33 AM Sibi Sankar <sibis@codeaurora.org>
> > wrote:
> >>
> >> Sometimes the stop triggers a watchdog rather than a stop-ack. Update
> >> the running state to false on requesting stop to skip the watchdog
> >> instead.
> >>
> >> Error Logs:
> >> $ echo stop > /sys/class/remoteproc/remoteproc0/state
> >> ipa 1e40000.ipa: received modem stopping event
> >> remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force
> >> stop
> >> qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
> >> ipa 1e40000.ipa: received modem offline event
> >> remoteproc0: stopped remote processor 4080000.remoteproc-modem
> >>
> >> Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource
> >> handling")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> >> ---
> >
> > Are you sure you want to tolerate this behavior from MSS? This is a
> > graceful shutdown, modem shouldn't have a problem completing the
> > proper handshake. If they do, isn't that a bug on the modem side?
>
> The graceful shutdown is achieved
> though sysmon (enabled using
> CONFIG_QCOM_SYSMON). When sysmon is
> enabled we get a shutdown-ack when we
> try to stop the modem, post which
> request stop is a basically a nop.
> Request stop is done to force stop
> the modem during failure cases (like
> rmtfs is not running and so on) and
> we do want to mask the wdog that we get
> during this scenario ( The locking
> already prevents the servicing of the
> wdog during shutdown, the check just
> prevents the scheduling of crash handler
> and err messages associated with it).
> Also this check was always present and
> was missed during common q6v5 resource
> helper migration, hence the unused
> running state in mss driver.

So you're saying that the intention of the ->running check already in
q6v5_wdog_interrupt() was to allow either the stop-ack or wdog
interrupt to complete the stop. This patch just fixes a regression
introduced during the refactor.
This patch seems ok to me then. It still sort of seems like a bug that
the modem responds arbitrarily in one of two ways, even to a "harsh"
shutdown request.

I wasn't aware of QCOM_SYSMON. Reading it now, It seems like kind of a
lot... do I really need all this? Can I get by with just remoteproc
stops?
Anyway, for this patch:

Reviewed-by: Evan Green <evgreen@chromium.org>
Sibi Sankar June 4, 2020, 6:50 p.m. UTC | #4
On 2020-06-04 04:03, Evan Green wrote:
> On Tue, Jun 2, 2020 at 10:29 PM Sibi Sankar <sibis@codeaurora.org> 
> wrote:
>> 
>> Evan,
>> Thanks for taking time to review
>> the series.
>> 
>> On 2020-06-02 23:14, Evan Green wrote:
>> > On Tue, Jun 2, 2020 at 9:33 AM Sibi Sankar <sibis@codeaurora.org>
>> > wrote:
>> >>
>> >> Sometimes the stop triggers a watchdog rather than a stop-ack. Update
>> >> the running state to false on requesting stop to skip the watchdog
>> >> instead.
>> >>
>> >> Error Logs:
>> >> $ echo stop > /sys/class/remoteproc/remoteproc0/state
>> >> ipa 1e40000.ipa: received modem stopping event
>> >> remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force
>> >> stop
>> >> qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
>> >> ipa 1e40000.ipa: received modem offline event
>> >> remoteproc0: stopped remote processor 4080000.remoteproc-modem
>> >>
>> >> Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource
>> >> handling")
>> >> Cc: stable@vger.kernel.org
>> >> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> >> ---
>> >
>> > Are you sure you want to tolerate this behavior from MSS? This is a
>> > graceful shutdown, modem shouldn't have a problem completing the
>> > proper handshake. If they do, isn't that a bug on the modem side?
>> 
>> The graceful shutdown is achieved
>> though sysmon (enabled using
>> CONFIG_QCOM_SYSMON). When sysmon is
>> enabled we get a shutdown-ack when we
>> try to stop the modem, post which
>> request stop is a basically a nop.
>> Request stop is done to force stop
>> the modem during failure cases (like
>> rmtfs is not running and so on) and
>> we do want to mask the wdog that we get
>> during this scenario ( The locking
>> already prevents the servicing of the
>> wdog during shutdown, the check just
>> prevents the scheduling of crash handler
>> and err messages associated with it).
>> Also this check was always present and
>> was missed during common q6v5 resource
>> helper migration, hence the unused
>> running state in mss driver.
> 
> So you're saying that the intention of the ->running check already in
> q6v5_wdog_interrupt() was to allow either the stop-ack or wdog
> interrupt to complete the stop. This patch just fixes a regression
> introduced during the refactor.
> This patch seems ok to me then. It still sort of seems like a bug that
> the modem responds arbitrarily in one of two ways, even to a "harsh"
> shutdown request.
> 
> I wasn't aware of QCOM_SYSMON. Reading it now, It seems like kind of a

TL;DR
Sysmon when enabled adds a lookup
for qmi service 43 (Subsystem
control service). When we shutdown
the modem, we send a SSCTL_SHUTDOWN_REQ
to the service and the modem responds
with a shutdown-ack interrupt. If you
have rmtfs running with -v turned on
you can notice pending efs transactions
being completed followed by a bye I guess.

> lot... do I really need all this? Can I get by with just remoteproc
> stops?
> Anyway, for this patch:
> 
> Reviewed-by: Evan Green <evgreen@chromium.org>

Thanks for the review!
Bjorn Andersson July 28, 2020, 6:19 a.m. UTC | #5
On Wed 03 Jun 15:33 PDT 2020, Evan Green wrote:

> On Tue, Jun 2, 2020 at 10:29 PM Sibi Sankar <sibis@codeaurora.org> wrote:
> >
> > Evan,
> > Thanks for taking time to review
> > the series.
> >
> > On 2020-06-02 23:14, Evan Green wrote:
> > > On Tue, Jun 2, 2020 at 9:33 AM Sibi Sankar <sibis@codeaurora.org>
> > > wrote:
> > >>
> > >> Sometimes the stop triggers a watchdog rather than a stop-ack. Update
> > >> the running state to false on requesting stop to skip the watchdog
> > >> instead.
> > >>
> > >> Error Logs:
> > >> $ echo stop > /sys/class/remoteproc/remoteproc0/state
> > >> ipa 1e40000.ipa: received modem stopping event
> > >> remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force
> > >> stop
> > >> qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt
> > >> ipa 1e40000.ipa: received modem offline event
> > >> remoteproc0: stopped remote processor 4080000.remoteproc-modem
> > >>
> > >> Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource
> > >> handling")
> > >> Cc: stable@vger.kernel.org
> > >> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> > >> ---
> > >
> > > Are you sure you want to tolerate this behavior from MSS? This is a
> > > graceful shutdown, modem shouldn't have a problem completing the
> > > proper handshake. If they do, isn't that a bug on the modem side?
> >
> > The graceful shutdown is achieved
> > though sysmon (enabled using
> > CONFIG_QCOM_SYSMON). When sysmon is
> > enabled we get a shutdown-ack when we
> > try to stop the modem, post which
> > request stop is a basically a nop.
> > Request stop is done to force stop
> > the modem during failure cases (like
> > rmtfs is not running and so on) and
> > we do want to mask the wdog that we get
> > during this scenario ( The locking
> > already prevents the servicing of the
> > wdog during shutdown, the check just
> > prevents the scheduling of crash handler
> > and err messages associated with it).
> > Also this check was always present and
> > was missed during common q6v5 resource
> > helper migration, hence the unused
> > running state in mss driver.
> 
> So you're saying that the intention of the ->running check already in
> q6v5_wdog_interrupt() was to allow either the stop-ack or wdog
> interrupt to complete the stop. This patch just fixes a regression
> introduced during the refactor.
> This patch seems ok to me then. It still sort of seems like a bug that
> the modem responds arbitrarily in one of two ways, even to a "harsh"
> shutdown request.
> 

I think the patch properly fixes this regression, but I share your
concern about the fact that we omit an entire category of
shutdown-related crashes from being reported.

The problem I presume with the current behavior is that the shutdown
will race against the crash handler - which might boot the remoteproc up
again while the shutdown is progressing.

So I'm okay with this fix for the immediate problem, but think it would
be nice if we could report the issue appropriately and then finalize the
shutdown.

> I wasn't aware of QCOM_SYSMON. Reading it now, It seems like kind of a
> lot... do I really need all this? Can I get by with just remoteproc
> stops?

It used to be that we set one bit in shared memory and sent an
interrupt and the modem would set another bit and interrupt us back when
it was done shutting down...

> Anyway, for this patch:
> 
> Reviewed-by: Evan Green <evgreen@chromium.org>

Thanks,
Bjorn
diff mbox series

Patch

diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
index 111a442c993c4..fd6fd36268d93 100644
--- a/drivers/remoteproc/qcom_q6v5.c
+++ b/drivers/remoteproc/qcom_q6v5.c
@@ -153,6 +153,8 @@  int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5)
 {
 	int ret;
 
+	q6v5->running = false;
+
 	qcom_smem_state_update_bits(q6v5->state,
 				    BIT(q6v5->stop_bit), BIT(q6v5->stop_bit));