wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot
diff mbox series

Message ID 20200602052533.15048-1-john.stultz@linaro.org
State New
Headers show
Series
  • wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot
Related show

Commit Message

John Stultz June 2, 2020, 5:25 a.m. UTC
Ever since 5.7-rc1, if we call
ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
reboot, resulting in the device getting stuck in the usb crash
debug mode and not coming back up wihthout a hard power off.

This hack avoids the issue by returning early in
ath10k_qmi_event_server_exit().

A better solution is very much desired!

Feedback and suggestions welcome!

Cc: Rakesh Pillai <pillair@qti.qualcomm.com>
Cc: Govind Singh <govinds@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: ath10k@lists.infradead.org
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/net/wireless/ath/ath10k/qmi.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Brian Norris June 2, 2020, 7:16 p.m. UTC | #1
+ Sibi

On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
>
> Ever since 5.7-rc1, if we call
> ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> reboot, resulting in the device getting stuck in the usb crash
> debug mode and not coming back up wihthout a hard power off.
>
> This hack avoids the issue by returning early in
> ath10k_qmi_event_server_exit().
>
> A better solution is very much desired!

Any chance you can bisect what caused this? There are a lot of
non-ath10k pieces involved in this stuff.

Brian
John Stultz June 2, 2020, 7:40 p.m. UTC | #2
On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
>
> + Sibi
>
> On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
> >
> > Ever since 5.7-rc1, if we call
> > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > reboot, resulting in the device getting stuck in the usb crash
> > debug mode and not coming back up wihthout a hard power off.
> >
> > This hack avoids the issue by returning early in
> > ath10k_qmi_event_server_exit().
> >
> > A better solution is very much desired!
>
> Any chance you can bisect what caused this? There are a lot of
> non-ath10k pieces involved in this stuff.

Amit had spent some work on chasing it down to the in kernel qrtr-ns
work, and reported it here:
  https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html

But that discussion seemingly stalled out, so I came up with this hack
to workaround it for us.

thanks
-john
Brian Norris June 2, 2020, 8:04 p.m. UTC | #3
On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@linaro.org> wrote:
> On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
> > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
> > >
> > > Ever since 5.7-rc1, if we call
> > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > reboot, resulting in the device getting stuck in the usb crash
> > > debug mode and not coming back up wihthout a hard power off.
> > >
> > > This hack avoids the issue by returning early in
> > > ath10k_qmi_event_server_exit().
> > >
> > > A better solution is very much desired!
> >
> > Any chance you can bisect what caused this? There are a lot of
> > non-ath10k pieces involved in this stuff.
>
> Amit had spent some work on chasing it down to the in kernel qrtr-ns
> work, and reported it here:
>   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>
> But that discussion seemingly stalled out, so I came up with this hack
> to workaround it for us.

If I'm reading it right, then that means we should revert this stuff
from v5.7-rc1:

0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace

At least, until people can resolve the tail end of that thread. New
features (ath11k, etc.) are not a reason to break existing features
(ath10k/wcn3990).

Brian
Manivannan Sadhasivam June 3, 2020, 12:27 a.m. UTC | #4
On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@linaro.org> wrote:
> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
> > > >
> > > > Ever since 5.7-rc1, if we call
> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > > reboot, resulting in the device getting stuck in the usb crash
> > > > debug mode and not coming back up wihthout a hard power off.
> > > >
> > > > This hack avoids the issue by returning early in
> > > > ath10k_qmi_event_server_exit().
> > > >
> > > > A better solution is very much desired!
> > >
> > > Any chance you can bisect what caused this? There are a lot of
> > > non-ath10k pieces involved in this stuff.
> >
> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
> > work, and reported it here:
> >   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
> >
> > But that discussion seemingly stalled out, so I came up with this hack
> > to workaround it for us.
> 
> If I'm reading it right, then that means we should revert this stuff
> from v5.7-rc1:
> 
> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
> 
> At least, until people can resolve the tail end of that thread. New
> features (ath11k, etc.) are not a reason to break existing features
> (ath10k/wcn3990).

I don't agree with this. If you read through the replies to the bug report,
it is clear that NS migration uncovered a corner case or even a bug. So we
should try to fix that indeed.

Govind: Did you get chance to work on fixing this issue?

Thanks,
Mani

> 
> Brian
Govind Singh June 3, 2020, 10:07 a.m. UTC | #5
Hi Mani,

On 2020-06-03 05:57, Manivannan Sadhasivam wrote:
> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@linaro.org> 
>> wrote:
>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
>> > > >
>> > > > Ever since 5.7-rc1, if we call
>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>> > > > reboot, resulting in the device getting stuck in the usb crash
>> > > > debug mode and not coming back up wihthout a hard power off.
>> > > >
>> > > > This hack avoids the issue by returning early in
>> > > > ath10k_qmi_event_server_exit().
>> > > >
>> > > > A better solution is very much desired!
>> > >
>> > > Any chance you can bisect what caused this? There are a lot of
>> > > non-ath10k pieces involved in this stuff.
>> >
>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>> > work, and reported it here:
>> >   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>> >
>> > But that discussion seemingly stalled out, so I came up with this hack
>> > to workaround it for us.
>> 
>> If I'm reading it right, then that means we should revert this stuff
>> from v5.7-rc1:
>> 
>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>> 
>> At least, until people can resolve the tail end of that thread. New
>> features (ath11k, etc.) are not a reason to break existing features
>> (ath10k/wcn3990).
> 
> I don't agree with this. If you read through the replies to the bug 
> report,
> it is clear that NS migration uncovered a corner case or even a bug. So 
> we
> should try to fix that indeed.
> 
> Govind: Did you get chance to work on fixing this issue?
> 

I have done basic testing by moving msa map/unmap from qmi service 
callbacks to init/de-init path.
I will send patch for review.
Reason for del_server needs to investigated from rproc side.

> Thanks,
> Mani
> 
>> 
>> Brian

Thanks,
Govind
Sibi Sankar June 4, 2020, 6:17 p.m. UTC | #6
On 2020-06-03 15:37, govinds@codeaurora.org wrote:
> Hi Mani,
> 
> On 2020-06-03 05:57, Manivannan Sadhasivam wrote:
>> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@linaro.org> 
>>> wrote:
>>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
>>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
>>> > > >
>>> > > > Ever since 5.7-rc1, if we call
>>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>>> > > > reboot, resulting in the device getting stuck in the usb crash
>>> > > > debug mode and not coming back up wihthout a hard power off.
>>> > > >
>>> > > > This hack avoids the issue by returning early in
>>> > > > ath10k_qmi_event_server_exit().
>>> > > >
>>> > > > A better solution is very much desired!
>>> > >
>>> > > Any chance you can bisect what caused this? There are a lot of
>>> > > non-ath10k pieces involved in this stuff.
>>> >
>>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>>> > work, and reported it here:
>>> >   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>>> >
>>> > But that discussion seemingly stalled out, so I came up with this hack
>>> > to workaround it for us.
>>> 
>>> If I'm reading it right, then that means we should revert this stuff
>>> from v5.7-rc1:
>>> 
>>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>>> 
>>> At least, until people can resolve the tail end of that thread. New
>>> features (ath11k, etc.) are not a reason to break existing features
>>> (ath10k/wcn3990).
>> 
>> I don't agree with this. If you read through the replies to the bug 
>> report,
>> it is clear that NS migration uncovered a corner case or even a bug. 
>> So we
>> should try to fix that indeed.
>> 
>> Govind: Did you get chance to work on fixing this issue?
>> 
> 
> I have done basic testing by moving msa map/unmap from qmi service
> callbacks to init/de-init path.
> I will send patch for review.
> Reason for del_server needs to investigated from rproc side.

Govind,
On receiving SIGTERM, rmtfs would try
to perform a graceful shutdown of the
modem, that should be the source of
the del_server.

> 
>> Thanks,
>> Mani
>> 
>>> 
>>> Brian
> 
> Thanks,
> Govind
Kalle Valo June 8, 2020, 11:17 a.m. UTC | #7
John Stultz <john.stultz@linaro.org> writes:

> Ever since 5.7-rc1, if we call
> ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> reboot, resulting in the device getting stuck in the usb crash
> debug mode and not coming back up wihthout a hard power off.
>
> This hack avoids the issue by returning early in
> ath10k_qmi_event_server_exit().
>
> A better solution is very much desired!
>
> Feedback and suggestions welcome!
>
> Cc: Rakesh Pillai <pillair@qti.qualcomm.com>
> Cc: Govind Singh <govinds@codeaurora.org>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Cc: Amit Pundir <amit.pundir@linaro.org>
> Cc: Brian Norris <briannorris@chromium.org>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: ath10k@lists.infradead.org
> Reported-by: Amit Pundir <amit.pundir@linaro.org>
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Just so you know: as you didn't CC linux-wireless it's not on patchwork
and hence not on my radar. But hopefully we find a better solution to
fix this.
Kalle Valo June 8, 2020, 11:37 a.m. UTC | #8
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:

> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@linaro.org> wrote:
>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@chromium.org> wrote:
>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@linaro.org> wrote:
>> > > >
>> > > > Ever since 5.7-rc1, if we call
>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>> > > > reboot, resulting in the device getting stuck in the usb crash
>> > > > debug mode and not coming back up wihthout a hard power off.
>> > > >
>> > > > This hack avoids the issue by returning early in
>> > > > ath10k_qmi_event_server_exit().
>> > > >
>> > > > A better solution is very much desired!
>> > >
>> > > Any chance you can bisect what caused this? There are a lot of
>> > > non-ath10k pieces involved in this stuff.
>> >
>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>> > work, and reported it here:
>> >   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>> >
>> > But that discussion seemingly stalled out, so I came up with this hack
>> > to workaround it for us.
>> 
>> If I'm reading it right, then that means we should revert this stuff
>> from v5.7-rc1:
>> 
>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>> 
>> At least, until people can resolve the tail end of that thread. New
>> features (ath11k, etc.) are not a reason to break existing features
>> (ath10k/wcn3990).
>
> I don't agree with this. If you read through the replies to the bug report,
> it is clear that NS migration uncovered a corner case or even a bug. So we
> should try to fix that indeed.

I'm with Mani, we should try to fix ath10k instead. Hopefully we can
find a fix soon.

Forcing QCA6390 users to use the userspace qrtr-ns would be bad user
experience, I really would want to avoid that.

Patch
diff mbox series

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 85dce43c5439..ab38562ce1cb 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -854,6 +854,11 @@  static void ath10k_qmi_event_server_exit(struct ath10k_qmi *qmi)
 	struct ath10k *ar = qmi->ar;
 	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
 
+	/*
+	 * HACK: Calling ath10k_qmi_remove_msa_permission causes
+	 * hardware to hard crash on reboot
+	 */
+	return;
 	ath10k_qmi_remove_msa_permission(qmi);
 	ath10k_core_free_board_files(ar);
 	if (!test_bit(ATH10K_SNOC_FLAG_UNREGISTERING, &ar_snoc->flags))