mbox series

[v3,0/3] Enable low power mode when WLAN is not active

Message ID 20221121110359.4652-1-quic_mpubbise@quicinc.com (mailing list archive)
Headers show
Series Enable low power mode when WLAN is not active | expand

Message

Manikanta Pubbisetty Nov. 21, 2022, 11:03 a.m. UTC
Currently, WLAN chip is powered once during driver probe and is kept
ON (powered) always even when WLAN is not active; keeping the chip
powered ON all the time will consume extra power which is not
desirable for battery operated devices. Same is the case with non-WoW
suspend, chip will not be put into low power mode when the system is
suspended resulting in higher battery drain.

Send QMI MODE OFF command to firmware during WiFi OFF to put device
into low power mode.

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16

Manikanta Pubbisetty (3):
  ath11k: Fix double free issue during SRNG deinit
  ath11k: Move hardware initialization logic to start()
  ath11k: Enable low power mode when WLAN is not active
---
V3:
 - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
 - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
 - Fixed other minor issues that were found during code review
 - Spelling corrections in the commit messages

V2:
 - "Enable low power mode when WLAN is not active" has been enabled only for WCN6750
   as the device shutdown and turn-on changes are not same for all chipsets in ath11k.
   A future patch will be sent to enable the logic for other devices.

 - Rebased on ToT

 drivers/net/wireless/ath/ath11k/ahb.c  |  45 ++++
 drivers/net/wireless/ath/ath11k/core.c | 294 ++++++++++++++++++-------
 drivers/net/wireless/ath/ath11k/core.h |  10 +-
 drivers/net/wireless/ath/ath11k/hal.c  |   1 +
 drivers/net/wireless/ath/ath11k/hif.h  |  11 +
 drivers/net/wireless/ath/ath11k/mac.c  |  33 +--
 drivers/net/wireless/ath/ath11k/pci.c  |  26 +++
 drivers/net/wireless/ath/ath11k/qmi.c  |   3 +-
 8 files changed, 311 insertions(+), 112 deletions(-)


base-commit: ef907406320c7af8b6a9e0b140a11ebb3a0fa371

Comments

Kalle Valo Nov. 21, 2022, 4:17 p.m. UTC | #1
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:

> Currently, WLAN chip is powered once during driver probe and is kept
> ON (powered) always even when WLAN is not active; keeping the chip
> powered ON all the time will consume extra power which is not
> desirable for battery operated devices. Same is the case with non-WoW
> suspend, chip will not be put into low power mode when the system is
> suspended resulting in higher battery drain.
>
> Send QMI MODE OFF command to firmware during WiFi OFF to put device
> into low power mode.
>
> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>
> Manikanta Pubbisetty (3):
>   ath11k: Fix double free issue during SRNG deinit
>   ath11k: Move hardware initialization logic to start()
>   ath11k: Enable low power mode when WLAN is not active

Nowadays we use "wifi:" prefix, but I can add it.
Kalle Valo Nov. 22, 2022, 9:11 a.m. UTC | #2
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:

> Currently, WLAN chip is powered once during driver probe and is kept
> ON (powered) always even when WLAN is not active; keeping the chip
> powered ON all the time will consume extra power which is not
> desirable for battery operated devices. Same is the case with non-WoW
> suspend, chip will not be put into low power mode when the system is
> suspended resulting in higher battery drain.
>
> Send QMI MODE OFF command to firmware during WiFi OFF to put device
> into low power mode.
>
> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>
> Manikanta Pubbisetty (3):
>   ath11k: Fix double free issue during SRNG deinit
>   ath11k: Move hardware initialization logic to start()
>   ath11k: Enable low power mode when WLAN is not active
> ---
> V3:
>  - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
>  - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
>  - Fixed other minor issues that were found during code review
>  - Spelling corrections in the commit messages

I still see a crash, immediately after the first rmmod:

Nov 22 11:05:47 nuc2  [  139.378719] rmmod ath11k_pci
Nov 22 11:05:48 nuc2  [  139.892395] general protection fault, probably for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
Nov 22 11:05:48 nuc2  [  139.892453] KASAN: null-ptr-deref in range [0x00000000000001f0-0x00000000000001f7]

Really odd that you don't see it. Unfortunately not able to debug this
further right now.

This is with:

wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
Kalle Valo Nov. 23, 2022, 4:05 p.m. UTC | #3
Kalle Valo <kvalo@kernel.org> writes:

> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
>
>> Currently, WLAN chip is powered once during driver probe and is kept
>> ON (powered) always even when WLAN is not active; keeping the chip
>> powered ON all the time will consume extra power which is not
>> desirable for battery operated devices. Same is the case with non-WoW
>> suspend, chip will not be put into low power mode when the system is
>> suspended resulting in higher battery drain.
>>
>> Send QMI MODE OFF command to firmware during WiFi OFF to put device
>> into low power mode.
>>
>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>>
>> Manikanta Pubbisetty (3):
>>   ath11k: Fix double free issue during SRNG deinit
>>   ath11k: Move hardware initialization logic to start()
>>   ath11k: Enable low power mode when WLAN is not active
>> ---
>> V3:
>>  - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
>>  - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
>>  - Fixed other minor issues that were found during code review
>>  - Spelling corrections in the commit messages
>
> I still see a crash, immediately after the first rmmod:
>
> Nov 22 11:05:47 nuc2  [  139.378719] rmmod ath11k_pci
> Nov 22 11:05:48 nuc2 [ 139.892395] general protection fault, probably
> for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP
> DEBUG_PAGEALLOC KASAN
> Nov 22 11:05:48 nuc2 [ 139.892453] KASAN: null-ptr-deref in range
> [0x00000000000001f0-0x00000000000001f7]
>
> Really odd that you don't see it. Unfortunately not able to debug this
> further right now.
>
> This is with:
>
> wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9

A bit more information how I see the crash. So first I have all modules
loaded:

$ lsmod
Module                  Size  Used by
ath11k_pci             57344  0
ath11k               2015232  1 ath11k_pci
mac80211             3284992  1 ath11k
libarc4                16384  1 mac80211
cfg80211             2494464  2 ath11k,mac80211
qmi_helpers            57344  1 ath11k
qrtr_mhi               20480  0
mhi                   217088  2 ath11k_pci,qrtr_mhi
qrtr                   98304  5 qrtr_mhi
nvme                  122880  3
nvme_core             299008  5 nvme
$

Then I just remove ath11k_pci module and boom:

$ sudo rmmod ath11k_pci

[  153.658409] general protection fault, probably for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN

This happens every time, there doesn't seem to be any randomness on the
behaviour.
Manikanta Pubbisetty Nov. 25, 2022, 4:29 a.m. UTC | #4
On 11/23/2022 9:35 PM, Kalle Valo wrote:
> Kalle Valo <kvalo@kernel.org> writes:
> 
>> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
>>
>>> Currently, WLAN chip is powered once during driver probe and is kept
>>> ON (powered) always even when WLAN is not active; keeping the chip
>>> powered ON all the time will consume extra power which is not
>>> desirable for battery operated devices. Same is the case with non-WoW
>>> suspend, chip will not be put into low power mode when the system is
>>> suspended resulting in higher battery drain.
>>>
>>> Send QMI MODE OFF command to firmware during WiFi OFF to put device
>>> into low power mode.
>>>
>>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>>>
>>> Manikanta Pubbisetty (3):
>>>    ath11k: Fix double free issue during SRNG deinit
>>>    ath11k: Move hardware initialization logic to start()
>>>    ath11k: Enable low power mode when WLAN is not active
>>> ---
>>> V3:
>>>   - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
>>>   - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
>>>   - Fixed other minor issues that were found during code review
>>>   - Spelling corrections in the commit messages
>>
>> I still see a crash, immediately after the first rmmod:
>>
>> Nov 22 11:05:47 nuc2  [  139.378719] rmmod ath11k_pci
>> Nov 22 11:05:48 nuc2 [ 139.892395] general protection fault, probably
>> for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP
>> DEBUG_PAGEALLOC KASAN
>> Nov 22 11:05:48 nuc2 [ 139.892453] KASAN: null-ptr-deref in range
>> [0x00000000000001f0-0x00000000000001f7]
>>
>> Really odd that you don't see it. Unfortunately not able to debug this
>> further right now.
>>
>> This is with:
>>
>> wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
> 
> A bit more information how I see the crash. So first I have all modules
> loaded:
> 
> $ lsmod
> Module                  Size  Used by
> ath11k_pci             57344  0
> ath11k               2015232  1 ath11k_pci
> mac80211             3284992  1 ath11k
> libarc4                16384  1 mac80211
> cfg80211             2494464  2 ath11k,mac80211
> qmi_helpers            57344  1 ath11k
> qrtr_mhi               20480  0
> mhi                   217088  2 ath11k_pci,qrtr_mhi
> qrtr                   98304  5 qrtr_mhi
> nvme                  122880  3
> nvme_core             299008  5 nvme
> $
> 
> Then I just remove ath11k_pci module and boom:
> 
> $ sudo rmmod ath11k_pci
> 
> [  153.658409] general protection fault, probably for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> 
> This happens every time, there doesn't seem to be any randomness on the
> behaviour.
> 

Thanks for the help Kalle, this is exactly what I was doing in my tests. 
Unfortunately, I'm not able to reproduce the problem. I have also tried 
with the exact firmware that you have pointed out. Let me see if I'm 
missing anything.

Thanks,
Manikanta
Kalle Valo Dec. 8, 2022, 7:54 a.m. UTC | #5
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:

> On 11/23/2022 9:35 PM, Kalle Valo wrote:
>
>> Kalle Valo <kvalo@kernel.org> writes:
>>
>>> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
>>>
>>>> Currently, WLAN chip is powered once during driver probe and is kept
>>>> ON (powered) always even when WLAN is not active; keeping the chip
>>>> powered ON all the time will consume extra power which is not
>>>> desirable for battery operated devices. Same is the case with non-WoW
>>>> suspend, chip will not be put into low power mode when the system is
>>>> suspended resulting in higher battery drain.
>>>>
>>>> Send QMI MODE OFF command to firmware during WiFi OFF to put device
>>>> into low power mode.
>>>>
>>>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>>>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>>>>
>>>> Manikanta Pubbisetty (3):
>>>>    ath11k: Fix double free issue during SRNG deinit
>>>>    ath11k: Move hardware initialization logic to start()
>>>>    ath11k: Enable low power mode when WLAN is not active
>>>> ---
>>>> V3:
>>>>   - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
>>>>   - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
>>>>   - Fixed other minor issues that were found during code review
>>>>   - Spelling corrections in the commit messages
>>>
>>> I still see a crash, immediately after the first rmmod:
>>>
>>> Nov 22 11:05:47 nuc2  [  139.378719] rmmod ath11k_pci
>>> Nov 22 11:05:48 nuc2 [ 139.892395] general protection fault, probably
>>> for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP
>>> DEBUG_PAGEALLOC KASAN
>>> Nov 22 11:05:48 nuc2 [ 139.892453] KASAN: null-ptr-deref in range
>>> [0x00000000000001f0-0x00000000000001f7]
>>>
>>> Really odd that you don't see it. Unfortunately not able to debug this
>>> further right now.
>>>
>>> This is with:
>>>
>>> wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
>>
>> A bit more information how I see the crash. So first I have all modules
>> loaded:
>>
>> $ lsmod
>> Module                  Size  Used by
>> ath11k_pci             57344  0
>> ath11k               2015232  1 ath11k_pci
>> mac80211             3284992  1 ath11k
>> libarc4                16384  1 mac80211
>> cfg80211             2494464  2 ath11k,mac80211
>> qmi_helpers            57344  1 ath11k
>> qrtr_mhi               20480  0
>> mhi                   217088  2 ath11k_pci,qrtr_mhi
>> qrtr                   98304  5 qrtr_mhi
>> nvme                  122880  3
>> nvme_core             299008  5 nvme
>> $
>>
>> Then I just remove ath11k_pci module and boom:
>>
>> $ sudo rmmod ath11k_pci
>>
>> [ 153.658409] general protection fault, probably for non-canonical
>> address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> KASAN
>>
>> This happens every time, there doesn't seem to be any randomness on the
>> behaviour.
>>
>
> Thanks for the help Kalle, this is exactly what I was doing in my
> tests. Unfortunately, I'm not able to reproduce the problem. I have
> also tried with the exact firmware that you have pointed out. Let me
> see if I'm missing anything.

I tested this more and patch 3 seems to be the one causing the crash. I
didn't see this when patch 1-2 were applied.

The crash happens in ath11k_dp_process_rxdma_err() in this line:

	srng = &ab->hal.srng_list[err_ring->ring_id];

ab looks sane to me (0xffff88814c960000) but err_ring is set to 0x200.
Does this help?
Manikanta Pubbisetty Dec. 8, 2022, 11:20 a.m. UTC | #6
On 12/8/2022 1:24 PM, Kalle Valo wrote:
> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
> 
>> On 11/23/2022 9:35 PM, Kalle Valo wrote:
>>
>>> Kalle Valo <kvalo@kernel.org> writes:
>>>
>>>> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
>>>>
>>>>> Currently, WLAN chip is powered once during driver probe and is kept
>>>>> ON (powered) always even when WLAN is not active; keeping the chip
>>>>> powered ON all the time will consume extra power which is not
>>>>> desirable for battery operated devices. Same is the case with non-WoW
>>>>> suspend, chip will not be put into low power mode when the system is
>>>>> suspended resulting in higher battery drain.
>>>>>
>>>>> Send QMI MODE OFF command to firmware during WiFi OFF to put device
>>>>> into low power mode.
>>>>>
>>>>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>>>>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
>>>>>
>>>>> Manikanta Pubbisetty (3):
>>>>>     ath11k: Fix double free issue during SRNG deinit
>>>>>     ath11k: Move hardware initialization logic to start()
>>>>>     ath11k: Enable low power mode when WLAN is not active
>>>>> ---
>>>>> V3:
>>>>>    - Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
>>>>>    - Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
>>>>>    - Fixed other minor issues that were found during code review
>>>>>    - Spelling corrections in the commit messages
>>>>
>>>> I still see a crash, immediately after the first rmmod:
>>>>
>>>> Nov 22 11:05:47 nuc2  [  139.378719] rmmod ath11k_pci
>>>> Nov 22 11:05:48 nuc2 [ 139.892395] general protection fault, probably
>>>> for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP
>>>> DEBUG_PAGEALLOC KASAN
>>>> Nov 22 11:05:48 nuc2 [ 139.892453] KASAN: null-ptr-deref in range
>>>> [0x00000000000001f0-0x00000000000001f7]
>>>>
>>>> Really odd that you don't see it. Unfortunately not able to debug this
>>>> further right now.
>>>>
>>>> This is with:
>>>>
>>>> wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
>>>
>>> A bit more information how I see the crash. So first I have all modules
>>> loaded:
>>>
>>> $ lsmod
>>> Module                  Size  Used by
>>> ath11k_pci             57344  0
>>> ath11k               2015232  1 ath11k_pci
>>> mac80211             3284992  1 ath11k
>>> libarc4                16384  1 mac80211
>>> cfg80211             2494464  2 ath11k,mac80211
>>> qmi_helpers            57344  1 ath11k
>>> qrtr_mhi               20480  0
>>> mhi                   217088  2 ath11k_pci,qrtr_mhi
>>> qrtr                   98304  5 qrtr_mhi
>>> nvme                  122880  3
>>> nvme_core             299008  5 nvme
>>> $
>>>
>>> Then I just remove ath11k_pci module and boom:
>>>
>>> $ sudo rmmod ath11k_pci
>>>
>>> [ 153.658409] general protection fault, probably for non-canonical
>>> address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>> KASAN
>>>
>>> This happens every time, there doesn't seem to be any randomness on the
>>> behaviour.
>>>
>>
>> Thanks for the help Kalle, this is exactly what I was doing in my
>> tests. Unfortunately, I'm not able to reproduce the problem. I have
>> also tried with the exact firmware that you have pointed out. Let me
>> see if I'm missing anything.
> 
> I tested this more and patch 3 seems to be the one causing the crash. I
> didn't see this when patch 1-2 were applied.
> 
> The crash happens in ath11k_dp_process_rxdma_err() in this line:
> 
> 	srng = &ab->hal.srng_list[err_ring->ring_id];
> 
> ab looks sane to me (0xffff88814c960000) but err_ring is set to 0x200.
> Does this help?
> 

Thanks for your time and analysis on the bug.

 From the callstack() that you have shared earlier, it looks like 
ath11k_dp_process_rxdma_err() is called from dp_service_srngs which is a
napi poll callback.

To me it looks like the napi handler of the driver is getting called
after the srng resources have been de-initialized during rmmod.

I have closed examined the changes in patch 3 (I'm still doing it). so 
far, I could not find anything that could trigger this kind of a 
problem. Since NAPI is disabled upon rmmod, NAPI poll should not be called.

It's quite not clear if I'm missing something.

Thanks,
Manikanta