diff mbox series

[v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"

Message ID 1578976521-6104-1-git-send-email-zhichen@codeaurora.org (mailing list archive)
State Accepted
Commit a1769bb68a850508a492e3674ab1e5e479b11254
Delegated to: Kalle Valo
Headers show
Series [v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices" | expand

Commit Message

Zhi Chen Jan. 14, 2020, 4:35 a.m. UTC
This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
PCIe hung issue was observed on multiple platforms. The issue was reproduced
when DUT was configured as AP and associated with 50+ STAs.

For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
of the RD/WR access to the HOST MEM.
0 - No split , RAW read/write transfer size from MAC is put out on bus
    as burst length
1 - Split at 256 byte boundary
2,3 - Reserved

With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
issue happened. It broke PCIe spec and caused PCIe stuck. So revert
the default value from 0 to 1.

Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
         QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
         Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040

Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
---
v2: restored 10.2 register configuration
v3: modified commit message
v4: resolved conflicts 
---
 drivers/net/wireless/ath/ath10k/hw.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kalle Valo Jan. 26, 2020, 10:34 a.m. UTC | #1
Zhi Chen <zhichen@codeaurora.org> wrote:

> This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
> PCIe hung issue was observed on multiple platforms. The issue was reproduced
> when DUT was configured as AP and associated with 50+ STAs.
> 
> For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
> of the RD/WR access to the HOST MEM.
> 0 - No split , RAW read/write transfer size from MAC is put out on bus
>     as burst length
> 1 - Split at 256 byte boundary
> 2,3 - Reserved
> 
> With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
> issue happened. It broke PCIe spec and caused PCIe stuck. So revert
> the default value from 0 to 1.
> 
> Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
>          QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
>          Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040
> 
> Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

a1769bb68a85 Revert "ath10k: fix DMA related firmware crashes on multiple devices"
Ben Greear Sept. 8, 2020, 5:48 p.m. UTC | #2
Hello,

Just FYI:  I added this patch to my ath10k-ct driver, and a user reported it causes
regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:

[   21.204868] ath10k_pci 0000:00:00.0: qca9888 hw2.0 target 0x01000000 chip_id 0x00000000 sub 0000:0000
[   21.214437] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   21.233298] ath10k_pci 0000:00:00.0: firmware ver 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features 
mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 
crc32 a00b5f36
[   21.596684] ath10k_pci 0000:00:00.0: board_file api 2 bmi_id 0:20 crc32 5bb32c02[   23.546156] ath10k_pci 0000:00:00.0: unsupported HTC service id: 1536

I'll revert this for the 9888 chipset (at least) in my driver, possibly you need to do similar.

https://github.com/greearb/ath10k-ct/issues/153

Thanks,
Ben

On 1/13/20 8:35 PM, Zhi Chen wrote:
> This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
> PCIe hung issue was observed on multiple platforms. The issue was reproduced
> when DUT was configured as AP and associated with 50+ STAs.
> 
> For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
> of the RD/WR access to the HOST MEM.
> 0 - No split , RAW read/write transfer size from MAC is put out on bus
>      as burst length
> 1 - Split at 256 byte boundary
> 2,3 - Reserved
> 
> With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
> issue happened. It broke PCIe spec and caused PCIe stuck. So revert
> the default value from 0 to 1.
> 
> Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
>           QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
>           Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040
> 
> Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
> ---
> v2: restored 10.2 register configuration
> v3: modified commit message
> v4: resolved conflicts
> ---
>   drivers/net/wireless/ath/ath10k/hw.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> index 21b7a2a..775fd62 100644
> --- a/drivers/net/wireless/ath/ath10k/hw.h
> +++ b/drivers/net/wireless/ath/ath10k/hw.h
> @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
>   
>   #define TARGET_10_4_TX_DBG_LOG_SIZE		1024
>   #define TARGET_10_4_NUM_WDS_ENTRIES		32
> -#define TARGET_10_4_DMA_BURST_SIZE		0
> +#define TARGET_10_4_DMA_BURST_SIZE		1
>   #define TARGET_10_4_MAC_AGGR_DELIM		0
>   #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
>   #define TARGET_10_4_VOW_CONFIG			0
>
Zhi Chen Sept. 9, 2020, 2:04 a.m. UTC | #3
Hi Ben,
   Thanks for your information. The DMA issue is host related. We never 
hit this issue with X86 platform. And it was only seen in stress cases 
with 50+ STAs(association and disassociation repeatedly). What's the 
host platform you are using? And how was the issue reproduced?

Thanks,
Zhi

On 2020-09-09 01:48, Ben Greear wrote:
> Hello,
> 
> Just FYI:  I added this patch to my ath10k-ct driver, and a user
> reported it causes
> regressions on his particular 9888 system when using ath10k-ct wave-2 
> firmware:
> 
> [   21.204868] ath10k_pci 0000:00:00.0: qca9888 hw2.0 target
> 0x01000000 chip_id 0x00000000 sub 0000:0000
> [   21.214437] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1
> tracing 0 dfs 1 testmode 0
> [   21.233298] ath10k_pci 0000:00:00.0: firmware ver
> 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features
> mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT
> crc32 a00b5f36
> [   21.596684] ath10k_pci 0000:00:00.0: board_file api 2 bmi_id 0:20
> crc32 5bb32c02[   23.546156] ath10k_pci 0000:00:00.0: unsupported HTC
> service id: 1536
> 
> I'll revert this for the 9888 chipset (at least) in my driver,
> possibly you need to do similar.
> 
> https://github.com/greearb/ath10k-ct/issues/153
> 
> Thanks,
> Ben
> 
> On 1/13/20 8:35 PM, Zhi Chen wrote:
>> This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
>> PCIe hung issue was observed on multiple platforms. The issue was 
>> reproduced
>> when DUT was configured as AP and associated with 50+ STAs.
>> 
>> For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI 
>> burst size
>> of the RD/WR access to the HOST MEM.
>> 0 - No split , RAW read/write transfer size from MAC is put out on bus
>>      as burst length
>> 1 - Split at 256 byte boundary
>> 2,3 - Reserved
>> 
>> With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary 
>> when
>> issue happened. It broke PCIe spec and caused PCIe stuck. So revert
>> the default value from 0 to 1.
>> 
>> Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
>>           QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
>>           Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040
>> 
>> Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
>> ---
>> v2: restored 10.2 register configuration
>> v3: modified commit message
>> v4: resolved conflicts
>> ---
>>   drivers/net/wireless/ath/ath10k/hw.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
>> b/drivers/net/wireless/ath/ath10k/hw.h
>> index 21b7a2a..775fd62 100644
>> --- a/drivers/net/wireless/ath/ath10k/hw.h
>> +++ b/drivers/net/wireless/ath/ath10k/hw.h
>> @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
>>     #define TARGET_10_4_TX_DBG_LOG_SIZE		1024
>>   #define TARGET_10_4_NUM_WDS_ENTRIES		32
>> -#define TARGET_10_4_DMA_BURST_SIZE		0
>> +#define TARGET_10_4_DMA_BURST_SIZE		1
>>   #define TARGET_10_4_MAC_AGGR_DELIM		0
>>   #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
>>   #define TARGET_10_4_VOW_CONFIG			0
>>
Ben Greear Sept. 9, 2020, 4:02 a.m. UTC | #4
Please see this bug report, and feel free to ask the reporter for more details if you
don't find everything you need there.  Seems a basic ping test reproduces packet loss
in their case...

  https://github.com/greearb/ath10k-ct/issues/153

I don't actually have the platform in question.

Thanks,
Ben

On 9/8/20 7:04 PM, Zhi Chen wrote:
> Hi Ben,
>    Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ 
> STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced?
> 
> Thanks,
> Zhi
> 
> On 2020-09-09 01:48, Ben Greear wrote:
>> Hello,
>>
>> Just FYI:  I added this patch to my ath10k-ct driver, and a user
>> reported it causes
>> regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:
>>
>> [   21.204868] ath10k_pci 0000:00:00.0: qca9888 hw2.0 target
>> 0x01000000 chip_id 0x00000000 sub 0000:0000
>> [   21.214437] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1
>> tracing 0 dfs 1 testmode 0
>> [   21.233298] ath10k_pci 0000:00:00.0: firmware ver
>> 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features
>> mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 
>>
>> crc32 a00b5f36
>> [   21.596684] ath10k_pci 0000:00:00.0: board_file api 2 bmi_id 0:20
>> crc32 5bb32c02[   23.546156] ath10k_pci 0000:00:00.0: unsupported HTC
>> service id: 1536
>>
>> I'll revert this for the 9888 chipset (at least) in my driver,
>> possibly you need to do similar.
>>
>> https://github.com/greearb/ath10k-ct/issues/153
>>
>> Thanks,
>> Ben
>>
>> On 1/13/20 8:35 PM, Zhi Chen wrote:
>>> This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
>>> PCIe hung issue was observed on multiple platforms. The issue was reproduced
>>> when DUT was configured as AP and associated with 50+ STAs.
>>>
>>> For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
>>> of the RD/WR access to the HOST MEM.
>>> 0 - No split , RAW read/write transfer size from MAC is put out on bus
>>>      as burst length
>>> 1 - Split at 256 byte boundary
>>> 2,3 - Reserved
>>>
>>> With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
>>> issue happened. It broke PCIe spec and caused PCIe stuck. So revert
>>> the default value from 0 to 1.
>>>
>>> Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
>>>           QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
>>>           Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040
>>>
>>> Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
>>> ---
>>> v2: restored 10.2 register configuration
>>> v3: modified commit message
>>> v4: resolved conflicts
>>> ---
>>>   drivers/net/wireless/ath/ath10k/hw.h | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
>>> index 21b7a2a..775fd62 100644
>>> --- a/drivers/net/wireless/ath/ath10k/hw.h
>>> +++ b/drivers/net/wireless/ath/ath10k/hw.h
>>> @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
>>>     #define TARGET_10_4_TX_DBG_LOG_SIZE        1024
>>>   #define TARGET_10_4_NUM_WDS_ENTRIES        32
>>> -#define TARGET_10_4_DMA_BURST_SIZE        0
>>> +#define TARGET_10_4_DMA_BURST_SIZE        1
>>>   #define TARGET_10_4_MAC_AGGR_DELIM        0
>>>   #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
>>>   #define TARGET_10_4_VOW_CONFIG            0
>>>
>
Ben Greear Sept. 15, 2020, 8 p.m. UTC | #5
Hello Zhi,

Do you know of any ways to detect in the driver what platforms need your patch and what ones
break with it?  Otherwise, we're stuck with external config (which is what I added
so far as work-around).

Thanks,
Ben

On 9/8/20 9:02 PM, Ben Greear wrote:
> Please see this bug report, and feel free to ask the reporter for more details if you
> don't find everything you need there.  Seems a basic ping test reproduces packet loss
> in their case...
> 
>   https://github.com/greearb/ath10k-ct/issues/153
> 
> I don't actually have the platform in question.
> 
> Thanks,
> Ben
> 
> On 9/8/20 7:04 PM, Zhi Chen wrote:
>> Hi Ben,
>>    Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ 
>> STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced?
>>
>> Thanks,
>> Zhi
>>
>> On 2020-09-09 01:48, Ben Greear wrote:
>>> Hello,
>>>
>>> Just FYI:  I added this patch to my ath10k-ct driver, and a user
>>> reported it causes
>>> regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:
>>>
>>> [   21.204868] ath10k_pci 0000:00:00.0: qca9888 hw2.0 target
>>> 0x01000000 chip_id 0x00000000 sub 0000:0000
>>> [   21.214437] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1
>>> tracing 0 dfs 1 testmode 0
>>> [   21.233298] ath10k_pci 0000:00:00.0: firmware ver
>>> 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features
>>> mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 
>>>
>>> crc32 a00b5f36
>>> [   21.596684] ath10k_pci 0000:00:00.0: board_file api 2 bmi_id 0:20
>>> crc32 5bb32c02[   23.546156] ath10k_pci 0000:00:00.0: unsupported HTC
>>> service id: 1536
>>>
>>> I'll revert this for the 9888 chipset (at least) in my driver,
>>> possibly you need to do similar.
>>>
>>> https://github.com/greearb/ath10k-ct/issues/153
>>>
>>> Thanks,
>>> Ben
>>>
>>> On 1/13/20 8:35 PM, Zhi Chen wrote:
>>>> This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
>>>> PCIe hung issue was observed on multiple platforms. The issue was reproduced
>>>> when DUT was configured as AP and associated with 50+ STAs.
>>>>
>>>> For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
>>>> of the RD/WR access to the HOST MEM.
>>>> 0 - No split , RAW read/write transfer size from MAC is put out on bus
>>>>      as burst length
>>>> 1 - Split at 256 byte boundary
>>>> 2,3 - Reserved
>>>>
>>>> With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
>>>> issue happened. It broke PCIe spec and caused PCIe stuck. So revert
>>>> the default value from 0 to 1.
>>>>
>>>> Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
>>>>           QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
>>>>           Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040
>>>>
>>>> Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
>>>> ---
>>>> v2: restored 10.2 register configuration
>>>> v3: modified commit message
>>>> v4: resolved conflicts
>>>> ---
>>>>   drivers/net/wireless/ath/ath10k/hw.h | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
>>>> index 21b7a2a..775fd62 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/hw.h
>>>> +++ b/drivers/net/wireless/ath/ath10k/hw.h
>>>> @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
>>>>     #define TARGET_10_4_TX_DBG_LOG_SIZE        1024
>>>>   #define TARGET_10_4_NUM_WDS_ENTRIES        32
>>>> -#define TARGET_10_4_DMA_BURST_SIZE        0
>>>> +#define TARGET_10_4_DMA_BURST_SIZE        1
>>>>   #define TARGET_10_4_MAC_AGGR_DELIM        0
>>>>   #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
>>>>   #define TARGET_10_4_VOW_CONFIG            0
>>>>
>>
> 
>
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 21b7a2a..775fd62 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -816,7 +816,7 @@  ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
 
 #define TARGET_10_4_TX_DBG_LOG_SIZE		1024
 #define TARGET_10_4_NUM_WDS_ENTRIES		32
-#define TARGET_10_4_DMA_BURST_SIZE		0
+#define TARGET_10_4_DMA_BURST_SIZE		1
 #define TARGET_10_4_MAC_AGGR_DELIM		0
 #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
 #define TARGET_10_4_VOW_CONFIG			0