diff mbox series

wifi: ath11k: fix boot failure with one MSI vector

Message ID 20230601033840.2997-1-quic_bqiang@quicinc.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series wifi: ath11k: fix boot failure with one MSI vector | expand

Commit Message

Baochen Qiang June 1, 2023, 3:38 a.m. UTC
Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
PCI common code") breaks with one MSI vector because it moves
affinity setting after IRQ request, see below log:

[ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
[ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
[ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
[ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110

The detail is, if do affinity request after IRQ activated,
which is done in request_irq(), kernel caches that request and
returns success directly. Later when a subsequent MHI interrupt is
fired, kernel will do the real affinity setting work, as a result,
changs the MSI vector. However at that time host has configured
old vector to hardware, so host never receives CE or DP interrupts.

Fix it by setting affinity before registering MHI controller
where host is, for the first time, doing IRQ request.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3

Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
---
 drivers/net/wireless/ath/ath11k/pci.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)


base-commit: 4ab7f08db5310ded48a5c1f3ec3f2e177ba6b1c2

Comments

Kalle Valo June 9, 2023, 12:26 p.m. UTC | #1
Baochen Qiang <quic_bqiang@quicinc.com> writes:

> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
> PCI common code") breaks with one MSI vector because it moves
> affinity setting after IRQ request, see below log:
>
> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>
> The detail is, if do affinity request after IRQ activated,
> which is done in request_irq(), kernel caches that request and
> returns success directly. Later when a subsequent MHI interrupt is
> fired, kernel will do the real affinity setting work, as a result,
> changs the MSI vector. However at that time host has configured
> old vector to hardware, so host never receives CE or DP interrupts.
>
> Fix it by setting affinity before registering MHI controller
> where host is, for the first time, doing IRQ request.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>
> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>

I'm worried if this breaks WCN6750 support. Manikanta, would able to
test this patch on WCN6750 and let us know if breaks anything?
Baochen Qiang Aug. 7, 2023, 7:14 a.m. UTC | #2
On 6/9/2023 8:26 PM, Kalle Valo wrote:
> Baochen Qiang <quic_bqiang@quicinc.com> writes:
>
>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>> PCI common code") breaks with one MSI vector because it moves
>> affinity setting after IRQ request, see below log:
>>
>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>
>> The detail is, if do affinity request after IRQ activated,
>> which is done in request_irq(), kernel caches that request and
>> returns success directly. Later when a subsequent MHI interrupt is
>> fired, kernel will do the real affinity setting work, as a result,
>> changs the MSI vector. However at that time host has configured
>> old vector to hardware, so host never receives CE or DP interrupts.
>>
>> Fix it by setting affinity before registering MHI controller
>> where host is, for the first time, doing IRQ request.
>>
>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>
>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
> I'm worried if this breaks WCN6750 support. Manikanta, would able to
> test this patch on WCN6750 and let us know if breaks anything?
Hi Manikanta, could you help test this patch on WCN6750?
Manikanta Pubbisetty Aug. 7, 2023, 7:23 a.m. UTC | #3
On 8/7/2023 12:44 PM, Baochen Qiang wrote:
> 
> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>> Baochen Qiang <quic_bqiang@quicinc.com> writes:
>>
>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>> PCI common code") breaks with one MSI vector because it moves
>>> affinity setting after IRQ request, see below log:
>>>
>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control 
>>> response completion, polling..
>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>
>>> The detail is, if do affinity request after IRQ activated,
>>> which is done in request_irq(), kernel caches that request and
>>> returns success directly. Later when a subsequent MHI interrupt is
>>> fired, kernel will do the real affinity setting work, as a result,
>>> changs the MSI vector. However at that time host has configured
>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>
>>> Fix it by setting affinity before registering MHI controller
>>> where host is, for the first time, doing IRQ request.
>>>
>>> Tested-on: WCN6855 hw2.0 PCI 
>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>
>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI 
>>> common code")
>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>> test this patch on WCN6750 and let us know if breaks anything?
> Hi Manikanta, could you help test this patch on WCN6750?

Sure, we will test and update this here.

Thanks,
Manikanta
Manikanta Pubbisetty Aug. 7, 2023, 10:16 a.m. UTC | #4
On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote:
> On 8/7/2023 12:44 PM, Baochen Qiang wrote:
>>
>> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>>> Baochen Qiang <quic_bqiang@quicinc.com> writes:
>>>
>>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>>> PCI common code") breaks with one MSI vector because it moves
>>>> affinity setting after IRQ request, see below log:
>>>>
>>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control 
>>>> response completion, polling..
>>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>>
>>>> The detail is, if do affinity request after IRQ activated,
>>>> which is done in request_irq(), kernel caches that request and
>>>> returns success directly. Later when a subsequent MHI interrupt is
>>>> fired, kernel will do the real affinity setting work, as a result,
>>>> changs the MSI vector. However at that time host has configured
>>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>>
>>>> Fix it by setting affinity before registering MHI controller
>>>> where host is, for the first time, doing IRQ request.
>>>>
>>>> Tested-on: WCN6855 hw2.0 PCI 
>>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>>
>>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI 
>>>> common code")
>>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
>>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>>> test this patch on WCN6750 and let us know if breaks anything?
>> Hi Manikanta, could you help test this patch on WCN6750?
> 
> Sure, we will test and update this here.
> 

No impact on WCN6750.

Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>


Thanks,
Manikanta
Baochen Qiang Aug. 7, 2023, 10:29 a.m. UTC | #5
On 8/7/2023 6:16 PM, Manikanta Pubbisetty wrote:
> On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote:
>> On 8/7/2023 12:44 PM, Baochen Qiang wrote:
>>>
>>> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>>>> Baochen Qiang <quic_bqiang@quicinc.com> writes:
>>>>
>>>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>>>> PCI common code") breaks with one MSI vector because it moves
>>>>> affinity setting after IRQ request, see below log:
>>>>>
>>>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control 
>>>>> response completion, polling..
>>>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: 
>>>>> -110
>>>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>>>
>>>>> The detail is, if do affinity request after IRQ activated,
>>>>> which is done in request_irq(), kernel caches that request and
>>>>> returns success directly. Later when a subsequent MHI interrupt is
>>>>> fired, kernel will do the real affinity setting work, as a result,
>>>>> changs the MSI vector. However at that time host has configured
>>>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>>>
>>>>> Fix it by setting affinity before registering MHI controller
>>>>> where host is, for the first time, doing IRQ request.
>>>>>
>>>>> Tested-on: WCN6855 hw2.0 PCI 
>>>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>>>
>>>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI 
>>>>> common code")
>>>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
>>>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>>>> test this patch on WCN6750 and let us know if breaks anything?
>>> Hi Manikanta, could you help test this patch on WCN6750?
>>
>> Sure, we will test and update this here.
>>
>
> No impact on WCN6750.
>
> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
>
>
> Thanks,
> Manikanta

Thanks Mani.

Hi Kalle, any more comments?
Jeff Johnson Aug. 7, 2023, 3:29 p.m. UTC | #6
On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
> No impact on WCN6750.
> 
> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>

Can you provide a Tested-on: flag that Kalle can add to the commit text?

> 
> 
> Thanks,
> Manikanta
Manikanta Pubbisetty Aug. 8, 2023, 5:07 a.m. UTC | #7
On 8/7/2023 8:59 PM, Jeff Johnson wrote:
> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
>> No impact on WCN6750.
>>
>> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
> 
> Can you provide a Tested-on: flag that Kalle can add to the commit text?
> 

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1

Thanks,
Manikanta
Kalle Valo Aug. 22, 2023, 12:50 p.m. UTC | #8
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:

> On 8/7/2023 8:59 PM, Jeff Johnson wrote:
>> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
>>> No impact on WCN6750.
>>>
>>> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
>> Can you provide a Tested-on: flag that Kalle can add to the commit
>> text?
>> 
>
> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1

Excellent, thanks. The more Tested-on tags I see on the patch the more
confidence I have for the patch.
Jeff Johnson Aug. 30, 2023, 10:12 p.m. UTC | #9
On 8/22/2023 5:50 AM, Kalle Valo wrote:
> Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes:
> 
>> On 8/7/2023 8:59 PM, Jeff Johnson wrote:
>>> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
>>>> No impact on WCN6750.
>>>>
>>>> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
>>> Can you provide a Tested-on: flag that Kalle can add to the commit
>>> text?
>>>
>>
>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1
> 
> Excellent, thanks. The more Tested-on tags I see on the patch the more
> confidence I have for the patch.
> 

This fixes my spurious bootup failure :)

Tested-on: WCN6855 hw2.1 PCI 
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
Kalle Valo Sept. 6, 2023, 3:12 p.m. UTC | #10
Baochen Qiang <quic_bqiang@quicinc.com> wrote:

> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
> PCI common code") breaks with one MSI vector because it moves
> affinity setting after IRQ request, see below log:
> 
> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
> 
> The detail is, if do affinity request after IRQ activated,
> which is done in request_irq(), kernel caches that request and
> returns success directly. Later when a subsequent MHI interrupt is
> fired, kernel will do the real affinity setting work, as a result,
> changs the MSI vector. However at that time host has configured
> old vector to hardware, so host never receives CE or DP interrupts.
> 
> Fix it by setting affinity before registering MHI controller
> where host is, for the first time, doing IRQ request.
> 
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
> 
> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>

Fails to apply, please rebase.

error: patch failed: drivers/net/wireless/ath/ath11k/pci.c:888
error: drivers/net/wireless/ath/ath11k/pci.c: patch does not apply
stg import: Diff does not apply cleanly

Patch set to Changes Requested.
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
index 7b33731a50ee..a67576226380 100644
--- a/drivers/net/wireless/ath/ath11k/pci.c
+++ b/drivers/net/wireless/ath/ath11k/pci.c
@@ -850,10 +850,16 @@  static int ath11k_pci_probe(struct pci_dev *pdev,
 	if (ret)
 		goto err_pci_disable_msi;
 
+	ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0));
+	if (ret) {
+		ath11k_err(ab, "failed to set irq affinity %d\n", ret);
+		goto err_pci_disable_msi;
+	}
+
 	ret = ath11k_mhi_register(ab_pci);
 	if (ret) {
 		ath11k_err(ab, "failed to register mhi: %d\n", ret);
-		goto err_pci_disable_msi;
+		goto err_irq_affinity_cleanup;
 	}
 
 	ret = ath11k_hal_srng_init(ab);
@@ -874,12 +880,6 @@  static int ath11k_pci_probe(struct pci_dev *pdev,
 		goto err_ce_free;
 	}
 
-	ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0));
-	if (ret) {
-		ath11k_err(ab, "failed to set irq affinity %d\n", ret);
-		goto err_free_irq;
-	}
-
 	/* kernel may allocate a dummy vector before request_irq and
 	 * then allocate a real vector when request_irq is called.
 	 * So get msi_data here again to avoid spurious interrupt
@@ -888,19 +888,16 @@  static int ath11k_pci_probe(struct pci_dev *pdev,
 	ret = ath11k_pci_config_msi_data(ab_pci);
 	if (ret) {
 		ath11k_err(ab, "failed to config msi_data: %d\n", ret);
-		goto err_irq_affinity_cleanup;
+		goto err_free_irq;
 	}
 
 	ret = ath11k_core_init(ab);
 	if (ret) {
 		ath11k_err(ab, "failed to init core: %d\n", ret);
-		goto err_irq_affinity_cleanup;
+		goto err_free_irq;
 	}
 	return 0;
 
-err_irq_affinity_cleanup:
-	ath11k_pci_set_irq_affinity_hint(ab_pci, NULL);
-
 err_free_irq:
 	ath11k_pcic_free_irq(ab);
 
@@ -913,6 +910,9 @@  static int ath11k_pci_probe(struct pci_dev *pdev,
 err_mhi_unregister:
 	ath11k_mhi_unregister(ab_pci);
 
+err_irq_affinity_cleanup:
+	ath11k_pci_set_irq_affinity_hint(ab_pci, NULL);
+
 err_pci_disable_msi:
 	ath11k_pci_free_msi(ab_pci);