diff mbox series

[v4,3/4] ath11k: add support for device recovery for QCA6390

Message ID 20211116041522.23529-4-quic_wgong@quicinc.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series ath11k: add feature for device recovery | expand

Commit Message

Wen Gong Nov. 16, 2021, 4:15 a.m. UTC
Currently ath11k has device recovery logic, it is introduced by this
patch "ath11k: Add support for subsystem recovery" which is upstream
by https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath11k-bringup&id=3a7b4838b6f6f234239f263ef3dc02e612a083ad.

The patch is for AHB devices such as IPQ8074, it has remote proc module
which is used to download the firmware and boots the processor which
firmware is running on. If firmware crashed, remote proc module will
detect it and download and boot firmware again. Below command will
trigger a firmware crash, and then user can test feature of device
recovery.

Test command:
echo assert > /sys/kernel/debug/ath11k/qca6390\ hw2.0/simulate_fw_crash

Unfortunately, QCA6390 is PCIe bus, it does not have the remote proc
module, it use mhi module to communicate between firmware and ath11k.
So ath11k does not support device recovery for QCA6390 currently.

This patch is to add the extra logic which is different for QCA6390.
When firmware crashed, MHI_CB_EE_RDDM event will be indicate by
firmware and then ath11k_mhi_op_status_cb which is the callback of
mhi_controller will receive the MHI_CB_EE_RDDM event, then ath11k
will start to do recovery process, ath11k_core_reset() calls
ath11k_hif_power_down()/ath11k_hif_power_up(), then the mhi/ath11k
will start to download and boot firmware. There are some logic to
avoid deadloop recovery and two simultaneous recovery operations.
And because it has muti-radios for the soc, so it add some logic
in ath11k_mac_op_reconfig_complete() to make sure all radios has
reconfig complete and then complete the device recovery.

Also it add workqueue_aux, because ab->workqueue is used when receive
ATH11K_QMI_EVENT_FW_READY in recovery process(queue_work(ab->workqueue,
&ab->restart_work)), and ath11k_core_reset will wait for max
ATH11K_RESET_TIMEOUT_HZ for the previous restart_work finished, if
ath11k_core_reset also queued in ab->workqueue, then it will delay
restart_work of previous recovery and lead previous recovery fail.

ath11k recovery success for QCA6390 after apply this patch.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
---
 drivers/net/wireless/ath/ath11k/core.c | 67 ++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath11k/core.h | 13 +++++
 drivers/net/wireless/ath/ath11k/mac.c  | 18 +++++++
 drivers/net/wireless/ath/ath11k/mhi.c  | 33 +++++++++++++
 drivers/net/wireless/ath/ath11k/pci.c  |  3 ++
 5 files changed, 134 insertions(+)

Comments

Sven Eckelmann Nov. 16, 2021, 8:15 a.m. UTC | #1
On Tuesday, 16 November 2021 05:15:21 CET Wen Gong wrote:
> Currently ath11k has device recovery logic, it is introduced by this
> patch "ath11k: Add support for subsystem recovery" which is upstream
> by https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath11k-bringup&id=3a7b4838b6f6f234239f263ef3dc02e612a083ad.

https://www.kernel.org/doc/html/v5.15/process/submitting-patches.html#describe-your-changes

Please search for "If you want to refer to a specific commit"
And this commit you referenced is definitely not the upstream commit.
It was only part of Kalle's repository. The code was only upstreamed
with commit d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax
devices").


Btw. another thing which I see again and again in these patches:

> --- a/drivers/net/wireless/ath/ath11k/core.h
> +++ b/drivers/net/wireless/ath/ath11k/core.h
> @@ -39,6 +39,10 @@
[...]
> +       bool is_reset;

https://www.kernel.org/doc/html/v5.15/process/coding-style.html#using-bool

"If a structure has many true/false values, consider consolidating them into a 
bitfield with 1 bit members, or using an appropriate fixed width type, such as 
u8."

There are more verbose mails from Linus Torvalds where he points out the 
problems of bools in structs. See for example struct ath11k_skb_rxcb. At
the moment, it looks like this:

    struct ath11k_skb_rxcb {
    	dma_addr_t paddr;
    	bool is_first_msdu;
    	bool is_last_msdu;
    	bool is_continuation;
    	bool is_mcbc;
    	bool is_eapol_tkip;
    	struct hal_rx_desc *rx_desc;
    	u8 err_rel_src;
    	u8 err_code;
    	u8 mac_id;
    	u8 unmapped;
    	u8 is_frag;
    	u8 tid;
    	u16 peer_id;
    	u16 seq_no;
    	struct napi_struct *napi;
    };

Compiled for IPQ60xx, it would end up as:

    $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_skb_rxcb
    struct ath11k_skb_rxcb {
            dma_addr_t                 paddr;                /*     0     8 */
            bool                       is_first_msdu;        /*     8     1 */
            bool                       is_last_msdu;         /*     9     1 */
            bool                       is_continuation;      /*    10     1 */
            bool                       is_mcbc;              /*    11     1 */
            bool                       is_eapol_tkip;        /*    12     1 */
    
            /* XXX 3 bytes hole, try to pack */
    
            struct hal_rx_desc *       rx_desc;              /*    16     8 */
            u8                         err_rel_src;          /*    24     1 */
            u8                         err_code;             /*    25     1 */
            u8                         mac_id;               /*    26     1 */
            u8                         unmapped;             /*    27     1 */
            u8                         is_frag;              /*    28     1 */
            u8                         tid;                  /*    29     1 */
            u16                        peer_id;              /*    30     2 */
            u16                        seq_no;               /*    32     2 */
    
            /* XXX 6 bytes hole, try to pack */
    
            struct napi_struct *       napi;                 /*    40     8 */
    
            /* size: 48, cachelines: 1, members: 16 */
            /* sum members: 39, holes: 2, sum holes: 9 */
            /* last cacheline: 48 bytes */
    };

After changing it to u8 ....:1 and reorganizing the structure:

    $ pahole drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_skb_rxcb -R
    struct ath11k_skb_rxcb {
            dma_addr_t                 paddr;                /*     0     8 */
            struct hal_rx_desc *       rx_desc;              /*     8     8 */
            struct napi_struct *       napi;                 /*    16     8 */
            u8                         err_rel_src;          /*    24     1 */
            u8                         err_code;             /*    25     1 */
            u8                         mac_id;               /*    26     1 */
            u8                         unmapped;             /*    27     1 */
            u8                         is_frag;              /*    28     1 */
            u8                         tid;                  /*    29     1 */
            u8                         is_first_msdu:1;      /*    30: 0  1 */
            u8                         is_last_msdu:1;       /*    30: 1  1 */
            u8                         is_continuation:1;    /*    30: 2  1 */
            u8                         is_mcbc:1;            /*    30: 3  1 */
            u8                         is_eapol_tkip:1;      /*    30: 4  1 */
    
            /* XXX 3 bits hole, try to pack */
            /* XXX 1 byte hole, try to pack */
    
            u16                        peer_id;              /*    32     2 */
            u16                        seq_no;               /*    34     2 */
    
            /* size: 40, cachelines: 1, members: 16 */
            /* sum members: 34, holes: 1, sum holes: 1 */
            /* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
            /* padding: 4 */
            /* last cacheline: 40 bytes */
    };



Or ath11k_bus_params:

    $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_bus_params
    struct ath11k_bus_params {
            bool                       mhi_support;          /*     0     1 */
            bool                       m3_fw_support;        /*     1     1 */
            bool                       fixed_bdf_addr;       /*     2     1 */
            bool                       fixed_mem_region;     /*     3     1 */
            bool                       static_window_map;    /*     4     1 */
    
            /* size: 5, cachelines: 1, members: 5 */
            /* last cacheline: 5 bytes */
    };

After changing it to an "u8 ....:1":

    $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_bus_params
    struct ath11k_bus_params {
            u8                         mhi_support:1;        /*     0: 0  1 */
            u8                         m3_fw_support:1;      /*     0: 1  1 */
            u8                         fixed_bdf_addr:1;     /*     0: 2  1 */
            u8                         fixed_mem_region:1;   /*     0: 3  1 */
            u8                         static_window_map:1;  /*     0: 4  1 */
    
            /* size: 1, cachelines: 1, members: 5 */
            /* bit_padding: 3 bits */
            /* last cacheline: 1 bytes */
    };


There are even better examples. ath11k_hw_params will for example take 
currently 200 bytes. With a little reordering and adjusting of bools,
it can easily be reduced to 152 bytes. But other structures might
have more impact because they are used more often.


Kind regards,
	Sven
Kalle Valo Nov. 17, 2021, 8:12 a.m. UTC | #2
Sven Eckelmann <sven@narfation.org> writes:

> On Tuesday, 16 November 2021 05:15:21 CET Wen Gong wrote:
>> Currently ath11k has device recovery logic, it is introduced by this
>> patch "ath11k: Add support for subsystem recovery" which is upstream
>> by
>> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath11k-bringup&id=3a7b4838b6f6f234239f263ef3dc02e612a083ad.
>
> https://www.kernel.org/doc/html/v5.15/process/submitting-patches.html#describe-your-changes
>
> Please search for "If you want to refer to a specific commit"
> And this commit you referenced is definitely not the upstream commit.
> It was only part of Kalle's repository. The code was only upstreamed
> with commit d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax
> devices").
>
>
> Btw. another thing which I see again and again in these patches:
>
>> --- a/drivers/net/wireless/ath/ath11k/core.h
>> +++ b/drivers/net/wireless/ath/ath11k/core.h
>> @@ -39,6 +39,10 @@
> [...]
>> +       bool is_reset;
>
> https://www.kernel.org/doc/html/v5.15/process/coding-style.html#using-bool
>
> "If a structure has many true/false values, consider consolidating them into a 
> bitfield with 1 bit members, or using an appropriate fixed width type, such as 
> u8."
>
> There are more verbose mails from Linus Torvalds where he points out the 
> problems of bools in structs. See for example struct ath11k_skb_rxcb. At
> the moment, it looks like this:
>
>     struct ath11k_skb_rxcb {
>     	dma_addr_t paddr;
>     	bool is_first_msdu;
>     	bool is_last_msdu;
>     	bool is_continuation;
>     	bool is_mcbc;
>     	bool is_eapol_tkip;
>     	struct hal_rx_desc *rx_desc;
>     	u8 err_rel_src;
>     	u8 err_code;
>     	u8 mac_id;
>     	u8 unmapped;
>     	u8 is_frag;
>     	u8 tid;
>     	u16 peer_id;
>     	u16 seq_no;
>     	struct napi_struct *napi;
>     };
>
> Compiled for IPQ60xx, it would end up as:
>
>     $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_skb_rxcb
>     struct ath11k_skb_rxcb {
>             dma_addr_t                 paddr;                /*     0     8 */
>             bool                       is_first_msdu;        /*     8     1 */
>             bool                       is_last_msdu;         /*     9     1 */
>             bool                       is_continuation;      /*    10     1 */
>             bool                       is_mcbc;              /*    11     1 */
>             bool                       is_eapol_tkip;        /*    12     1 */
>     
>             /* XXX 3 bytes hole, try to pack */
>     
>             struct hal_rx_desc *       rx_desc;              /*    16     8 */
>             u8                         err_rel_src;          /*    24     1 */
>             u8                         err_code;             /*    25     1 */
>             u8                         mac_id;               /*    26     1 */
>             u8                         unmapped;             /*    27     1 */
>             u8                         is_frag;              /*    28     1 */
>             u8                         tid;                  /*    29     1 */
>             u16                        peer_id;              /*    30     2 */
>             u16                        seq_no;               /*    32     2 */
>     
>             /* XXX 6 bytes hole, try to pack */
>     
>             struct napi_struct *       napi;                 /*    40     8 */
>     
>             /* size: 48, cachelines: 1, members: 16 */
>             /* sum members: 39, holes: 2, sum holes: 9 */
>             /* last cacheline: 48 bytes */
>     };
>
> After changing it to u8 ....:1 and reorganizing the structure:
>
>     $ pahole drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_skb_rxcb -R
>     struct ath11k_skb_rxcb {
>             dma_addr_t                 paddr;                /*     0     8 */
>             struct hal_rx_desc *       rx_desc;              /*     8     8 */
>             struct napi_struct *       napi;                 /*    16     8 */
>             u8                         err_rel_src;          /*    24     1 */
>             u8                         err_code;             /*    25     1 */
>             u8                         mac_id;               /*    26     1 */
>             u8                         unmapped;             /*    27     1 */
>             u8                         is_frag;              /*    28     1 */
>             u8                         tid;                  /*    29     1 */
>             u8                         is_first_msdu:1;      /*    30: 0  1 */
>             u8                         is_last_msdu:1;       /*    30: 1  1 */
>             u8                         is_continuation:1;    /*    30: 2  1 */
>             u8                         is_mcbc:1;            /*    30: 3  1 */
>             u8                         is_eapol_tkip:1;      /*    30: 4  1 */
>     
>             /* XXX 3 bits hole, try to pack */
>             /* XXX 1 byte hole, try to pack */
>     
>             u16                        peer_id;              /*    32     2 */
>             u16                        seq_no;               /*    34     2 */
>     
>             /* size: 40, cachelines: 1, members: 16 */
>             /* sum members: 34, holes: 1, sum holes: 1 */
>             /* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
>             /* padding: 4 */
>             /* last cacheline: 40 bytes */
>     };
>
>
>
> Or ath11k_bus_params:
>
>     $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_bus_params
>     struct ath11k_bus_params {
>             bool                       mhi_support;          /*     0     1 */
>             bool                       m3_fw_support;        /*     1     1 */
>             bool                       fixed_bdf_addr;       /*     2     1 */
>             bool                       fixed_mem_region;     /*     3     1 */
>             bool                       static_window_map;    /*     4     1 */
>     
>             /* size: 5, cachelines: 1, members: 5 */
>             /* last cacheline: 5 bytes */
>     };
>
> After changing it to an "u8 ....:1":
>
>     $ pahole ./drivers/net/wireless/ath/ath11k/ath11k.o -C ath11k_bus_params
>     struct ath11k_bus_params {
>             u8                         mhi_support:1;        /*     0: 0  1 */
>             u8                         m3_fw_support:1;      /*     0: 1  1 */
>             u8                         fixed_bdf_addr:1;     /*     0: 2  1 */
>             u8                         fixed_mem_region:1;   /*     0: 3  1 */
>             u8                         static_window_map:1;  /*     0: 4  1 */
>     
>             /* size: 1, cachelines: 1, members: 5 */
>             /* bit_padding: 3 bits */
>             /* last cacheline: 1 bytes */
>     };
>
>
> There are even better examples. ath11k_hw_params will for example take 
> currently 200 bytes. With a little reordering and adjusting of bools,
> it can easily be reduced to 152 bytes. But other structures might
> have more impact because they are used more often.

Yeah, I have been worried about this as well and we should fix this. But
instead of u8 I would prefer to use bool like mt76 uses:

struct mt76_queue_entry {
	union {
		void *buf;
		struct sk_buff *skb;
	};
	union {
		struct mt76_txwi_cache *txwi;
		struct urb *urb;
		int buf_sz;
	};
	u32 dma_addr[2];
	u16 dma_len[2];
	u16 wcid;
	bool skip_buf0:1;
	bool skip_buf1:1;
	bool done:1;
};

I didn't even know using bool is legal until I saw it in mt76.
Sven Eckelmann Nov. 17, 2021, 8:46 a.m. UTC | #3
On Wednesday, 17 November 2021 09:12:55 CET Kalle Valo wrote:
> > https://www.kernel.org/doc/html/v5.15/process/coding-style.html#using-bool
[...]
> 
> Yeah, I have been worried about this as well and we should fix this. But
> instead of u8 I would prefer to use bool like mt76 uses:
[...]
> I didn't even know using bool is legal until I saw it in mt76.

Interesting, I was also not aware of it. And it also seems to have some 
interesting implications when assigning values to it (example 4):

    #include <stdbool.h>
    #include <stdint.h>
    #include <stdio.h>
    
    struct test {
    	uint8_t u:1;
    	uint8_t u2:1;
    	bool b:1;
    	bool b2:1;
    };
    
    int main(void)
    {
    	struct test x;
    
    	x.u = false;
    	x.b = false;
    	printf("u %u b %u\n", x.u, x.b);
    
    	x.u = true;
    	x.b = true;
    	printf("u %u b %u\n", x.u, x.b);
    
    	x.u = 0;
    	x.b = 0;
    	printf("u %u b %u\n", x.u, x.b);
    
    	x.u = 8;
    	x.b = 8;
    	printf("u %u b %u\n", x.u, x.b);
    
    	return 0;
    }


Result:

    u 0 b 0
    u 1 b 1
    u 0 b 0
    u 0 b 1


The last example is basically the reason we see stuff like

    boolean_like_value = !!(some_retrieved_value);

when using unsigned bitfields instead of bool (bitfields).


And the memory layout (on x86-64):

    $ pahole test.o                                    
    struct test {
            uint8_t                    u:1;                  /*     0: 0  1 */
            uint8_t                    u2:1;                 /*     0: 1  1 */
            _Bool                      b:1;                  /*     0: 2  1 */
            _Bool                      b2:1;                 /*     0: 3  1 */
    
            /* size: 1, cachelines: 1, members: 4 */
            /* bit_padding: 4 bits */
            /* last cacheline: 1 bytes */
    };


To my surprise, it was already mentioned in one of the discussions [1].
Was there anything in the discussion which I might have missed and 
is a good reason to not use "bool ...:1" in structs?

Kind regards,
	Sven

[1] https://lore.kernel.org/all/CA+55aFwVZk1OfB9T2v014PTAKFhtVan_Zj2dOjnCy3x6E4UJfA@mail.gmail.com/T/#u
Kalle Valo Nov. 19, 2021, 3:17 p.m. UTC | #4
Sven Eckelmann <sven@narfation.org> writes:

> On Wednesday, 17 November 2021 09:12:55 CET Kalle Valo wrote:
>> > https://www.kernel.org/doc/html/v5.15/process/coding-style.html#using-bool
> [...]
>> 
>> Yeah, I have been worried about this as well and we should fix this. But
>> instead of u8 I would prefer to use bool like mt76 uses:
> [...]
>> I didn't even know using bool is legal until I saw it in mt76.
>
> Interesting, I was also not aware of it. And it also seems to have some 
> interesting implications when assigning values to it (example 4):
>
>     #include <stdbool.h>
>     #include <stdint.h>
>     #include <stdio.h>
>     
>     struct test {
>     	uint8_t u:1;
>     	uint8_t u2:1;
>     	bool b:1;
>     	bool b2:1;
>     };
>     
>     int main(void)
>     {
>     	struct test x;
>     
>     	x.u = false;
>     	x.b = false;
>     	printf("u %u b %u\n", x.u, x.b);
>     
>     	x.u = true;
>     	x.b = true;
>     	printf("u %u b %u\n", x.u, x.b);
>     
>     	x.u = 0;
>     	x.b = 0;
>     	printf("u %u b %u\n", x.u, x.b);
>     
>     	x.u = 8;
>     	x.b = 8;
>     	printf("u %u b %u\n", x.u, x.b);
>     
>     	return 0;
>     }
>
>
> Result:
>
>     u 0 b 0
>     u 1 b 1
>     u 0 b 0
>     u 0 b 1
>
>
> The last example is basically the reason we see stuff like
>
>     boolean_like_value = !!(some_retrieved_value);
>
> when using unsigned bitfields instead of bool (bitfields).
>
>
> And the memory layout (on x86-64):
>
>     $ pahole test.o                                    
>     struct test {
>             uint8_t                    u:1;                  /*     0: 0  1 */
>             uint8_t                    u2:1;                 /*     0: 1  1 */
>             _Bool                      b:1;                  /*     0: 2  1 */
>             _Bool                      b2:1;                 /*     0: 3  1 */
>     
>             /* size: 1, cachelines: 1, members: 4 */
>             /* bit_padding: 4 bits */
>             /* last cacheline: 1 bytes */
>     };
>
>
> To my surprise, it was already mentioned in one of the discussions [1].
> Was there anything in the discussion which I might have missed and 
> is a good reason to not use "bool ...:1" in structs?

No responses so must be safe to use ;) Changing the subject to gain more
visibility. But IMHO we should just switch using boolean bitfields as I
would expect mt76 to have noticed any problems by now.
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 969bf1a590d9..be788ec08200 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -1043,6 +1043,65 @@  static void ath11k_core_restart(struct work_struct *work)
 	complete(&ab->driver_recovery);
 }
 
+static void ath11k_core_reset(struct work_struct *work)
+{
+	struct ath11k_base *ab = container_of(work, struct ath11k_base, reset_work);
+	int reset_count, fail_cont_count;
+	long time_left;
+
+	if (!(test_bit(ATH11K_FLAG_REGISTERED, &ab->dev_flags))) {
+		ath11k_warn(ab, "ignore reset dev flags 0x%lx\n", ab->dev_flags);
+		return;
+	}
+
+	/* Sometimes the recovery will fail and then the next all recovery fail,
+	 * this is to avoid infinite recovery since it can not recovery success.
+	 */
+	fail_cont_count = atomic_read(&ab->fail_cont_count);
+
+	if (fail_cont_count >= ATH11K_RESET_MAX_FAIL_COUNT_FINAL)
+		return;
+
+	if (fail_cont_count >= ATH11K_RESET_MAX_FAIL_COUNT_FIRST &&
+	    time_before(jiffies, ab->reset_fail_timeout))
+		return;
+
+	reset_count = atomic_inc_return(&ab->reset_count);
+
+	if (reset_count > 1) {
+		/* Sometimes it happened another reset worker before the previous one
+		 * completed, then the second reset worker will destroy the previous one,
+		 * thus below is to avoid that.
+		 */
+		ath11k_warn(ab, "already reseting count %d\n", reset_count);
+
+		reinit_completion(&ab->reset_complete);
+		time_left = wait_for_completion_timeout(&ab->reset_complete,
+							ATH11K_RESET_TIMEOUT_HZ);
+
+		if (time_left) {
+			ath11k_dbg(ab, ATH11K_DBG_BOOT, "to skip reset\n");
+			atomic_dec(&ab->reset_count);
+			return;
+		}
+
+		ab->reset_fail_timeout = jiffies + ATH11K_RESET_FAIL_TIMEOUT_HZ;
+		/* Record the continuous recovery fail count when recovery failed*/
+		fail_cont_count = atomic_inc_return(&ab->fail_cont_count);
+	}
+
+	ath11k_dbg(ab, ATH11K_DBG_BOOT, "reset starting\n");
+
+	ab->is_reset = true;
+	atomic_set(&ab->recovery_count, 0);
+
+	ath11k_hif_power_down(ab);
+	ath11k_qmi_free_resource(ab);
+	ath11k_hif_power_up(ab);
+
+	ath11k_dbg(ab, ATH11K_DBG_BOOT, "reset started\n");
+}
+
 static int ath11k_init_hw_params(struct ath11k_base *ab)
 {
 	const struct ath11k_hw_params *hw_params = NULL;
@@ -1132,14 +1191,20 @@  struct ath11k_base *ath11k_core_alloc(struct device *dev, size_t priv_size,
 	if (!ab->workqueue)
 		goto err_sc_free;
 
+	ab->workqueue_aux = create_singlethread_workqueue("ath11k_aux_wq");
+	if (!ab->workqueue_aux)
+		goto err_free_wq;
+
 	mutex_init(&ab->core_lock);
 	spin_lock_init(&ab->base_lock);
+	init_completion(&ab->reset_complete);
 
 	INIT_LIST_HEAD(&ab->peers);
 	init_waitqueue_head(&ab->peer_mapping_wq);
 	init_waitqueue_head(&ab->wmi_ab.tx_credits_wq);
 	init_waitqueue_head(&ab->qmi.cold_boot_waitq);
 	INIT_WORK(&ab->restart_work, ath11k_core_restart);
+	INIT_WORK(&ab->reset_work, ath11k_core_reset);
 	timer_setup(&ab->rx_replenish_retry, ath11k_ce_rx_replenish_retry, 0);
 	init_completion(&ab->htc_suspend);
 	init_completion(&ab->wow.wakeup_completed);
@@ -1150,6 +1215,8 @@  struct ath11k_base *ath11k_core_alloc(struct device *dev, size_t priv_size,
 
 	return ab;
 
+err_free_wq:
+	destroy_workqueue(ab->workqueue);
 err_sc_free:
 	kfree(ab);
 	return NULL;
diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h
index 018fb2385f2a..9a9f8f24d407 100644
--- a/drivers/net/wireless/ath/ath11k/core.h
+++ b/drivers/net/wireless/ath/ath11k/core.h
@@ -39,6 +39,10 @@ 
 extern unsigned int ath11k_frame_mode;
 
 #define ATH11K_MON_TIMER_INTERVAL  10
+#define ATH11K_RESET_TIMEOUT_HZ (20 * HZ)
+#define ATH11K_RESET_MAX_FAIL_COUNT_FIRST 3
+#define ATH11K_RESET_MAX_FAIL_COUNT_FINAL 5
+#define ATH11K_RESET_FAIL_TIMEOUT_HZ (20 * HZ)
 
 enum ath11k_supported_bw {
 	ATH11K_BW_20	= 0,
@@ -734,6 +738,15 @@  struct ath11k_base {
 	struct completion driver_recovery;
 	struct workqueue_struct *workqueue;
 	struct work_struct restart_work;
+	struct workqueue_struct *workqueue_aux;
+	struct work_struct reset_work;
+	atomic_t reset_count;
+	atomic_t recovery_count;
+	bool is_reset;
+	struct completion reset_complete;
+	/* continuous recovery fail count */
+	atomic_t fail_cont_count;
+	unsigned long reset_fail_timeout;
 	struct {
 		/* protected by data_lock */
 		u32 fw_crash_counter;
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c
index eb52332dbe3f..b0a2f257f328 100644
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -6032,6 +6032,8 @@  ath11k_mac_op_reconfig_complete(struct ieee80211_hw *hw,
 				enum ieee80211_reconfig_type reconfig_type)
 {
 	struct ath11k *ar = hw->priv;
+	struct ath11k_base *ab = ar->ab;
+	int recovery_count;
 
 	if (reconfig_type != IEEE80211_RECONFIG_TYPE_RESTART)
 		return;
@@ -6043,6 +6045,22 @@  ath11k_mac_op_reconfig_complete(struct ieee80211_hw *hw,
 			    ar->pdev->pdev_id);
 		ar->state = ATH11K_STATE_ON;
 		ieee80211_wake_queues(ar->hw);
+
+		if (ab->is_reset) {
+			recovery_count = atomic_inc_return(&ab->recovery_count);
+			ath11k_dbg(ab, ATH11K_DBG_BOOT,
+				   "recovery count %d\n", recovery_count);
+			/* When there are multiple radios in an SOC,
+			 * the recovery has to be done for each radio
+			 */
+			if (recovery_count == ab->num_radios) {
+				atomic_dec(&ab->reset_count);
+				complete(&ab->reset_complete);
+				ab->is_reset = false;
+				atomic_set(&ab->fail_cont_count, 0);
+				ath11k_dbg(ab, ATH11K_DBG_BOOT, "reset success\n");
+			}
+		}
 	}
 
 	mutex_unlock(&ar->conf_mutex);
diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c
index 75cc2d80fde8..aea21ea2ca47 100644
--- a/drivers/net/wireless/ath/ath11k/mhi.c
+++ b/drivers/net/wireless/ath/ath11k/mhi.c
@@ -281,15 +281,48 @@  static void ath11k_mhi_op_runtime_put(struct mhi_controller *mhi_cntrl)
 {
 }
 
+static char *ath11k_mhi_op_callback_to_str(enum mhi_callback reason)
+{
+	switch (reason) {
+	case MHI_CB_IDLE:
+		return "MHI_CB_IDLE";
+	case MHI_CB_PENDING_DATA:
+		return "MHI_CB_PENDING_DATA";
+	case MHI_CB_LPM_ENTER:
+		return "MHI_CB_LPM_ENTER";
+	case MHI_CB_LPM_EXIT:
+		return "MHI_CB_LPM_EXIT";
+	case MHI_CB_EE_RDDM:
+		return "MHI_CB_EE_RDDM";
+	case MHI_CB_EE_MISSION_MODE:
+		return "MHI_CB_EE_MISSION_MODE";
+	case MHI_CB_SYS_ERROR:
+		return "MHI_CB_SYS_ERROR";
+	case MHI_CB_FATAL_ERROR:
+		return "MHI_CB_FATAL_ERROR";
+	case MHI_CB_BW_REQ:
+		return "MHI_CB_BW_REQ";
+	default:
+		return "UNKNOWN";
+	}
+};
+
 static void ath11k_mhi_op_status_cb(struct mhi_controller *mhi_cntrl,
 				    enum mhi_callback cb)
 {
 	struct ath11k_base *ab = dev_get_drvdata(mhi_cntrl->cntrl_dev);
 
+	ath11k_dbg(ab, ATH11K_DBG_BOOT, "mhi notify status reason %s\n",
+		   ath11k_mhi_op_callback_to_str(cb));
+
 	switch (cb) {
 	case MHI_CB_SYS_ERROR:
 		ath11k_warn(ab, "firmware crashed: MHI_CB_SYS_ERROR\n");
 		break;
+	case MHI_CB_EE_RDDM:
+		if (!(test_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags)))
+			queue_work(ab->workqueue_aux, &ab->reset_work);
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
index 646ad79f309c..1f8e3837cdfb 100644
--- a/drivers/net/wireless/ath/ath11k/pci.c
+++ b/drivers/net/wireless/ath/ath11k/pci.c
@@ -1346,6 +1346,7 @@  static void ath11k_pci_remove(struct pci_dev *pdev)
 
 	set_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags);
 
+	cancel_work_sync(&ab->reset_work);
 	ath11k_core_deinit(ab);
 
 qmi_fail:
@@ -1357,6 +1358,8 @@  static void ath11k_pci_remove(struct pci_dev *pdev)
 
 	ath11k_hal_srng_deinit(ab);
 	ath11k_ce_free_pipes(ab);
+
+	destroy_workqueue(ab->workqueue_aux);
 	ath11k_core_free(ab);
 }