diff mbox

mwifiex: cancel pcie/sdio work in remove/shutdown handler

Message ID 1513164473-13827-1-git-send-email-huxm@marvell.com (mailing list archive)
State Accepted
Commit b713bbf1471b56b572ce26bd02b81a85c2b007f4
Delegated to: Kalle Valo
Headers show

Commit Message

Xinming Hu Dec. 13, 2017, 11:27 a.m. UTC
The last command used to shutdown firmware might be timeout,
and trigger firmware dump in asynchronous pcie/sdio work.

The remove/shutdown handler will continue free core data
structure private/adapter, which might be dereferenced in
pcie/sdio work, finally crash the kernel.

Sync and Cancel pcie/sdio work, could be a fix for above
cornel case. In this way, the last command timeout could
be handled properly.

Signed-off-by: Xinming Hu <huxm@marvell.com>
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 2 ++
 drivers/net/wireless/marvell/mwifiex/sdio.c | 2 ++
 2 files changed, 4 insertions(+)

Comments

Kalle Valo Jan. 8, 2018, 5:38 p.m. UTC | #1
Xinming Hu <huxm@marvell.com> wrote:

> The last command used to shutdown firmware might be timeout,
> and trigger firmware dump in asynchronous pcie/sdio work.
> 
> The remove/shutdown handler will continue free core data
> structure private/adapter, which might be dereferenced in
> pcie/sdio work, finally crash the kernel.
> 
> Sync and Cancel pcie/sdio work, could be a fix for above
> cornel case. In this way, the last command timeout could
> be handled properly.
> 
> Signed-off-by: Xinming Hu <huxm@marvell.com>

Patch applied to wireless-drivers-next.git, thanks.

b713bbf1471b mwifiex: cancel pcie/sdio work in remove/shutdown handler
Brian Norris Jan. 8, 2018, 6:11 p.m. UTC | #2
Hi,

On Wed, Dec 13, 2017 at 07:27:53PM +0800, Xinming Hu wrote:
> The last command used to shutdown firmware might be timeout,
> and trigger firmware dump in asynchronous pcie/sdio work.
> 
> The remove/shutdown handler will continue free core data
> structure private/adapter, which might be dereferenced in
> pcie/sdio work, finally crash the kernel.
> 
> Sync and Cancel pcie/sdio work, could be a fix for above
> cornel case. In this way, the last command timeout could

s/cornel/corner/

> be handled properly.
> 
> Signed-off-by: Xinming Hu <huxm@marvell.com>
> ---
>  drivers/net/wireless/marvell/mwifiex/pcie.c | 2 ++
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index f666cb2..23209c5 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -310,6 +310,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
>  		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
>  	}
>  
> +	cancel_work_sync(&card->work);
> +

Just FYI, this "fix" is not a real fix. It will likely paper over some
of your bugs (where, e.g., the FW shutdown command times out in the
previous couple of lines), but this highlights the fact that there are
other races that could trigger the same behavior. You're not fixing
those.

For example, what if somebody initiates a scan or other nl80211 command
between the above line and mwifiex_remove_card()? That command could
potentially time out too.

The proper fix would be to institute some kind of mutual exclusion
(locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so
that they can't occur at the same time.

Unfortunately, I only paid attention to this after Kalle already applied
this patch. Personally, I'd prefer this patch not get applied, since
it's a bad solution to an obvious problem, which instead leaves a subtle
problem that perhaps no one will bother fixing.

Brian

>  	mwifiex_remove_card(adapter);
>  }
>  
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index a828801..2488587 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -399,6 +399,8 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
>  		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
>  	}
>  
> +	cancel_work_sync(&card->work);
> +
>  	mwifiex_remove_card(adapter);
>  }
>  
> -- 
> 1.9.1
>
Kalle Valo Jan. 9, 2018, 7:39 a.m. UTC | #3
Brian Norris <briannorris@chromium.org> writes:

>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> @@ -310,6 +310,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
>>  		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
>>  	}
>>  
>> +	cancel_work_sync(&card->work);
>> +
>
> Just FYI, this "fix" is not a real fix. It will likely paper over some
> of your bugs (where, e.g., the FW shutdown command times out in the
> previous couple of lines), but this highlights the fact that there are
> other races that could trigger the same behavior. You're not fixing
> those.
>
> For example, what if somebody initiates a scan or other nl80211 command
> between the above line and mwifiex_remove_card()? That command could
> potentially time out too.
>
> The proper fix would be to institute some kind of mutual exclusion
> (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so
> that they can't occur at the same time.
>
> Unfortunately, I only paid attention to this after Kalle already applied
> this patch. Personally, I'd prefer this patch not get applied, since
> it's a bad solution to an obvious problem, which instead leaves a subtle
> problem that perhaps no one will bother fixing.

I can revert it, that's not a problem. Can I use the text below as
explanation for the revert?

----------------------------------------------------------------------
Brian Norris <briannorris@chromium.org> says:

Just FYI, this "fix" is not a real fix. It will likely paper over some
of your bugs (where, e.g., the FW shutdown command times out in the
previous couple of lines), but this highlights the fact that there are
other races that could trigger the same behavior. You're not fixing
those.

For example, what if somebody initiates a scan or other nl80211 command
between the above line and mwifiex_remove_card()? That command could
potentially time out too.

The proper fix would be to institute some kind of mutual exclusion
(locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so
that they can't occur at the same time.

----------------------------------------------------------------------
diff mbox

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index f666cb2..23209c5 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -310,6 +310,8 @@  static void mwifiex_pcie_remove(struct pci_dev *pdev)
 		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
 	}
 
+	cancel_work_sync(&card->work);
+
 	mwifiex_remove_card(adapter);
 }
 
diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index a828801..2488587 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -399,6 +399,8 @@  static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
 		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
 	}
 
+	cancel_work_sync(&card->work);
+
 	mwifiex_remove_card(adapter);
 }