Message ID | 1487942964-3193-1-git-send-email-akarwar@marvell.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Hi Amit, You managed to CC several other Google folks, but not me, the one who actually reviewed most of this! You might consider including me in the future :) On Fri, Feb 24, 2017 at 06:59:24PM +0530, Amitkumar Karwar wrote: > We observed a SHUTDOWN command timeout during reboot stress test due > to a corner case firmware bug. It leads to use-after-free on adapter > structure pointer and crash. > > We already have a cancel_work_sync() call in teardown thread. This > issue is fixed by having this call just before mwifiex_remove_card(). > At this point no further work will be scheduled. > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> > Signed-off-by: Cathy Luo <cluo@marvell.com> I'm testing this artificially by testing things like this concurrently: rmmod mwifiex_pcie & cat /sys/kernel/debug/mwifiex/mlan0/device_dump I'm using a 4.4-based kernel (plus quite a few backports) at the moment and I'm having problems (I can retest on upstream if really needed), and pretty sure this patch is buggy. > --- > drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +-- > drivers/net/wireless/marvell/mwifiex/sdio.c | 3 +-- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c > index a0d9180..f31c5ea 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > @@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > if (!adapter || !adapter->priv_num) > return; > > - cancel_work_sync(&card->work); > - > reg = card->pcie.reg; > if (reg) > ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status); > @@ -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); > } > > + cancel_work_sync(&card->work); I don't think we want to move the cancellation to be this far; see the mwifiex_init_shutdown_fw() above! If I add a msleep(3000) below this, then run: rmmod mwifiex_pcie & sleep 0.5; cat /sys/kernel/debug/mwifiex/mlan0/device_dump I can trigger an abort in mwifiex_pcie_rdwr_firmware(). The problem is that you still allow a command timeout + firmware dump worker to still race with the shutdown -- in this case, I think it's mwifiex_init_shutdown_fw() that's disabling the device. I think the real solution is to, somewhere before we shutdown the firmware, *really* prevent any further work to be scheduled to &card->work. Maybe that means adding another flag so that the worker will just abort quickly in that case? So it's something like: card->worker_flags |= DONT_RUN_ANY_MORE; cancel_work_sync(&card->work); ... (this can be done either above the FIRMWARE_READY_PCIE check, or else you need to write a different version for FIRMWARE_READY_PCIE vs. !FIRMWARE_READY_PCIE) ... but definitely before mwifiex_init_shutdown_fw() ) ... And in mwifiex_pcie_work(): if (card->worker_flags & DONT_RUN_ANY_MORE) return; IOW, NAK to this patch. Brian > mwifiex_remove_card(adapter); > } > > diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c > index a4b356d..9534b47 100644 > --- a/drivers/net/wireless/marvell/mwifiex/sdio.c > +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c > @@ -387,8 +387,6 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter) > if (!adapter || !adapter->priv_num) > return; > > - cancel_work_sync(&card->work); > - > mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num); > > ret = mwifiex_sdio_read_fw_status(adapter, &firmware_stat); > @@ -400,6 +398,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter) > mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); > } > > + cancel_work_sync(&card->work); > mwifiex_remove_card(adapter); > } > > -- > 1.9.1 >
On Tue, Mar 14, 2017 at 11:33 AM, Brian Norris <briannorris@chromium.org> wrote: > > Hi Amit, > > You managed to CC several other Google folks, but not me, the one who > actually reviewed most of this! You might consider including me in the > future :) Oops, in fact you did CC me... my bad. But I usually use my @chromium.org for this stuff.
> From: Brian Norris [mailto:briannorris@chromium.org] > Sent: Wednesday, March 15, 2017 12:03 AM > To: Amitkumar Karwar > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; > rajatja@google.com; dmitry.torokhov@gmail.com > Subject: [EXT] Re: [PATCH] mwifiex: fix kernel crash after shutdown > command timeout > > On Fri, Feb 24, 2017 at 06:59:24PM +0530, Amitkumar Karwar wrote: > > We observed a SHUTDOWN command timeout during reboot stress test due > > to a corner case firmware bug. It leads to use-after-free on adapter > > structure pointer and crash. > > > > We already have a cancel_work_sync() call in teardown thread. This > > issue is fixed by having this call just before mwifiex_remove_card(). > > At this point no further work will be scheduled. > > > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> > > Signed-off-by: Cathy Luo <cluo@marvell.com> > > I'm testing this artificially by testing things like this concurrently: > > rmmod mwifiex_pcie & > cat /sys/kernel/debug/mwifiex/mlan0/device_dump > > I'm using a 4.4-based kernel (plus quite a few backports) at the moment > and I'm having problems (I can retest on upstream if really needed), > and pretty sure this patch is buggy. > > > --- > > drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +-- > > drivers/net/wireless/marvell/mwifiex/sdio.c | 3 +-- > > 2 files changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c > > b/drivers/net/wireless/marvell/mwifiex/pcie.c > > index a0d9180..f31c5ea 100644 > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > > @@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev > *pdev) > > if (!adapter || !adapter->priv_num) > > return; > > > > - cancel_work_sync(&card->work); > > - > > reg = card->pcie.reg; > > if (reg) > > ret = mwifiex_read_reg(adapter, reg->fw_status, > &fw_status); @@ > > -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev > *pdev) > > mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); > > } > > > > + cancel_work_sync(&card->work); > > I don't think we want to move the cancellation to be this far; see the > mwifiex_init_shutdown_fw() above! If I add a msleep(3000) below this, > then run: > > rmmod mwifiex_pcie & sleep 0.5; cat > /sys/kernel/debug/mwifiex/mlan0/device_dump > > I can trigger an abort in mwifiex_pcie_rdwr_firmware(). The problem is > that you still allow a command timeout + firmware dump worker to still > race with the shutdown -- in this case, I think it's > mwifiex_init_shutdown_fw() that's disabling the device. > > I think the real solution is to, somewhere before we shutdown the > firmware, *really* prevent any further work to be scheduled to &card- > >work. Maybe that means adding another flag so that the worker will > just abort quickly in that case? So it's something like: > > card->worker_flags |= DONT_RUN_ANY_MORE; > cancel_work_sync(&card->work); > > ... (this can be done either above the FIRMWARE_READY_PCIE > check, or else you need to write a different version for > FIRMWARE_READY_PCIE vs. !FIRMWARE_READY_PCIE) ... but definitely > before mwifiex_init_shutdown_fw() ) ... > > And in mwifiex_pcie_work(): > > if (card->worker_flags & DONT_RUN_ANY_MORE) > return; > Thanks for the review. You are right. This can be cleanly fixed with a extra worker flag(DONT_RUN_ANY_MORE) I will submit updated version with this approach. Regards, Amitkumar
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index a0d9180..f31c5ea 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -294,8 +294,6 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) if (!adapter || !adapter->priv_num) return; - cancel_work_sync(&card->work); - reg = card->pcie.reg; if (reg) ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status); @@ -312,6 +310,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); } + cancel_work_sync(&card->work); mwifiex_remove_card(adapter); } diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c index a4b356d..9534b47 100644 --- a/drivers/net/wireless/marvell/mwifiex/sdio.c +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c @@ -387,8 +387,6 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter) if (!adapter || !adapter->priv_num) return; - cancel_work_sync(&card->work); - mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num); ret = mwifiex_sdio_read_fw_status(adapter, &firmware_stat); @@ -400,6 +398,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter) mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); } + cancel_work_sync(&card->work); mwifiex_remove_card(adapter); }