diff mbox series

wifi: mwifiex: Fix two buggy list traversals

Message ID ff796ca4b4f5610bc2d4a479b8cafbb595c7b3a1.1722362534.git.calvin@wbinvd.org (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series wifi: mwifiex: Fix two buggy list traversals | expand

Commit Message

Calvin Owens July 30, 2024, 6:05 p.m. UTC
Both of these list traversals use list_for_each_entry_safe(), yet drop
the lock protecting the list during the traversal.

Because the _safe() iterator stores a pointer to the next list node
locally so the current node can be deleted, dropping the lock this way
means the next "cached" list_head might be freed by another caller,
leading the iterator to dereference pointers in freed memory after
reacquiring the lock.

Fix by moving to-be-deleted objects to an on-stack list before actually
deleting them, so the lock can be held for the entire traversal.

This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still
take the rx_reorder_tbl_lock to delete the item from the two on-stack
lists introduced in this patch. But that is just ugly, not wrong, and
the function has other callers... making the locking conditional seems
strictly uglier.

I discovered this bug while studying the new "nxpwifi" driver, which was
sent to the mailing list about a month ago:

https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/

...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially
exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted
to pass along a bugfix for the original driver as well.

I only have an IW612, so this patch was only tested on "nxpwifi".

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
---
 .../wireless/marvell/mwifiex/11n_rxreorder.c  | 26 +++++++++----------
 1 file changed, 12 insertions(+), 14 deletions(-)

Comments

Brian Norris July 31, 2024, 8:09 p.m. UTC | #1
On Tue, Jul 30, 2024 at 11:05:30AM -0700, Calvin Owens wrote:
> Both of these list traversals use list_for_each_entry_safe(), yet drop
> the lock protecting the list during the traversal.
> 
> Because the _safe() iterator stores a pointer to the next list node
> locally so the current node can be deleted, dropping the lock this way
> means the next "cached" list_head might be freed by another caller,
> leading the iterator to dereference pointers in freed memory after
> reacquiring the lock.

There are lots of unclear and/or unsound locking patterns in this
driver. You've probably identified one, although I don't think you've
solved 100% of it.

Here's another: is it valid for mwifiex_11n_rx_reorder_pkt() ->
mwifiex_11n_get_rx_reorder_tbl() to retrieve a 'tbl' pointer (without
removing it from the list), and then continue to operate on that without
holding any locks? (I think the answer is "no".)

Side note: you might also refer to this old thread:
https://lore.kernel.org/all/CAD=FV=VuxFtDdcMndLNzVYDoid8N3jP46j0sOFXG1D4CzX0=Zw@mail.gmail.com/
I don't think Marvell ever fully resolved all the issues there.

> Fix by moving to-be-deleted objects to an on-stack list before actually
> deleting them, so the lock can be held for the entire traversal.
> 
> This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still
> take the rx_reorder_tbl_lock to delete the item from the two on-stack
> lists introduced in this patch. But that is just ugly, not wrong, and
> the function has other callers... making the locking conditional seems
> strictly uglier.

I noticed this "ugliness", but I agree with your reasoning -- it's as
good as we can do here for now.

> I discovered this bug while studying the new "nxpwifi" driver, which was
> sent to the mailing list about a month ago:
> 
> https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/
> 
> ...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially
> exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted
> to pass along a bugfix for the original driver as well.

That's another can of worms. mwifiex is horrible, and so if you were
asking me, I'd reject any attempt at copy/paste/modify that doesn't make
significant efforts to refactor and improve -- for instance, better
documentation about what all the locks mean, and clarity such that
readers can be confident that the code is doing the right thing. For
example, I think this mwifiex comment is a lie:

	/* spin lock for rx_reorder_tbl_ptr queue */
	spinlock_t rx_reorder_tbl_lock;

I believe it's supposed to protect the elements within the list too --
but it doesn't do a good job of that.

But that's a side track...

> I only have an IW612, so this patch was only tested on "nxpwifi".

I don't think we can accept an untested patch here. If you're lucky,
maybe I or someone else on CC can test for you though.

> Signed-off-by: Calvin Owens <calvin@wbinvd.org>
> ---
>  .../wireless/marvell/mwifiex/11n_rxreorder.c  | 26 +++++++++----------
>  1 file changed, 12 insertions(+), 14 deletions(-)

I think the patch looks good enough, but I won't ack it without testing.
And while you're at it, I'd recommend some further auditing, per the
above.

Brian
Calvin Owens Aug. 5, 2024, 9:33 p.m. UTC | #2
On Wednesday 07/31 at 13:09 -0700, Brian Norris wrote:
> On Tue, Jul 30, 2024 at 11:05:30AM -0700, Calvin Owens wrote:
> > Both of these list traversals use list_for_each_entry_safe(), yet drop
> > the lock protecting the list during the traversal.
> > 
> > Because the _safe() iterator stores a pointer to the next list node
> > locally so the current node can be deleted, dropping the lock this way
> > means the next "cached" list_head might be freed by another caller,
> > leading the iterator to dereference pointers in freed memory after
> > reacquiring the lock.
> 
> There are lots of unclear and/or unsound locking patterns in this
> driver. You've probably identified one, although I don't think you've
> solved 100% of it.
> 
> Here's another: is it valid for mwifiex_11n_rx_reorder_pkt() ->
> mwifiex_11n_get_rx_reorder_tbl() to retrieve a 'tbl' pointer (without
> removing it from the list), and then continue to operate on that without
> holding any locks? (I think the answer is "no".)
> 
> Side note: you might also refer to this old thread:
> https://lore.kernel.org/all/CAD=FV=VuxFtDdcMndLNzVYDoid8N3jP46j0sOFXG1D4CzX0=Zw@mail.gmail.com/
> I don't think Marvell ever fully resolved all the issues there.

That's all helpful, thank you.

> > Fix by moving to-be-deleted objects to an on-stack list before actually
> > deleting them, so the lock can be held for the entire traversal.
> > 
> > This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still
> > take the rx_reorder_tbl_lock to delete the item from the two on-stack
> > lists introduced in this patch. But that is just ugly, not wrong, and
> > the function has other callers... making the locking conditional seems
> > strictly uglier.
> 
> I noticed this "ugliness", but I agree with your reasoning -- it's as
> good as we can do here for now.
> 
> > I discovered this bug while studying the new "nxpwifi" driver, which was
> > sent to the mailing list about a month ago:
> > 
> > https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/
> > 
> > ...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially
> > exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted
> > to pass along a bugfix for the original driver as well.
> 
> That's another can of worms. mwifiex is horrible, and so if you were
> asking me, I'd reject any attempt at copy/paste/modify that doesn't make
> significant efforts to refactor and improve -- for instance, better
> documentation about what all the locks mean, and clarity such that
> readers can be confident that the code is doing the right thing. For
> example, I think this mwifiex comment is a lie:
> 
> 	/* spin lock for rx_reorder_tbl_ptr queue */
> 	spinlock_t rx_reorder_tbl_lock;
> 
> I believe it's supposed to protect the elements within the list too --
> but it doesn't do a good job of that.
> 
> But that's a side track...
> 
> > I only have an IW612, so this patch was only tested on "nxpwifi".
> 
> I don't think we can accept an untested patch here. If you're lucky,
> maybe I or someone else on CC can test for you though.
>
> > Signed-off-by: Calvin Owens <calvin@wbinvd.org>
> > ---
> >  .../wireless/marvell/mwifiex/11n_rxreorder.c  | 26 +++++++++----------
> >  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> I think the patch looks good enough, but I won't ack it without testing.
> And while you're at it, I'd recommend some further auditing, per the
> above.

Understood. I was honestly a bit hesitant to send this in the first
place without some sort of reproducer, I'll sit on the patch until I'm
able to find one.

Thanks,
Calvin

> Brian
diff mbox series

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
index 10690e82358b..fbaecfd32429 100644
--- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
+++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
@@ -249,20 +249,20 @@  mwifiex_11n_get_rx_reorder_tbl(struct mwifiex_private *priv, int tid, u8 *ta)
 void mwifiex_11n_del_rx_reorder_tbl_by_ta(struct mwifiex_private *priv, u8 *ta)
 {
 	struct mwifiex_rx_reorder_tbl *tbl, *tmp;
+	LIST_HEAD(tmplist);
 
 	if (!ta)
 		return;
 
 	spin_lock_bh(&priv->rx_reorder_tbl_lock);
-	list_for_each_entry_safe(tbl, tmp, &priv->rx_reorder_tbl_ptr, list) {
-		if (!memcmp(tbl->ta, ta, ETH_ALEN)) {
-			spin_unlock_bh(&priv->rx_reorder_tbl_lock);
-			mwifiex_del_rx_reorder_entry(priv, tbl);
-			spin_lock_bh(&priv->rx_reorder_tbl_lock);
-		}
-	}
+	list_for_each_entry_safe(tbl, tmp, &priv->rx_reorder_tbl_ptr, list)
+		if (!memcmp(tbl->ta, ta, ETH_ALEN))
+			list_move_tail(&tbl->list, &tmplist);
 	spin_unlock_bh(&priv->rx_reorder_tbl_lock);
 
+	list_for_each_entry_safe(tbl, tmp, &tmplist, list)
+		mwifiex_del_rx_reorder_entry(priv, tbl);
+
 	return;
 }
 
@@ -785,17 +785,15 @@  void mwifiex_11n_ba_stream_timeout(struct mwifiex_private *priv,
 void mwifiex_11n_cleanup_reorder_tbl(struct mwifiex_private *priv)
 {
 	struct mwifiex_rx_reorder_tbl *del_tbl_ptr, *tmp_node;
+	LIST_HEAD(tmplist);
 
 	spin_lock_bh(&priv->rx_reorder_tbl_lock);
-	list_for_each_entry_safe(del_tbl_ptr, tmp_node,
-				 &priv->rx_reorder_tbl_ptr, list) {
-		spin_unlock_bh(&priv->rx_reorder_tbl_lock);
-		mwifiex_del_rx_reorder_entry(priv, del_tbl_ptr);
-		spin_lock_bh(&priv->rx_reorder_tbl_lock);
-	}
-	INIT_LIST_HEAD(&priv->rx_reorder_tbl_ptr);
+	list_splice_tail_init(&priv->rx_reorder_tbl_ptr, &tmplist);
 	spin_unlock_bh(&priv->rx_reorder_tbl_lock);
 
+	list_for_each_entry_safe(del_tbl_ptr, tmp_node, &tmplist, list)
+		mwifiex_del_rx_reorder_entry(priv, del_tbl_ptr);
+
 	mwifiex_reset_11n_rx_seq_num(priv);
 }