Message ID | ff796ca4b4f5610bc2d4a479b8cafbb595c7b3a1.1722362534.git.calvin@wbinvd.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | wifi: mwifiex: Fix two buggy list traversals | expand |
On Tue, Jul 30, 2024 at 11:05:30AM -0700, Calvin Owens wrote: > Both of these list traversals use list_for_each_entry_safe(), yet drop > the lock protecting the list during the traversal. > > Because the _safe() iterator stores a pointer to the next list node > locally so the current node can be deleted, dropping the lock this way > means the next "cached" list_head might be freed by another caller, > leading the iterator to dereference pointers in freed memory after > reacquiring the lock. There are lots of unclear and/or unsound locking patterns in this driver. You've probably identified one, although I don't think you've solved 100% of it. Here's another: is it valid for mwifiex_11n_rx_reorder_pkt() -> mwifiex_11n_get_rx_reorder_tbl() to retrieve a 'tbl' pointer (without removing it from the list), and then continue to operate on that without holding any locks? (I think the answer is "no".) Side note: you might also refer to this old thread: https://lore.kernel.org/all/CAD=FV=VuxFtDdcMndLNzVYDoid8N3jP46j0sOFXG1D4CzX0=Zw@mail.gmail.com/ I don't think Marvell ever fully resolved all the issues there. > Fix by moving to-be-deleted objects to an on-stack list before actually > deleting them, so the lock can be held for the entire traversal. > > This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still > take the rx_reorder_tbl_lock to delete the item from the two on-stack > lists introduced in this patch. But that is just ugly, not wrong, and > the function has other callers... making the locking conditional seems > strictly uglier. I noticed this "ugliness", but I agree with your reasoning -- it's as good as we can do here for now. > I discovered this bug while studying the new "nxpwifi" driver, which was > sent to the mailing list about a month ago: > > https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/ > > ...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially > exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted > to pass along a bugfix for the original driver as well. That's another can of worms. mwifiex is horrible, and so if you were asking me, I'd reject any attempt at copy/paste/modify that doesn't make significant efforts to refactor and improve -- for instance, better documentation about what all the locks mean, and clarity such that readers can be confident that the code is doing the right thing. For example, I think this mwifiex comment is a lie: /* spin lock for rx_reorder_tbl_ptr queue */ spinlock_t rx_reorder_tbl_lock; I believe it's supposed to protect the elements within the list too -- but it doesn't do a good job of that. But that's a side track... > I only have an IW612, so this patch was only tested on "nxpwifi". I don't think we can accept an untested patch here. If you're lucky, maybe I or someone else on CC can test for you though. > Signed-off-by: Calvin Owens <calvin@wbinvd.org> > --- > .../wireless/marvell/mwifiex/11n_rxreorder.c | 26 +++++++++---------- > 1 file changed, 12 insertions(+), 14 deletions(-) I think the patch looks good enough, but I won't ack it without testing. And while you're at it, I'd recommend some further auditing, per the above. Brian
On Wednesday 07/31 at 13:09 -0700, Brian Norris wrote: > On Tue, Jul 30, 2024 at 11:05:30AM -0700, Calvin Owens wrote: > > Both of these list traversals use list_for_each_entry_safe(), yet drop > > the lock protecting the list during the traversal. > > > > Because the _safe() iterator stores a pointer to the next list node > > locally so the current node can be deleted, dropping the lock this way > > means the next "cached" list_head might be freed by another caller, > > leading the iterator to dereference pointers in freed memory after > > reacquiring the lock. > > There are lots of unclear and/or unsound locking patterns in this > driver. You've probably identified one, although I don't think you've > solved 100% of it. > > Here's another: is it valid for mwifiex_11n_rx_reorder_pkt() -> > mwifiex_11n_get_rx_reorder_tbl() to retrieve a 'tbl' pointer (without > removing it from the list), and then continue to operate on that without > holding any locks? (I think the answer is "no".) > > Side note: you might also refer to this old thread: > https://lore.kernel.org/all/CAD=FV=VuxFtDdcMndLNzVYDoid8N3jP46j0sOFXG1D4CzX0=Zw@mail.gmail.com/ > I don't think Marvell ever fully resolved all the issues there. That's all helpful, thank you. > > Fix by moving to-be-deleted objects to an on-stack list before actually > > deleting them, so the lock can be held for the entire traversal. > > > > This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still > > take the rx_reorder_tbl_lock to delete the item from the two on-stack > > lists introduced in this patch. But that is just ugly, not wrong, and > > the function has other callers... making the locking conditional seems > > strictly uglier. > > I noticed this "ugliness", but I agree with your reasoning -- it's as > good as we can do here for now. > > > I discovered this bug while studying the new "nxpwifi" driver, which was > > sent to the mailing list about a month ago: > > > > https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/ > > > > ...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially > > exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted > > to pass along a bugfix for the original driver as well. > > That's another can of worms. mwifiex is horrible, and so if you were > asking me, I'd reject any attempt at copy/paste/modify that doesn't make > significant efforts to refactor and improve -- for instance, better > documentation about what all the locks mean, and clarity such that > readers can be confident that the code is doing the right thing. For > example, I think this mwifiex comment is a lie: > > /* spin lock for rx_reorder_tbl_ptr queue */ > spinlock_t rx_reorder_tbl_lock; > > I believe it's supposed to protect the elements within the list too -- > but it doesn't do a good job of that. > > But that's a side track... > > > I only have an IW612, so this patch was only tested on "nxpwifi". > > I don't think we can accept an untested patch here. If you're lucky, > maybe I or someone else on CC can test for you though. > > > Signed-off-by: Calvin Owens <calvin@wbinvd.org> > > --- > > .../wireless/marvell/mwifiex/11n_rxreorder.c | 26 +++++++++---------- > > 1 file changed, 12 insertions(+), 14 deletions(-) > > I think the patch looks good enough, but I won't ack it without testing. > And while you're at it, I'd recommend some further auditing, per the > above. Understood. I was honestly a bit hesitant to send this in the first place without some sort of reproducer, I'll sit on the patch until I'm able to find one. Thanks, Calvin > Brian
diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c index 10690e82358b..fbaecfd32429 100644 --- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c +++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c @@ -249,20 +249,20 @@ mwifiex_11n_get_rx_reorder_tbl(struct mwifiex_private *priv, int tid, u8 *ta) void mwifiex_11n_del_rx_reorder_tbl_by_ta(struct mwifiex_private *priv, u8 *ta) { struct mwifiex_rx_reorder_tbl *tbl, *tmp; + LIST_HEAD(tmplist); if (!ta) return; spin_lock_bh(&priv->rx_reorder_tbl_lock); - list_for_each_entry_safe(tbl, tmp, &priv->rx_reorder_tbl_ptr, list) { - if (!memcmp(tbl->ta, ta, ETH_ALEN)) { - spin_unlock_bh(&priv->rx_reorder_tbl_lock); - mwifiex_del_rx_reorder_entry(priv, tbl); - spin_lock_bh(&priv->rx_reorder_tbl_lock); - } - } + list_for_each_entry_safe(tbl, tmp, &priv->rx_reorder_tbl_ptr, list) + if (!memcmp(tbl->ta, ta, ETH_ALEN)) + list_move_tail(&tbl->list, &tmplist); spin_unlock_bh(&priv->rx_reorder_tbl_lock); + list_for_each_entry_safe(tbl, tmp, &tmplist, list) + mwifiex_del_rx_reorder_entry(priv, tbl); + return; } @@ -785,17 +785,15 @@ void mwifiex_11n_ba_stream_timeout(struct mwifiex_private *priv, void mwifiex_11n_cleanup_reorder_tbl(struct mwifiex_private *priv) { struct mwifiex_rx_reorder_tbl *del_tbl_ptr, *tmp_node; + LIST_HEAD(tmplist); spin_lock_bh(&priv->rx_reorder_tbl_lock); - list_for_each_entry_safe(del_tbl_ptr, tmp_node, - &priv->rx_reorder_tbl_ptr, list) { - spin_unlock_bh(&priv->rx_reorder_tbl_lock); - mwifiex_del_rx_reorder_entry(priv, del_tbl_ptr); - spin_lock_bh(&priv->rx_reorder_tbl_lock); - } - INIT_LIST_HEAD(&priv->rx_reorder_tbl_ptr); + list_splice_tail_init(&priv->rx_reorder_tbl_ptr, &tmplist); spin_unlock_bh(&priv->rx_reorder_tbl_lock); + list_for_each_entry_safe(del_tbl_ptr, tmp_node, &tmplist, list) + mwifiex_del_rx_reorder_entry(priv, del_tbl_ptr); + mwifiex_reset_11n_rx_seq_num(priv); }
Both of these list traversals use list_for_each_entry_safe(), yet drop the lock protecting the list during the traversal. Because the _safe() iterator stores a pointer to the next list node locally so the current node can be deleted, dropping the lock this way means the next "cached" list_head might be freed by another caller, leading the iterator to dereference pointers in freed memory after reacquiring the lock. Fix by moving to-be-deleted objects to an on-stack list before actually deleting them, so the lock can be held for the entire traversal. This is a bit ugly, because mwifiex_del_rx_reorder_entry() will still take the rx_reorder_tbl_lock to delete the item from the two on-stack lists introduced in this patch. But that is just ugly, not wrong, and the function has other callers... making the locking conditional seems strictly uglier. I discovered this bug while studying the new "nxpwifi" driver, which was sent to the mailing list about a month ago: https://lore.kernel.org/lkml/20240621075208.513497-1-yu-hao.lin@nxp.com/ ...but it turns out the new 11n_rxreorder.c in nxpwifi is essentially exactly identical to mwifiex, save for s/mwifiex/nxpwifi/, so I wanted to pass along a bugfix for the original driver as well. I only have an IW612, so this patch was only tested on "nxpwifi". Signed-off-by: Calvin Owens <calvin@wbinvd.org> --- .../wireless/marvell/mwifiex/11n_rxreorder.c | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-)