diff mbox

mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED

Message ID 1524633572-5588-1-git-send-email-gbhat@marvell.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Ganapathi Bhat April 25, 2018, 5:19 a.m. UTC
Whenever sched_scan(BG_SCAN) is in progress and driver downloads
any command, firmware will send an event BG_SCAN_STOPPED. On
recieving this, driver calls cfg80211_sched_scan_stopped. This
function in turn will try to acquire rtnl_lock. But if the
rtnl_lock was already held(while sending the command above), this
will result in nested rtnl locking. To fix this driver must call
rtnl version of the API if rtnl_is_locked().

Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
---
 drivers/net/wireless/marvell/mwifiex/sta_event.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Kalle Valo April 25, 2018, 7:16 a.m. UTC | #1
Ganapathi Bhat <gbhat@marvell.com> writes:

> Whenever sched_scan(BG_SCAN) is in progress and driver downloads
> any command, firmware will send an event BG_SCAN_STOPPED. On
> recieving this, driver calls cfg80211_sched_scan_stopped. This
> function in turn will try to acquire rtnl_lock. But if the
> rtnl_lock was already held(while sending the command above), this
> will result in nested rtnl locking. To fix this driver must call
> rtnl version of the API if rtnl_is_locked().
>
> Signed-off-by: Cathy Luo <cluo@marvell.com>
> Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>

Which one is the author? If Cathy is the author, you should add a From
header to indicate that.

> --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
>  
>  	case EVENT_BG_SCAN_STOPPED:
>  		dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> -		cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> +		if (rtnl_is_locked())
> +			cfg80211_sched_scan_stopped_rtnl(priv->wdev.wiphy, 0);
> +		else
> +			cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);

IMHO checking if a lock is taking is rather ugly and an indication
there's a problem with the locking. Instead making an ugly workaround
like this I would rather investigate who is holding the rtnl and solve
that.
Ganapathi Bhat April 25, 2018, 8:06 a.m. UTC | #2
Hi Kalle,

> ----------------------------------------------------------------------
> Ganapathi Bhat <gbhat@marvell.com> writes:
>
> > Whenever sched_scan(BG_SCAN) is in progress and driver downloads any
> > command, firmware will send an event BG_SCAN_STOPPED. On recieving
> > this, driver calls cfg80211_sched_scan_stopped. This function in turn
> > will try to acquire rtnl_lock. But if the rtnl_lock was already
> > held(while sending the command above), this will result in nested rtnl
> > locking. To fix this driver must call rtnl version of the API if
> > rtnl_is_locked().
> >
> > Signed-off-by: Cathy Luo <cluo@marvell.com>
> > Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
>
> Which one is the author? If Cathy is the author, you should add a From
> header to indicate that.
Sorry for the miss. Myself was the author. Will send v2 with corrections once we address the below comment.
>
> > --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct
> > mwifiex_private *priv)
> >
> >     case EVENT_BG_SCAN_STOPPED:
> >             dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > -           cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > +           if (rtnl_is_locked())
> > +                   cfg80211_sched_scan_stopped_rtnl(priv-
> >wdev.wiphy, 0);
> > +           else
> > +                   cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> 0);
>
> IMHO checking if a lock is taking is rather ugly and an indication there's a
> problem with the locking. Instead making an ugly workaround like this I
> would rather investigate who is holding the rtnl and solve that.
There can be applications trying to access driver(via cfg80211), holding rtnl_lock.
For example(in our case):
1. "iw dev"  was called, when BG_SCAN was active.
2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in internal_flags)
3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
4. mwifiex will download RF_TX_PWR command to firmware
5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl locking.

Please share your comments further.
>
> --
> Kalle Valo
Ganapathi Bhat June 27, 2018, 3 p.m. UTC | #3
Hi Kalle,

> > >
> > >   case EVENT_BG_SCAN_STOPPED:
> > >           dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > > -         cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > > +         if (rtnl_is_locked())
> > > +                 cfg80211_sched_scan_stopped_rtnl(priv-
> > >wdev.wiphy, 0);
> > > +         else
> > > +                 cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> > 0);
> >
> > IMHO checking if a lock is taking is rather ugly and an indication there's a
> > problem with the locking. Instead making an ugly workaround like this I
> > would rather investigate who is holding the rtnl and solve that.
> There can be applications trying to access driver(via cfg80211), holding
> rtnl_lock.
> For example(in our case):
> 1. "iw dev"  was called, when BG_SCAN was active.
> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in
> internal_flags)
> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
> 4. mwifiex will download RF_TX_PWR command to firmware
> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl
> locking.
>
> Please share your comments further.
Let us know your thoughts on above sequence.

Regards,
Ganapathi
Kalle Valo June 27, 2018, 3:22 p.m. UTC | #4
Ganapathi Bhat <gbhat@marvell.com> writes:

> Hi Kalle,
>
>> > >
>> > >   case EVENT_BG_SCAN_STOPPED:
>> > >           dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
>> > > -         cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
>> > > +         if (rtnl_is_locked())
>> > > +                 cfg80211_sched_scan_stopped_rtnl(priv-
>> > >wdev.wiphy, 0);
>> > > +         else
>> > > +                 cfg80211_sched_scan_stopped(priv->wdev.wiphy,
>> > 0);
>> >
>> > IMHO checking if a lock is taking is rather ugly and an indication there's a
>> > problem with the locking. Instead making an ugly workaround like this I
>> > would rather investigate who is holding the rtnl and solve that.
>> There can be applications trying to access driver(via cfg80211), holding
>> rtnl_lock.
>> For example(in our case):
>> 1. "iw dev"  was called, when BG_SCAN was active.
>> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in
>> internal_flags)
>> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
>> 4. mwifiex will download RF_TX_PWR command to firmware
>> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
>> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl
>> locking.
>>
>> Please share your comments further.
>
> Let us know your thoughts on above sequence.

Sorry, I don't have time to help with this. Hopefully someone else can.
Brian Norris Dec. 8, 2018, 12:58 a.m. UTC | #5
Hi Ganapathi,

For some reason, I'm daft enough to reply to this ancient thread. It
appears as if you probably have not resolved this issue yet though, so
I figured I could lend some advice.

On Wed, Apr 25, 2018 at 1:06 AM Ganapathi Bhat <gbhat@marvell.com> wrote:
> > Ganapathi Bhat <gbhat@marvell.com> writes:
> > > --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > > @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct
> > > mwifiex_private *priv)
> > >
> > >     case EVENT_BG_SCAN_STOPPED:
> > >             dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > > -           cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > > +           if (rtnl_is_locked())
> > > +                   cfg80211_sched_scan_stopped_rtnl(priv-
> > >wdev.wiphy, 0);
> > > +           else
> > > +                   cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> > 0);
> >
> > IMHO checking if a lock is taking is rather ugly and an indication there's a
> > problem with the locking. Instead making an ugly workaround like this I
> > would rather investigate who is holding the rtnl and solve that.

Agreed, this is not good. You are now bound to hit ASSERT_RTNL()
warnings occasionally, since you might see rtnl locked here, but a
split second later, when you're running in
__cfg80211_stop_sched_scan(), it could be released by some other
thread.

> There can be applications trying to access driver(via cfg80211), holding rtnl_lock.
> For example(in our case):
> 1. "iw dev"  was called, when BG_SCAN was active.
> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in internal_flags)
> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
> 4. mwifiex will download RF_TX_PWR command to firmware
> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl locking.

IIUC, it's not exactly a nested lock, but a lock inversion issue.

#1
NL80211_CMD_GET_INTERFACE thread is holding rtnl lock and later
waiting on one of its HostCmd_CMD_* to complete

In the meantime:
#2
a EVENT_BG_SCAN_STOPPED is queued, and it grabs the rtnl lock

Because events are served on the same thread as commands, you get #1
waiting on #2, which is waiting on the lock held in #1. i.e., ABBA.

The way to resolve this is to either move event processing to a
different thread than command processing (that looks it would be very
difficult to do correctly, given the ossified structure of your
driver; but that might be a wise move in the long term)...
...or else maybe you could defer this specific piece of work to its
own thread. That might require yet another workqueue?

Anyway, the key point is that you really don't want to hold rtnl_lock
in your main workqueue, or in any other way that might block the rest
of the operation of your driver.

Brian
Ganapathi Bhat Dec. 10, 2018, 3:27 a.m. UTC | #6
Hi Brian,
> 
> IIUC, it's not exactly a nested lock, but a lock inversion issue.
> 
> #1
> NL80211_CMD_GET_INTERFACE thread is holding rtnl lock and later waiting
> on one of its HostCmd_CMD_* to complete
> 
> In the meantime:
> #2
> a EVENT_BG_SCAN_STOPPED is queued, and it grabs the rtnl lock
> 
> Because events are served on the same thread as commands, you get #1
> waiting on #2, which is waiting on the lock held in #1. i.e., ABBA.
> 
> The way to resolve this is to either move event processing to a different
> thread than command processing (that looks it would be very difficult to do
> correctly, given the ossified structure of your driver; but that might be a wise
> move in the long term)...
> ...or else maybe you could defer this specific piece of work to its own thread.
> That might require yet another workqueue?
Yes, with current design we should be doing this. We will try to get this change.
> 
> Anyway, the key point is that you really don't want to hold rtnl_lock in your
> main workqueue, or in any other way that might block the rest of the
> operation of your driver.
Ok. Thanks for clarifying the scenario. I missed out the fact that same thread is blocked by the lock.
> 
> Brian
diff mbox

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 93dfb76..03ef625c 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -848,7 +848,10 @@  int mwifiex_process_sta_event(struct mwifiex_private *priv)
 
 	case EVENT_BG_SCAN_STOPPED:
 		dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
-		cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
+		if (rtnl_is_locked())
+			cfg80211_sched_scan_stopped_rtnl(priv->wdev.wiphy, 0);
+		else
+			cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
 		if (priv->sched_scanning)
 			priv->sched_scanning = false;
 		break;