diff mbox series

mac80211: fix race in ieee80211_register_hw()

Message ID 1586175677-3061-1-git-send-email-sumit.garg@linaro.org (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show
Series mac80211: fix race in ieee80211_register_hw() | expand

Commit Message

Sumit Garg April 6, 2020, 12:21 p.m. UTC
A race condition leading to a kernel crash is observed during invocation
of ieee80211_register_hw() on a dragonboard410c device having wcn36xx
driver built as a loadable module along with a wifi manager in user-space
waiting for a wifi device (wlanX) to be active.

Sequence diagram for a particular kernel crash scenario:

    user-space  ieee80211_register_hw()  RX IRQ
    +++++++++++++++++++++++++++++++++++++++++++++
       |                    |             |
       |<---wlan0---wiphy_register()      |
       |----start wlan0---->|             |
       |                    |<---IRQ---(RX packet)
       |              Kernel crash        |
       |              due to unallocated  |
       |              workqueue.          |
       |                    |             |
       |       alloc_ordered_workqueue()  |
       |                    |             |
       |              Misc wiphy init.    |
       |                    |             |
       |            ieee80211_if_add()    |
       |                    |             |

As evident from above sequence diagram, this race condition isn't specific
to a particular wifi driver but rather the initialization sequence in
ieee80211_register_hw() needs to be fixed. So re-order the initialization
sequence and the updated sequence diagram would look like:

    user-space  ieee80211_register_hw()  RX IRQ
    +++++++++++++++++++++++++++++++++++++++++++++
       |                    |             |
       |       alloc_ordered_workqueue()  |
       |                    |             |
       |              Misc wiphy init.    |
       |                    |             |
       |<---wlan0---wiphy_register()      |
       |----start wlan0---->|             |
       |                    |<---IRQ---(RX packet)
       |                    |             |
       |            ieee80211_if_add()    |
       |                    |             |

Cc: <stable@vger.kernel.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
---
 net/mac80211/main.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Johannes Berg April 6, 2020, 12:44 p.m. UTC | #1
On Mon, 2020-04-06 at 17:51 +0530, Sumit Garg wrote:
> A race condition leading to a kernel crash is observed during invocation
> of ieee80211_register_hw() on a dragonboard410c device having wcn36xx
> driver built as a loadable module along with a wifi manager in user-space
> waiting for a wifi device (wlanX) to be active.
> 
> Sequence diagram for a particular kernel crash scenario:
> 
>     user-space  ieee80211_register_hw()  RX IRQ
>     +++++++++++++++++++++++++++++++++++++++++++++
>        |                    |             |
>        |<---wlan0---wiphy_register()      |
>        |----start wlan0---->|             |
>        |                    |<---IRQ---(RX packet)
>        |              Kernel crash        |
>        |              due to unallocated  |
>        |              workqueue.          |
>        |                    |             |
>        |       alloc_ordered_workqueue()  |
>        |                    |             |
>        |              Misc wiphy init.    |
>        |                    |             |
>        |            ieee80211_if_add()    |
>        |                    |             |
> 
> As evident from above sequence diagram, this race condition isn't specific
> to a particular wifi driver but rather the initialization sequence in
> ieee80211_register_hw() needs to be fixed. 

Indeed, oops.

> So re-order the initialization
> sequence and the updated sequence diagram would look like:
> 
>     user-space  ieee80211_register_hw()  RX IRQ
>     +++++++++++++++++++++++++++++++++++++++++++++
>        |                    |             |
>        |       alloc_ordered_workqueue()  |
>        |                    |             |
>        |              Misc wiphy init.    |
>        |                    |             |
>        |<---wlan0---wiphy_register()      |
>        |----start wlan0---->|             |
>        |                    |<---IRQ---(RX packet)
>        |                    |             |
>        |            ieee80211_if_add()    |
>        |                    |             |

Makes sense.

> @@ -1254,6 +1250,14 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
>  		local->sband_allocated |= BIT(band);
>  	}
>  
> +	rtnl_unlock();
> +
> +	result = wiphy_register(local->hw.wiphy);
> +	if (result < 0)
> +		goto fail_wiphy_register;
> +
> +	rtnl_lock();

I'm a bit worried about this unlock/relock here though.

I think we only need the RTNL for the call to
ieee80211_init_rate_ctrl_alg() and then later ieee80211_if_add(), so
perhaps we can move that a little closer?

All the stuff between is really just setting up local stuff, so doesn't
really need to worry?

johannes
Kalle Valo April 6, 2020, 12:44 p.m. UTC | #2
Sumit Garg <sumit.garg@linaro.org> writes:

> A race condition leading to a kernel crash is observed during invocation
> of ieee80211_register_hw() on a dragonboard410c device having wcn36xx
> driver built as a loadable module along with a wifi manager in user-space
> waiting for a wifi device (wlanX) to be active.
>
> Sequence diagram for a particular kernel crash scenario:
>
>     user-space  ieee80211_register_hw()  RX IRQ
>     +++++++++++++++++++++++++++++++++++++++++++++
>        |                    |             |
>        |<---wlan0---wiphy_register()      |
>        |----start wlan0---->|             |
>        |                    |<---IRQ---(RX packet)
>        |              Kernel crash        |
>        |              due to unallocated  |
>        |              workqueue.          |
>        |                    |             |
>        |       alloc_ordered_workqueue()  |
>        |                    |             |
>        |              Misc wiphy init.    |
>        |                    |             |
>        |            ieee80211_if_add()    |
>        |                    |             |
>
> As evident from above sequence diagram, this race condition isn't specific
> to a particular wifi driver but rather the initialization sequence in
> ieee80211_register_hw() needs to be fixed. So re-order the initialization
> sequence and the updated sequence diagram would look like:
>
>     user-space  ieee80211_register_hw()  RX IRQ
>     +++++++++++++++++++++++++++++++++++++++++++++
>        |                    |             |
>        |       alloc_ordered_workqueue()  |
>        |                    |             |
>        |              Misc wiphy init.    |
>        |                    |             |
>        |<---wlan0---wiphy_register()      |
>        |----start wlan0---->|             |
>        |                    |<---IRQ---(RX packet)
>        |                    |             |
>        |            ieee80211_if_add()    |
>        |                    |             |
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>

I have understood that no frames should be received until mac80211 calls
struct ieee80211_ops::start:

 * @start: Called before the first netdevice attached to the hardware
 *         is enabled. This should turn on the hardware and must turn on
 *         frame reception (for possibly enabled monitor interfaces.)
   
So I would claim that this is a bug in wcn36xx.
Johannes Berg April 6, 2020, 12:47 p.m. UTC | #3
On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> 
> >     user-space  ieee80211_register_hw()  RX IRQ
> >     +++++++++++++++++++++++++++++++++++++++++++++
> >        |                    |             |
> >        |<---wlan0---wiphy_register()      |
> >        |----start wlan0---->|             |
> >        |                    |<---IRQ---(RX packet)
> >        |              Kernel crash        |
> >        |              due to unallocated  |
> >        |              workqueue.          |

[snip]

> I have understood that no frames should be received until mac80211 calls
> struct ieee80211_ops::start:
> 
>  * @start: Called before the first netdevice attached to the hardware
>  *         is enabled. This should turn on the hardware and must turn on
>  *         frame reception (for possibly enabled monitor interfaces.)

True, but I think he's saying that you can actually add and configure an
interface as soon as the wiphy is registered?

The "wlan0" is kinda wrong there, should be "phy0" I guess, and then
interface added from iw?

johannes
Kalle Valo April 6, 2020, 12:52 p.m. UTC | #4
Johannes Berg <johannes@sipsolutions.net> writes:

> On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
>> 
>> >     user-space  ieee80211_register_hw()  RX IRQ
>> >     +++++++++++++++++++++++++++++++++++++++++++++
>> >        |                    |             |
>> >        |<---wlan0---wiphy_register()      |
>> >        |----start wlan0---->|             |
>> >        |                    |<---IRQ---(RX packet)
>> >        |              Kernel crash        |
>> >        |              due to unallocated  |
>> >        |              workqueue.          |
>
> [snip]
>
>> I have understood that no frames should be received until mac80211 calls
>> struct ieee80211_ops::start:
>> 
>>  * @start: Called before the first netdevice attached to the hardware
>>  *         is enabled. This should turn on the hardware and must turn on
>>  *         frame reception (for possibly enabled monitor interfaces.)
>
> True, but I think he's saying that you can actually add and configure an
> interface as soon as the wiphy is registered?

With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
mac80211 using ieee80211_rx(), but of course I'm just guessing here.
Johannes Berg April 6, 2020, 12:53 p.m. UTC | #5
On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> Johannes Berg <johannes@sipsolutions.net> writes:
> 
> > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> > > >     user-space  ieee80211_register_hw()  RX IRQ
> > > >     +++++++++++++++++++++++++++++++++++++++++++++
> > > >        |                    |             |
> > > >        |<---wlan0---wiphy_register()      |
> > > >        |----start wlan0---->|             |
> > > >        |                    |<---IRQ---(RX packet)
> > > >        |              Kernel crash        |
> > > >        |              due to unallocated  |
> > > >        |              workqueue.          |
> > 
> > [snip]
> > 
> > > I have understood that no frames should be received until mac80211 calls
> > > struct ieee80211_ops::start:
> > > 
> > >  * @start: Called before the first netdevice attached to the hardware
> > >  *         is enabled. This should turn on the hardware and must turn on
> > >  *         frame reception (for possibly enabled monitor interfaces.)
> > 
> > True, but I think he's saying that you can actually add and configure an
> > interface as soon as the wiphy is registered?
> 
> With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> mac80211 using ieee80211_rx(), but of course I'm just guessing here.

Yeah, but that could be legitimate?

johannes
Sumit Garg April 6, 2020, 1 p.m. UTC | #6
On Mon, 6 Apr 2020 at 18:14, Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Mon, 2020-04-06 at 17:51 +0530, Sumit Garg wrote:
> > A race condition leading to a kernel crash is observed during invocation
> > of ieee80211_register_hw() on a dragonboard410c device having wcn36xx
> > driver built as a loadable module along with a wifi manager in user-space
> > waiting for a wifi device (wlanX) to be active.
> >
> > Sequence diagram for a particular kernel crash scenario:
> >
> >     user-space  ieee80211_register_hw()  RX IRQ
> >     +++++++++++++++++++++++++++++++++++++++++++++
> >        |                    |             |
> >        |<---wlan0---wiphy_register()      |
> >        |----start wlan0---->|             |
> >        |                    |<---IRQ---(RX packet)
> >        |              Kernel crash        |
> >        |              due to unallocated  |
> >        |              workqueue.          |
> >        |                    |             |
> >        |       alloc_ordered_workqueue()  |
> >        |                    |             |
> >        |              Misc wiphy init.    |
> >        |                    |             |
> >        |            ieee80211_if_add()    |
> >        |                    |             |
> >
> > As evident from above sequence diagram, this race condition isn't specific
> > to a particular wifi driver but rather the initialization sequence in
> > ieee80211_register_hw() needs to be fixed.
>
> Indeed, oops.
>
> > So re-order the initialization
> > sequence and the updated sequence diagram would look like:
> >
> >     user-space  ieee80211_register_hw()  RX IRQ
> >     +++++++++++++++++++++++++++++++++++++++++++++
> >        |                    |             |
> >        |       alloc_ordered_workqueue()  |
> >        |                    |             |
> >        |              Misc wiphy init.    |
> >        |                    |             |
> >        |<---wlan0---wiphy_register()      |
> >        |----start wlan0---->|             |
> >        |                    |<---IRQ---(RX packet)
> >        |                    |             |
> >        |            ieee80211_if_add()    |
> >        |                    |             |
>
> Makes sense.
>
> > @@ -1254,6 +1250,14 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
> >               local->sband_allocated |= BIT(band);
> >       }
> >
> > +     rtnl_unlock();
> > +
> > +     result = wiphy_register(local->hw.wiphy);
> > +     if (result < 0)
> > +             goto fail_wiphy_register;
> > +
> > +     rtnl_lock();
>
> I'm a bit worried about this unlock/relock here though.
>
> I think we only need the RTNL for the call to
> ieee80211_init_rate_ctrl_alg() and then later ieee80211_if_add(), so
> perhaps we can move that a little closer?
>

Sure, will move rtnl_unlock() to just after call to
ieee80211_init_rate_ctrl_alg().

> All the stuff between is really just setting up local stuff, so doesn't
> really need to worry?
>

Okay.

-Sumit

> johannes
>
>
Kalle Valo April 6, 2020, 1:04 p.m. UTC | #7
Johannes Berg <johannes@sipsolutions.net> writes:

> On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
>> Johannes Berg <johannes@sipsolutions.net> writes:
>> 
>> > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
>> > > >     user-space  ieee80211_register_hw()  RX IRQ
>> > > >     +++++++++++++++++++++++++++++++++++++++++++++
>> > > >        |                    |             |
>> > > >        |<---wlan0---wiphy_register()      |
>> > > >        |----start wlan0---->|             |
>> > > >        |                    |<---IRQ---(RX packet)
>> > > >        |              Kernel crash        |
>> > > >        |              due to unallocated  |
>> > > >        |              workqueue.          |
>> > 
>> > [snip]
>> > 
>> > > I have understood that no frames should be received until mac80211 calls
>> > > struct ieee80211_ops::start:
>> > > 
>> > >  * @start: Called before the first netdevice attached to the hardware
>> > >  *         is enabled. This should turn on the hardware and must turn on
>> > >  *         frame reception (for possibly enabled monitor interfaces.)
>> > 
>> > True, but I think he's saying that you can actually add and configure an
>> > interface as soon as the wiphy is registered?
>> 
>> With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
>> mac80211 using ieee80211_rx(), but of course I'm just guessing here.
>
> Yeah, but that could be legitimate?

Ah, I misunderstood then. The way I have understood is that no rx frames
should be delivered (= calling ieee80211_rx()_ before start() is called,
but if that's not the case please ignore me :)
Sumit Garg April 6, 2020, 1:06 p.m. UTC | #8
On Mon, 6 Apr 2020 at 18:17, Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> >
> > >     user-space  ieee80211_register_hw()  RX IRQ
> > >     +++++++++++++++++++++++++++++++++++++++++++++
> > >        |                    |             |
> > >        |<---wlan0---wiphy_register()      |
> > >        |----start wlan0---->|             |
> > >        |                    |<---IRQ---(RX packet)
> > >        |              Kernel crash        |
> > >        |              due to unallocated  |
> > >        |              workqueue.          |
>
> [snip]
>
> > I have understood that no frames should be received until mac80211 calls
> > struct ieee80211_ops::start:
> >
> >  * @start: Called before the first netdevice attached to the hardware
> >  *         is enabled. This should turn on the hardware and must turn on
> >  *         frame reception (for possibly enabled monitor interfaces.)
>
> True, but I think he's saying that you can actually add and configure an
> interface as soon as the wiphy is registered?

Indeed, it's a call to "struct ieee80211_ops::start" just after wiphy
is registered that causes the frame to be received leading to RX IRQ.

>
> The "wlan0" is kinda wrong there, should be "phy0" I guess, and then
> interface added from iw?

Okay, will update the sequence diagram.

-Sumit

>
> johannes
>
Johannes Berg April 6, 2020, 1:07 p.m. UTC | #9
On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
> Johannes Berg <johannes@sipsolutions.net> writes:
> 
> > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > 
> > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> > > > > >     user-space  ieee80211_register_hw()  RX IRQ
> > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
> > > > > >        |                    |             |
> > > > > >        |<---wlan0---wiphy_register()      |
> > > > > >        |----start wlan0---->|             |
> > > > > >        |                    |<---IRQ---(RX packet)
> > > > > >        |              Kernel crash        |
> > > > > >        |              due to unallocated  |
> > > > > >        |              workqueue.          |
> > > > 
> > > > [snip]
> > > > 
> > > > > I have understood that no frames should be received until mac80211 calls
> > > > > struct ieee80211_ops::start:
> > > > > 
> > > > >  * @start: Called before the first netdevice attached to the hardware
> > > > >  *         is enabled. This should turn on the hardware and must turn on
> > > > >  *         frame reception (for possibly enabled monitor interfaces.)
> > > > 
> > > > True, but I think he's saying that you can actually add and configure an
> > > > interface as soon as the wiphy is registered?
> > > 
> > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
> > 
> > Yeah, but that could be legitimate?
> 
> Ah, I misunderstood then. The way I have understood is that no rx frames
> should be delivered (= calling ieee80211_rx()_ before start() is called,
> but if that's not the case please ignore me :)

No no, that _is_ the case. But I think the "start wlan0" could end up
calling it?

johannes
Sumit Garg April 6, 2020, 1:21 p.m. UTC | #10
On Mon, 6 Apr 2020 at 18:38, Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
> > Johannes Berg <johannes@sipsolutions.net> writes:
> >
> > > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> > > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > >
> > > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> > > > > > >     user-space  ieee80211_register_hw()  RX IRQ
> > > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >        |                    |             |
> > > > > > >        |<---wlan0---wiphy_register()      |
> > > > > > >        |----start wlan0---->|             |
> > > > > > >        |                    |<---IRQ---(RX packet)
> > > > > > >        |              Kernel crash        |
> > > > > > >        |              due to unallocated  |
> > > > > > >        |              workqueue.          |
> > > > >
> > > > > [snip]
> > > > >
> > > > > > I have understood that no frames should be received until mac80211 calls
> > > > > > struct ieee80211_ops::start:
> > > > > >
> > > > > >  * @start: Called before the first netdevice attached to the hardware
> > > > > >  *         is enabled. This should turn on the hardware and must turn on
> > > > > >  *         frame reception (for possibly enabled monitor interfaces.)
> > > > >
> > > > > True, but I think he's saying that you can actually add and configure an
> > > > > interface as soon as the wiphy is registered?
> > > >
> > > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> > > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
> > >
> > > Yeah, but that could be legitimate?
> >
> > Ah, I misunderstood then. The way I have understood is that no rx frames
> > should be delivered (= calling ieee80211_rx()_ before start() is called,
> > but if that's not the case please ignore me :)
>
> No no, that _is_ the case. But I think the "start wlan0" could end up
> calling it?
>

Sorry if I wasn't clear enough via the sequence diagram. It's a common
RX packet that arrives via ieee80211_tasklet_handler() which is
enabled via call to "struct ieee80211_ops::start" api.

-Sumit

> johannes
>
Kalle Valo April 6, 2020, 1:27 p.m. UTC | #11
Sumit Garg <sumit.garg@linaro.org> writes:

> On Mon, 6 Apr 2020 at 18:38, Johannes Berg <johannes@sipsolutions.net> wrote:
>>
>> On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
>> > Johannes Berg <johannes@sipsolutions.net> writes:
>> >
>> > > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
>> > > > Johannes Berg <johannes@sipsolutions.net> writes:
>> > > >
>> > > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
>> > > > > > >     user-space  ieee80211_register_hw()  RX IRQ
>> > > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
>> > > > > > >        |                    |             |
>> > > > > > >        |<---wlan0---wiphy_register()      |
>> > > > > > >        |----start wlan0---->|             |
>> > > > > > >        |                    |<---IRQ---(RX packet)
>> > > > > > >        |              Kernel crash        |
>> > > > > > >        |              due to unallocated  |
>> > > > > > >        |              workqueue.          |
>> > > > >
>> > > > > [snip]
>> > > > >
>> > > > > > I have understood that no frames should be received until mac80211 calls
>> > > > > > struct ieee80211_ops::start:
>> > > > > >
>> > > > > >  * @start: Called before the first netdevice attached to the hardware
>> > > > > >  *         is enabled. This should turn on the hardware and must turn on
>> > > > > >  *         frame reception (for possibly enabled monitor interfaces.)
>> > > > >
>> > > > > True, but I think he's saying that you can actually add and configure an
>> > > > > interface as soon as the wiphy is registered?
>> > > >
>> > > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
>> > > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
>> > >
>> > > Yeah, but that could be legitimate?
>> >
>> > Ah, I misunderstood then. The way I have understood is that no rx frames
>> > should be delivered (= calling ieee80211_rx()_ before start() is called,
>> > but if that's not the case please ignore me :)
>>
>> No no, that _is_ the case. But I think the "start wlan0" could end up
>> calling it?
>>
>
> Sorry if I wasn't clear enough via the sequence diagram. It's a common
> RX packet that arrives via ieee80211_tasklet_handler() which is
> enabled via call to "struct ieee80211_ops::start" api.

Ah sorry, I didn't realise that. So wcn36xx is not to be blamed then,
thanks for the clarification.
Krishna Chaitanya April 6, 2020, 1:55 p.m. UTC | #12
On Mon, Apr 6, 2020 at 6:57 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Sumit Garg <sumit.garg@linaro.org> writes:
>
> > On Mon, 6 Apr 2020 at 18:38, Johannes Berg <johannes@sipsolutions.net> wrote:
> >>
> >> On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
> >> > Johannes Berg <johannes@sipsolutions.net> writes:
> >> >
> >> > > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> >> > > > Johannes Berg <johannes@sipsolutions.net> writes:
> >> > > >
> >> > > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> >> > > > > > >     user-space  ieee80211_register_hw()  RX IRQ
> >> > > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
> >> > > > > > >        |                    |             |
> >> > > > > > >        |<---wlan0---wiphy_register()      |
> >> > > > > > >        |----start wlan0---->|             |
> >> > > > > > >        |                    |<---IRQ---(RX packet)
> >> > > > > > >        |              Kernel crash        |
> >> > > > > > >        |              due to unallocated  |
> >> > > > > > >        |              workqueue.          |
> >> > > > >
> >> > > > > [snip]
> >> > > > >
> >> > > > > > I have understood that no frames should be received until mac80211 calls
> >> > > > > > struct ieee80211_ops::start:
> >> > > > > >
> >> > > > > >  * @start: Called before the first netdevice attached to the hardware
> >> > > > > >  *         is enabled. This should turn on the hardware and must turn on
> >> > > > > >  *         frame reception (for possibly enabled monitor interfaces.)
> >> > > > >
> >> > > > > True, but I think he's saying that you can actually add and configure an
> >> > > > > interface as soon as the wiphy is registered?
> >> > > >
> >> > > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> >> > > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
> >> > >
> >> > > Yeah, but that could be legitimate?
> >> >
> >> > Ah, I misunderstood then. The way I have understood is that no rx frames
> >> > should be delivered (= calling ieee80211_rx()_ before start() is called,
> >> > but if that's not the case please ignore me :)
> >>
> >> No no, that _is_ the case. But I think the "start wlan0" could end up
> >> calling it?
> >>
> >
> > Sorry if I wasn't clear enough via the sequence diagram. It's a common
> > RX packet that arrives via ieee80211_tasklet_handler() which is
> > enabled via call to "struct ieee80211_ops::start" api.
>
> Ah sorry, I didn't realise that. So wcn36xx is not to be blamed then,
> thanks for the clarification.
>
> --
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
I am still confused, without ieee80211_if_add how can the userspace
bring up the interface?
(there by calling drv_start())?
Johannes Berg April 6, 2020, 2:01 p.m. UTC | #13
On Mon, 2020-04-06 at 19:25 +0530, Krishna Chaitanya wrote:
> On Mon, Apr 6, 2020 at 6:57 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > Sumit Garg <sumit.garg@linaro.org> writes:
> > 
> > > On Mon, 6 Apr 2020 at 18:38, Johannes Berg <johannes@sipsolutions.net> wrote:
> > > > On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
> > > > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > > > 
> > > > > > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> > > > > > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > > > > > 
> > > > > > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> > > > > > > > > >     user-space  ieee80211_register_hw()  RX IRQ
> > > > > > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > >        |                    |             |
> > > > > > > > > >        |<---wlan0---wiphy_register()      |
> > > > > > > > > >        |----start wlan0---->|             |
> > > > > > > > > >        |                    |<---IRQ---(RX packet)
> > > > > > > > > >        |              Kernel crash        |
> > > > > > > > > >        |              due to unallocated  |
> > > > > > > > > >        |              workqueue.          |
> > > > > > > > 
> > > > > > > > [snip]
> > > > > > > > 
> > > > > > > > > I have understood that no frames should be received until mac80211 calls
> > > > > > > > > struct ieee80211_ops::start:
> > > > > > > > > 
> > > > > > > > >  * @start: Called before the first netdevice attached to the hardware
> > > > > > > > >  *         is enabled. This should turn on the hardware and must turn on
> > > > > > > > >  *         frame reception (for possibly enabled monitor interfaces.)
> > > > > > > > 
> > > > > > > > True, but I think he's saying that you can actually add and configure an
> > > > > > > > interface as soon as the wiphy is registered?
> > > > > > > 
> > > > > > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> > > > > > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
> > > > > > 
> > > > > > Yeah, but that could be legitimate?
> > > > > 
> > > > > Ah, I misunderstood then. The way I have understood is that no rx frames
> > > > > should be delivered (= calling ieee80211_rx()_ before start() is called,
> > > > > but if that's not the case please ignore me :)
> > > > 
> > > > No no, that _is_ the case. But I think the "start wlan0" could end up
> > > > calling it?

> I am still confused, without ieee80211_if_add how can the userspace
> bring up the interface?

It can add its own interface. Maybe that won't be 'wlan0' but something
else?

like

iw phy0 interface add wlan0 type station
ip link set wlan0 up

johannes
Krishna Chaitanya April 6, 2020, 2:25 p.m. UTC | #14
On Mon, Apr 6, 2020 at 7:31 PM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Mon, 2020-04-06 at 19:25 +0530, Krishna Chaitanya wrote:
> > On Mon, Apr 6, 2020 at 6:57 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > > Sumit Garg <sumit.garg@linaro.org> writes:
> > >
> > > > On Mon, 6 Apr 2020 at 18:38, Johannes Berg <johannes@sipsolutions.net> wrote:
> > > > > On Mon, 2020-04-06 at 16:04 +0300, Kalle Valo wrote:
> > > > > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > > > >
> > > > > > > On Mon, 2020-04-06 at 15:52 +0300, Kalle Valo wrote:
> > > > > > > > Johannes Berg <johannes@sipsolutions.net> writes:
> > > > > > > >
> > > > > > > > > On Mon, 2020-04-06 at 15:44 +0300, Kalle Valo wrote:
> > > > > > > > > > >     user-space  ieee80211_register_hw()  RX IRQ
> > > > > > > > > > >     +++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > > >        |                    |             |
> > > > > > > > > > >        |<---wlan0---wiphy_register()      |
> > > > > > > > > > >        |----start wlan0---->|             |
> > > > > > > > > > >        |                    |<---IRQ---(RX packet)
> > > > > > > > > > >        |              Kernel crash        |
> > > > > > > > > > >        |              due to unallocated  |
> > > > > > > > > > >        |              workqueue.          |
> > > > > > > > >
> > > > > > > > > [snip]
> > > > > > > > >
> > > > > > > > > > I have understood that no frames should be received until mac80211 calls
> > > > > > > > > > struct ieee80211_ops::start:
> > > > > > > > > >
> > > > > > > > > >  * @start: Called before the first netdevice attached to the hardware
> > > > > > > > > >  *         is enabled. This should turn on the hardware and must turn on
> > > > > > > > > >  *         frame reception (for possibly enabled monitor interfaces.)
> > > > > > > > >
> > > > > > > > > True, but I think he's saying that you can actually add and configure an
> > > > > > > > > interface as soon as the wiphy is registered?
> > > > > > > >
> > > > > > > > With '<---IRQ---(RX packet)' I assumed wcn36xx is delivering a frame to
> > > > > > > > mac80211 using ieee80211_rx(), but of course I'm just guessing here.
> > > > > > >
> > > > > > > Yeah, but that could be legitimate?
> > > > > >
> > > > > > Ah, I misunderstood then. The way I have understood is that no rx frames
> > > > > > should be delivered (= calling ieee80211_rx()_ before start() is called,
> > > > > > but if that's not the case please ignore me :)
> > > > >
> > > > > No no, that _is_ the case. But I think the "start wlan0" could end up
> > > > > calling it?
>
> > I am still confused, without ieee80211_if_add how can the userspace
> > bring up the interface?
>
> It can add its own interface. Maybe that won't be 'wlan0' but something
> else?
>
> like
>
> iw phy0 interface add wlan0 type station
> ip link set wlan0 up
Ah okay, got it, thanks. Very narrow window though :-) as the
alloc_ordered_workqueue
doesn't need RTNL and there is a long way to go to do if_add() from
user and setup
the driver for interrupts. Again depends on the driver though, it
should properly handle
pending ieee80211_register_hw() with start().
Johannes Berg April 6, 2020, 3:06 p.m. UTC | #15
On Mon, 2020-04-06 at 19:55 +0530, Krishna Chaitanya wrote:

> > iw phy0 interface add wlan0 type station
> > ip link set wlan0 up
> Ah okay, got it, thanks. Very narrow window though :-) as the
> alloc_ordered_workqueue
> doesn't need RTNL and there is a long way to go to do if_add() from
> user and setup
> the driver for interrupts. 

True, I do wonder how this is hit. Maybe something with no preempt and a
uevent triggering things?

> Again depends on the driver though, it
> should properly handle
> pending ieee80211_register_hw() with start().

It could, but it'd be really tricky. Much better to fix mac80211.

johannes
Krishna Chaitanya April 6, 2020, 6:08 p.m. UTC | #16
On Mon, Apr 6, 2020 at 8:36 PM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Mon, 2020-04-06 at 19:55 +0530, Krishna Chaitanya wrote:
>
> > > iw phy0 interface add wlan0 type station
> > > ip link set wlan0 up
> > Ah okay, got it, thanks. Very narrow window though :-) as the
> > alloc_ordered_workqueue
> > doesn't need RTNL and there is a long way to go to do if_add() from
> > user and setup
> > the driver for interrupts.
>
> True, I do wonder how this is hit. Maybe something with no preempt and a
> uevent triggering things?
Probably, it might be specific to the dragonboard410c configuration

> > Again depends on the driver though, it
> > should properly handle
> > pending ieee80211_register_hw() with start().

> It could, but it'd be really tricky. Much better to fix mac80211.
Sure, anyways it is a good change.
Sumit Garg April 7, 2020, 6:42 a.m. UTC | #17
On Mon, 6 Apr 2020 at 23:39, Krishna Chaitanya <chaitanya.mgit@gmail.com> wrote:
>
> On Mon, Apr 6, 2020 at 8:36 PM Johannes Berg <johannes@sipsolutions.net> wrote:
> >
> > On Mon, 2020-04-06 at 19:55 +0530, Krishna Chaitanya wrote:
> >
> > > > iw phy0 interface add wlan0 type station
> > > > ip link set wlan0 up
> > > Ah okay, got it, thanks. Very narrow window though :-) as the
> > > alloc_ordered_workqueue
> > > doesn't need RTNL and there is a long way to go to do if_add() from
> > > user and setup
> > > the driver for interrupts.
> >
> > True, I do wonder how this is hit. Maybe something with no preempt and a
> > uevent triggering things?

The crash is reproducible while working with iwd [1] which is
basically a wireless daemon. It can be started as "iwd.service" during
boot that can detect wiphy registration events and configure
interfaces. Have a look at this text [2] from iwd manager.

To have a simple reproducer, please have a look at this trigger script
[3] from Matthias in CC. With this script I am able to reproduce the
kernel crash with approx. frequency of 1/10 across reboots on
dragonboard 410c.

There is nothing special like no preempt.

[1] https://wiki.archlinux.org/index.php/Iwd
[2] https://git.kernel.org/pub/scm/network/wireless/iwd.git/tree/src/manager.c#n563
[3] https://github.com/DasRoteSkelett/meta-iwd/blob/master/recipes-trigger/trigger/trigger/trigger.sh

> Probably, it might be specific to the dragonboard410c configuration
>

As described above, it isn't specific to any dragonboard 410c
configuration and one should be able to reproduce it on other boards
too using iwd depending on how long it takes to start corresponding
wiphy device.

> > > Again depends on the driver though, it
> > > should properly handle
> > > pending ieee80211_register_hw() with start().
>
> > It could, but it'd be really tricky. Much better to fix mac80211.

+1

-Sumit

> Sure, anyways it is a good change.
diff mbox series

Patch

diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 4c2b5ba..4ca62fc 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1051,7 +1051,7 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 		local->hw.wiphy->signal_type = CFG80211_SIGNAL_TYPE_UNSPEC;
 		if (hw->max_signal <= 0) {
 			result = -EINVAL;
-			goto fail_wiphy_register;
+			goto fail_workqueue;
 		}
 	}
 
@@ -1113,7 +1113,7 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 
 	result = ieee80211_init_cipher_suites(local);
 	if (result < 0)
-		goto fail_wiphy_register;
+		goto fail_workqueue;
 
 	if (!local->ops->remain_on_channel)
 		local->hw.wiphy->max_remain_on_channel_duration = 5000;
@@ -1139,10 +1139,6 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 
 	local->hw.wiphy->max_num_csa_counters = IEEE80211_MAX_CSA_COUNTERS_NUM;
 
-	result = wiphy_register(local->hw.wiphy);
-	if (result < 0)
-		goto fail_wiphy_register;
-
 	/*
 	 * We use the number of queues for feature tests (QoS, HT) internally
 	 * so restrict them appropriately.
@@ -1254,6 +1250,14 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 		local->sband_allocated |= BIT(band);
 	}
 
+	rtnl_unlock();
+
+	result = wiphy_register(local->hw.wiphy);
+	if (result < 0)
+		goto fail_wiphy_register;
+
+	rtnl_lock();
+
 	/* add one default STA interface if supported */
 	if (local->hw.wiphy->interface_modes & BIT(NL80211_IFTYPE_STATION) &&
 	    !ieee80211_hw_check(hw, NO_AUTO_VIF)) {
@@ -1293,6 +1297,8 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 #if defined(CONFIG_INET) || defined(CONFIG_IPV6)
  fail_ifa:
 #endif
+	wiphy_unregister(local->hw.wiphy);
+ fail_wiphy_register:
 	rtnl_lock();
 	rate_control_deinitialize(local);
 	ieee80211_remove_interfaces(local);
@@ -1302,8 +1308,6 @@  int ieee80211_register_hw(struct ieee80211_hw *hw)
 	ieee80211_led_exit(local);
 	destroy_workqueue(local->workqueue);
  fail_workqueue:
-	wiphy_unregister(local->hw.wiphy);
- fail_wiphy_register:
 	if (local->wiphy_ciphers_allocated)
 		kfree(local->hw.wiphy->cipher_suites);
 	kfree(local->int_scan_req);
@@ -1353,8 +1357,8 @@  void ieee80211_unregister_hw(struct ieee80211_hw *hw)
 	skb_queue_purge(&local->skb_queue_unreliable);
 	skb_queue_purge(&local->skb_queue_tdls_chsw);
 
-	destroy_workqueue(local->workqueue);
 	wiphy_unregister(local->hw.wiphy);
+	destroy_workqueue(local->workqueue);
 	ieee80211_led_exit(local);
 	kfree(local->int_scan_req);
 }