diff mbox

[1/2] cfg80211: ignore netif running state when changing iftype

Message ID 1432039021-29666-1-git-send-email-michal.kazior@tieto.com (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show

Commit Message

Michal Kazior May 19, 2015, 12:37 p.m. UTC
This isn't a revert of f8cdddb8d61d ("cfg80211:
check iface combinations only when iface is
running") as far as functionality is considred
because b6a550156bc ("cfg80211/mac80211: move more
combination checks to mac80211") moved the logic
somewhere else.

It was possible for mac80211 to be coerced into an
unexpected flow causing sdata union to become
corrupted. Station pointer was put into
sdata->u.vlan.sta memory location while it was
really master AP's sdata->u.ap.next_beacon. This
led to station entry being later freed as CSA
beacon before __sta_info_flush() in
ieee80211_stop_ap() and a subsequent invalid
pointer dereference crash.

The problem was observed with the following test
steps:

 1. prepare 2 devices
 2. start hostapd AP with wds_sta=1
 3. connect client with 4addr
 4. disconnect
 5. swap roles & connect
 6. disconnect
    [ During AP (which was a client first)
      teardown kernel would crash. ]

Fixes: f8cdddb8d61d ("cfg80211: check iface combinations only when iface is running")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 net/wireless/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Johannes Berg May 20, 2015, 1:17 p.m. UTC | #1
On Tue, 2015-05-19 at 14:37 +0200, Michal Kazior wrote:
> This isn't a revert of f8cdddb8d61d ("cfg80211:
> check iface combinations only when iface is
> running") as far as functionality is considred
> because b6a550156bc ("cfg80211/mac80211: move more
> combination checks to mac80211") moved the logic
> somewhere else.
> 
> It was possible for mac80211 to be coerced into an
> unexpected flow causing sdata union to become
> corrupted. Station pointer was put into
> sdata->u.vlan.sta memory location while it was
> really master AP's sdata->u.ap.next_beacon. This
> led to station entry being later freed as CSA
> beacon before __sta_info_flush() in
> ieee80211_stop_ap() and a subsequent invalid
> pointer dereference crash.
> 
> The problem was observed with the following test
> steps:
> 
>  1. prepare 2 devices
>  2. start hostapd AP with wds_sta=1
>  3. connect client with 4addr
>  4. disconnect
>  5. swap roles & connect
>  6. disconnect
>     [ During AP (which was a client first)
>       teardown kernel would crash. ]

That doesn't really explain how it crashes?

> +++ b/net/wireless/util.c
> @@ -944,7 +944,7 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
>  	     ntype == NL80211_IFTYPE_P2P_CLIENT))
>  		return -EBUSY;
>  
> -	if (ntype != otype && netif_running(dev)) {
> +	if (ntype != otype) {
>  		dev->ieee80211_ptr->use_4addr = false;
>  		dev->ieee80211_ptr->mesh_id_up_len = 0;
>  		wdev_lock(dev->ieee80211_ptr);

I don't think that makes much sense - the code within this block really
only makes sense when the interface *is* running, like disconnecting
etc. Doing that when it's *not* would be entirely unexpected to the
drivers, no?

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg May 20, 2015, 1:19 p.m. UTC | #2
On Wed, 2015-05-20 at 15:17 +0200, Johannes Berg wrote:

> > -	if (ntype != otype && netif_running(dev)) {
> > +	if (ntype != otype) {
> >  		dev->ieee80211_ptr->use_4addr = false;
> >  		dev->ieee80211_ptr->mesh_id_up_len = 0;
> >  		wdev_lock(dev->ieee80211_ptr);
> 
> I don't think that makes much sense - the code within this block really
> only makes sense when the interface *is* running, like disconnecting
> etc. Doing that when it's *not* would be entirely unexpected to the
> drivers, no?

The real problem here might be the assignment to use_4addr *before*
we've actually disconnected or anything, perhaps that should be moved?

Similarly, the mesh_id_up_len should probably be moved into the mesh
point switch case...

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kazior May 21, 2015, 7:44 a.m. UTC | #3
On 20 May 2015 at 15:19, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2015-05-20 at 15:17 +0200, Johannes Berg wrote:
>
>> > -   if (ntype != otype && netif_running(dev)) {
>> > +   if (ntype != otype) {
>> >             dev->ieee80211_ptr->use_4addr = false;
>> >             dev->ieee80211_ptr->mesh_id_up_len = 0;
>> >             wdev_lock(dev->ieee80211_ptr);
>>
>> I don't think that makes much sense - the code within this block really
>> only makes sense when the interface *is* running, like disconnecting
>> etc. Doing that when it's *not* would be entirely unexpected to the
>> drivers, no?

The netif_running() was originally introduced in f8cdddb8d61d which
did in for entirely different purpose. Hence stripping netif_running()
shouldn't be a problem for drivers, can it? The patch was also made
kind of obsolete by b6a550156bc. Perhaps there are some other
behavioral changes that I'm unaware of in stuff that gets called upon
entering this condition down the stack..?

Maybe it's just safer to move use_4addr/mesh_id_up_len clearing into a
separate `oftype != ntype` condition. What do you think?


> The real problem here might be the assignment to use_4addr *before*
> we've actually disconnected or anything, perhaps that should be moved?
>
> Similarly, the mesh_id_up_len should probably be moved into the mesh
> point switch case...

The problem isn't use_4addr clearing ordering per se. The problem is
it wasn't cleared at all on interface changes. The minimal subset of
commands that would crash the system was:

  # host A and host B have just booted; no wpa_s/hostapd running; all
vifs are down
  host A> iw wlan0 set type station
  host A> iw wlan0 set 4addr on
  host A> printf
'interface=wlan0\nssid=4addrcrash\nchannel=1\nwds_sta=1' > /tmp/hconf
  host A> hostapd -B /tmp/conf
  host B> iw wlan0 set 4addr on
  host B> ifconfig wlan0 up
  host B> iw wlan0 connect -w hostAssid
  host A> pkill hostapd
  # host A crashes:

[  127.928192] BUG: unable to handle kernel NULL pointer dereference
at 00000000000006c8
[  127.929014] IP: [<ffffffff816f4f32>] __sta_info_flush+0xac/0x158
...
[  127.934578]  [<ffffffff8170789e>] ieee80211_stop_ap+0x139/0x26c
[  127.934578]  [<ffffffff8100498f>] ? dump_trace+0x279/0x28a
[  127.934578]  [<ffffffff816dc661>] __cfg80211_stop_ap+0x84/0x191
[  127.934578]  [<ffffffff816dc7ad>] cfg80211_stop_ap+0x3f/0x58
[  127.934578]  [<ffffffff816c5ad6>] nl80211_stop_ap+0x1b/0x1d
[  127.934578]  [<ffffffff815e53f8>] genl_family_rcv_msg+0x259/0x2b5

; gdb vmlinux -ex 'l * __sta_info_flush+0xac' -ex quit --quiet
Reading symbols from vmlinux...done.
0xffffffff816f4f32 is in __sta_info_flush
(/devel/src/linux/net/mac80211/sta_info.c:1033).
1028     WARN_ON(vlans && !sdata->bss);
1029
1030     mutex_lock(&local->sta_mtx);
1031     list_for_each_entry_safe(sta, tmp, &local->sta_list, list) {
1032           if (sdata == sta->sdata ||
1033               (vlans && sdata->bss == sta->sdata->bss)) {
1034            if (!WARN_ON(__sta_info_destroy_part1(sta)))
1035                    list_add(&sta->free_list, &free_list);
1036            ret++;
1037           }

This is because:

(cfg.c, ieee80211_change_station)
  1397                  if (params->vlan->ieee80211_ptr->use_4addr) {
  1398                          if (vlansdata->u.vlan.sta) {
  1399                                  err = -EBUSY;
  1400                                  goto out_err;
  1401                          }
  1402
  1403                          rcu_assign_pointer(vlansdata->u.vlan.sta, sta);
  1404                          new_4addr = true;
  1405                  }
  1406

@1397 would eval to true even though vlansdata is IFTYPE_AP.
u.vlan.sta has identical memory offset as u.ap.next_beacon. During
ieee80211_stop_ap() it would:

(cfg.c, ieee80211_stop_ap)
   913          kfree(sdata->u.ap.next_beacon);
   914          sdata->u.ap.next_beacon = NULL;
...
   929          __sta_info_flush(sdata, true);

Effectively sta_info was freed at @914 and then __sta_info_flush()
tried to use it (after it was freed).

Do you want me to put the above into the commit log? Should I put a
copy into the mac80211 patch as well?


Micha?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg May 22, 2015, 8:34 a.m. UTC | #4
On Thu, 2015-05-21 at 09:44 +0200, Michal Kazior wrote:
> On 20 May 2015 at 15:19, Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Wed, 2015-05-20 at 15:17 +0200, Johannes Berg wrote:
> >
> >> > -   if (ntype != otype && netif_running(dev)) {
> >> > +   if (ntype != otype) {
> >> >             dev->ieee80211_ptr->use_4addr = false;
> >> >             dev->ieee80211_ptr->mesh_id_up_len = 0;
> >> >             wdev_lock(dev->ieee80211_ptr);
> >>
> >> I don't think that makes much sense - the code within this block really
> >> only makes sense when the interface *is* running, like disconnecting
> >> etc. Doing that when it's *not* would be entirely unexpected to the
> >> drivers, no?
> 
> The netif_running() was originally introduced in f8cdddb8d61d which
> did in for entirely different purpose. Hence stripping netif_running()
> shouldn't be a problem for drivers, can it? The patch was also made
> kind of obsolete by b6a550156bc. Perhaps there are some other
> behavioral changes that I'm unaware of in stuff that gets called upon
> entering this condition down the stack..?

Perhaps it doesn't matter actually, since all the functions that are
called here will double-check things before calling the driver?

> Maybe it's just safer to move use_4addr/mesh_id_up_len clearing into a
> separate `oftype != ntype` condition. What do you think?

Or maybe we should just look at the functions we call in more detail :)

> > The real problem here might be the assignment to use_4addr *before*
> > we've actually disconnected or anything, perhaps that should be moved?
> >
> > Similarly, the mesh_id_up_len should probably be moved into the mesh
> > point switch case...
> 
> The problem isn't use_4addr clearing ordering per se. The problem is
> it wasn't cleared at all on interface changes. 

Ah.

> Do you want me to put the above into the commit log? Should I put a
> copy into the mac80211 patch as well?

I think just noting (more explicitly) that it was *missing* a clearing
in these cases would have been helpful.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 70051ab52f4f..7e4e3fffe7ce 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -944,7 +944,7 @@  int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
 	     ntype == NL80211_IFTYPE_P2P_CLIENT))
 		return -EBUSY;
 
-	if (ntype != otype && netif_running(dev)) {
+	if (ntype != otype) {
 		dev->ieee80211_ptr->use_4addr = false;
 		dev->ieee80211_ptr->mesh_id_up_len = 0;
 		wdev_lock(dev->ieee80211_ptr);