mbox series

pull-request: mac80211-next 2019-07-31

Message ID 20190731155057.23035-1-johannes@sipsolutions.net
State Not Applicable
Headers show
Series pull-request: mac80211-next 2019-07-31 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-davem-2019-07-31

Message

Johannes Berg July 31, 2019, 3:50 p.m. UTC
Hi Dave,

There's a fair number of changes here, so I thought I'd get them out.
I've included two Intel driver cleanups because Luca is on vacation,
I'm covering for him, and doing it all in one tree let me merge all
of the patches at once (including mac80211 that depends on that);
Kalle is aware.

Also, though this isn't very interesting yet, I've started looking at
weaning the wireless subsystem off the RTNL for all operations, as it
can cause significant lock contention, especially with slow USB devices.
The real patches for that are some way off, but one preparation here is
to use generic netlink's parallel_ops=true, to avoid trading one place
with contention for another in the future, and to avoid adding more
genl_family_attrbuf() usage (since that's no longer possible with the
parallel_ops setting).

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 00c33afbf9dd06f77a2f15117cd4bdc2a54b51d7:

  net: mvneta: use devm_platform_ioremap_resource() to simplify code (2019-07-25 17:28:11 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-davem-2019-07-31

for you to fetch changes up to f39b07fdfb688724fedabf5507e15eaf398f2500:

  mac80211: HE STA disassoc due to QOS NULL not sent (2019-07-31 13:26:41 +0200)

----------------------------------------------------------------
We have a reasonably large number of changes:
 * lots more HE (802.11ax) support, particularly things
   relevant for the the AP side, but also mesh support
 * debugfs cleanups from Greg
 * some more work on extended key ID
 * start using genl parallel_ops, as preparation for
   weaning ourselves off RTNL and getting parallelism
 * various other changes all over

----------------------------------------------------------------
Alexander Wetzel (3):
      mac80211_hwsim: Extended Key ID API update
      mac80211: Simplify Extended Key ID API
      mac80211: AMPDU handling for rekeys with Extended Key ID

Ard Biesheuvel (1):
      lib80211: use crypto API ccm(aes) transform for CCMP processing

Christophe JAILLET (1):
      mac80211_hwsim: Fix a typo in the name of function 'mac80211_hswim_he_capab()'

Colin Ian King (1):
      mac80211: add missing null return check from call to ieee80211_get_sband

Denis Kenzior (2):
      nl80211: document uapi for CMD_FRAME_WAIT_CANCEL
      nl80211: Include wiphy address setup in NEW_WIPHY

Emmanuel Grumbach (1):
      mac80211: pass the vif to cancel_remain_on_channel

Erik Stromdahl (1):
      mac80211: add tx dequeue function for process context

Greg Kroah-Hartman (4):
      iwlwifi: dvm: no need to check return value of debugfs_create functions
      iwlwifi: mvm: remove unused .remove_sta_debugfs callback
      mac80211: remove unused and unneeded remove_sta_debugfs callback
      cfg80211: no need to check return value of debugfs_create functions

Johannes Berg (6):
      cfg80211: clean up cfg80211_inform_single_bss_frame_data()
      cfg80211: don't parse MBSSID if transmitting BSS isn't created
      cfg80211: give all multi-BSSID BSS entries the same timestamp
      mac80211_hwsim: fill boottime_ns in netlink RX path
      cfg80211: use parallel_ops for genl
      nl80211: add strict start type

John Crispin (10):
      mac80211: add support for parsing ADDBA_EXT IEs
      mac80211: add xmit rate to struct ieee80211_tx_status
      mac80211: propagate struct ieee80211_tx_status into ieee80211_tx_monitor()
      mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header
      mac80211: HE: add Spatial Reuse element parsing support
      mac80211: fix ieee80211_he_oper_size() comment
      mac80211: propagate HE operation info into bss_conf
      mac80211: add support for the ADDBA extension element
      cfg80211: add support for parsing OBBS_PD attributes
      mac80211: allow setting spatial reuse parameters from bss_conf

Karthikeyan Periyasamy (1):
      mac80211: reject zero MAC address in add station

Lorenzo Bianconi (1):
      mac80211: add IEEE80211_KEY_FLAG_GENERATE_MMIE to ieee80211_key_flags

Michael Vassernis (1):
      cfg80211: fix dfs channels remain DFS_AVAILABLE after ch_switch

Sergey Matyukevich (2):
      cfg80211: refactor cfg80211_bss_update
      cfg80211: fix duplicated scan entries after channel switch

Shay Bar (1):
      mac80211: HE STA disassoc due to QOS NULL not sent

Sven Eckelmann (1):
      mac80211: implement HE support for mesh

 drivers/net/wireless/ath/ath10k/mac.c             |   3 +-
 drivers/net/wireless/ath/ath9k/main.c             |   3 +-
 drivers/net/wireless/intel/iwlwifi/dvm/rs.c       |  29 +--
 drivers/net/wireless/intel/iwlwifi/dvm/rs.h       |   4 -
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c |   3 +-
 drivers/net/wireless/intel/iwlwifi/mvm/rs.c       |   5 -
 drivers/net/wireless/mac80211_hwsim.c             |  20 +-
 drivers/net/wireless/rsi/rsi_91x_mac80211.c       |   3 +-
 drivers/net/wireless/ti/wlcore/main.c             |   3 +-
 include/linux/ieee80211.h                         |  63 ++++-
 include/net/cfg80211.h                            |  15 ++
 include/net/mac80211.h                            |  53 ++++-
 include/uapi/linux/nl80211.h                      |  31 ++-
 net/mac80211/agg-rx.c                             |  72 +++++-
 net/mac80211/cfg.c                                |   7 +-
 net/mac80211/debugfs.c                            |   3 +-
 net/mac80211/driver-ops.h                         |   8 +-
 net/mac80211/he.c                                 |  39 ++++
 net/mac80211/ht.c                                 |   2 +-
 net/mac80211/ieee80211_i.h                        |  17 +-
 net/mac80211/key.c                                |  16 +-
 net/mac80211/main.c                               |  18 +-
 net/mac80211/mesh.c                               |  62 +++++
 net/mac80211/mesh.h                               |   4 +
 net/mac80211/mesh_plink.c                         |  12 +-
 net/mac80211/mlme.c                               |   7 +-
 net/mac80211/offchannel.c                         |   5 +-
 net/mac80211/rate.h                               |   9 -
 net/mac80211/sta_info.c                           |   1 -
 net/mac80211/status.c                             | 180 +++++++++++++--
 net/mac80211/trace.h                              |   7 +-
 net/mac80211/tx.c                                 |   5 +-
 net/mac80211/util.c                               |  60 +++++
 net/mac80211/wpa.c                                |   6 +-
 net/wireless/Kconfig                              |   2 +
 net/wireless/core.c                               |  17 +-
 net/wireless/core.h                               |   2 +
 net/wireless/lib80211_crypt_ccmp.c                | 197 +++++++---------
 net/wireless/nl80211.c                            | 182 ++++++++++++---
 net/wireless/scan.c                               | 269 ++++++++++++++--------
 40 files changed, 1070 insertions(+), 374 deletions(-)

Comments

David Miller July 31, 2019, 4 p.m. UTC | #1
From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed, 31 Jul 2019 17:50:56 +0200

> There's a fair number of changes here, so I thought I'd get them out.
> I've included two Intel driver cleanups because Luca is on vacation,
> I'm covering for him, and doing it all in one tree let me merge all
> of the patches at once (including mac80211 that depends on that);
> Kalle is aware.
> 
> Also, though this isn't very interesting yet, I've started looking at
> weaning the wireless subsystem off the RTNL for all operations, as it
> can cause significant lock contention, especially with slow USB devices.
> The real patches for that are some way off, but one preparation here is
> to use generic netlink's parallel_ops=true, to avoid trading one place
> with contention for another in the future, and to avoid adding more
> genl_family_attrbuf() usage (since that's no longer possible with the
> parallel_ops setting).
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

I'm sure people like Florian Westphal will also appreciate your RTNL
elimination work...
Arnd Bergmann Oct. 21, 2019, 7:39 p.m. UTC | #2
On Wed, Jul 31, 2019 at 5:53 PM Johannes Berg <johannes@sipsolutions.net> wrote:
> John Crispin (10):
>       mac80211: add support for parsing ADDBA_EXT IEs
>       mac80211: add xmit rate to struct ieee80211_tx_status
>       mac80211: propagate struct ieee80211_tx_status into ieee80211_tx_monitor()
>       mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header
>       mac80211: HE: add Spatial Reuse element parsing support

Hi Johannes and John,

It looks like one of the last additions pushed the stack usage over
the 1024 byte limit
for 32-bit architectures:

net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in
function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=]

struct ieee802_11_elems is fairly large, and just grew another two pointers.
When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success()
are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies
of this structure, which is slightly too much.

Marking any of those functions as __noinline_for_stack would shut up the
warning but not fix the underlying issue. Silencing the warning might
be enough if there is a fairly short call chain leading up to
ieee80211_sta_rx_queued_mgmt(). Another option would be a dynamic
allocation.

Thoughts?

      Arnd
Johannes Berg Oct. 28, 2019, 9:51 a.m. UTC | #3
Hi Arnd,

> It looks like one of the last additions pushed the stack usage over
> the 1024 byte limit
> for 32-bit architectures:
> 
> net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in
> function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=]
> 
> struct ieee802_11_elems is fairly large, and just grew another two pointers.
> When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success()
> are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies
> of this structure, which is slightly too much.

Hmm. I guess that means the compiler isn't smart enough to make the
copies from the inlined sub-functions alias each other? I mean, if I
have

fn1(...) { struct ... elems1; ... }
fn2(...) { struct ... elems2; ... }

fn(...)
{
  fn1();
  fn2();
}

then it could reasonably use the same stack memory for elems1 and
elems2, at least theoretically, but you're saying it doesn't do that I
guess?

It could even do that for different BBs, in theory ...

If it does, I'd have suggested to move the code from the outer function
inside the "case IEEE80211_STYPE_ACTION:" block into a new sub-function, 
but that won't work then.

I don't think dynamic allocation would be nice - but we could manually
do this by passing the elems pointer into the
ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions.


Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't
see this ... hmm.

johannes
Arnd Bergmann Oct. 28, 2019, 10:53 a.m. UTC | #4
On Mon, Oct 28, 2019 at 10:52 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> > It looks like one of the last additions pushed the stack usage over
> > the 1024 byte limit
> > for 32-bit architectures:
> >
> > net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in
> > function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=]
> >
> > struct ieee802_11_elems is fairly large, and just grew another two pointers.
> > When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success()
> > are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies
> > of this structure, which is slightly too much.
>
> Hmm. I guess that means the compiler isn't smart enough to make the
> copies from the inlined sub-functions alias each other? I mean, if I
> have
>
> fn1(...) { struct ... elems1; ... }
> fn2(...) { struct ... elems2; ... }
>
> fn(...)
> {
>   fn1();
>   fn2();
> }
>
> then it could reasonably use the same stack memory for elems1 and
> elems2, at least theoretically, but you're saying it doesn't do that I
> guess?

No, that's not the problem here (it can happen if the compiler is
unable to prove
the object lifetimes are non-overlapping).

What we have here are multiple functions that are in the same call chain:

fn1()
{
     struct ieee802_11_elems e ;
}

fn2()
{
   struct ieee802_11_elems e;
  ...
   fn1();
}

fn3()
{
   struct ieee802_11_elems e;
  ...
   fn2();
}

Here, the object lifetimes actually do overlap, so the compiler cannot easily
optimize that away.

> I don't think dynamic allocation would be nice - but we could manually
> do this by passing the elems pointer into the
> ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions.

Ah, so you mean we can reuse the objects manually? I think that would
be great. I could not tell if that's possible when reading the source, but
if you can show that this works, that would be an easy solution.

> Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't
> see this ... hmm.

That is correct. For historic reasons, both the total amount of stack space
per thread and the warning limit on 64 bit are twice the amount that we
have on 32-bit kernels, so even though the problem is more serious on
64-bit architectures, we do not see a warning about it because we remain
well under the warning limit.

      Arnd
Johannes Berg Oct. 28, 2019, 10:59 a.m. UTC | #5
On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote:
> On Mon, Oct 28, 2019 at 10:52 AM Johannes Berg
> <johannes@sipsolutions.net> wrote:
> > > It looks like one of the last additions pushed the stack usage over
> > > the 1024 byte limit
> > > for 32-bit architectures:
> > > 
> > > net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in
> > > function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=]
> > > 
> > > struct ieee802_11_elems is fairly large, and just grew another two pointers.
> > > When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success()
> > > are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies
> > > of this structure, which is slightly too much.
> > 
> > Hmm. I guess that means the compiler isn't smart enough to make the
> > copies from the inlined sub-functions alias each other? I mean, if I
> > have
> > 
> > fn1(...) { struct ... elems1; ... }
> > fn2(...) { struct ... elems2; ... }
> > 
> > fn(...)
> > {
> >   fn1();
> >   fn2();
> > }
> > 
> > then it could reasonably use the same stack memory for elems1 and
> > elems2, at least theoretically, but you're saying it doesn't do that I
> > guess?
> 
> No, that's not the problem here (it can happen if the compiler is
> unable to prove the object lifetimes are non-overlapping).
> 
> What we have here are multiple functions that are in the same call chain:
> 
> fn1()
> {
>      struct ieee802_11_elems e ;
> }
> 
> fn2()
> {
>    struct ieee802_11_elems e;
>   ...
>    fn1();
> }
> 
> fn3()
> {
>    struct ieee802_11_elems e;
>   ...
>    fn2();
> }
> 
> Here, the object lifetimes actually do overlap, so the compiler cannot easily
> optimize that away.

Ah, yes, you're right. I didn't look closely enough, sorry.

> > I don't think dynamic allocation would be nice - but we could manually
> > do this by passing the elems pointer into the
> > ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions.
> 
> Ah, so you mean we can reuse the objects manually? I think that would
> be great. I could not tell if that's possible when reading the source, but
> if you can show that this works, that would be an easy solution.

Now that I look more closely, I'm not even sure why we parse it again in
ieee80211_assoc_success() after already having done it in
_rx_mgmt_assoc_resp(), they're doing exactly the same thing:

        ieee802_11_parse_elems(pos, len - (pos - (u8 *)mgmt), false, &elems,
                               mgmt->bssid, assoc_data->bss->bssid);

(need to track the variables a bit more closely, but ...)

So I think we can even just avoid duplicate work.

And for the third copy - it's in a different switch case. Do you think
we could rely on the compiler being able to prove non-overlapping
lifetime? Or better to just pass a pointer down to
_rx_mgmt_assoc_resp()?

> > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't
> > see this ... hmm.
> 
> That is correct. For historic reasons, both the total amount of stack space
> per thread and the warning limit on 64 bit are twice the amount that we
> have on 32-bit kernels, so even though the problem is more serious on
> 64-bit architectures, we do not see a warning about it because we remain
> well under the warning limit.

Ah, ok, thanks.

johannes
Johannes Berg Oct. 28, 2019, 11:08 a.m. UTC | #6
On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote:
> 
> > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't
> > see this ... hmm.
> 
> That is correct. For historic reasons, both the total amount of stack space
> per thread and the warning limit on 64 bit are twice the amount that we
> have on 32-bit kernels, so even though the problem is more serious on
> 64-bit architectures, we do not see a warning about it because we remain
> well under the warning limit.

Hmm, but I have:

CONFIG_FRAME_WARN=1024

in my compilation?

Maybe I do in fact have merging of the storage space, and you don't? I
see another copy of it that shouldn't be merged ("bss_elems"), but ...

Hmm.

johannes
Johannes Berg Oct. 28, 2019, 11:38 a.m. UTC | #7
On Mon, 2019-10-28 at 12:08 +0100, Johannes Berg wrote:
> On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote:
> > > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't
> > > see this ... hmm.
> > 
> > That is correct. For historic reasons, both the total amount of stack space
> > per thread and the warning limit on 64 bit are twice the amount that we
> > have on 32-bit kernels, so even though the problem is more serious on
> > 64-bit architectures, we do not see a warning about it because we remain
> > well under the warning limit.
> 
> Hmm, but I have:
> 
> CONFIG_FRAME_WARN=1024
> 
> in my compilation

But the compiler decides not to inline it, for whatever reason. Only if
I mark it as __always_inline, does it actually inline it.

But it does seem to merge the storage, if I inline only assoc_success()
I get 888 bytes (after the fix), if I inline also
ieee80211_rx_mgmt_assoc_resp() then I get 904 bytes.

johannes