Message ID | 20190731155057.23035-1-johannes@sipsolutions.net (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | pull-request: mac80211-next 2019-07-31 | expand |
From: Johannes Berg <johannes@sipsolutions.net> Date: Wed, 31 Jul 2019 17:50:56 +0200 > There's a fair number of changes here, so I thought I'd get them out. > I've included two Intel driver cleanups because Luca is on vacation, > I'm covering for him, and doing it all in one tree let me merge all > of the patches at once (including mac80211 that depends on that); > Kalle is aware. > > Also, though this isn't very interesting yet, I've started looking at > weaning the wireless subsystem off the RTNL for all operations, as it > can cause significant lock contention, especially with slow USB devices. > The real patches for that are some way off, but one preparation here is > to use generic netlink's parallel_ops=true, to avoid trading one place > with contention for another in the future, and to avoid adding more > genl_family_attrbuf() usage (since that's no longer possible with the > parallel_ops setting). > > Please pull and let me know if there's any problem. Pulled, thanks Johannes. I'm sure people like Florian Westphal will also appreciate your RTNL elimination work...
On Wed, Jul 31, 2019 at 5:53 PM Johannes Berg <johannes@sipsolutions.net> wrote: > John Crispin (10): > mac80211: add support for parsing ADDBA_EXT IEs > mac80211: add xmit rate to struct ieee80211_tx_status > mac80211: propagate struct ieee80211_tx_status into ieee80211_tx_monitor() > mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header > mac80211: HE: add Spatial Reuse element parsing support Hi Johannes and John, It looks like one of the last additions pushed the stack usage over the 1024 byte limit for 32-bit architectures: net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=] struct ieee802_11_elems is fairly large, and just grew another two pointers. When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies of this structure, which is slightly too much. Marking any of those functions as __noinline_for_stack would shut up the warning but not fix the underlying issue. Silencing the warning might be enough if there is a fairly short call chain leading up to ieee80211_sta_rx_queued_mgmt(). Another option would be a dynamic allocation. Thoughts? Arnd
Hi Arnd, > It looks like one of the last additions pushed the stack usage over > the 1024 byte limit > for 32-bit architectures: > > net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in > function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=] > > struct ieee802_11_elems is fairly large, and just grew another two pointers. > When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() > are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies > of this structure, which is slightly too much. Hmm. I guess that means the compiler isn't smart enough to make the copies from the inlined sub-functions alias each other? I mean, if I have fn1(...) { struct ... elems1; ... } fn2(...) { struct ... elems2; ... } fn(...) { fn1(); fn2(); } then it could reasonably use the same stack memory for elems1 and elems2, at least theoretically, but you're saying it doesn't do that I guess? It could even do that for different BBs, in theory ... If it does, I'd have suggested to move the code from the outer function inside the "case IEEE80211_STYPE_ACTION:" block into a new sub-function, but that won't work then. I don't think dynamic allocation would be nice - but we could manually do this by passing the elems pointer into the ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions. Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't see this ... hmm. johannes
On Mon, Oct 28, 2019 at 10:52 AM Johannes Berg <johannes@sipsolutions.net> wrote: > > > It looks like one of the last additions pushed the stack usage over > > the 1024 byte limit > > for 32-bit architectures: > > > > net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in > > function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=] > > > > struct ieee802_11_elems is fairly large, and just grew another two pointers. > > When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() > > are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies > > of this structure, which is slightly too much. > > Hmm. I guess that means the compiler isn't smart enough to make the > copies from the inlined sub-functions alias each other? I mean, if I > have > > fn1(...) { struct ... elems1; ... } > fn2(...) { struct ... elems2; ... } > > fn(...) > { > fn1(); > fn2(); > } > > then it could reasonably use the same stack memory for elems1 and > elems2, at least theoretically, but you're saying it doesn't do that I > guess? No, that's not the problem here (it can happen if the compiler is unable to prove the object lifetimes are non-overlapping). What we have here are multiple functions that are in the same call chain: fn1() { struct ieee802_11_elems e ; } fn2() { struct ieee802_11_elems e; ... fn1(); } fn3() { struct ieee802_11_elems e; ... fn2(); } Here, the object lifetimes actually do overlap, so the compiler cannot easily optimize that away. > I don't think dynamic allocation would be nice - but we could manually > do this by passing the elems pointer into the > ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions. Ah, so you mean we can reuse the objects manually? I think that would be great. I could not tell if that's possible when reading the source, but if you can show that this works, that would be an easy solution. > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't > see this ... hmm. That is correct. For historic reasons, both the total amount of stack space per thread and the warning limit on 64 bit are twice the amount that we have on 32-bit kernels, so even though the problem is more serious on 64-bit architectures, we do not see a warning about it because we remain well under the warning limit. Arnd
On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote: > On Mon, Oct 28, 2019 at 10:52 AM Johannes Berg > <johannes@sipsolutions.net> wrote: > > > It looks like one of the last additions pushed the stack usage over > > > the 1024 byte limit > > > for 32-bit architectures: > > > > > > net/mac80211/mlme.c:4063:6: error: stack frame size of 1032 bytes in > > > function 'ieee80211_sta_rx_queued_mgmt' [-Werror,-Wframe-larger-than=] > > > > > > struct ieee802_11_elems is fairly large, and just grew another two pointers. > > > When ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() > > > are inlined into ieee80211_sta_rx_queued_mgmt(), there are three copies > > > of this structure, which is slightly too much. > > > > Hmm. I guess that means the compiler isn't smart enough to make the > > copies from the inlined sub-functions alias each other? I mean, if I > > have > > > > fn1(...) { struct ... elems1; ... } > > fn2(...) { struct ... elems2; ... } > > > > fn(...) > > { > > fn1(); > > fn2(); > > } > > > > then it could reasonably use the same stack memory for elems1 and > > elems2, at least theoretically, but you're saying it doesn't do that I > > guess? > > No, that's not the problem here (it can happen if the compiler is > unable to prove the object lifetimes are non-overlapping). > > What we have here are multiple functions that are in the same call chain: > > fn1() > { > struct ieee802_11_elems e ; > } > > fn2() > { > struct ieee802_11_elems e; > ... > fn1(); > } > > fn3() > { > struct ieee802_11_elems e; > ... > fn2(); > } > > Here, the object lifetimes actually do overlap, so the compiler cannot easily > optimize that away. Ah, yes, you're right. I didn't look closely enough, sorry. > > I don't think dynamic allocation would be nice - but we could manually > > do this by passing the elems pointer into the > > ieee80211_rx_mgmt_assoc_resp() and ieee80211_assoc_success() functions. > > Ah, so you mean we can reuse the objects manually? I think that would > be great. I could not tell if that's possible when reading the source, but > if you can show that this works, that would be an easy solution. Now that I look more closely, I'm not even sure why we parse it again in ieee80211_assoc_success() after already having done it in _rx_mgmt_assoc_resp(), they're doing exactly the same thing: ieee802_11_parse_elems(pos, len - (pos - (u8 *)mgmt), false, &elems, mgmt->bssid, assoc_data->bss->bssid); (need to track the variables a bit more closely, but ...) So I think we can even just avoid duplicate work. And for the third copy - it's in a different switch case. Do you think we could rely on the compiler being able to prove non-overlapping lifetime? Or better to just pass a pointer down to _rx_mgmt_assoc_resp()? > > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't > > see this ... hmm. > > That is correct. For historic reasons, both the total amount of stack space > per thread and the warning limit on 64 bit are twice the amount that we > have on 32-bit kernels, so even though the problem is more serious on > 64-bit architectures, we do not see a warning about it because we remain > well under the warning limit. Ah, ok, thanks. johannes
On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote: > > > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't > > see this ... hmm. > > That is correct. For historic reasons, both the total amount of stack space > per thread and the warning limit on 64 bit are twice the amount that we > have on 32-bit kernels, so even though the problem is more serious on > 64-bit architectures, we do not see a warning about it because we remain > well under the warning limit. Hmm, but I have: CONFIG_FRAME_WARN=1024 in my compilation? Maybe I do in fact have merging of the storage space, and you don't? I see another copy of it that shouldn't be merged ("bss_elems"), but ... Hmm. johannes
On Mon, 2019-10-28 at 12:08 +0100, Johannes Berg wrote: > On Mon, 2019-10-28 at 11:53 +0100, Arnd Bergmann wrote: > > > Why do you say 32-bit btw, it should be *bigger* on 64-bit, but I didn't > > > see this ... hmm. > > > > That is correct. For historic reasons, both the total amount of stack space > > per thread and the warning limit on 64 bit are twice the amount that we > > have on 32-bit kernels, so even though the problem is more serious on > > 64-bit architectures, we do not see a warning about it because we remain > > well under the warning limit. > > Hmm, but I have: > > CONFIG_FRAME_WARN=1024 > > in my compilation But the compiler decides not to inline it, for whatever reason. Only if I mark it as __always_inline, does it actually inline it. But it does seem to merge the storage, if I inline only assoc_success() I get 888 bytes (after the fix), if I inline also ieee80211_rx_mgmt_assoc_resp() then I get 904 bytes. johannes