Message ID | 20160817144531.4285-1-toke@toke.dk (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
Hi, You need to work on coding style, a lot of your indentation is completely messed up. > + switch (sdata->vif.type) { > + case NL80211_IFTYPE_STATION: > + if (sdata->u.mgd.use_4addr) { > + pn_offs = 30; > + break; > + } > + pn_offs = 24; > + break; > + case NL80211_IFTYPE_AP_VLAN: > + if (sdata->wdev.use_4addr) { > + pn_offs = 30; > + break; > + } > + /* fall through */ > + case NL80211_IFTYPE_ADHOC: > + case NL80211_IFTYPE_AP: > + pn_offs = 24; > + break; > + default: > + return; > + } > + > + if (sta->sta.wme) { > + pn_offs += 2; > + } I think you just reinvented ieee80211_hdrlen(). No? > - if (fast_tx->pn_offs) { > - u64 pn; > - u8 *crypto_hdr = skb->data + fast_tx->pn_offs; No need to undo the pn_offs optimisation for the !txq case, you can pass it in to the new function that will fill it. However, you're still doing it wrong - now you haven't fixed anything for TKIP, which won't hit the fastpath. johannes
On Wed, Aug 17, 2016 at 9:49 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > Hi, > > You need to work on coding style, a lot of your indentation is > completely messed up. > >> + switch (sdata->vif.type) { >> + case NL80211_IFTYPE_STATION: >> + if (sdata->u.mgd.use_4addr) { >> + pn_offs = 30; >> + break; >> + } >> + pn_offs = 24; >> + break; >> + case NL80211_IFTYPE_AP_VLAN: >> + if (sdata->wdev.use_4addr) { >> + pn_offs = 30; >> + break; >> + } >> + /* fall through */ >> + case NL80211_IFTYPE_ADHOC: >> + case NL80211_IFTYPE_AP: >> + pn_offs = 24; >> + break; >> + default: >> + return; >> + } >> + >> + if (sta->sta.wme) { >> + pn_offs += 2; >> + } > > I think you just reinvented ieee80211_hdrlen(). No? > >> - if (fast_tx->pn_offs) { >> - u64 pn; >> - u8 *crypto_hdr = skb->data + fast_tx->pn_offs; > > No need to undo the pn_offs optimisation for the !txq case, you can > pass it in to the new function that will fill it. > > However, you're still doing it wrong - now you haven't fixed anything > for TKIP, which won't hit the fastpath. well, we're getting there. the results of both patch attempts were really nice, and brought encrypted performance with fq back into line with unencrypted. Still running crypted tests as I write... So fixing TKIP would be next, forcing the AP to use that? What other scenarios do we have to worry about? WDS? > johannes > _______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast
> well, we're getting there. the results of both patch attempts were > really nice, and brought encrypted performance with fq back into line > with unencrypted. Still running crypted tests as I write... > > So fixing TKIP would be next, forcing the AP to use that? What other > scenarios do we have to worry about? WDS? > I don't think there's anything else, I just don't really feel it's getting anywhere. This is a mere symptom of the design. Felix had worked around the SN assignment in a similar way, but I feel that perhaps the whole thing isn't quite the right architecture. Why are we applying FQ after the wifi conversion, when clearly that doesn't work well? Seems to me that it would make more sense to let the frames sit on the queues as they come in, and do most of the wifi handling only when needed (obviously, things like control port would still have to be done). We even count those packets that are dropped for TX statistics, which would seem to be a big behavioural difference vs. applying a qdisc. Now, it's unlikely to be that simple - fragmentation, for example, might mess this up. Overall though, I'm definitely wondering if it should be this way, since all the special cases just add complexity. johannes
Johannes Berg <johannes@sipsolutions.net> writes: >> well, we're getting there. the results of both patch attempts were >> really nice, and brought encrypted performance with fq back into line >> with unencrypted. Still running crypted tests as I write... >> >> So fixing TKIP would be next, forcing the AP to use that? What other >> scenarios do we have to worry about? WDS? >> > > I don't think there's anything else, I just don't really feel it's > getting anywhere. This is a mere symptom of the design. > > Felix had worked around the SN assignment in a similar way, but I feel > that perhaps the whole thing isn't quite the right architecture. Why > are we applying FQ after the wifi conversion, when clearly that doesn't > work well? Seems to me that it would make more sense to let the frames > sit on the queues as they come in, and do most of the wifi handling > only when needed (obviously, things like control port would still have > to be done). I suppose that could be a way to do it (i.e. have ieee80211_tx_dequeue call all the TX hooks etc), but am not sure whether there would be problems doing all this work in the loop that's building aggregates (which is what would happen for ath9k at least). An alternative could be to split the process up in two: An "early" and "late" stage, where the early stage does everything that is not sensitive to reordering and the occasional drop, and the late stage is everything that is. Then the queueing step could happen in-between the two stages, and the non-queueing path could just call both stages at once. In effect, this would just make the current work-arounds be more explicit in the structure, rather than being marked as exceptions. > We even count those packets that are dropped for TX statistics, which > would seem to be a big behavioural difference vs. applying a qdisc. While you're right in principle, in practice I don't think this has too big of an impact. In normal operation, CoDel drops (at most) dozens of packets per *minute*, so it's not going to skew the statistics too much. > Now, it's unlikely to be that simple - fragmentation, for example, > might mess this up. > > Overall though, I'm definitely wondering if it should be this way, > since all the special cases just add complexity. I agree that the work-arounds are iffy, but I do also think it's important to keep in mind that we are improving latency by orders of magnitude here. A few special cases is worth it to achieve that, IMO. And then iterating towards a design that don't need them, of course :) -Toke
On Mon, 2016-08-22 at 16:47 +0200, Toke Høiland-Jørgensen wrote: > > I suppose that could be a way to do it (i.e. have > ieee80211_tx_dequeue call all the TX hooks etc), but am not sure > whether there would be problems doing all this work in the loop > that's building aggregates (which is what would happen for ath9k at > least). I don't know, but it seems that it's worth trying. > An alternative could be to split the process up in two: An "early" > and "late" stage, where the early stage does everything that is not > sensitive to reordering and the occasional drop, and the late stage > is everything that is. Then the queueing step could happen in-between > the two stages, and the non-queueing path could just call both stages > at once. In effect, this would just make the current work-arounds be > more explicit in the structure, rather than being marked as > exceptions. I'm not sure that works the way you think it does. What you did works for fast-xmit, but *only* because that doesn't do software crypto. If, for some reason, the TXQ stuff combines with software crypto, which doesn't seem impossible (ath9k even has a module parameter, iirc), then you have no way for this to work. > > Now, it's unlikely to be that simple - fragmentation, for example, > > might mess this up. > > > > Overall though, I'm definitely wondering if it should be this way, > > since all the special cases just add complexity. > > I agree that the work-arounds are iffy, but I do also think it's > important to keep in mind that we are improving latency by orders of > magnitude here. A few special cases is worth it to achieve that, IMO. > And then iterating towards a design that don't need them, of course > :) I don't really agree, I'm not going to treat this unlike any other feature, which gets merged when it's ready for that. Right now, your code here obviously isn't, since it doesn't even address the cases that ath9k could run into, so either ath9k shouldn't use this mac80211 feature, or the mac80211 feature needs to be fixed before ath9k can use it. I have no problems with documenting that the TXQ stuff can only be used with full hardware crypto, but then we should add some checks and warnings in mac80211 to ensure that, i.e. not allow software keys when TXQ stuff is used, nor allow keys with mac80211 PN assignment, etc. Even QoS-seqno assignment will be broken btw, so you do need a bunch more offloads to make this work. johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2016-08-22 at 16:47 +0200, Toke Høiland-Jørgensen wrote: >> >> I suppose that could be a way to do it (i.e. have >> ieee80211_tx_dequeue call all the TX hooks etc), but am not sure >> whether there would be problems doing all this work in the loop >> that's building aggregates (which is what would happen for ath9k at >> least). > > I don't know, but it seems that it's worth trying. > >> An alternative could be to split the process up in two: An "early" >> and "late" stage, where the early stage does everything that is not >> sensitive to reordering and the occasional drop, and the late stage >> is everything that is. Then the queueing step could happen in-between >> the two stages, and the non-queueing path could just call both stages >> at once. In effect, this would just make the current work-arounds be >> more explicit in the structure, rather than being marked as >> exceptions. > > I'm not sure that works the way you think it does. > > What you did works for fast-xmit, but *only* because that doesn't do > software crypto. If, for some reason, the TXQ stuff combines with > software crypto, which doesn't seem impossible (ath9k even has a module > parameter, iirc), then you have no way for this to work. Yeah, I realised that when I started reviewing the slow path (sorry for not realising that straight away). The v3 takes the "split handlers" approach for this reason. That saved having to deal with fragmentation on TXQ dequeue, and it means that some of the processing can be done before queueing (such as GSO splitting; having packets be as small as possible before applying FQ to them is a good thing if we want to realise the full potential). It seems there are still some bugs to work out with that patch, but I'd be grateful if you could glance at it and comment on whether you think this is a viable way forward (provided we can work out all the bugs, of course). >> > Now, it's unlikely to be that simple - fragmentation, for example, >> > might mess this up. >> > >> > Overall though, I'm definitely wondering if it should be this way, >> > since all the special cases just add complexity. >> >> I agree that the work-arounds are iffy, but I do also think it's >> important to keep in mind that we are improving latency by orders of >> magnitude here. A few special cases is worth it to achieve that, IMO. >> And then iterating towards a design that don't need them, of course >> :) > > I don't really agree, I'm not going to treat this unlike any other > feature, which gets merged when it's ready for that. > > Right now, your code here obviously isn't, since it doesn't even > address the cases that ath9k could run into, so either ath9k shouldn't > use this mac80211 feature, or the mac80211 feature needs to be fixed > before ath9k can use it. Yeah, I agree now that I've looked at it some more :) > I have no problems with documenting that the TXQ stuff can only be > used with full hardware crypto, but then we should add some checks and > warnings in mac80211 to ensure that, i.e. not allow software keys when > TXQ stuff is used, nor allow keys with mac80211 PN assignment, etc. I'd much rather fix things so it works in all cases. My patch to ath9k to use this stuff completely removes the old TX path, and things like the airtime fairness scheduler needs the intermediate queues to work. -Toke
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 0556be3..c9d4d69 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -266,7 +266,6 @@ struct sta_ampdu_mlme { * @hdr_len: actual 802.11 header length * @sa_offs: offset of the SA * @da_offs: offset of the DA - * @pn_offs: offset where to put PN for crypto (or 0 if not needed) * @band: band this will be transmitted on, for tx_info * @rcu_head: RCU head to free this struct * @@ -277,7 +276,7 @@ struct sta_ampdu_mlme { struct ieee80211_fast_tx { struct ieee80211_key *key; u8 hdr_len; - u8 sa_offs, da_offs, pn_offs; + u8 sa_offs, da_offs; u8 band; u8 hdr[30 + 2 + IEEE80211_FAST_XMIT_MAX_IV + sizeof(rfc1042_header)] __aligned(2); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 1d0746d..9caf75f 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1074,6 +1074,64 @@ ieee80211_tx_h_calculate_duration(struct ieee80211_tx_data *tx) return TX_CONTINUE; } +static void ieee80211_gen_crypto_iv(struct ieee80211_key_conf *conf, + struct sta_info *sta, struct sk_buff *skb) +{ + struct ieee80211_sub_if_data *sdata; + u64 pn; + u8 *crypto_hdr; + u8 pn_offs = 0; + + if (!conf || !sta || !(conf->flags & IEEE80211_KEY_FLAG_GENERATE_IV)) + return; + + sdata = sta->sdata; + + switch (sdata->vif.type) { + case NL80211_IFTYPE_STATION: + if (sdata->u.mgd.use_4addr) { + pn_offs = 30; + break; + } + pn_offs = 24; + break; + case NL80211_IFTYPE_AP_VLAN: + if (sdata->wdev.use_4addr) { + pn_offs = 30; + break; + } + /* fall through */ + case NL80211_IFTYPE_ADHOC: + case NL80211_IFTYPE_AP: + pn_offs = 24; + break; + default: + return; + } + + if (sta->sta.wme) { + pn_offs += 2; + } + + crypto_hdr = skb->data + pn_offs; + switch (conf->cipher) { + case WLAN_CIPHER_SUITE_CCMP: + case WLAN_CIPHER_SUITE_CCMP_256: + case WLAN_CIPHER_SUITE_GCMP: + case WLAN_CIPHER_SUITE_GCMP_256: + pn = atomic64_inc_return(&conf->tx_pn); + crypto_hdr[0] = pn; + crypto_hdr[1] = pn >> 8; + crypto_hdr[4] = pn >> 16; + crypto_hdr[5] = pn >> 24; + crypto_hdr[6] = pn >> 32; + crypto_hdr[7] = pn >> 40; + break; + } +} + + + /* actual transmit path */ static bool ieee80211_tx_prep_agg(struct ieee80211_tx_data *tx, @@ -1503,6 +1561,11 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, sta); struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + if (info->control.hw_key) { + ieee80211_gen_crypto_iv(info->control.hw_key, + container_of(txq->sta, struct sta_info, sta), skb); + } + hdr->seq_ctrl = ieee80211_tx_next_seq(sta, txq->tid); if (test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags)) info->flags |= IEEE80211_TX_CTL_AMPDU; @@ -2874,7 +2937,6 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) if (gen_iv) { (build.hdr + build.hdr_len)[3] = 0x20 | (build.key->conf.keyidx << 6); - build.pn_offs = build.hdr_len; } if (gen_iv || iv_spc) build.hdr_len += IEEE80211_CCMP_HDR_LEN; @@ -2885,7 +2947,6 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) if (gen_iv) { (build.hdr + build.hdr_len)[3] = 0x20 | (build.key->conf.keyidx << 6); - build.pn_offs = build.hdr_len; } if (gen_iv || iv_spc) build.hdr_len += IEEE80211_GCMP_HDR_LEN; @@ -3289,24 +3350,8 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, sta->tx_stats.bytes[skb_get_queue_mapping(skb)] += skb->len; sta->tx_stats.packets[skb_get_queue_mapping(skb)]++; - if (fast_tx->pn_offs) { - u64 pn; - u8 *crypto_hdr = skb->data + fast_tx->pn_offs; - - switch (fast_tx->key->conf.cipher) { - case WLAN_CIPHER_SUITE_CCMP: - case WLAN_CIPHER_SUITE_CCMP_256: - case WLAN_CIPHER_SUITE_GCMP: - case WLAN_CIPHER_SUITE_GCMP_256: - pn = atomic64_inc_return(&fast_tx->key->conf.tx_pn); - crypto_hdr[0] = pn; - crypto_hdr[1] = pn >> 8; - crypto_hdr[4] = pn >> 16; - crypto_hdr[5] = pn >> 24; - crypto_hdr[6] = pn >> 32; - crypto_hdr[7] = pn >> 40; - break; - } + if (fast_tx->key && !local->ops->wake_tx_queue) { + ieee80211_gen_crypto_iv(&fast_tx->key->conf, sta, skb); } if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN)