Message ID | 20191119060610.76681-5-kyan@google.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Johannes Berg |
Headers | show |
Series | Add Airtime Queue Limits (AQL) to mac80211 | expand |
Hi, I've take a closer look at the AQL implementation, and I found some corner cases that need to be addressed soon: - AQL estimated airtime does not take into account A-MPDU, so it is significantly overestimating airtime use for aggregated traffic, especially on high rates. My proposed solution would be to check for a running aggregation session and set estimated tx time to something like: expected_airtime(16 * skb->len) / 16. - We need an API that allows the driver to change the pending airtime values, e.g. subtract estimated tx time for a packet. mt76 an ath9k can queue packets inside the driver that are not currently in the hardware queues. Typically if the txqs have more data than what gets put into the hardware queue, both drivers will pull an extra frame and queue it in its private txq struct. This frame will get used on the next txq scheduling round for that particular station. If you have lots of stations doing traffic (or having driver buffered frames in powersave mode), this could use up a sizable chunk of the AQL budget. While removing the airtime of those packages would lead to AQL temporarily underestimating airtime, I think it would be better than overestimating it. I will work on some patches. What do you think about these issues and my proposed fixes? - Felix
Felix Fietkau <nbd@nbd.name> writes: > Hi, > > I've take a closer look at the AQL implementation, and I found some > corner cases that need to be addressed soon: > > - AQL estimated airtime does not take into account A-MPDU, so it is > significantly overestimating airtime use for aggregated traffic, > especially on high rates. > My proposed solution would be to check for a running aggregation session > and set estimated tx time to something like: > expected_airtime(16 * skb->len) / 16. This seems reasonable. Not sure if it'll do anything for ath10k (does mac80211 know the aggregation state for that?), but maybe this is not such a big issue on that hardware? > - We need an API that allows the driver to change the pending airtime > values, e.g. subtract estimated tx time for a packet. > mt76 an ath9k can queue packets inside the driver that are not currently > in the hardware queues. Typically if the txqs have more data than what > gets put into the hardware queue, both drivers will pull an extra frame > and queue it in its private txq struct. This frame will get used on the > next txq scheduling round for that particular station. > If you have lots of stations doing traffic (or having driver buffered > frames in powersave mode), this could use up a sizable chunk of the AQL > budget. I'm a bit more skeptical about this. If the driver buffers a bunch of packets that are not accounted that will hurt that station due to extra latency when it wakes up. For ath9k, this is the retry_q you're talking about, right? The number of packets queued on that is fairly limited, isn't it? What kind of powersave buffering is the driver doing, and why can't it leave the packets on the TXQ? That would allow them to be scheduled along with any new ones that might have arrived in the meantime, which would be a benefit for latency. I can see how it can be a problem that many stations in powersave can block transmissions for *other* stations, though. Maybe an alternative to the driver subtracting airtime could be to have mac80211 do something like this when a station is put into powersave mode: local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending) (where sum is the summation over all ACs) and the reverse when the station wakes back up? That should keep the whole device from throttling but still prevent a big queue building up for that sta when it wakes back up. Would that work, do you think? -Toke
> - AQL estimated airtime does not take into account A-MPDU, so it is > significantly overestimating airtime use for aggregated traffic, > especially on high rates. > My proposed solution would be to check for a running aggregation session > and set estimated tx time to something like: > expected_airtime(16 * skb->len) / 16. Yes, that's a known limitation that needs to be improved. I actually post a comment for this patch: "[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76" > > When a txq is subjected to the queue limit, > there is usually a significant amount of frames being queued and those > frames are likely being sent out in large aggregations. So, in most > cases when AQL is active, the medium access overhead can be amortized > over many frames and the per frame overhead could be considerably > lower than 36, especially at higher data rate. As a result, the > pending airtime calculated this way could be higher than the actual > airtime. In my test, I have to compensate that by increasing > "aql_txq_limit" via debugfs to get the same peak throughput as without > AQL. > My proposed solution would be to check for a running aggregation session > and set estimated tx time to something like: > expected_airtime(16 * skb->len) / 16. I think that's a reasonable approximation, but doubts aggregation information is available in all architectures. In some architecture, firmware may only report aggregation information after the frame has been transmitted. In my earlier version of AQL for the out-of-tree ChromeOS kernel, A-MPDU is dealt this way: The the medium access overhead is only counted once for each TXQ for all frames released to the hardware over a 4ms period, assuming those frames are likely to be aggregated together. Instead of calculating the estimated airtime using the last known phy rate, then try to add some estimated overhead for medium access time, another potentially better approach is to use average data rate, which is byte_transmited/firmware_reported_actual_airtime. The average rate not only includes medium access overhead, but also takes retries into account. > - We need an API that allows the driver to change the pending airtime > values, e.g. subtract estimated tx time for a packet. > mt76 an ath9k can queue packets inside the driver that are not currently > in the hardware queues. Typically if the txqs have more data than what > gets put into the hardware queue, both drivers will pull an extra frame > and queue it in its private txq struct. This frame will get used on the > next txq scheduling round for that particular station. > If you have lots of stations doing traffic (or having driver buffered > frames in powersave mode), this could use up a sizable chunk of the AQL > budget. Not sure I fully understand your concerns here. The AQL budget is per STA, per TID, so frames queued in the driver's special queue for one station won't impact another station's budget. Those frames are counted towards the per interface pending airtime, which could trigger AQL to switch to use the lower queue limit. In this case, that could be the desirable behavior when there is heavy traffic. Regards, Kan On Wed, Feb 26, 2020 at 1:56 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Felix Fietkau <nbd@nbd.name> writes: > > > Hi, > > > > I've take a closer look at the AQL implementation, and I found some > > corner cases that need to be addressed soon: > > > > - AQL estimated airtime does not take into account A-MPDU, so it is > > significantly overestimating airtime use for aggregated traffic, > > especially on high rates. > > My proposed solution would be to check for a running aggregation session > > and set estimated tx time to something like: > > expected_airtime(16 * skb->len) / 16. > > This seems reasonable. Not sure if it'll do anything for ath10k (does > mac80211 know the aggregation state for that?), but maybe this is not > such a big issue on that hardware? > > > - We need an API that allows the driver to change the pending airtime > > values, e.g. subtract estimated tx time for a packet. > > mt76 an ath9k can queue packets inside the driver that are not currently > > in the hardware queues. Typically if the txqs have more data than what > > gets put into the hardware queue, both drivers will pull an extra frame > > and queue it in its private txq struct. This frame will get used on the > > next txq scheduling round for that particular station. > > If you have lots of stations doing traffic (or having driver buffered > > frames in powersave mode), this could use up a sizable chunk of the AQL > > budget. > > I'm a bit more skeptical about this. If the driver buffers a bunch of > packets that are not accounted that will hurt that station due to extra > latency when it wakes up. For ath9k, this is the retry_q you're talking > about, right? The number of packets queued on that is fairly limited, > isn't it? What kind of powersave buffering is the driver doing, and why > can't it leave the packets on the TXQ? That would allow them to be > scheduled along with any new ones that might have arrived in the > meantime, which would be a benefit for latency. > > I can see how it can be a problem that many stations in powersave can > block transmissions for *other* stations, though. Maybe an alternative > to the driver subtracting airtime could be to have mac80211 do something > like this when a station is put into powersave mode: > > local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending) > > (where sum is the summation over all ACs) > > and the reverse when the station wakes back up? That should keep the > whole device from throttling but still prevent a big queue building up > for that sta when it wakes back up. Would that work, do you think? > > -Toke >
On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote: > Felix Fietkau <nbd@nbd.name> writes: >> - We need an API that allows the driver to change the pending airtime >> values, e.g. subtract estimated tx time for a packet. >> mt76 an ath9k can queue packets inside the driver that are not currently >> in the hardware queues. Typically if the txqs have more data than what >> gets put into the hardware queue, both drivers will pull an extra frame >> and queue it in its private txq struct. This frame will get used on the >> next txq scheduling round for that particular station. >> If you have lots of stations doing traffic (or having driver buffered >> frames in powersave mode), this could use up a sizable chunk of the AQL >> budget. > > I'm a bit more skeptical about this. If the driver buffers a bunch of > packets that are not accounted that will hurt that station due to extra > latency when it wakes up. For ath9k, this is the retry_q you're talking > about, right? The number of packets queued on that is fairly limited, > isn't it? What kind of powersave buffering is the driver doing, and why > can't it leave the packets on the TXQ? That would allow them to be > scheduled along with any new ones that might have arrived in the > meantime, which would be a benefit for latency. For mt76 there should be max. 1 frame in the retry queue, it's just a frame that was pulled from the txq in a transmission attempt but that it couldn't put in the hw queue because it didn't fit in the current aggregate batch. If such a frame is in the retry queue and the sta ends up switching to powersave mode, that frame stays buffered in the driver queue until the sta wakes up. > I can see how it can be a problem that many stations in powersave can > block transmissions for *other* stations, though. Maybe an alternative > to the driver subtracting airtime could be to have mac80211 do something > like this when a station is put into powersave mode: > > local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending) > > (where sum is the summation over all ACs) > > and the reverse when the station wakes back up? That should keep the > whole device from throttling but still prevent a big queue building up > for that sta when it wakes back up. Would that work, do you think? That solves most of the issue but is still incomplete. It also gets tricky when the driver starts pulling dequeueing packets in powersave mode (e.g. on PS-Poll). The remaining corner case is when there are a large number of stations that each have a single frame in their retry queue (see above). Having those frames counted for pending airtime could reduce the aggregation size for each station, even though those frames are not in the hw queue. - Felix
> ieee80211_report_used_skb(). As an optimisation, we also subtract the > airtime on regular TX completion, zeroing out the value stored in the > packet afterwards, to avoid having to do an expensive lookup of the station > from the packet data on every packet. > > This patch does *not* include any mechanism to wake a throttled TXQ again, > on the assumption that this will happen anyway as a side effect of whatever > freed the skb (most commonly a TX completion). I recall a recent patch for ath10k sdio that disabled tx acknowledgement for performance gains and am wondering if that will be problematic? Presumably not since it would be caught at the dequeue, but thought I'd ask-- wondering what the effect of failed tx's or block acknowledgement is on this stuff I'll need to study the code some more https://lore.kernel.org/linux-wireless/0101016eb1903db0-ef7063b4-0f42-4a01-8886-327541e6c1a4-000000@us-west-2.amazonses.com/T/#t
Felix Fietkau <nbd@nbd.name> writes: > On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote: >> Felix Fietkau <nbd@nbd.name> writes: >>> - We need an API that allows the driver to change the pending airtime >>> values, e.g. subtract estimated tx time for a packet. >>> mt76 an ath9k can queue packets inside the driver that are not currently >>> in the hardware queues. Typically if the txqs have more data than what >>> gets put into the hardware queue, both drivers will pull an extra frame >>> and queue it in its private txq struct. This frame will get used on the >>> next txq scheduling round for that particular station. >>> If you have lots of stations doing traffic (or having driver buffered >>> frames in powersave mode), this could use up a sizable chunk of the AQL >>> budget. >> >> I'm a bit more skeptical about this. If the driver buffers a bunch of >> packets that are not accounted that will hurt that station due to extra >> latency when it wakes up. For ath9k, this is the retry_q you're talking >> about, right? The number of packets queued on that is fairly limited, >> isn't it? What kind of powersave buffering is the driver doing, and why >> can't it leave the packets on the TXQ? That would allow them to be >> scheduled along with any new ones that might have arrived in the >> meantime, which would be a benefit for latency. > For mt76 there should be max. 1 frame in the retry queue, it's just a > frame that was pulled from the txq in a transmission attempt but that it > couldn't put in the hw queue because it didn't fit in the current > aggregate batch. Wait, if it's only a single frame that is queued in the driver, how is this causing problems? We deliberately set the limit so there was a bit of slack above the size of an aggregate for things like this. Could you please describe in a bit more detail what symptoms you are seeing of this problem? :) -Toke
Justin Capella <justincapella@gmail.com> writes: >> ieee80211_report_used_skb(). As an optimisation, we also subtract the >> airtime on regular TX completion, zeroing out the value stored in the >> packet afterwards, to avoid having to do an expensive lookup of the station >> from the packet data on every packet. >> >> This patch does *not* include any mechanism to wake a throttled TXQ again, >> on the assumption that this will happen anyway as a side effect of whatever >> freed the skb (most commonly a TX completion). > > I recall a recent patch for ath10k sdio that disabled tx > acknowledgement for performance gains and am wondering if that will be > problematic? Presumably not since it would be caught at the dequeue, > but thought I'd ask-- wondering what the effect of failed tx's or > block acknowledgement is on this stuff I'll need to study the code > some more > > https://lore.kernel.org/linux-wireless/0101016eb1903db0-ef7063b4-0f42-4a01-8886-327541e6c1a4-000000@us-west-2.amazonses.com/T/#t It looks like that patch will just end up disabling AQL (because packets will be immediately completed as far as mac80211 is concerned) and replace it with whatever that credit-based scheme does? No idea how that will impact latency; you should go ask the developers of that series! As usual, the patch description only mentions throughput numbers :/ -Toke
On 2020-02-27 09:24, Kan Yan wrote: >> - AQL estimated airtime does not take into account A-MPDU, so it is >> significantly overestimating airtime use for aggregated traffic, >> especially on high rates. >> My proposed solution would be to check for a running aggregation session >> and set estimated tx time to something like: >> expected_airtime(16 * skb->len) / 16. > > Yes, that's a known limitation that needs to be improved. I actually > post a comment for this patch: > "[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76" >> >> When a txq is subjected to the queue limit, >> there is usually a significant amount of frames being queued and those >> frames are likely being sent out in large aggregations. So, in most >> cases when AQL is active, the medium access overhead can be amortized >> over many frames and the per frame overhead could be considerably >> lower than 36, especially at higher data rate. As a result, the >> pending airtime calculated this way could be higher than the actual >> airtime. In my test, I have to compensate that by increasing >> "aql_txq_limit" via debugfs to get the same peak throughput as without >> AQL. > > >> My proposed solution would be to check for a running aggregation session >> and set estimated tx time to something like: >> expected_airtime(16 * skb->len) / 16. > > I think that's a reasonable approximation, but doubts aggregation > information is available in all architectures. In some architecture, > firmware may only report aggregation information after the frame has > been transmitted. I'm not proposing using frame transmission aggregation. It would be a simple approximation where it would use a fixed assumed average aggregation size if an aggregation session is established. > In my earlier version of AQL for the out-of-tree ChromeOS kernel, > A-MPDU is dealt this way: The the medium access overhead is only > counted once for each TXQ for all frames released to the hardware over > a 4ms period, assuming those frames are likely to be aggregated > together. I think that would be more accurate, but probably also more expensive to calculate. > Instead of calculating the estimated airtime using the last known phy > rate, then try to add some estimated overhead for medium access time, > another potentially better approach is to use average data rate, which > is byte_transmited/firmware_reported_actual_airtime. The average rate > not only includes medium access overhead, but also takes retries into > account. Also an interesting idea, though probably not available on all hardware. >> - We need an API that allows the driver to change the pending airtime >> values, e.g. subtract estimated tx time for a packet. >> mt76 an ath9k can queue packets inside the driver that are not currently >> in the hardware queues. Typically if the txqs have more data than what >> gets put into the hardware queue, both drivers will pull an extra frame >> and queue it in its private txq struct. This frame will get used on the >> next txq scheduling round for that particular station. >> If you have lots of stations doing traffic (or having driver buffered >> frames in powersave mode), this could use up a sizable chunk of the AQL >> budget. > > Not sure I fully understand your concerns here. The AQL budget is per > STA, per TID, so frames queued in the driver's special queue for one > station won't impact another station's budget. Those frames are > counted towards the per interface pending airtime, which could trigger > AQL to switch to use the lower queue limit. In this case, that could > be the desirable behavior when there is heavy traffic. Yes, the per interface limit is what I'm concerned about. I'm not sure if it will be an issue in practice, it's just something that I noticed while reviewing the code. - Felix
Felix Fietkau <nbd@nbd.name> writes: >> Not sure I fully understand your concerns here. The AQL budget is per >> STA, per TID, so frames queued in the driver's special queue for one >> station won't impact another station's budget. Those frames are >> counted towards the per interface pending airtime, which could trigger >> AQL to switch to use the lower queue limit. In this case, that could >> be the desirable behavior when there is heavy traffic. > Yes, the per interface limit is what I'm concerned about. I'm not sure > if it will be an issue in practice, it's just something that I noticed > while reviewing the code. Ah, right. Note that it's not a hard limit, though; it's just a threshold for when the per-station limit switches to a smaller value. The idea is that when there are many stations with outstanding data, each station's latency will be higher since they have to also wait for their turn to transmit. And so we compensate for this by lowering each station's queue limit, since the hardware is unlikely to become idle when there are many stations to send to. The lower limit is still 5ms of data, though, so there should still be enough for a full aggregate queued for each station. Arguably the limit could actually be made *smaller*: On a very busy network with lots of stations transmitting at once, IMO we *don't* want to send full-sized aggregates as that will add up to way too much latency. Better to sacrifice a bit of network efficiency to raise responsiveness :) -Toke
On 2020-02-27 11:07, Toke Høiland-Jørgensen wrote: > Felix Fietkau <nbd@nbd.name> writes: > >> On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote: >>> Felix Fietkau <nbd@nbd.name> writes: >>>> - We need an API that allows the driver to change the pending airtime >>>> values, e.g. subtract estimated tx time for a packet. >>>> mt76 an ath9k can queue packets inside the driver that are not currently >>>> in the hardware queues. Typically if the txqs have more data than what >>>> gets put into the hardware queue, both drivers will pull an extra frame >>>> and queue it in its private txq struct. This frame will get used on the >>>> next txq scheduling round for that particular station. >>>> If you have lots of stations doing traffic (or having driver buffered >>>> frames in powersave mode), this could use up a sizable chunk of the AQL >>>> budget. >>> >>> I'm a bit more skeptical about this. If the driver buffers a bunch of >>> packets that are not accounted that will hurt that station due to extra >>> latency when it wakes up. For ath9k, this is the retry_q you're talking >>> about, right? The number of packets queued on that is fairly limited, >>> isn't it? What kind of powersave buffering is the driver doing, and why >>> can't it leave the packets on the TXQ? That would allow them to be >>> scheduled along with any new ones that might have arrived in the >>> meantime, which would be a benefit for latency. >> For mt76 there should be max. 1 frame in the retry queue, it's just a >> frame that was pulled from the txq in a transmission attempt but that it >> couldn't put in the hw queue because it didn't fit in the current >> aggregate batch. > > Wait, if it's only a single frame that is queued in the driver, how is > this causing problems? We deliberately set the limit so there was a bit > of slack above the size of an aggregate for things like this. Could you > please describe in a bit more detail what symptoms you are seeing of > this problem? :) It would be a single frame per sta/txq. I don't know if it will cause problems in practice, it's just a potential corner case that I found during review. I'd imagine this would probably show up in some benchmarks at least. I'm not seeing any symptoms myself, but I also haven't run any intricate tests yet. - Felix
Felix Fietkau <nbd@nbd.name> writes: > On 2020-02-27 11:07, Toke Høiland-Jørgensen wrote: >> Felix Fietkau <nbd@nbd.name> writes: >> >>> On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote: >>>> Felix Fietkau <nbd@nbd.name> writes: >>>>> - We need an API that allows the driver to change the pending airtime >>>>> values, e.g. subtract estimated tx time for a packet. >>>>> mt76 an ath9k can queue packets inside the driver that are not currently >>>>> in the hardware queues. Typically if the txqs have more data than what >>>>> gets put into the hardware queue, both drivers will pull an extra frame >>>>> and queue it in its private txq struct. This frame will get used on the >>>>> next txq scheduling round for that particular station. >>>>> If you have lots of stations doing traffic (or having driver buffered >>>>> frames in powersave mode), this could use up a sizable chunk of the AQL >>>>> budget. >>>> >>>> I'm a bit more skeptical about this. If the driver buffers a bunch of >>>> packets that are not accounted that will hurt that station due to extra >>>> latency when it wakes up. For ath9k, this is the retry_q you're talking >>>> about, right? The number of packets queued on that is fairly limited, >>>> isn't it? What kind of powersave buffering is the driver doing, and why >>>> can't it leave the packets on the TXQ? That would allow them to be >>>> scheduled along with any new ones that might have arrived in the >>>> meantime, which would be a benefit for latency. >>> For mt76 there should be max. 1 frame in the retry queue, it's just a >>> frame that was pulled from the txq in a transmission attempt but that it >>> couldn't put in the hw queue because it didn't fit in the current >>> aggregate batch. >> >> Wait, if it's only a single frame that is queued in the driver, how is >> this causing problems? We deliberately set the limit so there was a bit >> of slack above the size of an aggregate for things like this. Could you >> please describe in a bit more detail what symptoms you are seeing of >> this problem? :) > It would be a single frame per sta/txq. I don't know if it will cause > problems in practice, it's just a potential corner case that I found > during review. I'd imagine this would probably show up in some > benchmarks at least. > I'm not seeing any symptoms myself, but I also haven't run any intricate > tests yet. See my other reply; I'm not convinced this behaviour is wrong :) -Toke
On 2020-02-27 12:07, Toke Høiland-Jørgensen wrote: > Felix Fietkau <nbd@nbd.name> writes: > >>> Not sure I fully understand your concerns here. The AQL budget is per >>> STA, per TID, so frames queued in the driver's special queue for one >>> station won't impact another station's budget. Those frames are >>> counted towards the per interface pending airtime, which could trigger >>> AQL to switch to use the lower queue limit. In this case, that could >>> be the desirable behavior when there is heavy traffic. >> Yes, the per interface limit is what I'm concerned about. I'm not sure >> if it will be an issue in practice, it's just something that I noticed >> while reviewing the code. > > Ah, right. Note that it's not a hard limit, though; it's just a > threshold for when the per-station limit switches to a smaller value. > The idea is that when there are many stations with outstanding data, > each station's latency will be higher since they have to also wait for > their turn to transmit. And so we compensate for this by lowering each > station's queue limit, since the hardware is unlikely to become idle > when there are many stations to send to. > > The lower limit is still 5ms of data, though, so there should still be > enough for a full aggregate queued for each station. Arguably the limit > could actually be made *smaller*: On a very busy network with lots of > stations transmitting at once, IMO we *don't* want to send full-sized > aggregates as that will add up to way too much latency. Better to > sacrifice a bit of network efficiency to raise responsiveness :) Makes sense, thanks. - Felix
diff --git a/include/net/mac80211.h b/include/net/mac80211.h index ba3f33cc41ea..aa145808e57a 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -1060,6 +1060,22 @@ struct ieee80211_tx_info { }; }; +static inline u16 +ieee80211_info_set_tx_time_est(struct ieee80211_tx_info *info, u16 tx_time_est) +{ + /* We only have 10 bits in tx_time_est, so store airtime + * in increments of 4us and clamp the maximum to 2**12-1 + */ + info->tx_time_est = min_t(u16, tx_time_est, 4095) >> 2; + return info->tx_time_est << 2; +} + +static inline u16 +ieee80211_info_get_tx_time_est(struct ieee80211_tx_info *info) +{ + return info->tx_time_est << 2; +} + /** * struct ieee80211_tx_status - extended tx status info for rate control * diff --git a/net/mac80211/status.c b/net/mac80211/status.c index 0e51def35b8a..39da82b35be9 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -670,12 +670,26 @@ static void ieee80211_report_used_skb(struct ieee80211_local *local, struct sk_buff *skb, bool dropped) { struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + u16 tx_time_est = ieee80211_info_get_tx_time_est(info); struct ieee80211_hdr *hdr = (void *)skb->data; bool acked = info->flags & IEEE80211_TX_STAT_ACK; if (dropped) acked = false; + if (tx_time_est) { + struct sta_info *sta; + + rcu_read_lock(); + + sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2); + ieee80211_sta_update_pending_airtime(local, sta, + skb_get_queue_mapping(skb), + tx_time_est, + true); + rcu_read_unlock(); + } + if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) { struct ieee80211_sub_if_data *sdata; @@ -877,6 +891,7 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw, struct ieee80211_bar *bar; int shift = 0; int tid = IEEE80211_NUM_TIDS; + u16 tx_time_est; rates_idx = ieee80211_tx_get_rates(hw, info, &retry_count); @@ -986,6 +1001,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw, ieee80211_sta_register_airtime(&sta->sta, tid, info->status.tx_time, 0); + if ((tx_time_est = ieee80211_info_get_tx_time_est(info)) > 0) { + /* Do this here to avoid the expensive lookup of the sta + * in ieee80211_report_used_skb(). + */ + ieee80211_sta_update_pending_airtime(local, sta, + skb_get_queue_mapping(skb), + tx_time_est, + true); + ieee80211_info_set_tx_time_est(info, 0); + } + if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS)) { if (info->flags & IEEE80211_TX_STAT_ACK) { if (sta->status_stats.lost_packets) diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 59db921d1086..dfbaab92dbf2 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3551,6 +3551,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, WARN_ON_ONCE(softirq_count() == 0); + if (!ieee80211_txq_airtime_check(hw, txq)) + return NULL; + begin: spin_lock_bh(&fq->lock); @@ -3661,6 +3664,21 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, } IEEE80211_SKB_CB(skb)->control.vif = vif; + + if (local->airtime_flags & AIRTIME_USE_AQL) { + u32 airtime; + + airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta, + skb->len); + if (airtime) { + airtime = ieee80211_info_set_tx_time_est(info, airtime); + ieee80211_sta_update_pending_airtime(local, tx.sta, + txq->ac, + airtime, + false); + } + } + return skb; out: