Message ID | CA+BoTQn3KRym0qCJ+fLav5LQgjfiN_-Eqz0xPSc2eXJ-ijNjQw@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Mon, 2015-02-09 at 14:47 +0100, Michal Kazior wrote: > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 65caf8b..5e249bf 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1996,6 +1996,7 @@ static bool tcp_write_xmit(struct sock *sk, > unsigned int mss_now, int nonagle, > max_segs = tcp_tso_autosize(sk, mss_now); > while ((skb = tcp_send_head(sk))) { > unsigned int limit; > + unsigned int amount; > > tso_segs = tcp_init_tso_segs(sk, skb, mss_now); > BUG_ON(!tso_segs); > @@ -2053,7 +2054,9 @@ static bool tcp_write_xmit(struct sock *sk, > unsigned int mss_now, int nonagle, > * of queued bytes to ensure line rate. > * One example is wifi aggregation (802.11 AMPDU) > */ > - limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10); > + amount = sk->sk_tx_completion_delay_us * > + (sk->sk_pacing_rate >> 10); > + limit = max(2 * skb->truesize, amount >> 10); > limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes); > > if (atomic_read(&sk->sk_wmem_alloc) > limit) { This is not what I suggested. If you test this on any other network device, you'll have sk->sk_tx_completion_delay_us == 0 amount = 0 * (sk->sk_pacing_rate >> 10); --> 0 limit = max(2 * skb->truesize, amount >> 10); --> 2 * skb->truesize So non TSO/GSO NIC will not be able to queue more than 2 MSS (one MSS per skb) Then if you store only the last tx completion, you have the possibility of having a last packet of a train (say a retransmit) to make it very low. Ideally the formula would be in TCP something very fast to compute : amount = (sk->sk_pacing_rate >> 10) + sk->tx_completion_delay_cushion; limit = max(2 * skb->truesize, amount); limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes); So a 'problematic' driver would have to do the math (64 bit maths) like this : sk->tx_completion_delay_cushion = ewma_tx_delay * sk->sk_pacing_rate; -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To revive an old thread ... On Mon, 2015-02-09 at 07:11 -0800, Eric Dumazet wrote: > Ideally the formula would be in TCP something very fast to compute : > > amount = (sk->sk_pacing_rate >> 10) + sk->tx_completion_delay_cushion; > limit = max(2 * skb->truesize, amount); > limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes); > > So a 'problematic' driver would have to do the math (64 bit maths) like > this : > > > sk->tx_completion_delay_cushion = ewma_tx_delay * sk->sk_pacing_rate; The whole notion with "ewma_tx_delay" seems very wrong to me. Measuring something while also trying to control it (or something closely related) seems a bit strange, but perhaps you meant to measure something different than what Michal implemented. What he implemented was measuring the time it takes from submission to the hardware queues, but that seems to create a bad feedback cycle: Allowing it as extra transmit "cushion" will, IIUC, cause TCP to submit more data to the queues, which will in turn cause the next packets to be potentially delayed more (since there are still waiting ones) thus causing a longer delay to be measured, which in turn allows even more data to be submitted etc. IOW, while traffic is flowing this will likely cause feedback that completely removes the point of this, no? Leaving only sysctl_tcp_limit_output_bytes as the limit (*). It seems it'd be better to provide a calculated estimate, perhaps based on current transmit rate and (if available) CCA/TXOP acquisition time. johannes (*) Which, btw, isn't all that big given that a maximally sized A-MPDU is like 1MB containing close to that much actual data! Don't think that can actually be done at current transmit rates though. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 3be3a59..4ff0ae8 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -82,6 +82,7 @@ struct ath10k_skb_cb { dma_addr_t paddr; u8 eid; u8 vdev_id; + ktime_t stamp; struct { u8 tid; diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 15e47f4..5efb2a7 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -2620,6 +2620,7 @@ static void ath10k_tx(struct ieee80211_hw *hw, if (info->flags & IEEE80211_TX_CTL_NO_CCK_RATE) ath10k_dbg(ar, ATH10K_DBG_MAC, "IEEE80211_TX_CTL_NO_CCK_RATE\n"); + ATH10K_SKB_CB(skb)->stamp = ktime_get(); ATH10K_SKB_CB(skb)->htt.is_offchan = false; ATH10K_SKB_CB(skb)->htt.tid = ath10k_tx_h_get_tid(hdr); ATH10K_SKB_CB(skb)->vdev_id = ath10k_tx_h_get_vdev_id(ar, vif); diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c index 3f00cec..0d5539b 100644 --- a/drivers/net/wireless/ath/ath10k/txrx.c +++ b/drivers/net/wireless/ath/ath10k/txrx.c @@ -15,6 +15,7 @@ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +#include <net/sock.h> #include "core.h" #include "txrx.h" #include "htt.h" @@ -82,6 +83,13 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt, ath10k_report_offchan_tx(htt->ar, msdu); + if (msdu->sk) { + ACCESS_ONCE(msdu->sk->sk_tx_completion_delay_us) = + ktime_to_ns(ktime_sub(ktime_get(), + skb_cb->stamp)) / + NSEC_PER_USEC; + } + info = IEEE80211_SKB_CB(msdu); memset(&info->status, 0, sizeof(info->status)); trace_ath10k_txrx_tx_unref(ar, tx_done->msdu_id); diff --git a/include/net/sock.h b/include/net/sock.h index 2210fec..6b15d71 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -390,6 +390,7 @@ struct sock { int sk_wmem_queued; gfp_t sk_allocation; u32 sk_pacing_rate; /* bytes per second */ + u32 sk_tx_completion_delay_us; u32 sk_max_pacing_rate; netdev_features_t sk_route_caps; netdev_features_t sk_route_nocaps; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 65caf8b..5e249bf 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1996,6 +1996,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, max_segs = tcp_tso_autosize(sk, mss_now); while ((skb = tcp_send_head(sk))) { unsigned int limit; + unsigned int amount; tso_segs = tcp_init_tso_segs(sk, skb, mss_now);