Message ID | 1532589677-16428-3-git-send-email-wgong@codeaurora.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Johannes Berg |
Headers | show |
Series | Change sk_pacing_shift in ieee80211_hw for best tx throughput | expand |
Wen Gong <wgong@codeaurora.org> writes: > Upstream kernel has an interface to help adjust sk_pacing_shift to help > improve TCP UL throughput. > The sk_pacing_shift is 8 in mac80211, this is based on test with 11N > WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC > VHT80 TCP UL throughput testing result shows 6 is the optimal. > Overwrite the sk_pacing_shift to 6 in ath10k driver. When I tested this, a pacing shift of 8 was quite close to optimal as well for ath10k. Why are you getting different results? > Tested with QCA6174 PCI with firmware > WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI. > It's not a regression with new firmware releases. > > There have 2 test result of different settings: > > ARM CPU based device with QCA6174A PCI with different > sk_pacing_shift: > > sk_pacing_shift throughput(Mbps) CPU utilization > 6 500(-P5) ~75% idle, Focus on CPU1: ~14%idle > 7 454(-P5) ~80% idle, Focus on CPU1: ~4%idle > 8 288 ~90% idle, Focus on CPU1: ~35%idle > 9 ~200 ~92% idle, Focus on CPU1: ~50%idle Your tests do not include latency values; please try running a test that also measures latency. The tcp_nup test in Flent (https://flent.org) will do that, for instance. Also, is this a single TCP flow? -Toke
On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote: > Wen Gong <wgong@codeaurora.org> writes: > >> Upstream kernel has an interface to help adjust sk_pacing_shift to help >> improve TCP UL throughput. >> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N >> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC >> VHT80 TCP UL throughput testing result shows 6 is the optimal. >> Overwrite the sk_pacing_shift to 6 in ath10k driver. > > When I tested this, a pacing shift of 8 was quite close to optimal as > well for ath10k. Why are you getting different results? > >> Tested with QCA6174 PCI with firmware >> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI. >> It's not a regression with new firmware releases. >> >> There have 2 test result of different settings: >> >> ARM CPU based device with QCA6174A PCI with different >> sk_pacing_shift: Different firmware releases have different tx buffering characteristics. In some 10.2 firmware running on QCA9888 you can have up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be adjusted per firmware release. Maybe this should become part of the ath10k firmware wrapping "fw features" stuff? Michał
On 2018-07-26 19:45, Toke Høiland-Jørgensen wrote: > Wen Gong <wgong@codeaurora.org> writes: > >> Upstream kernel has an interface to help adjust sk_pacing_shift to >> help >> improve TCP UL throughput. >> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N >> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC >> VHT80 TCP UL throughput testing result shows 6 is the optimal. >> Overwrite the sk_pacing_shift to 6 in ath10k driver. > > When I tested this, a pacing shift of 8 was quite close to optimal as > well for ath10k. Why are you getting different results? the default value is still 8 in the patch: https://patchwork.kernel.org/patch/10545361/ In my test, pacing shift 6 is better than 8. The test is for ath10k/11AC WiFi chips. Test result is show in the commit logs before. > >> Tested with QCA6174 PCI with firmware >> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 >> PCI. >> It's not a regression with new firmware releases. >> >> There have 2 test result of different settings: >> >> ARM CPU based device with QCA6174A PCI with different >> sk_pacing_shift: >> >> sk_pacing_shift throughput(Mbps) CPU utilization >> 6 500(-P5) ~75% idle, Focus on CPU1: ~14%idle >> 7 454(-P5) ~80% idle, Focus on CPU1: ~4%idle >> 8 288 ~90% idle, Focus on CPU1: ~35%idle >> 9 ~200 ~92% idle, Focus on CPU1: ~50%idle > > Your tests do not include latency values; please try running a test > that > also measures latency. The tcp_nup test in Flent (https://flent.org) > will do that, for instance. Also, is this a single TCP flow? > It is not a single TCP flow, it is 500Mbps with 5 flows. below is result show in commit log before: 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift set to 6: tcp_limit_output_bytes throughput(Mbps) default(262144)+1 Stream 336 default(262144)+2 Streams 558 default(262144)+3 Streams 584 default(262144)+4 Streams 602 default(262144)+5 Streams 598 changed(2621440)+1 Stream 598 changed(2621440)+2 Streams 601 > -Toke
On 2018-07-26 21:02, Michał Kazior wrote: > On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote: >> Wen Gong <wgong@codeaurora.org> writes: >> >>> Upstream kernel has an interface to help adjust sk_pacing_shift to >>> help >>> improve TCP UL throughput. >>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N >>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC >>> VHT80 TCP UL throughput testing result shows 6 is the optimal. >>> Overwrite the sk_pacing_shift to 6 in ath10k driver. >> >> When I tested this, a pacing shift of 8 was quite close to optimal as >> well for ath10k. Why are you getting different results? >> >>> Tested with QCA6174 PCI with firmware >>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 >>> PCI. >>> It's not a regression with new firmware releases. >>> >>> There have 2 test result of different settings: >>> >>> ARM CPU based device with QCA6174A PCI with different >>> sk_pacing_shift: > > Different firmware releases have different tx buffering > characteristics. In some 10.2 firmware running on QCA9888 you can have > up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be > adjusted per firmware release. Maybe this should become part of the > ath10k firmware wrapping "fw features" stuff? > recently we do not want to do like this since no test data for each firmware. > > Michał
On 27 July 2018 at 11:39, Wen Gong <wgong@codeaurora.org> wrote: > On 2018-07-26 21:02, Michał Kazior wrote: >> >> On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote: >>> >>> Wen Gong <wgong@codeaurora.org> writes: >>> >>>> Upstream kernel has an interface to help adjust sk_pacing_shift to help >>>> improve TCP UL throughput. >>>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N >>>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC >>>> VHT80 TCP UL throughput testing result shows 6 is the optimal. >>>> Overwrite the sk_pacing_shift to 6 in ath10k driver. >>> >>> >>> When I tested this, a pacing shift of 8 was quite close to optimal as >>> well for ath10k. Why are you getting different results? >>> >>>> Tested with QCA6174 PCI with firmware >>>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI. >>>> It's not a regression with new firmware releases. >>>> >>>> There have 2 test result of different settings: >>>> >>>> ARM CPU based device with QCA6174A PCI with different >>>> sk_pacing_shift: >> >> >> Different firmware releases have different tx buffering >> characteristics. In some 10.2 firmware running on QCA9888 you can have >> up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be >> adjusted per firmware release. Maybe this should become part of the >> ath10k firmware wrapping "fw features" stuff? >> > recently we do not want to do like this since no test data for each > firmware. All the more reason to *not* change the pacing shift from 8 to 6 for entire ath10k because you have no idea what impact that is going to have on other chips/firmwares, e.g. QCA4019, QCA9888X, QCA9984. Michał
Wen Gong <wgong@codeaurora.org> writes: > On 2018-07-26 19:45, Toke Høiland-Jørgensen wrote: >> Wen Gong <wgong@codeaurora.org> writes: >> >>> Upstream kernel has an interface to help adjust sk_pacing_shift to >>> help >>> improve TCP UL throughput. >>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N >>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC >>> VHT80 TCP UL throughput testing result shows 6 is the optimal. >>> Overwrite the sk_pacing_shift to 6 in ath10k driver. >> >> When I tested this, a pacing shift of 8 was quite close to optimal as >> well for ath10k. Why are you getting different results? > > the default value is still 8 in the patch: > https://patchwork.kernel.org/patch/10545361/ > > In my test, pacing shift 6 is better than 8. > The test is for ath10k/11AC WiFi chips. > Test result is show in the commit logs before. >> >>> Tested with QCA6174 PCI with firmware >>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 >>> PCI. >>> It's not a regression with new firmware releases. >>> >>> There have 2 test result of different settings: >>> >>> ARM CPU based device with QCA6174A PCI with different >>> sk_pacing_shift: >>> >>> sk_pacing_shift throughput(Mbps) CPU utilization >>> 6 500(-P5) ~75% idle, Focus on CPU1: ~14%idle >>> 7 454(-P5) ~80% idle, Focus on CPU1: ~4%idle >>> 8 288 ~90% idle, Focus on CPU1: ~35%idle >>> 9 ~200 ~92% idle, Focus on CPU1: ~50%idle >> >> Your tests do not include latency values; please try running a test >> that >> also measures latency. The tcp_nup test in Flent (https://flent.org) >> will do that, for instance. Also, is this a single TCP flow? >> > > It is not a single TCP flow, it is 500Mbps with 5 flows. > > below is result show in commit log before: > 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift > set to 6: > > tcp_limit_output_bytes throughput(Mbps) > default(262144)+1 Stream 336 > default(262144)+2 Streams 558 > default(262144)+3 Streams 584 > default(262144)+4 Streams 602 > default(262144)+5 Streams 598 > changed(2621440)+1 Stream 598 > changed(2621440)+2 Streams 601 This is useless without latency numbers. The whole point of sk_pacing_shift is to control the tradeoff between latency and throughput. You're only showing the throughput, so it's impossible to judge if setting the pacing shift to 6 is right (and from your results I suspect the sweet spot is actually 7). -Toke
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index f31ae3b..40d24c1 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -8348,6 +8348,8 @@ int ath10k_mac_register(struct ath10k *ar) ar->hw->wiphy->max_scan_ssids = WLAN_SCAN_PARAMS_MAX_SSID; ar->hw->wiphy->max_scan_ie_len = WLAN_SCAN_PARAMS_MAX_IE_LEN; + ar->hw->tx_sk_pacing_shift = 6; + ar->hw->vif_data_size = sizeof(struct ath10k_vif); ar->hw->sta_data_size = sizeof(struct ath10k_sta); ar->hw->txq_data_size = sizeof(struct ath10k_txq);
Upstream kernel has an interface to help adjust sk_pacing_shift to help improve TCP UL throughput. The sk_pacing_shift is 8 in mac80211, this is based on test with 11N WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC VHT80 TCP UL throughput testing result shows 6 is the optimal. Overwrite the sk_pacing_shift to 6 in ath10k driver. Tested with QCA6174 PCI with firmware WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI. It's not a regression with new firmware releases. There have 2 test result of different settings: ARM CPU based device with QCA6174A PCI with different sk_pacing_shift: sk_pacing_shift throughput(Mbps) CPU utilization 6 500(-P5) ~75% idle, Focus on CPU1: ~14%idle 7 454(-P5) ~80% idle, Focus on CPU1: ~4%idle 8 288 ~90% idle, Focus on CPU1: ~35%idle 9 ~200 ~92% idle, Focus on CPU1: ~50%idle 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift set to 6: tcp_limit_output_bytes throughput(Mbps) default(262144)+1 Stream 336 default(262144)+2 Streams 558 default(262144)+3 Streams 584 default(262144)+4 Streams 602 default(262144)+5 Streams 598 changed(2621440)+1 Stream 598 changed(2621440)+2 Streams 601 Signed-off-by: Wen Gong <wgong@codeaurora.org> --- drivers/net/wireless/ath/ath10k/mac.c | 2 ++ 1 file changed, 2 insertions(+)