diff mbox series

[2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

Message ID 1532589677-16428-3-git-send-email-wgong@codeaurora.org (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show
Series Change sk_pacing_shift in ieee80211_hw for best tx throughput | expand

Commit Message

Wen Gong July 26, 2018, 7:21 a.m. UTC
Upstream kernel has an interface to help adjust sk_pacing_shift to help
improve TCP UL throughput.
The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
VHT80 TCP UL throughput testing result shows 6 is the optimal.
Overwrite the sk_pacing_shift to 6 in ath10k driver.

Tested with QCA6174 PCI with firmware
WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
It's not a regression with new firmware releases.

There have 2 test result of different settings:

ARM CPU based device with QCA6174A PCI with different
sk_pacing_shift:

 sk_pacing_shift  throughput(Mbps)             CPU utilization
         6            500(-P5)      ~75% idle, Focus on CPU1: ~14%idle
         7            454(-P5)      ~80% idle, Focus on CPU1: ~4%idle
         8               288        ~90% idle, Focus on CPU1: ~35%idle
         9              ~200        ~92% idle, Focus on CPU1: ~50%idle

5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
set to 6:

  tcp_limit_output_bytes            throughput(Mbps)
 default(262144)+1 Stream                 336
 default(262144)+2 Streams                558
 default(262144)+3 Streams                584
 default(262144)+4 Streams                602
 default(262144)+5 Streams                598
 changed(2621440)+1 Stream                598
 changed(2621440)+2 Streams               601

Signed-off-by: Wen Gong <wgong@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/mac.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Toke Høiland-Jørgensen July 26, 2018, 11:45 a.m. UTC | #1
Wen Gong <wgong@codeaurora.org> writes:

> Upstream kernel has an interface to help adjust sk_pacing_shift to help
> improve TCP UL throughput.
> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> VHT80 TCP UL throughput testing result shows 6 is the optimal.
> Overwrite the sk_pacing_shift to 6 in ath10k driver.

When I tested this, a pacing shift of 8 was quite close to optimal as
well for ath10k. Why are you getting different results?

> Tested with QCA6174 PCI with firmware
> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
> It's not a regression with new firmware releases.
>
> There have 2 test result of different settings:
>
> ARM CPU based device with QCA6174A PCI with different
> sk_pacing_shift:
>
>  sk_pacing_shift  throughput(Mbps)             CPU utilization
>          6            500(-P5)      ~75% idle, Focus on CPU1: ~14%idle
>          7            454(-P5)      ~80% idle, Focus on CPU1: ~4%idle
>          8               288        ~90% idle, Focus on CPU1: ~35%idle
>          9              ~200        ~92% idle, Focus on CPU1: ~50%idle

Your tests do not include latency values; please try running a test that
also measures latency. The tcp_nup test in Flent (https://flent.org)
will do that, for instance. Also, is this a single TCP flow?

-Toke
Michal Kazior July 26, 2018, 1:02 p.m. UTC | #2
On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Wen Gong <wgong@codeaurora.org> writes:
>
>> Upstream kernel has an interface to help adjust sk_pacing_shift to help
>> improve TCP UL throughput.
>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
>
> When I tested this, a pacing shift of 8 was quite close to optimal as
> well for ath10k. Why are you getting different results?
>
>> Tested with QCA6174 PCI with firmware
>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
>> It's not a regression with new firmware releases.
>>
>> There have 2 test result of different settings:
>>
>> ARM CPU based device with QCA6174A PCI with different
>> sk_pacing_shift:

Different firmware releases have different tx buffering
characteristics. In some 10.2 firmware running on QCA9888 you can have
up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be
adjusted per firmware release. Maybe this should become part of the
ath10k firmware wrapping "fw features" stuff?


Michał
Wen Gong July 27, 2018, 9:29 a.m. UTC | #3
On 2018-07-26 19:45, Toke Høiland-Jørgensen wrote:
> Wen Gong <wgong@codeaurora.org> writes:
> 
>> Upstream kernel has an interface to help adjust sk_pacing_shift to 
>> help
>> improve TCP UL throughput.
>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
> 
> When I tested this, a pacing shift of 8 was quite close to optimal as
> well for ath10k. Why are you getting different results?

the default value is still 8 in the patch:
https://patchwork.kernel.org/patch/10545361/

In my test, pacing shift 6 is better than 8.
The test is for ath10k/11AC WiFi chips.
Test result is show in the commit logs before.
> 
>> Tested with QCA6174 PCI with firmware
>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 
>> PCI.
>> It's not a regression with new firmware releases.
>> 
>> There have 2 test result of different settings:
>> 
>> ARM CPU based device with QCA6174A PCI with different
>> sk_pacing_shift:
>> 
>>  sk_pacing_shift  throughput(Mbps)             CPU utilization
>>          6            500(-P5)      ~75% idle, Focus on CPU1: ~14%idle
>>          7            454(-P5)      ~80% idle, Focus on CPU1: ~4%idle
>>          8               288        ~90% idle, Focus on CPU1: ~35%idle
>>          9              ~200        ~92% idle, Focus on CPU1: ~50%idle
> 
> Your tests do not include latency values; please try running a test 
> that
> also measures latency. The tcp_nup test in Flent (https://flent.org)
> will do that, for instance. Also, is this a single TCP flow?
> 

It is not a single TCP flow, it is 500Mbps with 5 flows.

below is result show in commit log before:
5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
set to 6:

   tcp_limit_output_bytes            throughput(Mbps)
  default(262144)+1 Stream                 336
  default(262144)+2 Streams                558
  default(262144)+3 Streams                584
  default(262144)+4 Streams                602
  default(262144)+5 Streams                598
  changed(2621440)+1 Stream                598
  changed(2621440)+2 Streams               601

> -Toke
Wen Gong July 27, 2018, 9:39 a.m. UTC | #4
On 2018-07-26 21:02, Michał Kazior wrote:
> On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> Wen Gong <wgong@codeaurora.org> writes:
>> 
>>> Upstream kernel has an interface to help adjust sk_pacing_shift to 
>>> help
>>> improve TCP UL throughput.
>>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
>> 
>> When I tested this, a pacing shift of 8 was quite close to optimal as
>> well for ath10k. Why are you getting different results?
>> 
>>> Tested with QCA6174 PCI with firmware
>>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 
>>> PCI.
>>> It's not a regression with new firmware releases.
>>> 
>>> There have 2 test result of different settings:
>>> 
>>> ARM CPU based device with QCA6174A PCI with different
>>> sk_pacing_shift:
> 
> Different firmware releases have different tx buffering
> characteristics. In some 10.2 firmware running on QCA9888 you can have
> up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be
> adjusted per firmware release. Maybe this should become part of the
> ath10k firmware wrapping "fw features" stuff?
> 
recently we do not want to do like this since no test data for each 
firmware.
> 
> Michał
Michal Kazior July 27, 2018, 12:33 p.m. UTC | #5
On 27 July 2018 at 11:39, Wen Gong <wgong@codeaurora.org> wrote:
> On 2018-07-26 21:02, Michał Kazior wrote:
>>
>> On 26 July 2018 at 13:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>>
>>> Wen Gong <wgong@codeaurora.org> writes:
>>>
>>>> Upstream kernel has an interface to help adjust sk_pacing_shift to help
>>>> improve TCP UL throughput.
>>>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>>>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>>>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>>>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
>>>
>>>
>>> When I tested this, a pacing shift of 8 was quite close to optimal as
>>> well for ath10k. Why are you getting different results?
>>>
>>>> Tested with QCA6174 PCI with firmware
>>>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
>>>> It's not a regression with new firmware releases.
>>>>
>>>> There have 2 test result of different settings:
>>>>
>>>> ARM CPU based device with QCA6174A PCI with different
>>>> sk_pacing_shift:
>>
>>
>> Different firmware releases have different tx buffering
>> characteristics. In some 10.2 firmware running on QCA9888 you can have
>> up to 5ms of delayed aggregation. Ideally sk_pacing_shift should be
>> adjusted per firmware release. Maybe this should become part of the
>> ath10k firmware wrapping "fw features" stuff?
>>
> recently we do not want to do like this since no test data for each
> firmware.

All the more reason to *not* change the pacing shift from 8 to 6 for
entire ath10k because you have no idea what impact that is going to
have on other chips/firmwares, e.g. QCA4019, QCA9888X, QCA9984.


Michał
Toke Høiland-Jørgensen July 27, 2018, 8:06 p.m. UTC | #6
Wen Gong <wgong@codeaurora.org> writes:

> On 2018-07-26 19:45, Toke Høiland-Jørgensen wrote:
>> Wen Gong <wgong@codeaurora.org> writes:
>> 
>>> Upstream kernel has an interface to help adjust sk_pacing_shift to 
>>> help
>>> improve TCP UL throughput.
>>> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>>> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>>> VHT80 TCP UL throughput testing result shows 6 is the optimal.
>>> Overwrite the sk_pacing_shift to 6 in ath10k driver.
>> 
>> When I tested this, a pacing shift of 8 was quite close to optimal as
>> well for ath10k. Why are you getting different results?
>
> the default value is still 8 in the patch:
> https://patchwork.kernel.org/patch/10545361/
>
> In my test, pacing shift 6 is better than 8.
> The test is for ath10k/11AC WiFi chips.
> Test result is show in the commit logs before.
>> 
>>> Tested with QCA6174 PCI with firmware
>>> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 
>>> PCI.
>>> It's not a regression with new firmware releases.
>>> 
>>> There have 2 test result of different settings:
>>> 
>>> ARM CPU based device with QCA6174A PCI with different
>>> sk_pacing_shift:
>>> 
>>>  sk_pacing_shift  throughput(Mbps)             CPU utilization
>>>          6            500(-P5)      ~75% idle, Focus on CPU1: ~14%idle
>>>          7            454(-P5)      ~80% idle, Focus on CPU1: ~4%idle
>>>          8               288        ~90% idle, Focus on CPU1: ~35%idle
>>>          9              ~200        ~92% idle, Focus on CPU1: ~50%idle
>> 
>> Your tests do not include latency values; please try running a test 
>> that
>> also measures latency. The tcp_nup test in Flent (https://flent.org)
>> will do that, for instance. Also, is this a single TCP flow?
>> 
>
> It is not a single TCP flow, it is 500Mbps with 5 flows.
>
> below is result show in commit log before:
> 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
> set to 6:
>
>    tcp_limit_output_bytes            throughput(Mbps)
>   default(262144)+1 Stream                 336
>   default(262144)+2 Streams                558
>   default(262144)+3 Streams                584
>   default(262144)+4 Streams                602
>   default(262144)+5 Streams                598
>   changed(2621440)+1 Stream                598
>   changed(2621440)+2 Streams               601

This is useless without latency numbers. The whole point of
sk_pacing_shift is to control the tradeoff between latency and
throughput. You're only showing the throughput, so it's impossible to
judge if setting the pacing shift to 6 is right (and from your results I
suspect the sweet spot is actually 7).

-Toke
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index f31ae3b..40d24c1 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -8348,6 +8348,8 @@  int ath10k_mac_register(struct ath10k *ar)
 	ar->hw->wiphy->max_scan_ssids = WLAN_SCAN_PARAMS_MAX_SSID;
 	ar->hw->wiphy->max_scan_ie_len = WLAN_SCAN_PARAMS_MAX_IE_LEN;
 
+	ar->hw->tx_sk_pacing_shift = 6;
+
 	ar->hw->vif_data_size = sizeof(struct ath10k_vif);
 	ar->hw->sta_data_size = sizeof(struct ath10k_sta);
 	ar->hw->txq_data_size = sizeof(struct ath10k_txq);