diff mbox series

a nuking the mac80211 changing codel parameters patch

Message ID CAA93jw6NJ2cmLmMauz0xAgC2MGbBq6n0ZiZzAdkK0u4b+O2yXg@mail.gmail.com (mailing list archive)
State RFC
Delegated to: Johannes Berg
Headers show
Series a nuking the mac80211 changing codel parameters patch | expand

Commit Message

Dave Taht Dec. 20, 2022, 8:24 p.m. UTC
This is the single, most buggy, piece of code in "my" portion of wifi
today. It is so wrong, yet thus far I cannot get it out of linux or
find an acceptable substitute. It makes it hard to sleep at night
knowing this code has been so wrong... and now in millions , maybe
even 10s of millions, of devices by now.... Since I've been ranting
about the wrongness of this for years, I keep hoping that we can
excise it, especially for wifi6 devices and even more especially on
6ghz spectrum... but just about everything, somehow, would benefit
hugely if we could somehow do more of the right thing here.

I'd tried, last time I got this bee in my bonnet, tried to nuke this call here:

https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605/

As it is, I really encourage folk, especially with mt79 and to some
extent mt76 ac or ath10k, to try out the attached patch, measure tcp
rtts, and throughput, etc. A slightly less aggressive patch might suit
wifi-n....

Maybe there's a reason for keeping this code in linux wifi that I do
not understand. But here are my pithy comments as to why this part of
mac80211 is so wrong...

 static void sta_update_codel_params(struct sta_info *sta, u32 thr)
 {
-       if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {

1) sta->local->num_sta is the number of associated, rather than
active, stations. "Active" stations in the last 50ms or so, might have
been a better thing to use, but as most people have far more than that
associated, we end up with really lousy codel parameters, all the
time. Mistake numero uno!

2) The STA_SLOW_THRESHOLD was completely arbitrary in 2016.

-               sta->cparams.target = MS2TIME(50);

This, by itself, was probably not too bad. 30ms might have been
better, at the time, when we were battling powersave etc, but 20ms was
enough,
really, to cover most scenarios, even where we had low rate 2Ghz
multicast to cope with. Even then, codel has a hard time finding any
sane drop
rate at all, with a target this high.

-               sta->cparams.interval = MS2TIME(300);

But this was horrible, a total mistake, that is leading to codel being
completely ineffective in almost any scenario on clients or APS.
100ms, even 80ms, here, would be vastly better than this insanity. I'm
seeing 5+seconds of delay accumulated in a bunch of otherwise happily
fq-ing APs....

100ms of  observed jitter during a flow is enough. Certainly (in 2016)
there were interactions with powersave that I did not understand, and
still don't, but
if you are transmitting in the first place, powersave shouldn't be a
problemmmm.....

-               sta->cparams.ecn = false;

At the time we were pretty nervous about ecn, I'm kind of sanguine
about it now, and reliably indicating ecn seems better than turning it
off for
any reason.

-       } else {
-               sta->cparams.target = MS2TIME(20);
-               sta->cparams.interval = MS2TIME(100);
-               sta->cparams.ecn = true;
-       }

And if we aint gonna fiddle with these, we don't need these either.

In production, on p2p wireless, I've had 8ms and 80ms for target and
interval for years now, and it works great. It is obviously too low,
for those that
prize bandwidth over latency (I personally would prefer TXOPs shrink
intelligently as well as bandwidth, as you add stations, some of which
happens naturally by fq-codels scheduling mechanisms, others don't, I
even run with 2ms txops by default on everything myself)

+       return;

Ideally we could kill this entire call off entirely.

 }

A pre-thx for anyone actually trying the attached patch and reporting
back on any results.

https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605/

Comments

Dave Taht Aug. 4, 2023, 11:03 a.m. UTC | #1
I cannot help but wonder what the results have been on people applying
this patch over the past 6 months?

On Tue, Dec 20, 2022 at 12:24 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> This is the single, most buggy, piece of code in "my" portion of wifi
> today. It is so wrong, yet thus far I cannot get it out of linux or
> find an acceptable substitute. It makes it hard to sleep at night
> knowing this code has been so wrong... and now in millions , maybe
> even 10s of millions, of devices by now.... Since I've been ranting
> about the wrongness of this for years, I keep hoping that we can
> excise it, especially for wifi6 devices and even more especially on
> 6ghz spectrum... but just about everything, somehow, would benefit
> hugely if we could somehow do more of the right thing here.
>
> I'd tried, last time I got this bee in my bonnet, tried to nuke this call here:
>
> https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605/
>
> As it is, I really encourage folk, especially with mt79 and to some
> extent mt76 ac or ath10k, to try out the attached patch, measure tcp
> rtts, and throughput, etc. A slightly less aggressive patch might suit
> wifi-n....
>
> Maybe there's a reason for keeping this code in linux wifi that I do
> not understand. But here are my pithy comments as to why this part of
> mac80211 is so wrong...
>
>  static void sta_update_codel_params(struct sta_info *sta, u32 thr)
>  {
> -       if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
>
> 1) sta->local->num_sta is the number of associated, rather than
> active, stations. "Active" stations in the last 50ms or so, might have
> been a better thing to use, but as most people have far more than that
> associated, we end up with really lousy codel parameters, all the
> time. Mistake numero uno!
>
> 2) The STA_SLOW_THRESHOLD was completely arbitrary in 2016.
>
> -               sta->cparams.target = MS2TIME(50);
>
> This, by itself, was probably not too bad. 30ms might have been
> better, at the time, when we were battling powersave etc, but 20ms was
> enough,
> really, to cover most scenarios, even where we had low rate 2Ghz
> multicast to cope with. Even then, codel has a hard time finding any
> sane drop
> rate at all, with a target this high.
>
> -               sta->cparams.interval = MS2TIME(300);
>
> But this was horrible, a total mistake, that is leading to codel being
> completely ineffective in almost any scenario on clients or APS.
> 100ms, even 80ms, here, would be vastly better than this insanity. I'm
> seeing 5+seconds of delay accumulated in a bunch of otherwise happily
> fq-ing APs....
>
> 100ms of  observed jitter during a flow is enough. Certainly (in 2016)
> there were interactions with powersave that I did not understand, and
> still don't, but
> if you are transmitting in the first place, powersave shouldn't be a
> problemmmm.....
>
> -               sta->cparams.ecn = false;
>
> At the time we were pretty nervous about ecn, I'm kind of sanguine
> about it now, and reliably indicating ecn seems better than turning it
> off for
> any reason.
>
> -       } else {
> -               sta->cparams.target = MS2TIME(20);
> -               sta->cparams.interval = MS2TIME(100);
> -               sta->cparams.ecn = true;
> -       }
>
> And if we aint gonna fiddle with these, we don't need these either.
>
> In production, on p2p wireless, I've had 8ms and 80ms for target and
> interval for years now, and it works great. It is obviously too low,
> for those that
> prize bandwidth over latency (I personally would prefer TXOPs shrink
> intelligently as well as bandwidth, as you add stations, some of which
> happens naturally by fq-codels scheduling mechanisms, others don't, I
> even run with 2ms txops by default on everything myself)
>
> +       return;
>
> Ideally we could kill this entire call off entirely.
>
>  }
>
> A pre-thx for anyone actually trying the attached patch and reporting
> back on any results.
>
> https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605/
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
diff mbox series

Patch

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 97d24e9..457209a 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -2766,15 +2766,7 @@  unsigned long ieee80211_sta_last_active(struct sta_info *sta)
 
 static void sta_update_codel_params(struct sta_info *sta, u32 thr)
 {
-	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
-		sta->cparams.target = MS2TIME(50);
-		sta->cparams.interval = MS2TIME(300);
-		sta->cparams.ecn = false;
-	} else {
-		sta->cparams.target = MS2TIME(20);
-		sta->cparams.interval = MS2TIME(100);
-		sta->cparams.ecn = true;
-	}
+	return;
 }
 
 void ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,