diff mbox series

mac80211: only schedule TXQ when reasonable airtime reporting

Message ID c48c3555ab2261d6b6674ac7de8203359b80b127.1612529311.git.ryder.lee@mediatek.com (mailing list archive)
State Changes Requested
Delegated to: Johannes Berg
Headers show
Series mac80211: only schedule TXQ when reasonable airtime reporting | expand

Commit Message

Ryder Lee Feb. 5, 2021, 12:55 p.m. UTC
For some drivers and hardware may report faulty airtime, which ends up
with excessive hold time (~0.9s on mt7915 multiclent tests) impacting
system performance.

Although issue has been fixed in driver, but it make sense to select txqi
depends on a reasonable airtime reporting to prevent such a case from
happening again.

Tested-by: Jiao Bo <jiao.bao@mediatek.com>
Tested-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
---
 net/mac80211/tx.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Toke Høiland-Jørgensen Feb. 5, 2021, 1:29 p.m. UTC | #1
Ryder Lee <ryder.lee@mediatek.com> writes:

> For some drivers and hardware may report faulty airtime, which ends up
> with excessive hold time (~0.9s on mt7915 multiclent tests) impacting
> system performance.
>
> Although issue has been fixed in driver, but it make sense to select txqi
> depends on a reasonable airtime reporting to prevent such a case from
> happening again.

I think I see what you're trying to do with the patch, but this commit
message makes no sense. What, exactly, was the error you were seeing
that this is supposed to fix?

> Tested-by: Jiao Bo <jiao.bao@mediatek.com>
> Tested-by: Sujuan Chen <sujuan.chen@mediatek.com>
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> ---
>  net/mac80211/tx.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 6422da6690f7..0b8a8c3600f4 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3770,6 +3770,10 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
>  				sta->airtime_weight;
>  
>  		if (deficit < 0 || !aql_check) {
> +			if (txqi->schedule_round == local->schedule_round[ac])
> +				goto out;
> +
> +			txqi->schedule_round = local->schedule_round[ac];

I think this change may be worth making anyway, but for a different
reason: Without it, a station that fails aql_check will keep getting
recycled through the list, advancing its deficit. Which could actually
be the reason AQL breaks airtime fairness; did you observe any
difference in fairness with this change?

-Toke
Ryder Lee Feb. 7, 2021, 2:41 a.m. UTC | #2
On Fri, 2021-02-05 at 14:29 +0100, Toke Høiland-Jørgensen wrote:
> Ryder Lee <ryder.lee@mediatek.com> writes:
> 
> > For some drivers and hardware may report faulty airtime, which ends up
> > with excessive hold time (~0.9s on mt7915 multiclent tests) impacting
> > system performance.
> >
> > Although issue has been fixed in driver, but it make sense to select txqi
> > depends on a reasonable airtime reporting to prevent such a case from
> > happening again.
> 
> I think I see what you're trying to do with the patch, but this commit
> message makes no sense. What, exactly, was the error you were seeing
> that this is supposed to fix?

I will make commit message more straightforward - if a station takes
large amount of airtime and fails the check that will keep getting
recycled through the list along with excessive lock hold time. Add this
patch to avoid breaking fairness.

> > Tested-by: Jiao Bo <jiao.bao@mediatek.com>
> > Tested-by: Sujuan Chen <sujuan.chen@mediatek.com>
> > Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> > ---
> >  net/mac80211/tx.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > index 6422da6690f7..0b8a8c3600f4 100644
> > --- a/net/mac80211/tx.c
> > +++ b/net/mac80211/tx.c
> > @@ -3770,6 +3770,10 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
> >  				sta->airtime_weight;
> >  
> >  		if (deficit < 0 || !aql_check) {
> > +			if (txqi->schedule_round == local->schedule_round[ac])
> > +				goto out;
> > +
> > +			txqi->schedule_round = local->schedule_round[ac];
> 
> I think this change may be worth making anyway, but for a different
> reason: Without it, a station that fails aql_check will keep getting
> recycled through the list, advancing its deficit. Which could actually
> be the reason AQL breaks airtime fairness; did you observe any
> difference in fairness with this change?

Our case is: mt7915 provides per-peer airtime counters. However, some of
them were not properly configured, so certain stations reported large
amount of airtime which led to deficit < 0, and as you said, ending up
with recycle + very longer lock hold time (0.9s in our tests) and
breaking fairness.


Ryder
Ryder Lee Feb. 8, 2021, 2:53 p.m. UTC | #3
On Sun, 2021-02-07 at 10:41 +0800, Ryder Lee wrote:
> On Fri, 2021-02-05 at 14:29 +0100, Toke Høiland-Jørgensen wrote:

> > > @@ -3770,6 +3770,10 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
> > >  				sta->airtime_weight;
> > >  
> > >  		if (deficit < 0 || !aql_check) {
> > > +			if (txqi->schedule_round == local->schedule_round[ac])
> > > +				goto out;
> > > +
> > > +			txqi->schedule_round = local->schedule_round[ac];
> > 
> > I think this change may be worth making anyway, but for a different
> > reason: Without it, a station that fails aql_check will keep getting
> > recycled through the list, advancing its deficit. Which could actually
> > be the reason AQL breaks airtime fairness; did you observe any
> > difference in fairness with this change?
> 
> Our case is: mt7915 provides per-peer airtime counters. However, some of
> them were not properly configured, so certain stations reported large
> amount of airtime which led to deficit < 0, and as you said, ending up
> with recycle + very longer lock hold time (0.9s in our tests) and
> breaking fairness.
> 
> 
Found a problem when we are in low traffic with this patch.This will
increase latency (i.e ping)


So, we have to

	if (deficit < 0 || !aql_check) {
		if (txqi->schedule_round == local->schedule_round[ac])
			// re-schedule
			goto out;
			....
	}
}

if (txqi->schedule_round == local->schedule_round[ac])
	// re-schedule
	goto out;

Ryder
Toke Høiland-Jørgensen Feb. 8, 2021, 3:53 p.m. UTC | #4
Ryder Lee <ryder.lee@mediatek.com> writes:

> On Sun, 2021-02-07 at 10:41 +0800, Ryder Lee wrote:
>> On Fri, 2021-02-05 at 14:29 +0100, Toke Høiland-Jørgensen wrote:
>
>> > > @@ -3770,6 +3770,10 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
>> > >  				sta->airtime_weight;
>> > >  
>> > >  		if (deficit < 0 || !aql_check) {
>> > > +			if (txqi->schedule_round == local->schedule_round[ac])
>> > > +				goto out;
>> > > +
>> > > +			txqi->schedule_round = local->schedule_round[ac];
>> > 
>> > I think this change may be worth making anyway, but for a different
>> > reason: Without it, a station that fails aql_check will keep getting
>> > recycled through the list, advancing its deficit. Which could actually
>> > be the reason AQL breaks airtime fairness; did you observe any
>> > difference in fairness with this change?
>> 
>> Our case is: mt7915 provides per-peer airtime counters. However, some of
>> them were not properly configured, so certain stations reported large
>> amount of airtime which led to deficit < 0, and as you said, ending up
>> with recycle + very longer lock hold time (0.9s in our tests) and
>> breaking fairness.

First of all, if the driver reports wrong airtime values, of course it
is going to affect fairness. The right thing in that case is to fix the
driver, or turn off reporting if it can't be fixed.

> Found a problem when we are in low traffic with this patch.This will
> increase latency (i.e ping)
>
>
> So, we have to
>
> 	if (deficit < 0 || !aql_check) {
> 		if (txqi->schedule_round == local->schedule_round[ac])
> 			// re-schedule

You mean, signal the driver to start over? But then you're just undoing
the check you just inserted here...


...and thinking about it a bit more, I don't actually think adding this
check is the right thing to do. As you've just discovered, the deficit
scheduler relies on the "goto begin" below (and thus being able to
keep spinning and increasing the deficit) to make progress. So if you
short-circuit that, you'll get blocking, but if you keep rotating the
queues for other reasons (like AQL does) you no longer get fairness.

Ultimately this comes from using two different sources of airtime:
predicted values (in AQL) and after-the-fact reporting (in the fairness
scheduler). There's a time lag between when these two values are
applied, which leads to the fairness scheduler insisting that a station
should be the next one to transmit even though AQL is blocking it.

Hmm, I wonder what would happen if we just accounted the AQL balance in
the fairness deficit as well? Something like the patch below
(compile-tested only). I'm not sure what the effect of running the
deficit backwards like this is; we may get weird oscillating values when
we subtract the AQL value and the "real" value hasn't been accounted
yet. But it may also turn out to not be a big issue; worth testing,
maybe?

The alternative would be to switch to using only the AQL values for
fairness as well; if the AQL predictions are reasonably accurate this
would likely work well enough. Got any idea how much they are off?

-Toke

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index ec6973ee88ef..86718a6429e6 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1893,12 +1893,10 @@ void ieee80211_sta_set_buffered(struct ieee80211_sta *pubsta,
 }
 EXPORT_SYMBOL(ieee80211_sta_set_buffered);
 
-void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
-                                   u32 tx_airtime, u32 rx_airtime)
+static void __ieee80211_sta_register_airtime(struct ieee80211_local *local,
+                                            struct sta_info *sta, u8 ac,
+                                            u32 tx_airtime, u32 rx_airtime)
 {
-       struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
-       struct ieee80211_local *local = sta->sdata->local;
-       u8 ac = ieee80211_ac_from_tid(tid);
        u32 airtime = 0;
 
        if (sta->local->airtime_flags & AIRTIME_USE_TX)
@@ -1912,6 +1910,16 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
        sta->airtime[ac].deficit -= airtime;
        spin_unlock_bh(&local->active_txq_lock[ac]);
 }
+
+void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
+                                   u32 tx_airtime, u32 rx_airtime)
+{
+       struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
+       struct ieee80211_local *local = sta->sdata->local;
+       u8 ac = ieee80211_ac_from_tid(tid);
+
+       __ieee80211_sta_register_airtime(local, sta, ac, tx_airtime, rx_airtime);
+}
 EXPORT_SYMBOL(ieee80211_sta_register_airtime);
 
 void ieee80211_sta_update_pending_airtime(struct ieee80211_local *local,
@@ -1924,9 +1932,11 @@ void ieee80211_sta_update_pending_airtime(struct ieee80211_local *local,
                return;
 
        if (!tx_completed) {
-               if (sta)
+               if (sta) {
                        atomic_add(tx_airtime,
                                   &sta->airtime[ac].aql_tx_pending);
+                       __ieee80211_sta_register_airtime(local, sta, ac, tx_airtime, 0);
+               }
 
                atomic_add(tx_airtime, &local->aql_total_pending_airtime);
                return;
@@ -1938,6 +1948,7 @@ void ieee80211_sta_update_pending_airtime(struct ieee80211_local *local,
                if (tx_pending < 0)
                        atomic_cmpxchg(&sta->airtime[ac].aql_tx_pending,
                                       tx_pending, 0);
+               __ieee80211_sta_register_airtime(local, sta, ac, -tx_airtime, 0);
        }
 
        tx_pending = atomic_sub_return(tx_airtime,
diff mbox series

Patch

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 6422da6690f7..0b8a8c3600f4 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3770,6 +3770,10 @@  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
 				sta->airtime_weight;
 
 		if (deficit < 0 || !aql_check) {
+			if (txqi->schedule_round == local->schedule_round[ac])
+				goto out;
+
+			txqi->schedule_round = local->schedule_round[ac];
 			list_move_tail(&txqi->schedule_order,
 				       &local->active_txqs[txqi->txq.ac]);
 			goto begin;