mbox series

[RFC,0/2] ath10k: provide survey info as accumulated data

Message ID 20190918124259.17804-1-sven@narfation.org (mailing list archive)
Headers show
Series ath10k: provide survey info as accumulated data | expand

Message

Sven Eckelmann Sept. 18, 2019, 12:42 p.m. UTC
From: Sven Eckelmann <seckelmann@datto.com>

Hi,

it was observed that ath9k provides accumulated survey counters but ath10k
neither provides deltas nor accumulated counters. Instead it returns
some value which was returned at some point from the firmware.

But as it turns out, this data is not reliable. To make it more useful,
ath10k has to:

* retrieve counters rather frequently for hardware which is known to use
  firmware versions with low number counter bits (for only 14-30s)
* clean up received counter values 
* accumulate counters from firmware

A comparison of the resulting output with these fixes can be seen under
https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86749f4d60&fullscreen&panelId=5&from=1568782046974&to=1568807068706

The left side of the graph shows the output before the patches were applied
and the right side the output with the patches applied. Just as reference, an
ath9k device in the same building is
https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86740037e0&fullscreen&panelId=5&from=1568782046974&to=1568807068706

Kind regards,
	Sven

Sven Eckelmann (2):
  ath10k: report survey info as accumulated values
  ath10k: regularly fetch survey counters

 drivers/net/wireless/ath/ath10k/core.c |  8 ++++
 drivers/net/wireless/ath/ath10k/core.h |  3 ++
 drivers/net/wireless/ath/ath10k/hw.c   | 13 +++--
 drivers/net/wireless/ath/ath10k/mac.c  | 52 ++++++++++++++++++++
 drivers/net/wireless/ath/ath10k/mac.h  |  3 ++
 drivers/net/wireless/ath/ath10k/wmi.c  | 66 ++++++++++++++++++++++----
 6 files changed, 130 insertions(+), 15 deletions(-)

Comments

Kalle Valo Oct. 11, 2019, 8:44 a.m. UTC | #1
Sven Eckelmann <sven@narfation.org> writes:

> it was observed that ath9k provides accumulated survey counters but ath10k
> neither provides deltas nor accumulated counters. Instead it returns
> some value which was returned at some point from the firmware.
>
> But as it turns out, this data is not reliable. To make it more useful,
> ath10k has to:
>
> * retrieve counters rather frequently for hardware which is known to use
>   firmware versions with low number counter bits (for only 14-30s)
> * clean up received counter values 
> * accumulate counters from firmware
>
> A comparison of the resulting output with these fixes can be seen under
> https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86749f4d60&fullscreen&panelId=5&from=1568782046974&to=1568807068706
>
> The left side of the graph shows the output before the patches were applied
> and the right side the output with the patches applied. Just as reference, an
> ath9k device in the same building is
> https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86740037e0&fullscreen&panelId=5&from=1568782046974&to=1568807068706

Thanks, this looks very good to me and I had only cosmetic comments.
Please submit next version without RFC so that I can apply these.
Sebastian Gottschall Oct. 13, 2019, 10:15 p.m. UTC | #2
i checked your patch on 10.4 based chipsets with 9984. the values are 
now looking bogus and wrong at all. busy and active time time in ms does 
increase in hours each second
the problem seem to be that your patch is 10.2.4 only related. 
ath_clean_survey does not trigger on 10.4 so the values double itself 
each time the event raises since you add the full values and not just a 
delta on top

Sebastian

Am 18.09.2019 um 14:42 schrieb Sven Eckelmann:
> From: Sven Eckelmann <seckelmann@datto.com>
>
> Hi,
>
> it was observed that ath9k provides accumulated survey counters but ath10k
> neither provides deltas nor accumulated counters. Instead it returns
> some value which was returned at some point from the firmware.
>
> But as it turns out, this data is not reliable. To make it more useful,
> ath10k has to:
>
> * retrieve counters rather frequently for hardware which is known to use
>    firmware versions with low number counter bits (for only 14-30s)
> * clean up received counter values
> * accumulate counters from firmware
>
> A comparison of the resulting output with these fixes can be seen under
> https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86749f4d60&fullscreen&panelId=5&from=1568782046974&to=1568807068706
>
> The left side of the graph shows the output before the patches were applied
> and the right side the output with the patches applied. Just as reference, an
> ath9k device in the same building is
> https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1&var-node=ac86740037e0&fullscreen&panelId=5&from=1568782046974&to=1568807068706
>
> Kind regards,
> 	Sven
>
> Sven Eckelmann (2):
>    ath10k: report survey info as accumulated values
>    ath10k: regularly fetch survey counters
>
>   drivers/net/wireless/ath/ath10k/core.c |  8 ++++
>   drivers/net/wireless/ath/ath10k/core.h |  3 ++
>   drivers/net/wireless/ath/ath10k/hw.c   | 13 +++--
>   drivers/net/wireless/ath/ath10k/mac.c  | 52 ++++++++++++++++++++
>   drivers/net/wireless/ath/ath10k/mac.h  |  3 ++
>   drivers/net/wireless/ath/ath10k/wmi.c  | 66 ++++++++++++++++++++++----
>   6 files changed, 130 insertions(+), 15 deletions(-)
>
Sven Eckelmann Oct. 14, 2019, 7:07 a.m. UTC | #3
On Monday, 14 October 2019 00:15:20 CEST Sebastian Gottschall wrote:
> i checked your patch on 10.4 based chipsets with 9984. the values are 
> now looking bogus and wrong at all. busy and active time time in ms does 
> increase in hours each second
> the problem seem to be that your patch is 10.2.4 only related. 
> ath_clean_survey does not trigger on 10.4 so the values double itself 
> each time the event raises since you add the full values and not just a 
> delta on top

Thanks for the feedback. So we have now a firmware 10.2.4 which is counting 
busy + active up and has wraparound problems. And then we have a 10.4 firmware 
(on QCA9888 and QCA4019) which is clearing everything as expected with 
WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR  and then we have some 10.4 firmware (one 
QCA9984) which behaves more like ath 10.2.4 firmware but is marked as 
ATH10K_HW_CC_WRAP_SHIFTED_EACH like the QCA4019.

So I have no idea how to fix this when QCA4019 and QCA9984 are currently 
marked the same but behave differently. Does somebody have a overview how the 
different HW versions should behave or is there some special bit in the data 
reported by the firmware which can be used to evaluate the expected behavior?

Kind regards,
	Sven
Kalle Valo Oct. 14, 2019, 8:57 a.m. UTC | #4
Sven Eckelmann <sven@narfation.org> writes:

> On Monday, 14 October 2019 00:15:20 CEST Sebastian Gottschall wrote:
>> i checked your patch on 10.4 based chipsets with 9984. the values are 
>> now looking bogus and wrong at all. busy and active time time in ms does 
>> increase in hours each second
>> the problem seem to be that your patch is 10.2.4 only related. 
>> ath_clean_survey does not trigger on 10.4 so the values double itself 
>> each time the event raises since you add the full values and not just a 
>> delta on top
>
> Thanks for the feedback. So we have now a firmware 10.2.4 which is counting 
> busy + active up and has wraparound problems. And then we have a 10.4 firmware 
> (on QCA9888 and QCA4019) which is clearing everything as expected with 
> WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR  and then we have some 10.4 firmware (one 
> QCA9984) which behaves more like ath 10.2.4 firmware but is marked as 
> ATH10K_HW_CC_WRAP_SHIFTED_EACH like the QCA4019.
>
> So I have no idea how to fix this when QCA4019 and QCA9984 are currently 
> marked the same but behave differently. Does somebody have a overview how the 
> different HW versions should behave or is there some special bit in the data 
> reported by the firmware which can be used to evaluate the expected behavior?

I hope there's an easy way to detect this behaviour change, but if
nothing else we could add a new bit to enum ath10k_fw_features. But of
course that's the last resort, maintaining the firmware features
bitfield accross different firmware branches is quite cumbersome.
Sebastian Gottschall Oct. 14, 2019, 9:32 a.m. UTC | #5
10.4 has additional 64 bit cycle counters. see wmi_pdev_bss_chan_info_event

the standard 32 cycle counters do individual wrap around as far as i know

Am 14.10.2019 um 09:07 schrieb Sven Eckelmann:
> On Monday, 14 October 2019 00:15:20 CEST Sebastian Gottschall wrote:
>> i checked your patch on 10.4 based chipsets with 9984. the values are
>> now looking bogus and wrong at all. busy and active time time in ms does
>> increase in hours each second
>> the problem seem to be that your patch is 10.2.4 only related.
>> ath_clean_survey does not trigger on 10.4 so the values double itself
>> each time the event raises since you add the full values and not just a
>> delta on top
> Thanks for the feedback. So we have now a firmware 10.2.4 which is 
> counting
> busy + active up and has wraparound problems. And then we have a 10.4 
> firmware
> (on QCA9888 and QCA4019) which is clearing everything as expected with
> WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR  and then we have some 10.4 
> firmware (one
> QCA9984) which behaves more like ath 10.2.4 firmware but is marked as
> ATH10K_HW_CC_WRAP_SHIFTED_EACH like the QCA4019.
>
> So I have no idea how to fix this when QCA4019 and QCA9984 are currently
> marked the same but behave differently. Does somebody have a overview 
> how the
> different HW versions should behave or is there some special bit in 
> the data
> reported by the firmware which can be used to evaluate the expected 
> behavior?
>
> Kind regards,
>     Sven