diff mbox series

[1/2] netdev: Add logging for CQM messages

Message ID 20220801100631.41605-1-mjohnson459@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show
Series [1/2] netdev: Add logging for CQM messages | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
prestwoj/iwd-alpine-ci-fetch success Fetch PR
prestwoj/iwd-ci-gitlint success GitLint
prestwoj/iwd-ci-fetch success Fetch PR
prestwoj/iwd-alpine-ci-makedistcheck success Make Distcheck
prestwoj/iwd-alpine-ci-build success Build - Configure
prestwoj/iwd-ci-makedistcheck success Make Distcheck
prestwoj/iwd-ci-build success Build - Configure
prestwoj/iwd-alpine-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-alpine-ci-makecheck success Make Check
prestwoj/iwd-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-ci-clang success clang PASS
prestwoj/iwd-ci-makecheck success Make Check
prestwoj/iwd-alpine-ci-incremental_build success Incremental Build with patches
prestwoj/iwd-ci-incremental_build success Incremental Build with patches
prestwoj/iwd-ci-testrunner success test-runner PASS

Commit Message

Michael Johnson Aug. 1, 2022, 10:06 a.m. UTC
Add extra logging around CQM events to help track wifi status.

The signal strength logging matches wpa_supplicant's which looks like
this:

CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
---
 src/netdev.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Denis Kenzior Aug. 1, 2022, 3:17 p.m. UTC | #1
Hi Michael,

On 8/1/22 05:06, Michael Johnson wrote:
> Add extra logging around CQM events to help track wifi status.
> 
> The signal strength logging matches wpa_supplicant's which looks like
> this:
> 
> CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
> ---
>   src/netdev.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 

<snip>

> @@ -1106,10 +1120,14 @@ static void netdev_cqm_event(struct l_genl_msg *msg, struct netdev *netdev)
>   	}
>   
>   	if (rssi_event) {
> -		if (rssi_val)
> +		if (rssi_val) {
> +			l_info("Signal change event (above=%d signal=%d)",

I'm not sure 'above=%d' is going to work.  rssi_event is only set on 
THRESHOLD_EVENT_LOW/THRESHOLD_EVENT_HIGH.  If RSSI_LEVEL is being reported, that 
pointer is still NULL.

> +							*rssi_event, *rssi_val);
>   			netdev_cqm_event_rssi_value(netdev, *rssi_val);

Hmm, I don't think wpa_s uses RSSI_LEVEL stuff in the kernel, does it?  The 
reason I ask is that we do have UIs that setup RSSI level reporting for 
displaying connection link quality (in number of bars for example).  Since such 
events are quite common, I think you're going to get quite a bit of spam in the 
logs.  Are you only interested in cases where the roaming threshold is reached? 
  If so, you may need to modify netdev_cqm_event_rssi_value() instead.

> -		else
> +		} else {
> +			l_info("Signal change event (above=%d)", *rssi_event);
>   			netdev_cqm_event_rssi_threshold(netdev, *rssi_event);
> +		}
>   	}
>   }
>   

Patch 2 looks fine btw, so I applied that one.

Regards,
-Denis
Michael Johnson Aug. 1, 2022, 4:18 p.m. UTC | #2
Hi Denis,

On Mon, 1 Aug 2022 at 16:27, Denis Kenzior <denkenz@gmail.com> wrote:

> > @@ -1106,10 +1120,14 @@ static void netdev_cqm_event(struct l_genl_msg *msg, struct netdev *netdev)
> >       }
> >
> >       if (rssi_event) {
> > -             if (rssi_val)
> > +             if (rssi_val) {
> > +                     l_info("Signal change event (above=%d signal=%d)",
>
> I'm not sure 'above=%d' is going to work.  rssi_event is only set on
> THRESHOLD_EVENT_LOW/THRESHOLD_EVENT_HIGH.  If RSSI_LEVEL is being reported, that
> pointer is still NULL.

Should the top level `if (rssi_event) {` not cover that case? I
believe it checks both
pointers before the info statement. I didn't add any output in the case of an
RSSI_LEVEL but no THRESHOLD_EVENT_* as I haven't seen this actually occurring
so assumed iwd didn't set up non-threshold events?

>
> > +                                                     *rssi_event, *rssi_val);
> >                       netdev_cqm_event_rssi_value(netdev, *rssi_val);
>
> Hmm, I don't think wpa_s uses RSSI_LEVEL stuff in the kernel, does it?  The
> reason I ask is that we do have UIs that setup RSSI level reporting for
> displaying connection link quality (in number of bars for example).  Since such
> events are quite common, I think you're going to get quite a bit of spam in the
> logs.  Are you only interested in cases where the roaming threshold is reached?
>   If so, you may need to modify netdev_cqm_event_rssi_value() instead.

Sorry, I should have included more reasoning behind the change.

In my use case the client is running headless and so the logs are
really the only
insight into why the network might have degraded to the point of causing issues.
For example, if the client stops receiving data, it is useful to know
what the rough
signal level is that caused it as that might imply a dead zone or
interference. All
debugging is done remotely as well so we can't ever debug the network while
it is currently in a bad state hence the log part.

The useful data isn't just the threshold but the general signal level
at the time of
an incident. If all is working then we would have roamed before the signal gets
bad enough  to cause issues (it follows that we don't really care about _good_
signal values so maybe we could reduce spam there?)

As for the general spamminess, does the UI RSSI reporting cause more events
than running normally? I believe I was only seeing events when the signal had
changed more than the hysteresis value in netdev_build_cmd_cqm_rssi_update
(5 dBm). This doesn't seem too frequent (for me at least) when the client is
mostly static. When it is moving it is generally useful information.

For wpa_supplicant, I'm not sure the exact mechanism it uses but it does report
almost identical output at a fairly high frequency if the client is moving e.g.
```
May 08 08:56:04 p3-2019 wpa_supplicant[842]: wlp2s0:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-62 noise=-93 txrate=6000
May 08 08:56:14 p3-2019 wpa_supplicant[842]: wlp2s0:
CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-76 noise=-93 txrate=6000
May 08 08:56:27 p3-2019 wpa_supplicant[842]: wlp2s0:
CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-79 noise=-94 txrate=6000
```

Finally, the reason I made this `info` level was in the hope that I could
remove the `-d` flag from iwd which does produce a lot of spam in the logs.
This isn't really feasible at the moment as there isn't enough `info` logging
to track down any issues that might occur on production systems where
changes are harder to make. If the `info` level becomes to verbose, maybe
iwd could set the default logging level to `warn` to reduce spam?

>
> > -             else
> > +             } else {
> > +                     l_info("Signal change event (above=%d)", *rssi_event);
> >                       netdev_cqm_event_rssi_threshold(netdev, *rssi_event);
> > +             }
> >       }
> >   }
> >
>
> Patch 2 looks fine btw, so I applied that one.
>
> Regards,
> -Denis

Regards,
Michael
Denis Kenzior Aug. 1, 2022, 5:14 p.m. UTC | #3
Hi Michael,

>> I'm not sure 'above=%d' is going to work.  rssi_event is only set on
>> THRESHOLD_EVENT_LOW/THRESHOLD_EVENT_HIGH.  If RSSI_LEVEL is being reported, that
>> pointer is still NULL.
> 
> Should the top level `if (rssi_event) {` not cover that case? I
> believe it checks both
> pointers before the info statement. I didn't add any output in the case of an
> RSSI_LEVEL but no THRESHOLD_EVENT_* as I haven't seen this actually occurring
> so assumed iwd didn't set up non-threshold events?

Ah you're right.  I guess the kernel does send the threshold event attribute as 
well.  I'm really not sure why actually?  This seems useless since it depends on 
knowing the currently used high/low thresholds, and only the kernel knows those.
>>
>>> +                                                     *rssi_event, *rssi_val);
>>>                        netdev_cqm_event_rssi_value(netdev, *rssi_val);
>>
>> Hmm, I don't think wpa_s uses RSSI_LEVEL stuff in the kernel, does it?  The
>> reason I ask is that we do have UIs that setup RSSI level reporting for
>> displaying connection link quality (in number of bars for example).  Since such
>> events are quite common, I think you're going to get quite a bit of spam in the
>> logs.  Are you only interested in cases where the roaming threshold is reached?
>>    If so, you may need to modify netdev_cqm_event_rssi_value() instead.
> 
> Sorry, I should have included more reasoning behind the change.
> 
> In my use case the client is running headless and so the logs are
> really the only
> insight into why the network might have degraded to the point of causing issues.
> For example, if the client stops receiving data, it is useful to know
> what the rough
> signal level is that caused it as that might imply a dead zone or
> interference. All
> debugging is done remotely as well so we can't ever debug the network while
> it is currently in a bad state hence the log part.

I wonder if we should report this over D-Bus instead?  I mean you can always 
setup your own SignalLevelAgent, or we could even add more reporting via 
StationDiagnostic interface?

> 
> The useful data isn't just the threshold but the general signal level
> at the time of
> an incident. If all is working then we would have roamed before the signal gets
> bad enough  to cause issues (it follows that we don't really care about _good_
> signal values so maybe we could reduce spam there?)
> 
> As for the general spamminess, does the UI RSSI reporting cause more events
> than running normally? I believe I was only seeing events when the signal had
> changed more than the hysteresis value in netdev_build_cmd_cqm_rssi_update
> (5 dBm). This doesn't seem too frequent (for me at least) when the client is
> mostly static. When it is moving it is generally useful information.

It would.  By default we program only the low signal threshold value.  So you 
should only see signal changes around that value.  If an rssi agent is used, 
then multiple thresholds are programmed, so you'd see rssi events even when the 
signal is very good.

> 
> For wpa_supplicant, I'm not sure the exact mechanism it uses but it does report
> almost identical output at a fairly high frequency if the client is moving e.g.
> ```
> May 08 08:56:04 p3-2019 wpa_supplicant[842]: wlp2s0:
> CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-62 noise=-93 txrate=6000
> May 08 08:56:14 p3-2019 wpa_supplicant[842]: wlp2s0:
> CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-76 noise=-93 txrate=6000
> May 08 08:56:27 p3-2019 wpa_supplicant[842]: wlp2s0:
> CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-79 noise=-94 txrate=6000
> ```
> 

Right, but I think it sets a single CQM threshold, not multiple like we (might) 
do.  Also, wpa_s is very spammy since it is really (only) meant for testing wifi 
implementations in a lab ;)

> Finally, the reason I made this `info` level was in the hope that I could
> remove the `-d` flag from iwd which does produce a lot of spam in the logs.
> This isn't really feasible at the moment as there isn't enough `info` logging
> to track down any issues that might occur on production systems where
> changes are harder to make. If the `info` level becomes to verbose, maybe
> iwd could set the default logging level to `warn` to reduce spam?

You can use a glob match for the -d flag.  Something like:
	#iwd -d '*Signal change event*'

So you can make the rssi messages as 'debug', and have iwd only print those you 
want.  Multiple globs are supported too, separate them by ':' or ','.

Regards,
-Denis
Michael Johnson Aug. 1, 2022, 6:09 p.m. UTC | #4
Hi Denis,

On Mon, 1 Aug 2022 at 18:24, Denis Kenzior <denkenz@gmail.com> wrote:
> I wonder if we should report this over D-Bus instead?  I mean you can always
> setup your own SignalLevelAgent, or we could even add more reporting via
> StationDiagnostic interface?

I actually do record the signal level separately but it's stored in an
annoying binary format so access is more painful :(. This was just to
make things easier and reduce the jump from wpa_supplicant to iwd for
anyone on plain ubuntu etc. I can understand why you don't want the
spam though.

> Right, but I think it sets a single CQM threshold, not multiple like we (might)
> do.  Also, wpa_s is very spammy since it is really (only) meant for testing wifi
> implementations in a lab ;)

haha fair. Of course, it's still the default on most linux systems ;).

> > changes are harder to make. If the `info` level becomes to verbose, maybe
> > iwd could set the default logging level to `warn` to reduce spam?
>
> You can use a glob match for the -d flag.  Something like:
>         #iwd -d '*Signal change event*'
>
> So you can make the rssi messages as 'debug', and have iwd only print those you
> want.  Multiple globs are supported too, separate them by ':' or ','.

That's a smart idea although as a user i feel like you only ever know
what messages are important after the fact.

>
> Regards,
> -Denis

What would you like to do with the patch? I'm happy to drop it if its
too spammy or we can lower the level to debug? Or make it only print
when the threshold changes.
Are you happy with the packet loss and beacon loss events or are they
too much as well?

Regards,
Michael
Denis Kenzior Aug. 1, 2022, 6:30 p.m. UTC | #5
Hi Michael,

On 8/1/22 13:09, Michael Johnson wrote:
> Hi Denis,
> 
> On Mon, 1 Aug 2022 at 18:24, Denis Kenzior <denkenz@gmail.com> wrote:
>> I wonder if we should report this over D-Bus instead?  I mean you can always
>> setup your own SignalLevelAgent, or we could even add more reporting via
>> StationDiagnostic interface?
> 
> I actually do record the signal level separately but it's stored in an
> annoying binary format so access is more painful :(. This was just to
> make things easier and reduce the jump from wpa_supplicant to iwd for
> anyone on plain ubuntu etc. I can understand why you don't want the
> spam though.

I don't mind the spam in debug mode, that's what it is for.  But not sure they 
really belong at info/warn levels.  I think we can consider putting in roaming 
related messages at the info level since that type of information might be 
generally useful.

> 
>> Right, but I think it sets a single CQM threshold, not multiple like we (might)
>> do.  Also, wpa_s is very spammy since it is really (only) meant for testing wifi
>> implementations in a lab ;)
> 
> haha fair. Of course, it's still the default on most linux systems ;).
> 

And we're working to change that.

> 
> What would you like to do with the patch? I'm happy to drop it if its
> too spammy or we can lower the level to debug? Or make it only print
> when the threshold changes.

I don't mind these at debug level, if you want to start with that.

> Are you happy with the packet loss and beacon loss events or are they
> too much as well?

As I mentioned before, kernel will generally disconnect us as soon as the beacon 
loss event happens (depends on the STA capability if I recall correctly).  Not 
sure how really useful this is, but since beacon loss goes via dmesg at 'info' 
level, I think it would be fine for iwd to do so as well.

The packet loss one I'm not sure.  Certaintly not 'warn'.  I think I'd start 
with 'debug' since this event isn't being reacted to (yet).

Regards,
-Denis
diff mbox series

Patch

diff --git a/src/netdev.c b/src/netdev.c
index 5a6a7b70..f9acd639 100644
--- a/src/netdev.c
+++ b/src/netdev.c
@@ -1092,12 +1092,26 @@  static void netdev_cqm_event(struct l_genl_msg *msg, struct netdev *netdev)
 					rssi_event = (uint32_t *) data;
 					break;
 
+				case NL80211_ATTR_CQM_PKT_LOSS_EVENT:
+					if (len != 4)
+						continue;
+
+					l_warn("Packets lost event: %d",
+							*(uint32_t *) data);
+					break;
+
+				case NL80211_ATTR_CQM_BEACON_LOSS_EVENT:
+					l_warn("Beacon lost event");
+					break;
+
 				case NL80211_ATTR_CQM_RSSI_LEVEL:
 					if (len != 4)
 						continue;
 
 					rssi_val = (int32_t *) data;
 					break;
+				default:
+					l_debug("Unknown CQM event: %d", type);
 				}
 			}
 
@@ -1106,10 +1120,14 @@  static void netdev_cqm_event(struct l_genl_msg *msg, struct netdev *netdev)
 	}
 
 	if (rssi_event) {
-		if (rssi_val)
+		if (rssi_val) {
+			l_info("Signal change event (above=%d signal=%d)",
+							*rssi_event, *rssi_val);
 			netdev_cqm_event_rssi_value(netdev, *rssi_val);
-		else
+		} else {
+			l_info("Signal change event (above=%d)", *rssi_event);
 			netdev_cqm_event_rssi_threshold(netdev, *rssi_event);
+		}
 	}
 }