diff mbox series

[v2,2/2] netcons: Add udp send fail statistics to netconsole

Message ID 20240828214524.1867954-2-max@kutsevol.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,1/2] netpoll: Make netpoll_send_udp return status instead of void | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Maksym Kutsevol Aug. 28, 2024, 9:33 p.m. UTC
Enhance observability of netconsole. Packet sends can fail.
Start tracking at least two failure possibilities: ENOMEM and
NET_XMIT_DROP for every target. Stats are exposed via an additional
attribute in CONFIGFS.

The exposed statistics allows easier debugging of cases when netconsole
messages were not seen by receivers, eliminating the guesswork if the
sender thinks that messages in question were sent out.

Stats are not reset on enable/disable/change remote ip/etc, they
belong to the netcons target itself.

Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/
Signed-off-by: Maksym Kutsevol <max@kutsevol.com>
---
Changelog:

v2:
 * fixed commit message wording and reported-by reference.
 * not hiding netconsole_target_stats when CONFIG_NETCONSOLE_DYNAMIC
   is not enabled.
 * rename stats attribute in configfs to transmit_errors and make it
   a single u64 value, which is a sum of errors that occured.
 * make a wrapper function to count errors instead of a return result
   classifier one.
 * use u64_stats_sync.h to manage stats.

v1:
 * https://lore.kernel.org/netdev/20240824215130.2134153-2-max@kutsevol.com/
 Documentation/networking/netconsole.rst |  5 +-
 drivers/net/netconsole.c                | 63 +++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 5 deletions(-)

Comments

Breno Leitao Aug. 30, 2024, 8:45 a.m. UTC | #1
Hello Maksym,

On Wed, Aug 28, 2024 at 02:33:49PM -0700, Maksym Kutsevol wrote:
> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index 9c09293b5258..e14b13a8e0d2 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -36,6 +36,7 @@
>  #include <linux/inet.h>
>  #include <linux/configfs.h>
>  #include <linux/etherdevice.h>
> +#include <linux/u64_stats_sync.h>
>  #include <linux/utsname.h>
>  
>  MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");

I am afraid that you are not build the patch against net-next, since
this line was changed a while ago.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=10a6545f0bdc

Please develop on top of net-next, otherwise the patch might not apply
on top of net-next.

> +/**
> + * netpoll_send_udp_count_errs - Wrapper for netpoll_send_udp that counts errors
> + * @nt: target to send message to
> + * @msg: message to send
> + * @len: length of message
> + *
> + * Calls netpoll_send_udp and classifies the return value. If an error
> + * occurred it increments statistics in nt->stats accordingly.
> + * Noop if CONFIG_NETCONSOLE_DYNAMIC is disabled.
> + */
> +// static void netpoll_send_udp_count_errs(struct netpoll *np, const char *msg, int len)

Have you forgot to remove the line above?

> +static void netpoll_send_udp_count_errs(struct netconsole_target *nt, const char *msg, int len)
> +{
> +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> +	int result = netpoll_send_udp(&nt->np, msg, len);
> +	result = NET_XMIT_DROP;

Could you please clarify why do you want to overwrite `result` here with
NET_XMIT_DROP? It seems wrong.

> +	if (result == NET_XMIT_DROP) {
> +		u64_stats_update_begin(&nt->stats.syncp);
> +		u64_stats_inc(&nt->stats.xmit_drop_count);
> +		u64_stats_update_end(&nt->stats.syncp);
> +	} else if (result == -ENOMEM) {
> +		u64_stats_update_begin(&nt->stats.syncp);
> +		u64_stats_inc(&nt->stats.enomem_count);
> +		u64_stats_update_end(&nt->stats.syncp);
> +	};
> +#else
> +	netpoll_send_udp(&nt->np, msg, len);
> +#endif

I am not sure this if/else/endif is the best way. I am wondering if
something like this would make the code simpler (uncompiled/untested):

static void netpoll_send_udp_count_errs(struct netconsole_target *nt, const char *msg, int len)
{
        int __maybe_unused result;

        result = netpoll_send_udp(&nt->np, msg, len);
#ifdef CONFIG_NETCONSOLE_DYNAMIC
	switch (result) {
	case NET_XMIT_DROP:
                u64_stats_update_begin(&nt->stats.syncp);
                u64_stats_inc(&nt->stats.xmit_drop_count);
                u64_stats_update_end(&nt->stats.syncp);
		breadk;
        case ENOMEM:
                u64_stats_update_begin(&nt->stats.syncp);
                u64_stats_inc(&nt->stats.enomem_count);
                u64_stats_update_end(&nt->stats.syncp);
		break;
        };
#endif

Thanks for working on it.
--breno
Maksym Kutsevol Aug. 30, 2024, 12:58 p.m. UTC | #2
Hello Breno,


On Fri, Aug 30, 2024 at 4:45 AM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Maksym,
>
> On Wed, Aug 28, 2024 at 02:33:49PM -0700, Maksym Kutsevol wrote:
> > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> > index 9c09293b5258..e14b13a8e0d2 100644
> > --- a/drivers/net/netconsole.c
> > +++ b/drivers/net/netconsole.c
> > @@ -36,6 +36,7 @@
> >  #include <linux/inet.h>
> >  #include <linux/configfs.h>
> >  #include <linux/etherdevice.h>
> > +#include <linux/u64_stats_sync.h>
> >  #include <linux/utsname.h>
> >
> >  MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");
>
> I am afraid that you are not build the patch against net-next, since
> this line was changed a while ago.
Yes, that's true. Jacub sent me the link to the net-tree specific
contribution doc, I also found
that. Will fix it in the next set.

> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=10a6545f0bdc
>
> Please develop on top of net-next, otherwise the patch might not apply
> on top of net-next.
>
> > +/**
> > + * netpoll_send_udp_count_errs - Wrapper for netpoll_send_udp that counts errors
> > + * @nt: target to send message to
> > + * @msg: message to send
> > + * @len: length of message
> > + *
> > + * Calls netpoll_send_udp and classifies the return value. If an error
> > + * occurred it increments statistics in nt->stats accordingly.
> > + * Noop if CONFIG_NETCONSOLE_DYNAMIC is disabled.
> > + */
> > +// static void netpoll_send_udp_count_errs(struct netpoll *np, const char *msg, int len)
>
> Have you forgot to remove the line above?
Yes, thank you.

> > +static void netpoll_send_udp_count_errs(struct netconsole_target *nt, const char *msg, int len)
> > +{
> > +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> > +     int result = netpoll_send_udp(&nt->np, msg, len);
> > +     result = NET_XMIT_DROP;
>
> Could you please clarify why do you want to overwrite `result` here with
> NET_XMIT_DROP? It seems wrong.
Unfortunately I sent this patch with my debugging addons, this is plainly wrong.
Will remove.

> > +     if (result == NET_XMIT_DROP) {
> > +             u64_stats_update_begin(&nt->stats.syncp);
> > +             u64_stats_inc(&nt->stats.xmit_drop_count);
> > +             u64_stats_update_end(&nt->stats.syncp);
> > +     } else if (result == -ENOMEM) {
> > +             u64_stats_update_begin(&nt->stats.syncp);
> > +             u64_stats_inc(&nt->stats.enomem_count);
> > +             u64_stats_update_end(&nt->stats.syncp);
> > +     };
> > +#else
> > +     netpoll_send_udp(&nt->np, msg, len);
> > +#endif
>
> I am not sure this if/else/endif is the best way. I am wondering if
> something like this would make the code simpler (uncompiled/untested):
Two calls in two different places to netpoll_send_udp bothers you or
the way it has to distinct cases for enabled/disabled and you prefer to
have it as added steps for the case when it's enabled?


> static void netpoll_send_udp_count_errs(struct netconsole_target *nt, const char *msg, int len)
> {
>         int __maybe_unused result;
>
>         result = netpoll_send_udp(&nt->np, msg, len);
> #ifdef CONFIG_NETCONSOLE_DYNAMIC
>         switch (result) {
>         case NET_XMIT_DROP:
>                 u64_stats_update_begin(&nt->stats.syncp);
>                 u64_stats_inc(&nt->stats.xmit_drop_count);
>                 u64_stats_update_end(&nt->stats.syncp);
>                 breadk;
>         case ENOMEM:
>                 u64_stats_update_begin(&nt->stats.syncp);
>                 u64_stats_inc(&nt->stats.enomem_count);
>                 u64_stats_update_end(&nt->stats.syncp);
>                 break;
>         };
> #endif
>
> Thanks for working on it.
> --breno
Breno Leitao Aug. 30, 2024, 2:12 p.m. UTC | #3
Hello Maksym,

On Fri, Aug 30, 2024 at 08:58:12AM -0400, Maksym Kutsevol wrote:

> > I am not sure this if/else/endif is the best way. I am wondering if
> > something like this would make the code simpler (uncompiled/untested):

> Two calls in two different places to netpoll_send_udp bothers you or
> the way it has to distinct cases for enabled/disabled and you prefer to
> have it as added steps for the case when it's enabled?

I would say both. I try to reduce as much as possible the number of
similar calls and #else(s) is the goal.

At the same time, I admit it is easier said than done, and Jakub is
usually the one that helps me to reach the last mile.
Maksym Kutsevol Aug. 30, 2024, 3:37 p.m. UTC | #4
Hello Breno,

On Fri, Aug 30, 2024 at 10:12 AM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Maksym,
>
> On Fri, Aug 30, 2024 at 08:58:12AM -0400, Maksym Kutsevol wrote:
>
> > > I am not sure this if/else/endif is the best way. I am wondering if
> > > something like this would make the code simpler (uncompiled/untested):
>
> > Two calls in two different places to netpoll_send_udp bothers you or
> > the way it has to distinct cases for enabled/disabled and you prefer to
> > have it as added steps for the case when it's enabled?
>
> I would say both. I try to reduce as much as possible the number of
> similar calls and #else(s) is the goal.
>
> At the same time, I admit it is easier said than done, and Jakub is
> usually the one that helps me to reach the last mile.
I see, ok.
I was more looking for possible (maybe impossible) compiler
optimizations in this case.

but access to nt->np probably makes it impossible anyway, I don't know
compilers well
enough to say.

I'll make it an if then, e.g.
if (IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC)){

}
diff mbox series

Patch

diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index d55c2a22ec7a..94c4680fdf3e 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -124,7 +124,7 @@  To remove a target::
 
 The interface exposes these parameters of a netconsole target to userspace:
 
-	==============  =================================       ============
+	=============== =================================       ============
 	enabled		Is this target currently enabled?	(read-write)
 	extended	Extended mode enabled			(read-write)
 	release		Prepend kernel release to message	(read-write)
@@ -135,7 +135,8 @@  The interface exposes these parameters of a netconsole target to userspace:
 	remote_ip	Remote agent's IP address		(read-write)
 	local_mac	Local interface's MAC address		(read-only)
 	remote_mac	Remote agent's MAC address		(read-write)
-	==============  =================================       ============
+	transmit_errors	Number of packet send errors		(read-only)
+	=============== =================================       ============
 
 The "enabled" attribute is also used to control whether the parameters of
 a target can be updated or not -- you can modify the parameters of only
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 9c09293b5258..e14b13a8e0d2 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -36,6 +36,7 @@ 
 #include <linux/inet.h>
 #include <linux/configfs.h>
 #include <linux/etherdevice.h>
+#include <linux/u64_stats_sync.h>
 #include <linux/utsname.h>
 
 MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");
@@ -82,6 +83,12 @@  static DEFINE_SPINLOCK(target_list_lock);
  */
 static struct console netconsole_ext;
 
+struct netconsole_target_stats  {
+	u64_stats_t xmit_drop_count;
+	u64_stats_t enomem_count;
+	struct u64_stats_sync syncp;
+} __aligned(2 * sizeof(u64));
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -89,6 +96,7 @@  static struct console netconsole_ext;
  * @userdata_group:	Links to the userdata configfs hierarchy
  * @userdata_complete:	Cached, formatted string of append
  * @userdata_length:	String length of userdata_complete
+ * @stats:	Packet send stats for the target. Used for debugging.
  * @enabled:	On / off knob to enable / disable target.
  *		Visible from userspace (read-write).
  *		We maintain a strict 1:1 correspondence between this and
@@ -115,6 +123,7 @@  struct netconsole_target {
 	struct config_group	userdata_group;
 	char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS];
 	size_t			userdata_length;
+	struct netconsole_target_stats stats;
 #endif
 	bool			enabled;
 	bool			extended;
@@ -227,6 +236,7 @@  static struct netconsole_target *alloc_and_init(void)
  *				|	remote_ip
  *				|	local_mac
  *				|	remote_mac
+ *				|	transmit_errors
  *				|	userdata/
  *				|		<key>/
  *				|			value
@@ -323,6 +333,21 @@  static ssize_t remote_mac_show(struct config_item *item, char *buf)
 	return sysfs_emit(buf, "%pM\n", to_target(item)->np.remote_mac);
 }
 
+static ssize_t transmit_errors_show(struct config_item *item, char *buf)
+{
+	struct netconsole_target *nt = to_target(item);
+	u64 xmit_drop_count, enomem_count;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin(&nt->stats.syncp);
+		xmit_drop_count = u64_stats_read(&nt->stats.xmit_drop_count);
+		enomem_count = u64_stats_read(&nt->stats.enomem_count);
+	} while (u64_stats_fetch_retry(&nt->stats.syncp, start));
+
+	return sysfs_emit(buf, "%llu\n", xmit_drop_count + enomem_count);
+}
+
 /*
  * This one is special -- targets created through the configfs interface
  * are not enabled (and the corresponding netpoll activated) by default.
@@ -795,6 +820,7 @@  CONFIGFS_ATTR(, remote_ip);
 CONFIGFS_ATTR_RO(, local_mac);
 CONFIGFS_ATTR(, remote_mac);
 CONFIGFS_ATTR(, release);
+CONFIGFS_ATTR_RO(, transmit_errors);
 
 static struct configfs_attribute *netconsole_target_attrs[] = {
 	&attr_enabled,
@@ -807,6 +833,7 @@  static struct configfs_attribute *netconsole_target_attrs[] = {
 	&attr_remote_ip,
 	&attr_local_mac,
 	&attr_remote_mac,
+	&attr_transmit_errors,
 	NULL,
 };
 
@@ -1015,6 +1042,36 @@  static struct notifier_block netconsole_netdev_notifier = {
 	.notifier_call  = netconsole_netdev_event,
 };
 
+/**
+ * netpoll_send_udp_count_errs - Wrapper for netpoll_send_udp that counts errors
+ * @nt: target to send message to
+ * @msg: message to send
+ * @len: length of message
+ *
+ * Calls netpoll_send_udp and classifies the return value. If an error
+ * occurred it increments statistics in nt->stats accordingly.
+ * Noop if CONFIG_NETCONSOLE_DYNAMIC is disabled.
+ */
+// static void netpoll_send_udp_count_errs(struct netpoll *np, const char *msg, int len)
+static void netpoll_send_udp_count_errs(struct netconsole_target *nt, const char *msg, int len)
+{
+#ifdef CONFIG_NETCONSOLE_DYNAMIC
+	int result = netpoll_send_udp(&nt->np, msg, len);
+	result = NET_XMIT_DROP;
+	if (result == NET_XMIT_DROP) {
+		u64_stats_update_begin(&nt->stats.syncp);
+		u64_stats_inc(&nt->stats.xmit_drop_count);
+		u64_stats_update_end(&nt->stats.syncp);
+	} else if (result == -ENOMEM) {
+		u64_stats_update_begin(&nt->stats.syncp);
+		u64_stats_inc(&nt->stats.enomem_count);
+		u64_stats_update_end(&nt->stats.syncp);
+	};
+#else
+	netpoll_send_udp(&nt->np, msg, len);
+#endif
+}
+
 /**
  * send_ext_msg_udp - send extended log message to target
  * @nt: target to send message to
@@ -1063,7 +1120,7 @@  static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 					     "%s", userdata);
 
 		msg_ready = buf;
-		netpoll_send_udp(&nt->np, msg_ready, msg_len);
+		netpoll_send_udp_count_errs(nt, msg_ready, msg_len);
 		return;
 	}
 
@@ -1126,7 +1183,7 @@  static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 			this_offset += this_chunk;
 		}
 
-		netpoll_send_udp(&nt->np, buf, this_header + this_offset);
+		netpoll_send_udp_count_errs(nt, buf, this_header + this_offset);
 		offset += this_offset;
 	}
 }
@@ -1172,7 +1229,7 @@  static void write_msg(struct console *con, const char *msg, unsigned int len)
 			tmp = msg;
 			for (left = len; left;) {
 				frag = min(left, MAX_PRINT_CHUNK);
-				netpoll_send_udp(&nt->np, tmp, frag);
+				netpoll_send_udp_count_errs(nt, tmp, frag);
 				tmp += frag;
 				left -= frag;
 			}