diff mbox series

[2/2] netcons: Add udp send fail statistics to netconsole

Message ID 20240824215130.2134153-2-max@kutsevol.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [1/2] netpoll: Make netpoll_send_udp return status instead of void | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 16 this patch: 18
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: corbet@lwn.net linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 127 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Maksym Kutsevol Aug. 24, 2024, 9:50 p.m. UTC
Enhance observability of netconsole. UDP sends can fail. Start tracking at
least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target.
Stats are exposed via an additional attribute in CONFIGFS.

The exposed statistics allows easier debugging of cases when netconsole
messages were not seen by receivers, eliminating the guesswork if the
sender thinks that messages in question were sent out.

Stats are not reset on enable/disable/change remote ip/etc, they
belong to the netcons target itself.

Signed-off-by: Maksym Kutsevol <max@kutsevol.com>
---
 Documentation/networking/netconsole.rst |  1 +
 drivers/net/netconsole.c                | 54 +++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski Aug. 26, 2024, 9:35 p.m. UTC | #1
On Sat, 24 Aug 2024 14:50:24 -0700 Maksym Kutsevol wrote:
> Enhance observability of netconsole. UDP sends can fail. Start tracking at

nit: "UDP sends" sounds a bit too much like it's using sockets
maybe "packet sends"?

> least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target.
> Stats are exposed via an additional attribute in CONFIGFS.

Please provide a reference to another configfs user in the kernel which
exposes stats. To help reviewers validate it's a legit use case.

> +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> +struct netconsole_target_stats  {
> +	size_t xmit_drop_count;
> +	size_t enomem_count;
> +};
> +#endif

Don't hide types under ifdefs
In fact I'm not sure if hiding stats if DYNAMIC isn't enabled makes
sense. They don't take up much space.

> +static ssize_t stats_show(struct config_item *item, char *buf)
> +{
> +	struct netconsole_target *nt = to_target(item);
> +
> +	return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
> +		nt->stats.xmit_drop_count, nt->stats.enomem_count);

does configfs require value per file like sysfs or this is okay?

>  /**
>   * send_ext_msg_udp - send extended log message to target
>   * @nt: target to send message to
> @@ -1063,7 +1102,9 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
>  					     "%s", userdata);
>  
>  		msg_ready = buf;
> -		netpoll_send_udp(&nt->np, msg_ready, msg_len);
> +		count_udp_send_stats(nt, netpoll_send_udp(&nt->np,
> +							  msg_ready,
> +							  msg_len));

Please add a wrapper which calls netpoll_send_udp() and counts the
stats. This sort of nested function calls are unlikely to meet kernel
coding requirements.
Maksym Kutsevol Aug. 26, 2024, 11:55 p.m. UTC | #2
Hey Jakub,
thank you for your time looking into this.

PS. Sorry for the html message, noob mistake.

On Mon, Aug 26, 2024 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 24 Aug 2024 14:50:24 -0700 Maksym Kutsevol wrote:
> > Enhance observability of netconsole. UDP sends can fail. Start tracking at
>
> nit: "UDP sends" sounds a bit too much like it's using sockets
> maybe "packet sends"?


Sure, it makes sense, I will update it.

>
> > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target.
> > Stats are exposed via an additional attribute in CONFIGFS.
>
> Please provide a reference to another configfs user in the kernel which
> exposes stats. To help reviewers validate it's a legit use case.
>
doc/Documentation/block/stat.txt describes what stats block devices expose.
The idea is the same there - a single read gives a coherent snapshot of stats.
Did you mean that the commit message needs updating or info provided
here is enough?

> > +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> > +struct netconsole_target_stats  {
> > +     size_t xmit_drop_count;
> > +     size_t enomem_count;
> > +};
> > +#endif
>
> Don't hide types under ifdefs
> In fact I'm not sure if hiding stats if DYNAMIC isn't enabled makes
> sense. They don't take up much space.
>
I'll remove the ifdef.

Without _DYNAMIC it will not create the sysfs config group, so there
will be no place to expose the stats from,
hence the attachment to dynamic config.
The reason to hide the type comes from the same idea. It's not about
saving space, but about the fact that
it can't exist without it the way it's currently implemented. There's
no way to expose those metrics if _DYNAMIC
is not enabled.

> > +static ssize_t stats_show(struct config_item *item, char *buf)
> > +{
> > +     struct netconsole_target *nt = to_target(item);
> > +
> > +     return
> > +             nt->stats.xmit_drop_count, nt->stats.enomem_count);
>
> does configfs require value per file like sysfs or this is okay?


Docs say (Documentation/filesystems/sysfs.txt):

Attributes should be ASCII text files, preferably with only one value
per file. It is noted that it may not be efficient to contain only one
value per file, so it is socially acceptable to express an array of
values of the same type.

Given those are of the same type, I thought it's ok. To make it less
"fancy" maybe move to
just values separated by whitespace + a block in
Documentation/networking/netconsole.rst describing the format?
E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have
multiple files for it.
What do you think?

>
>
> >  /**
> >   * send_ext_msg_udp - send extended log message to target
> >   * @nt: target to send message to
> > @@ -1063,7 +1102,9 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
> >                                            "%s", userdata);
> >
> >               msg_ready = buf;
> > -             netpoll_send_udp(&nt->np, msg_ready, msg_len);
> > +             count_udp_send_stats(nt, netpoll_send_udp(&nt->np,
> > +                                                       msg_ready,
> > +                                                       msg_len));
>
> Please add a wrapper which calls netpoll_send_udp() and counts the
> stats. This sort of nested function calls are unlikely to meet kernel
> coding requirements.


I see, will do. Noob question -  I couldn't find any written guidance
regarding this, if you have it handy - I'd appreciate
a link to some guidance regarding passing the result of a function to
a classifier vs wrapper function.

> --
> pw-bot: cr
kernel test robot Aug. 27, 2024, 6:32 a.m. UTC | #3
Hi Maksym,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 8af174ea863c72f25ce31cee3baad8a301c0cf0f]

url:    https://github.com/intel-lab-lkp/linux/commits/Maksym-Kutsevol/netcons-Add-udp-send-fail-statistics-to-netconsole/20240826-163850
base:   8af174ea863c72f25ce31cee3baad8a301c0cf0f
patch link:    https://lore.kernel.org/r/20240824215130.2134153-2-max%40kutsevol.com
patch subject: [PATCH 2/2] netcons: Add udp send fail statistics to netconsole
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20240827/202408271419.3JJwwxE7-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 08e5a1de8227512d4774a534b91cb2353cef6284)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408271419.3JJwwxE7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408271419.3JJwwxE7-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/net/netconsole.c:27:
   In file included from include/linux/mm.h:2228:
   include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     517 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/net/netconsole.c:35:
   In file included from include/linux/netpoll.h:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/hexagon/include/asm/io.h:328:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     548 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from drivers/net/netconsole.c:35:
   In file included from include/linux/netpoll.h:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/hexagon/include/asm/io.h:328:
   include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from drivers/net/netconsole.c:35:
   In file included from include/linux/netpoll.h:11:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:43:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/hexagon/include/asm/io.h:328:
   include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     585 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
>> drivers/net/netconsole.c:341:3: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
     340 |         return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
         |                                            ~~~
         |                                            %zu
     341 |                 nt->stats.xmit_drop_count, nt->stats.enomem_count);
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/netconsole.c:341:30: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
     340 |         return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
         |                                                        ~~~
         |                                                        %zu
     341 |                 nt->stats.xmit_drop_count, nt->stats.enomem_count);
         |                                            ^~~~~~~~~~~~~~~~~~~~~~
   9 warnings generated.


vim +341 drivers/net/netconsole.c

   335	
   336	static ssize_t stats_show(struct config_item *item, char *buf)
   337	{
   338		struct netconsole_target *nt = to_target(item);
   339	
   340		return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
 > 341			nt->stats.xmit_drop_count, nt->stats.enomem_count);
   342	}
   343
kernel test robot Aug. 27, 2024, 9:36 a.m. UTC | #4
Hi Maksym,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 8af174ea863c72f25ce31cee3baad8a301c0cf0f]

url:    https://github.com/intel-lab-lkp/linux/commits/Maksym-Kutsevol/netcons-Add-udp-send-fail-statistics-to-netconsole/20240826-163850
base:   8af174ea863c72f25ce31cee3baad8a301c0cf0f
patch link:    https://lore.kernel.org/r/20240824215130.2134153-2-max%40kutsevol.com
patch subject: [PATCH 2/2] netcons: Add udp send fail statistics to netconsole
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20240827/202408271711.RhzKTDRD-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408271711.RhzKTDRD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408271711.RhzKTDRD-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/netconsole.c: In function 'stats_show':
>> drivers/net/netconsole.c:340:46: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
     340 |         return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
         |                                            ~~^
         |                                              |
         |                                              long unsigned int
         |                                            %u
     341 |                 nt->stats.xmit_drop_count, nt->stats.enomem_count);
         |                 ~~~~~~~~~~~~~~~~~~~~~~~~~     
         |                          |
         |                          size_t {aka unsigned int}
   drivers/net/netconsole.c:340:58: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
     340 |         return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
         |                                                        ~~^
         |                                                          |
         |                                                          long unsigned int
         |                                                        %u
     341 |                 nt->stats.xmit_drop_count, nt->stats.enomem_count);
         |                                            ~~~~~~~~~~~~~~~~~~~~~~
         |                                                     |
         |                                                     size_t {aka unsigned int}


vim +340 drivers/net/netconsole.c

   335	
   336	static ssize_t stats_show(struct config_item *item, char *buf)
   337	{
   338		struct netconsole_target *nt = to_target(item);
   339	
 > 340		return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
   341			nt->stats.xmit_drop_count, nt->stats.enomem_count);
   342	}
   343
Breno Leitao Aug. 27, 2024, 12:18 p.m. UTC | #5
On Sat, Aug 24, 2024 at 02:50:24PM -0700, Maksym Kutsevol wrote:
> Enhance observability of netconsole. UDP sends can fail. Start tracking at
> least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target.
> Stats are exposed via an additional attribute in CONFIGFS.
> 
> The exposed statistics allows easier debugging of cases when netconsole
> messages were not seen by receivers, eliminating the guesswork if the
> sender thinks that messages in question were sent out.
> 
> Stats are not reset on enable/disable/change remote ip/etc, they
> belong to the netcons target itself.
> 
> Signed-off-by: Maksym Kutsevol <max@kutsevol.com>

Would you mind adding a "Reported-by" me in this case?

https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/

> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index 9c09293b5258..45c07ec7842d 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock);
>   */
>  static struct console netconsole_ext;
>  
> +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> +struct netconsole_target_stats  {
> +	size_t xmit_drop_count;
> +	size_t enomem_count;

I am looking at other drivers, and they use a specific type for these
counters, u64_stats_sync.

if it is possible to use this format, then you can leverage the
`__u64_stats_update` helpers, and not worry about locking/overflow!?

> @@ -1015,6 +1035,25 @@ static struct notifier_block netconsole_netdev_notifier = {
>  	.notifier_call  = netconsole_netdev_event,
>  };
>  
> +/**
> + * count_udp_send_stats - Classify netpoll_send_udp result and count errors.
> + * @nt: target that was sent to
> + * @result: result of netpoll_send_udp
> + *
> + * Takes the result of netpoll_send_udp and classifies the type of error that
> + * occurred. Increments statistics in nt->stats accordingly.
> + */
> +static void count_udp_send_stats(struct netconsole_target *nt, int result)
> +{
> +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> +	if (result == NET_XMIT_DROP) {
> +		nt->stats.xmit_drop_count++;

If you change the type, you can use the `u64_stats_inc` helper here.

> @@ -1126,7 +1167,11 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
>  			this_offset += this_chunk;
>  		}
>  
> -		netpoll_send_udp(&nt->np, buf, this_header + this_offset);
> +		count_udp_send_stats(nt,
> +				     netpoll_send_udp(&nt->np,
> +						      buf,
> +						      this_header + this_offset)
> +		);

as Jakub said, this is not a format that is common in the Linux kenrel.
Jakub Kicinski Aug. 27, 2024, 1:59 p.m. UTC | #6
On Mon, 26 Aug 2024 19:55:36 -0400 Maksym Kutsevol wrote:
> > > +static ssize_t stats_show(struct config_item *item, char *buf)
> > > +{
> > > +     struct netconsole_target *nt = to_target(item);
> > > +
> > > +     return
> > > +             nt->stats.xmit_drop_count, nt->stats.enomem_count);  
> >
> > does configfs require value per file like sysfs or this is okay?  
> 
> Docs say (Documentation/filesystems/sysfs.txt):
> 
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.

Right, but this is for sysfs, main question is whether configfs has 
the same expectations.

> Given those are of the same type, I thought it's ok. To make it less
> "fancy" maybe move to
> just values separated by whitespace + a block in
> Documentation/networking/netconsole.rst describing the format?
> E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have
> multiple files for it.
> What do you think?

Stats as an array are quite hard to read / understand
Maksym Kutsevol Aug. 28, 2024, 3:03 p.m. UTC | #7
Hey Jakub,
thanks for looking into this.

PS. A couple more email send mistakes and I'll go install mutt, sorry
for the noise :)

On Tue, Aug 27, 2024 at 9:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 26 Aug 2024 19:55:36 -0400 Maksym Kutsevol wrote:
> > > > +static ssize_t stats_show(struct config_item *item, char *buf)
> > > > +{
> > > > +     struct netconsole_target *nt = to_target(item);
> > > > +
> > > > +     return
> > > > +             nt->stats.xmit_drop_count, nt->stats.enomem_count);
> > >
> > > does configfs require value per file like sysfs or this is okay?
> >
> > Docs say (Documentation/filesystems/sysfs.txt):
> >
> > Attributes should be ASCII text files, preferably with only one value
> > per file. It is noted that it may not be efficient to contain only one
> > value per file, so it is socially acceptable to express an array of
> > values of the same type.
>
> Right, but this is for sysfs, main question is whether configfs has
> the same expectations.
Eh, my bad, thank you :)

Docs on configfs (Documentation/filesystems/configfs.rst) say approximately
the same, quote:
* Normal attributes, which similar to sysfs attributes, are small ASCII text
  files, with a maximum size of one page (PAGE_SIZE, 4096 on i386).  Preferably
  only one value per file should be used, and the same caveats from sysfs apply.
  Configfs expects write(2) to store the entire buffer at once.  When writing to
  normal configfs attributes, userspace processes should first read the entire
  file, modify the portions they wish to change, and then write the entire
  buffer back.

so based on sysfs+configfs docs it looks ok to do so. What do you think?

Regarding the overall idea of exposing stats via configfs I found this:
https://github.com/torvalds/linux/blob/master/drivers/target/iscsi/iscsi_target_stat.c#L82-L87
as an example of another place doing it, which exposes the number of
active sessions.

> > Given those are of the same type, I thought it's ok. To make it less
> > "fancy" maybe move to
> > just values separated by whitespace + a block in
> > Documentation/networking/netconsole.rst describing the format?
> > E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have
> > multiple files for it.
> > What do you think?
>
> Stats as an array are quite hard to read / understand
I agree with that.
I couldn't find examples of multiple values exported as stats from
configfs. Only from sysfs,
e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which
describes a whitespace
separated file with stats.

I want to lean on the opinion of someone more experienced in kernel
dev on how to proceed here.
- as is
- whitespace separated like blockdev stats
- multiple files and stop talking about it? :)
Maksym Kutsevol Aug. 28, 2024, 3:04 p.m. UTC | #8
Hey Breno,
thanks for looking at it again :)


On Tue, Aug 27, 2024 at 8:18 AM Breno Leitao <leitao@debian.org> wrote:
>
> On Sat, Aug 24, 2024 at 02:50:24PM -0700, Maksym Kutsevol wrote:
> > Enhance observability of netconsole. UDP sends can fail. Start tracking at
> > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target.
> > Stats are exposed via an additional attribute in CONFIGFS.
> >
> > The exposed statistics allows easier debugging of cases when netconsole
> > messages were not seen by receivers, eliminating the guesswork if the
> > sender thinks that messages in question were sent out.
> >
> > Stats are not reset on enable/disable/change remote ip/etc, they
> > belong to the netcons target itself.
> >
> > Signed-off-by: Maksym Kutsevol <max@kutsevol.com>
>
> Would you mind adding a "Reported-by" me in this case?
>
> https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/

Absolutely!

> > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> > index 9c09293b5258..45c07ec7842d 100644
> > --- a/drivers/net/netconsole.c
> > +++ b/drivers/net/netconsole.c
> > @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock);
> >   */
> >  static struct console netconsole_ext;
> >
> > +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> > +struct netconsole_target_stats  {
> > +     size_t xmit_drop_count;
> > +     size_t enomem_count;
>
> I am looking at other drivers, and they use a specific type for these
> counters, u64_stats_sync.
> if it is possible to use this format, then you can leverage the
> `__u64_stats_update` helpers, and not worry about locking/overflow!?
>
Do you think that these counters really need more than an int?
Switching them to unsigned int instead might be better?
I'd argue that at the point when an external system collection
interval is not short enough
to see the unsigned counter going to a lesser value (counters are
unsigned, they go to 0 at UINT_MAX+1).
I need advice/pointer on locking - I'm looking at it and it looks to
me as if there's no locking needed when
updating a member of nt there. Tell me if I'm wrong.

> > @@ -1015,6 +1035,25 @@ static struct notifier_block netconsole_netdev_notifier = {
> >       .notifier_call  = netconsole_netdev_event,
> >  };
> >
> > +/**
> > + * count_udp_send_stats - Classify netpoll_send_udp result and count errors.
> > + * @nt: target that was sent to
> > + * @result: result of netpoll_send_udp
> > + *
> > + * Takes the result of netpoll_send_udp and classifies the type of error that
> > + * occurred. Increments statistics in nt->stats accordingly.
> > + */
> > +static void count_udp_send_stats(struct netconsole_target *nt, int result)
> > +{
> > +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> > +     if (result == NET_XMIT_DROP) {
> > +             nt->stats.xmit_drop_count++;
>
> If you change the type, you can use the `u64_stats_inc` helper here.
ok

> > @@ -1126,7 +1167,11 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
> >                       this_offset += this_chunk;
> >               }
> >
> > -             netpoll_send_udp(&nt->np, buf, this_header + this_offset);
> > +             count_udp_send_stats(nt,
> > +                                  netpoll_send_udp(&nt->np,
> > +                                                   buf,
> > +                                                   this_header + this_offset)
> > +             );
>
> as Jakub said, this is not a format that is common in the Linux kenrel.
ok
Breno Leitao Aug. 28, 2024, 3:24 p.m. UTC | #9
Hello Maksym,

On Wed, Aug 28, 2024 at 11:03:09AM -0400, Maksym Kutsevol wrote:

> > Stats as an array are quite hard to read / understand

> I agree with that.
> I couldn't find examples of multiple values exported as stats from
> configfs. Only from sysfs,
> e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which
> describes a whitespace
> separated file with stats.
> 
> I want to lean on the opinion of someone more experienced in kernel
> dev on how to proceed here.
> - as is
> - whitespace separated like blockdev stats
> - multiple files and stop talking about it? :)

Do we really need both stats/numbers here? Would it be easier if we just have
a "dropped_packet" counter that count packets that netconsole dropped
for one reason or another?

I don't see statistics for lack of memory in regular netdev interfaces.
We probably don't need it for netconsole.
Maksym Kutsevol Aug. 28, 2024, 3:31 p.m. UTC | #10
Hey Breno,
thanks for looking at it.


On Wed, Aug 28, 2024 at 11:19 AM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Maksym,
>
> On Wed, Aug 28, 2024 at 10:26:20AM -0400, Maksym Kutsevol wrote:
> > > > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> > > > index 9c09293b5258..45c07ec7842d 100644
> > > > --- a/drivers/net/netconsole.c
> > > > +++ b/drivers/net/netconsole.c
> > > > @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock);
> > > >   */
> > > >  static struct console netconsole_ext;
> > > >
> > > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC
> > > > +struct netconsole_target_stats  {
> > > > +     size_t xmit_drop_count;
> > > > +     size_t enomem_count;
> > >
> > > I am looking at other drivers, and they use a specific type for these
> > > counters, u64_stats_sync.
> > > if it is possible to use this format, then you can leverage the
> > > `__u64_stats_update` helpers, and not worry about locking/overflow!?
> > >
> > Do you think that these counters really need more than an int?
>
> An int can overflow and become negative, so, you will see negative
> values, right?
It's an unsigned int (maybe not an int on every platform, but still unsigned),
it will become 0, not negative. E.g.
https://www.programiz.com/online-compiler/3qbkayylX5Cmf

> > Switching them to unsigned int instead might be better?
>
> Why not `u64_stats_sync` ?
>
Only because it's smaller and does the job. Unless locking is needed.

> > I'd argue that at the point when an external system collection
> > interval is not short enough
> > to see the unsigned counter going to a lesser value (counters are
> > unsigned, they go to 0 at UINT_MAX+1).
> > I need advice/pointer on locking - I'm looking at it and it looks to
> > me as if there's no locking needed when
> > updating a member of nt there. Tell me if I'm wrong.
>
> well, they are updated while holding `target_list_lock` right? But I
> would not rely on it.
Then I'll update to use them instead.

> If you just convert the values to u64_stats_sync, you get the
> synchronization for free, basically doing the following:
>
>         u64_stats_update_begin()
>         u64_stats_inc()
>         u64_stats_update_end()
>
> Thanks for working on it,
> --breno
Absolutely, my pleasure :)
Maksym Kutsevol Aug. 28, 2024, 3:33 p.m. UTC | #11
Hey Breno,


On Wed, Aug 28, 2024 at 11:25 AM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Maksym,
>
> On Wed, Aug 28, 2024 at 11:03:09AM -0400, Maksym Kutsevol wrote:
>
> > > Stats as an array are quite hard to read / understand
>
> > I agree with that.
> > I couldn't find examples of multiple values exported as stats from
> > configfs. Only from sysfs,
> > e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which
> > describes a whitespace
> > separated file with stats.
> >
> > I want to lean on the opinion of someone more experienced in kernel
> > dev on how to proceed here.
> > - as is
> > - whitespace separated like blockdev stats
> > - multiple files and stop talking about it? :)
>
> Do we really need both stats/numbers here? Would it be easier if we just have
> a "dropped_packet" counter that count packets that netconsole dropped
> for one reason or another?
>
> I don't see statistics for lack of memory in regular netdev interfaces.
> We probably don't need it for netconsole.
I like this idea! I'll send a V2 with only one attribute.
Thanks, Breno!
diff mbox series

Patch

diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index d55c2a22ec7a..733d4a93878e 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -135,6 +135,7 @@  The interface exposes these parameters of a netconsole target to userspace:
 	remote_ip	Remote agent's IP address		(read-write)
 	local_mac	Local interface's MAC address		(read-only)
 	remote_mac	Remote agent's MAC address		(read-write)
+	stats		Send error stats			(read-only)
 	==============  =================================       ============
 
 The "enabled" attribute is also used to control whether the parameters of
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 9c09293b5258..45c07ec7842d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -82,6 +82,13 @@  static DEFINE_SPINLOCK(target_list_lock);
  */
 static struct console netconsole_ext;
 
+#ifdef CONFIG_NETCONSOLE_DYNAMIC
+struct netconsole_target_stats  {
+	size_t xmit_drop_count;
+	size_t enomem_count;
+};
+#endif
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -89,6 +96,7 @@  static struct console netconsole_ext;
  * @userdata_group:	Links to the userdata configfs hierarchy
  * @userdata_complete:	Cached, formatted string of append
  * @userdata_length:	String length of userdata_complete
+ * @stats:	UDP send stats for the target. Used for debugging.
  * @enabled:	On / off knob to enable / disable target.
  *		Visible from userspace (read-write).
  *		We maintain a strict 1:1 correspondence between this and
@@ -115,6 +123,7 @@  struct netconsole_target {
 	struct config_group	userdata_group;
 	char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS];
 	size_t			userdata_length;
+	struct netconsole_target_stats stats;
 #endif
 	bool			enabled;
 	bool			extended;
@@ -227,6 +236,7 @@  static struct netconsole_target *alloc_and_init(void)
  *				|	remote_ip
  *				|	local_mac
  *				|	remote_mac
+ *				|	stats
  *				|	userdata/
  *				|		<key>/
  *				|			value
@@ -323,6 +333,14 @@  static ssize_t remote_mac_show(struct config_item *item, char *buf)
 	return sysfs_emit(buf, "%pM\n", to_target(item)->np.remote_mac);
 }
 
+static ssize_t stats_show(struct config_item *item, char *buf)
+{
+	struct netconsole_target *nt = to_target(item);
+
+	return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n",
+		nt->stats.xmit_drop_count, nt->stats.enomem_count);
+}
+
 /*
  * This one is special -- targets created through the configfs interface
  * are not enabled (and the corresponding netpoll activated) by default.
@@ -795,6 +813,7 @@  CONFIGFS_ATTR(, remote_ip);
 CONFIGFS_ATTR_RO(, local_mac);
 CONFIGFS_ATTR(, remote_mac);
 CONFIGFS_ATTR(, release);
+CONFIGFS_ATTR_RO(, stats);
 
 static struct configfs_attribute *netconsole_target_attrs[] = {
 	&attr_enabled,
@@ -807,6 +826,7 @@  static struct configfs_attribute *netconsole_target_attrs[] = {
 	&attr_remote_ip,
 	&attr_local_mac,
 	&attr_remote_mac,
+	&attr_stats,
 	NULL,
 };
 
@@ -1015,6 +1035,25 @@  static struct notifier_block netconsole_netdev_notifier = {
 	.notifier_call  = netconsole_netdev_event,
 };
 
+/**
+ * count_udp_send_stats - Classify netpoll_send_udp result and count errors.
+ * @nt: target that was sent to
+ * @result: result of netpoll_send_udp
+ *
+ * Takes the result of netpoll_send_udp and classifies the type of error that
+ * occurred. Increments statistics in nt->stats accordingly.
+ */
+static void count_udp_send_stats(struct netconsole_target *nt, int result)
+{
+#ifdef CONFIG_NETCONSOLE_DYNAMIC
+	if (result == NET_XMIT_DROP) {
+		nt->stats.xmit_drop_count++;
+	} else if (result == -ENOMEM) {
+		nt->stats.enomem_count++;
+	};
+#endif
+}
+
 /**
  * send_ext_msg_udp - send extended log message to target
  * @nt: target to send message to
@@ -1063,7 +1102,9 @@  static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 					     "%s", userdata);
 
 		msg_ready = buf;
-		netpoll_send_udp(&nt->np, msg_ready, msg_len);
+		count_udp_send_stats(nt, netpoll_send_udp(&nt->np,
+							  msg_ready,
+							  msg_len));
 		return;
 	}
 
@@ -1126,7 +1167,11 @@  static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 			this_offset += this_chunk;
 		}
 
-		netpoll_send_udp(&nt->np, buf, this_header + this_offset);
+		count_udp_send_stats(nt,
+				     netpoll_send_udp(&nt->np,
+						      buf,
+						      this_header + this_offset)
+		);
 		offset += this_offset;
 	}
 }
@@ -1172,7 +1217,10 @@  static void write_msg(struct console *con, const char *msg, unsigned int len)
 			tmp = msg;
 			for (left = len; left;) {
 				frag = min(left, MAX_PRINT_CHUNK);
-				netpoll_send_udp(&nt->np, tmp, frag);
+				int send_result = netpoll_send_udp(&nt->np,
+								   tmp,
+								   frag);
+				count_udp_send_stats(nt, send_result);
 				tmp += frag;
 				left -= frag;
 			}